38 * {@code Character} contains a single field whose type is
39 * {@code char}.
40 * <p>
41 * In addition, this class provides several methods for determining
42 * a character's category (lowercase letter, digit, etc.) and for converting
43 * characters from uppercase to lowercase and vice versa.
44 * <p>
45 * Character information is based on the Unicode Standard, version 10.0.0.
46 * <p>
47 * The methods and data of class {@code Character} are defined by
48 * the information in the <i>UnicodeData</i> file that is part of the
49 * Unicode Character Database maintained by the Unicode
50 * Consortium. This file specifies various properties including name
51 * and general category for every defined Unicode code point or
52 * character range.
53 * <p>
54 * The file and its description are available from the Unicode Consortium at:
55 * <ul>
56 * <li><a href="http://www.unicode.org">http://www.unicode.org</a>
57 * </ul>
58 *
59 * <h3><a id="unicode">Unicode Character Representations</a></h3>
60 *
61 * <p>The {@code char} data type (and therefore the value that a
62 * {@code Character} object encapsulates) are based on the
63 * original Unicode specification, which defined characters as
64 * fixed-width 16-bit entities. The Unicode Standard has since been
65 * changed to allow for characters whose representation requires more
66 * than 16 bits. The range of legal <em>code point</em>s is now
67 * U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>.
68 * (Refer to the <a
69 * href="http://www.unicode.org/reports/tr27/#notation"><i>
70 * definition</i></a> of the U+<i>n</i> notation in the Unicode
71 * Standard.)
72 *
73 * <p><a id="BMP">The set of characters from U+0000 to U+FFFF</a> is
74 * sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>.
75 * <a id="supplementary">Characters</a> whose code points are greater
76 * than U+FFFF are called <em>supplementary character</em>s. The Java
77 * platform uses the UTF-16 representation in {@code char} arrays and
|
38 * {@code Character} contains a single field whose type is
39 * {@code char}.
40 * <p>
41 * In addition, this class provides several methods for determining
42 * a character's category (lowercase letter, digit, etc.) and for converting
43 * characters from uppercase to lowercase and vice versa.
44 * <p>
45 * Character information is based on the Unicode Standard, version 10.0.0.
46 * <p>
47 * The methods and data of class {@code Character} are defined by
48 * the information in the <i>UnicodeData</i> file that is part of the
49 * Unicode Character Database maintained by the Unicode
50 * Consortium. This file specifies various properties including name
51 * and general category for every defined Unicode code point or
52 * character range.
53 * <p>
54 * The file and its description are available from the Unicode Consortium at:
55 * <ul>
56 * <li><a href="http://www.unicode.org">http://www.unicode.org</a>
57 * </ul>
58 * <p>
59 * The code point, U+32FF, is reserved by the Unicode Consortium
60 * to represent the Japanese square character for the new era that begins
61 * May 2019. Relevant methods in the Character class return the same
62 * properties as for the existing Japanese era characters (e.g., U+337E for
63 * "Meizi"). For the details of the code point, refer to
64 * <a href="http://blog.unicode.org/2018/09/new-japanese-era.html">
65 * http://blog.unicode.org/2018/09/new-japanese-era.html</a>
66 *
67 * <h3><a id="unicode">Unicode Character Representations</a></h3>
68 *
69 * <p>The {@code char} data type (and therefore the value that a
70 * {@code Character} object encapsulates) are based on the
71 * original Unicode specification, which defined characters as
72 * fixed-width 16-bit entities. The Unicode Standard has since been
73 * changed to allow for characters whose representation requires more
74 * than 16 bits. The range of legal <em>code point</em>s is now
75 * U+0000 to U+10FFFF, known as <em>Unicode scalar value</em>.
76 * (Refer to the <a
77 * href="http://www.unicode.org/reports/tr27/#notation"><i>
78 * definition</i></a> of the U+<i>n</i> notation in the Unicode
79 * Standard.)
80 *
81 * <p><a id="BMP">The set of characters from U+0000 to U+FFFF</a> is
82 * sometimes referred to as the <em>Basic Multilingual Plane (BMP)</em>.
83 * <a id="supplementary">Characters</a> whose code points are greater
84 * than U+FFFF are called <em>supplementary character</em>s. The Java
85 * platform uses the UTF-16 representation in {@code char} arrays and
|