src/share/classes/java/nio/charset/Charset.java

Print this page
rev 3975 : 4884238: Adds java.nio.charset.StandardCharset to provide static final constants for the standard charsets.


 126  * <p> If a charset listed in the <a
 127  * href="http://www.iana.org/assignments/character-sets"><i>IANA Charset
 128  * Registry</i></a> is supported by an implementation of the Java platform then
 129  * its canonical name must be the name listed in the registry.  Many charsets
 130  * are given more than one name in the registry, in which case the registry
 131  * identifies one of the names as <i>MIME-preferred</i>.  If a charset has more
 132  * than one registry name then its canonical name must be the MIME-preferred
 133  * name and the other names in the registry must be valid aliases.  If a
 134  * supported charset is not listed in the IANA registry then its canonical name
 135  * must begin with one of the strings <tt>"X-"</tt> or <tt>"x-"</tt>.
 136  *
 137  * <p> The IANA charset registry does change over time, and so the canonical
 138  * name and the aliases of a particular charset may also change over time.  To
 139  * ensure compatibility it is recommended that no alias ever be removed from a
 140  * charset, and that if the canonical name of a charset is changed then its
 141  * previous canonical name be made into an alias.
 142  *
 143  *
 144  * <h4>Standard charsets</h4>
 145  *


 146  * <p> Every implementation of the Java platform is required to support the
 147  * following standard charsets.  Consult the release documentation for your
 148  * implementation to see if any other charsets are supported.  The behavior
 149  * of such optional charsets may differ between implementations.
 150  *
 151  * <blockquote><table width="80%" summary="Description of standard charsets">
 152  * <tr><th><p align="left">Charset</p></th><th><p align="left">Description</p></th></tr>
 153  * <tr><td valign=top><tt>US-ASCII</tt></td>
 154  *     <td>Seven-bit ASCII, a.k.a. <tt>ISO646-US</tt>,
 155  *         a.k.a. the Basic Latin block of the Unicode character set</td></tr>
 156  * <tr><td valign=top><tt>ISO-8859-1&nbsp;&nbsp;</tt></td>
 157  *     <td>ISO Latin Alphabet No. 1, a.k.a. <tt>ISO-LATIN-1</tt></td></tr>
 158  * <tr><td valign=top><tt>UTF-8</tt></td>
 159  *     <td>Eight-bit UCS Transformation Format</td></tr>
 160  * <tr><td valign=top><tt>UTF-16BE</tt></td>
 161  *     <td>Sixteen-bit UCS Transformation Format,
 162  *         big-endian byte&nbsp;order</td></tr>
 163  * <tr><td valign=top><tt>UTF-16LE</tt></td>
 164  *     <td>Sixteen-bit UCS Transformation Format,
 165  *         little-endian byte&nbsp;order</td></tr>


 196  *   byte-order marks. </p></li>
 197 
 198  *
 199  *   <li><p> When decoding, the <tt>UTF-16</tt> charset interprets the
 200  *   byte-order mark at the beginning of the input stream to indicate the
 201  *   byte-order of the stream but defaults to big-endian if there is no
 202  *   byte-order mark; when encoding, it uses big-endian byte order and writes
 203  *   a big-endian byte-order mark. </p></li>
 204  *
 205  * </ul>
 206  *
 207  * In any case, byte order marks occuring after the first element of an
 208  * input sequence are not omitted since the same code is used to represent
 209  * <small>ZERO-WIDTH NON-BREAKING SPACE</small>.
 210  *
 211  * <p> Every instance of the Java virtual machine has a default charset, which
 212  * may or may not be one of the standard charsets.  The default charset is
 213  * determined during virtual-machine startup and typically depends upon the
 214  * locale and charset being used by the underlying operating system. </p>
 215  *


 216  *
 217  * <h4>Terminology</h4>
 218  *
 219  * <p> The name of this class is taken from the terms used in
 220  * <a href="http://www.ietf.org/rfc/rfc2278.txt"><i>RFC&nbsp;2278</i></a>.
 221  * In that document a <i>charset</i> is defined as the combination of
 222  * one or more coded character sets and a character-encoding scheme.
 223  * (This definition is confusing; some other software systems define
 224  * <i>charset</i> as a synonym for <i>coded character set</i>.)
 225  *
 226  * <p> A <i>coded character set</i> is a mapping between a set of abstract
 227  * characters and a set of integers.  US-ASCII, ISO&nbsp;8859-1,
 228  * JIS&nbsp;X&nbsp;0201, and Unicode are examples of coded character sets.
 229  *
 230  * <p> Some standards have defined a <i>character set</i> to be simply a
 231  * set of abstract characters without an associated assigned numbering.
 232  * An alphabet is an example of such a character set.  However, the subtle
 233  * distinction between <i>character set</i> and <i>coded character set</i>
 234  * is rarely used in practice; the former has become a short form for the
 235  * latter, including in the Java API specification.




 126  * <p> If a charset listed in the <a
 127  * href="http://www.iana.org/assignments/character-sets"><i>IANA Charset
 128  * Registry</i></a> is supported by an implementation of the Java platform then
 129  * its canonical name must be the name listed in the registry.  Many charsets
 130  * are given more than one name in the registry, in which case the registry
 131  * identifies one of the names as <i>MIME-preferred</i>.  If a charset has more
 132  * than one registry name then its canonical name must be the MIME-preferred
 133  * name and the other names in the registry must be valid aliases.  If a
 134  * supported charset is not listed in the IANA registry then its canonical name
 135  * must begin with one of the strings <tt>"X-"</tt> or <tt>"x-"</tt>.
 136  *
 137  * <p> The IANA charset registry does change over time, and so the canonical
 138  * name and the aliases of a particular charset may also change over time.  To
 139  * ensure compatibility it is recommended that no alias ever be removed from a
 140  * charset, and that if the canonical name of a charset is changed then its
 141  * previous canonical name be made into an alias.
 142  *
 143  *
 144  * <h4>Standard charsets</h4>
 145  *
 146  * <a name="standard">
 147  *
 148  * <p> Every implementation of the Java platform is required to support the
 149  * following standard charsets.  Consult the release documentation for your
 150  * implementation to see if any other charsets are supported.  The behavior
 151  * of such optional charsets may differ between implementations.
 152  *
 153  * <blockquote><table width="80%" summary="Description of standard charsets">
 154  * <tr><th><p align="left">Charset</p></th><th><p align="left">Description</p></th></tr>
 155  * <tr><td valign=top><tt>US-ASCII</tt></td>
 156  *     <td>Seven-bit ASCII, a.k.a. <tt>ISO646-US</tt>,
 157  *         a.k.a. the Basic Latin block of the Unicode character set</td></tr>
 158  * <tr><td valign=top><tt>ISO-8859-1&nbsp;&nbsp;</tt></td>
 159  *     <td>ISO Latin Alphabet No. 1, a.k.a. <tt>ISO-LATIN-1</tt></td></tr>
 160  * <tr><td valign=top><tt>UTF-8</tt></td>
 161  *     <td>Eight-bit UCS Transformation Format</td></tr>
 162  * <tr><td valign=top><tt>UTF-16BE</tt></td>
 163  *     <td>Sixteen-bit UCS Transformation Format,
 164  *         big-endian byte&nbsp;order</td></tr>
 165  * <tr><td valign=top><tt>UTF-16LE</tt></td>
 166  *     <td>Sixteen-bit UCS Transformation Format,
 167  *         little-endian byte&nbsp;order</td></tr>


 198  *   byte-order marks. </p></li>
 199 
 200  *
 201  *   <li><p> When decoding, the <tt>UTF-16</tt> charset interprets the
 202  *   byte-order mark at the beginning of the input stream to indicate the
 203  *   byte-order of the stream but defaults to big-endian if there is no
 204  *   byte-order mark; when encoding, it uses big-endian byte order and writes
 205  *   a big-endian byte-order mark. </p></li>
 206  *
 207  * </ul>
 208  *
 209  * In any case, byte order marks occuring after the first element of an
 210  * input sequence are not omitted since the same code is used to represent
 211  * <small>ZERO-WIDTH NON-BREAKING SPACE</small>.
 212  *
 213  * <p> Every instance of the Java virtual machine has a default charset, which
 214  * may or may not be one of the standard charsets.  The default charset is
 215  * determined during virtual-machine startup and typically depends upon the
 216  * locale and charset being used by the underlying operating system. </p>
 217  *
 218  * <p>The {@link StandardCharset} class defines constants for each of the
 219  * standard charsets.
 220  *
 221  * <h4>Terminology</h4>
 222  *
 223  * <p> The name of this class is taken from the terms used in
 224  * <a href="http://www.ietf.org/rfc/rfc2278.txt"><i>RFC&nbsp;2278</i></a>.
 225  * In that document a <i>charset</i> is defined as the combination of
 226  * one or more coded character sets and a character-encoding scheme.
 227  * (This definition is confusing; some other software systems define
 228  * <i>charset</i> as a synonym for <i>coded character set</i>.)
 229  *
 230  * <p> A <i>coded character set</i> is a mapping between a set of abstract
 231  * characters and a set of integers.  US-ASCII, ISO&nbsp;8859-1,
 232  * JIS&nbsp;X&nbsp;0201, and Unicode are examples of coded character sets.
 233  *
 234  * <p> Some standards have defined a <i>character set</i> to be simply a
 235  * set of abstract characters without an associated assigned numbering.
 236  * An alphabet is an example of such a character set.  However, the subtle
 237  * distinction between <i>character set</i> and <i>coded character set</i>
 238  * is rarely used in practice; the former has become a short form for the
 239  * latter, including in the Java API specification.