< prev index next >

src/java.base/share/classes/java/nio/charset/Charset.java

Print this page




 130  * name and the other names in the registry must be valid aliases.  If a
 131  * supported charset is not listed in the IANA registry then its canonical name
 132  * must begin with one of the strings {@code "X-"} or {@code "x-"}.
 133  *
 134  * <p> The IANA charset registry does change over time, and so the canonical
 135  * name and the aliases of a particular charset may also change over time.  To
 136  * ensure compatibility it is recommended that no alias ever be removed from a
 137  * charset, and that if the canonical name of a charset is changed then its
 138  * previous canonical name be made into an alias.
 139  *
 140  *
 141  * <h2>Standard charsets</h2>
 142  *
 143  *
 144  *
 145  * <p><a id="standard">Every implementation of the Java platform is required to support the
 146  * following standard charsets.</a>  Consult the release documentation for your
 147  * implementation to see if any other charsets are supported.  The behavior
 148  * of such optional charsets may differ between implementations.
 149  *
 150  * <blockquote><table style="width:80%" summary="Description of standard charsets">


 151  * <tr><th style="text-align:left">Charset</th><th style="text-align:left">Description</th></tr>


 152  * <tr><td style="vertical-align:top">{@code US-ASCII}</td>
 153  *     <td>Seven-bit ASCII, a.k.a. {@code ISO646-US},
 154  *         a.k.a. the Basic Latin block of the Unicode character set</td></tr>
 155  * <tr><td style="vertical-align:top"><code>ISO-8859-1&nbsp;&nbsp;</code></td>
 156  *     <td>ISO Latin Alphabet No. 1, a.k.a. {@code ISO-LATIN-1}</td></tr>
 157  * <tr><td style="vertical-align:top">{@code UTF-8}</td>
 158  *     <td>Eight-bit UCS Transformation Format</td></tr>
 159  * <tr><td style="vertical-align:top">{@code UTF-16BE}</td>
 160  *     <td>Sixteen-bit UCS Transformation Format,
 161  *         big-endian byte&nbsp;order</td></tr>
 162  * <tr><td style="vertical-align:top">{@code UTF-16LE}</td>
 163  *     <td>Sixteen-bit UCS Transformation Format,
 164  *         little-endian byte&nbsp;order</td></tr>
 165  * <tr><td style="vertical-align:top">{@code UTF-16}</td>
 166  *     <td>Sixteen-bit UCS Transformation Format,
 167  *         byte&nbsp;order identified by an optional byte-order mark</td></tr>

 168  * </table></blockquote>
 169  *
 170  * <p> The {@code UTF-8} charset is specified by <a
 171  * href="http://www.ietf.org/rfc/rfc2279.txt"><i>RFC&nbsp;2279</i></a>; the
 172  * transformation format upon which it is based is specified in
 173  * Amendment&nbsp;2 of ISO&nbsp;10646-1 and is also described in the <a
 174  * href="http://www.unicode.org/unicode/standard/standard.html"><i>Unicode
 175  * Standard</i></a>.
 176  *
 177  * <p> The {@code UTF-16} charsets are specified by <a
 178  * href="http://www.ietf.org/rfc/rfc2781.txt"><i>RFC&nbsp;2781</i></a>; the
 179  * transformation formats upon which they are based are specified in
 180  * Amendment&nbsp;1 of ISO&nbsp;10646-1 and are also described in the <a
 181  * href="http://www.unicode.org/unicode/standard/standard.html"><i>Unicode
 182  * Standard</i></a>.
 183  *
 184  * <p> The {@code UTF-16} charsets use sixteen-bit quantities and are
 185  * therefore sensitive to byte order.  In these encodings the byte order of a
 186  * stream may be indicated by an initial <i>byte-order mark</i> represented by
 187  * the Unicode character <code>'\uFEFF'</code>.  Byte-order marks are handled




 130  * name and the other names in the registry must be valid aliases.  If a
 131  * supported charset is not listed in the IANA registry then its canonical name
 132  * must begin with one of the strings {@code "X-"} or {@code "x-"}.
 133  *
 134  * <p> The IANA charset registry does change over time, and so the canonical
 135  * name and the aliases of a particular charset may also change over time.  To
 136  * ensure compatibility it is recommended that no alias ever be removed from a
 137  * charset, and that if the canonical name of a charset is changed then its
 138  * previous canonical name be made into an alias.
 139  *
 140  *
 141  * <h2>Standard charsets</h2>
 142  *
 143  *
 144  *
 145  * <p><a id="standard">Every implementation of the Java platform is required to support the
 146  * following standard charsets.</a>  Consult the release documentation for your
 147  * implementation to see if any other charsets are supported.  The behavior
 148  * of such optional charsets may differ between implementations.
 149  *
 150  * <blockquote><table style="width:80%">
 151  * <caption style="display:none">Description of standard charsets</caption>
 152  * <thead>
 153  * <tr><th style="text-align:left">Charset</th><th style="text-align:left">Description</th></tr>
 154  * </thead>
 155  * <tbody>
 156  * <tr><td style="vertical-align:top">{@code US-ASCII}</td>
 157  *     <td>Seven-bit ASCII, a.k.a. {@code ISO646-US},
 158  *         a.k.a. the Basic Latin block of the Unicode character set</td></tr>
 159  * <tr><td style="vertical-align:top"><code>ISO-8859-1&nbsp;&nbsp;</code></td>
 160  *     <td>ISO Latin Alphabet No. 1, a.k.a. {@code ISO-LATIN-1}</td></tr>
 161  * <tr><td style="vertical-align:top">{@code UTF-8}</td>
 162  *     <td>Eight-bit UCS Transformation Format</td></tr>
 163  * <tr><td style="vertical-align:top">{@code UTF-16BE}</td>
 164  *     <td>Sixteen-bit UCS Transformation Format,
 165  *         big-endian byte&nbsp;order</td></tr>
 166  * <tr><td style="vertical-align:top">{@code UTF-16LE}</td>
 167  *     <td>Sixteen-bit UCS Transformation Format,
 168  *         little-endian byte&nbsp;order</td></tr>
 169  * <tr><td style="vertical-align:top">{@code UTF-16}</td>
 170  *     <td>Sixteen-bit UCS Transformation Format,
 171  *         byte&nbsp;order identified by an optional byte-order mark</td></tr>
 172  * </tbody>
 173  * </table></blockquote>
 174  *
 175  * <p> The {@code UTF-8} charset is specified by <a
 176  * href="http://www.ietf.org/rfc/rfc2279.txt"><i>RFC&nbsp;2279</i></a>; the
 177  * transformation format upon which it is based is specified in
 178  * Amendment&nbsp;2 of ISO&nbsp;10646-1 and is also described in the <a
 179  * href="http://www.unicode.org/unicode/standard/standard.html"><i>Unicode
 180  * Standard</i></a>.
 181  *
 182  * <p> The {@code UTF-16} charsets are specified by <a
 183  * href="http://www.ietf.org/rfc/rfc2781.txt"><i>RFC&nbsp;2781</i></a>; the
 184  * transformation formats upon which they are based are specified in
 185  * Amendment&nbsp;1 of ISO&nbsp;10646-1 and are also described in the <a
 186  * href="http://www.unicode.org/unicode/standard/standard.html"><i>Unicode
 187  * Standard</i></a>.
 188  *
 189  * <p> The {@code UTF-16} charsets use sixteen-bit quantities and are
 190  * therefore sensitive to byte order.  In these encodings the byte order of a
 191  * stream may be indicated by an initial <i>byte-order mark</i> represented by
 192  * the Unicode character <code>'\uFEFF'</code>.  Byte-order marks are handled


< prev index next >