126 * <p> If a charset listed in the <a
127 * href="http://www.iana.org/assignments/character-sets"><i>IANA Charset
128 * Registry</i></a> is supported by an implementation of the Java platform then
129 * its canonical name must be the name listed in the registry. Many charsets
130 * are given more than one name in the registry, in which case the registry
131 * identifies one of the names as <i>MIME-preferred</i>. If a charset has more
132 * than one registry name then its canonical name must be the MIME-preferred
133 * name and the other names in the registry must be valid aliases. If a
134 * supported charset is not listed in the IANA registry then its canonical name
135 * must begin with one of the strings <tt>"X-"</tt> or <tt>"x-"</tt>.
136 *
137 * <p> The IANA charset registry does change over time, and so the canonical
138 * name and the aliases of a particular charset may also change over time. To
139 * ensure compatibility it is recommended that no alias ever be removed from a
140 * charset, and that if the canonical name of a charset is changed then its
141 * previous canonical name be made into an alias.
142 *
143 *
144 * <h4>Standard charsets</h4>
145 *
146 * <p> Every implementation of the Java platform is required to support the
147 * following standard charsets. Consult the release documentation for your
148 * implementation to see if any other charsets are supported. The behavior
149 * of such optional charsets may differ between implementations.
150 *
151 * <blockquote><table width="80%" summary="Description of standard charsets">
152 * <tr><th><p align="left">Charset</p></th><th><p align="left">Description</p></th></tr>
153 * <tr><td valign=top><tt>US-ASCII</tt></td>
154 * <td>Seven-bit ASCII, a.k.a. <tt>ISO646-US</tt>,
155 * a.k.a. the Basic Latin block of the Unicode character set</td></tr>
156 * <tr><td valign=top><tt>ISO-8859-1 </tt></td>
157 * <td>ISO Latin Alphabet No. 1, a.k.a. <tt>ISO-LATIN-1</tt></td></tr>
158 * <tr><td valign=top><tt>UTF-8</tt></td>
159 * <td>Eight-bit UCS Transformation Format</td></tr>
160 * <tr><td valign=top><tt>UTF-16BE</tt></td>
161 * <td>Sixteen-bit UCS Transformation Format,
162 * big-endian byte order</td></tr>
163 * <tr><td valign=top><tt>UTF-16LE</tt></td>
164 * <td>Sixteen-bit UCS Transformation Format,
165 * little-endian byte order</td></tr>
196 * byte-order marks. </p></li>
197
198 *
199 * <li><p> When decoding, the <tt>UTF-16</tt> charset interprets the
200 * byte-order mark at the beginning of the input stream to indicate the
201 * byte-order of the stream but defaults to big-endian if there is no
202 * byte-order mark; when encoding, it uses big-endian byte order and writes
203 * a big-endian byte-order mark. </p></li>
204 *
205 * </ul>
206 *
207 * In any case, byte order marks occuring after the first element of an
208 * input sequence are not omitted since the same code is used to represent
209 * <small>ZERO-WIDTH NON-BREAKING SPACE</small>.
210 *
211 * <p> Every instance of the Java virtual machine has a default charset, which
212 * may or may not be one of the standard charsets. The default charset is
213 * determined during virtual-machine startup and typically depends upon the
214 * locale and charset being used by the underlying operating system. </p>
215 *
216 *
217 * <h4>Terminology</h4>
218 *
219 * <p> The name of this class is taken from the terms used in
220 * <a href="http://www.ietf.org/rfc/rfc2278.txt"><i>RFC 2278</i></a>.
221 * In that document a <i>charset</i> is defined as the combination of
222 * one or more coded character sets and a character-encoding scheme.
223 * (This definition is confusing; some other software systems define
224 * <i>charset</i> as a synonym for <i>coded character set</i>.)
225 *
226 * <p> A <i>coded character set</i> is a mapping between a set of abstract
227 * characters and a set of integers. US-ASCII, ISO 8859-1,
228 * JIS X 0201, and Unicode are examples of coded character sets.
229 *
230 * <p> Some standards have defined a <i>character set</i> to be simply a
231 * set of abstract characters without an associated assigned numbering.
232 * An alphabet is an example of such a character set. However, the subtle
233 * distinction between <i>character set</i> and <i>coded character set</i>
234 * is rarely used in practice; the former has become a short form for the
235 * latter, including in the Java API specification.
|
126 * <p> If a charset listed in the <a
127 * href="http://www.iana.org/assignments/character-sets"><i>IANA Charset
128 * Registry</i></a> is supported by an implementation of the Java platform then
129 * its canonical name must be the name listed in the registry. Many charsets
130 * are given more than one name in the registry, in which case the registry
131 * identifies one of the names as <i>MIME-preferred</i>. If a charset has more
132 * than one registry name then its canonical name must be the MIME-preferred
133 * name and the other names in the registry must be valid aliases. If a
134 * supported charset is not listed in the IANA registry then its canonical name
135 * must begin with one of the strings <tt>"X-"</tt> or <tt>"x-"</tt>.
136 *
137 * <p> The IANA charset registry does change over time, and so the canonical
138 * name and the aliases of a particular charset may also change over time. To
139 * ensure compatibility it is recommended that no alias ever be removed from a
140 * charset, and that if the canonical name of a charset is changed then its
141 * previous canonical name be made into an alias.
142 *
143 *
144 * <h4>Standard charsets</h4>
145 *
146 * <a name="standard">
147 *
148 * <p> Every implementation of the Java platform is required to support the
149 * following standard charsets. Consult the release documentation for your
150 * implementation to see if any other charsets are supported. The behavior
151 * of such optional charsets may differ between implementations.
152 *
153 * <blockquote><table width="80%" summary="Description of standard charsets">
154 * <tr><th><p align="left">Charset</p></th><th><p align="left">Description</p></th></tr>
155 * <tr><td valign=top><tt>US-ASCII</tt></td>
156 * <td>Seven-bit ASCII, a.k.a. <tt>ISO646-US</tt>,
157 * a.k.a. the Basic Latin block of the Unicode character set</td></tr>
158 * <tr><td valign=top><tt>ISO-8859-1 </tt></td>
159 * <td>ISO Latin Alphabet No. 1, a.k.a. <tt>ISO-LATIN-1</tt></td></tr>
160 * <tr><td valign=top><tt>UTF-8</tt></td>
161 * <td>Eight-bit UCS Transformation Format</td></tr>
162 * <tr><td valign=top><tt>UTF-16BE</tt></td>
163 * <td>Sixteen-bit UCS Transformation Format,
164 * big-endian byte order</td></tr>
165 * <tr><td valign=top><tt>UTF-16LE</tt></td>
166 * <td>Sixteen-bit UCS Transformation Format,
167 * little-endian byte order</td></tr>
198 * byte-order marks. </p></li>
199
200 *
201 * <li><p> When decoding, the <tt>UTF-16</tt> charset interprets the
202 * byte-order mark at the beginning of the input stream to indicate the
203 * byte-order of the stream but defaults to big-endian if there is no
204 * byte-order mark; when encoding, it uses big-endian byte order and writes
205 * a big-endian byte-order mark. </p></li>
206 *
207 * </ul>
208 *
209 * In any case, byte order marks occuring after the first element of an
210 * input sequence are not omitted since the same code is used to represent
211 * <small>ZERO-WIDTH NON-BREAKING SPACE</small>.
212 *
213 * <p> Every instance of the Java virtual machine has a default charset, which
214 * may or may not be one of the standard charsets. The default charset is
215 * determined during virtual-machine startup and typically depends upon the
216 * locale and charset being used by the underlying operating system. </p>
217 *
218 * <p>The {@link StandardCharset} class defines constants for each of the
219 * standard charsets.
220 *
221 * <h4>Terminology</h4>
222 *
223 * <p> The name of this class is taken from the terms used in
224 * <a href="http://www.ietf.org/rfc/rfc2278.txt"><i>RFC 2278</i></a>.
225 * In that document a <i>charset</i> is defined as the combination of
226 * one or more coded character sets and a character-encoding scheme.
227 * (This definition is confusing; some other software systems define
228 * <i>charset</i> as a synonym for <i>coded character set</i>.)
229 *
230 * <p> A <i>coded character set</i> is a mapping between a set of abstract
231 * characters and a set of integers. US-ASCII, ISO 8859-1,
232 * JIS X 0201, and Unicode are examples of coded character sets.
233 *
234 * <p> Some standards have defined a <i>character set</i> to be simply a
235 * set of abstract characters without an associated assigned numbering.
236 * An alphabet is an example of such a character set. However, the subtle
237 * distinction between <i>character set</i> and <i>coded character set</i>
238 * is rarely used in practice; the former has become a short form for the
239 * latter, including in the Java API specification.
|