< prev index next >
src/java.base/share/classes/java/io/DataInput.java
Print this page
*** 52,138 ****
* <p>
* Implementations of the DataInput and DataOutput interfaces represent
* Unicode strings in a format that is a slight modification of UTF-8.
* (For information regarding the standard UTF-8 format, see section
* <i>3.9 Unicode Encoding Forms</i> of <i>The Unicode Standard, Version
! * 4.0</i>).
! * Note that in the following table, the most significant bit appears in the
! * far left-hand column.
*
! * <blockquote>
! * <table class="plain">
! * <caption style="display:none">Bit values and bytes</caption>
* <tbody>
* <tr>
! * <th colspan="9"><span style="font-weight:normal">
! * All characters in the range {@code '\u005Cu0001'} to
! * {@code '\u005Cu007F'} are represented by a single byte:</span></th>
! * </tr>
! * <tr>
! * <td></td>
! * <th colspan="8" id="bit_a">Bit Values</th>
! * </tr>
! * <tr>
! * <th id="byte1_a" style="text-align:left">Byte 1</th>
* <td style="text-align:center">0
* <td colspan="7" style="text-align:center">bits 6-0
* </tr>
* <tr>
! * <th colspan="9"><span style="font-weight:normal">
! * The null character {@code '\u005Cu0000'} and characters
! * in the range {@code '\u005Cu0080'} to {@code '\u005Cu07FF'} are
! * represented by a pair of bytes:</span></th>
! * </tr>
! * <tr>
! * <td></td>
! * <th colspan="8" id="bit_b">Bit Values</th>
! * </tr>
! * <tr>
! * <th id="byte1_b" style="text-align:left">Byte 1</th>
* <td style="text-align:center">1
* <td style="text-align:center">1
* <td style="text-align:center">0
* <td colspan="5" style="text-align:center">bits 10-6
* </tr>
* <tr>
! * <th id="byte2_a" style="text-align:left">Byte 2</th>
* <td style="text-align:center">1
* <td style="text-align:center">0
* <td colspan="6" style="text-align:center">bits 5-0
* </tr>
* <tr>
! * <th colspan="9"><span style="font-weight:normal">
! * {@code char} values in the range {@code '\u005Cu0800'}
! * to {@code '\u005CuFFFF'} are represented by three bytes:</span></th>
! * </tr>
! * <tr>
! * <td></td>
! * <th colspan="8"id="bit_c">Bit Values</th>
! * </tr>
! * <tr>
! * <th id="byte1_c" style="text-align:left">Byte 1</th>
* <td style="text-align:center">1
* <td style="text-align:center">1
* <td style="text-align:center">1
* <td style="text-align:center">0
* <td colspan="4" style="text-align:center">bits 15-12
* </tr>
* <tr>
! * <th id="byte2_b" style="text-align:left">Byte 2</th>
* <td style="text-align:center">1
* <td style="text-align:center">0
* <td colspan="6" style="text-align:center">bits 11-6
* </tr>
* <tr>
! * <th id="byte3" style="text-align:left">Byte 3</th>
* <td style="text-align:center">1
* <td style="text-align:center">0
* <td colspan="6" style="text-align:center">bits 5-0
* </tr>
* </tbody>
* </table>
! * </blockquote>
* <p>
* The differences between this format and the
* standard UTF-8 format are the following:
* <ul>
* <li>The null byte {@code '\u005Cu0000'} is encoded in 2-byte format
--- 52,145 ----
* <p>
* Implementations of the DataInput and DataOutput interfaces represent
* Unicode strings in a format that is a slight modification of UTF-8.
* (For information regarding the standard UTF-8 format, see section
* <i>3.9 Unicode Encoding Forms</i> of <i>The Unicode Standard, Version
! * 4.0</i>)
*
! * <ul>
! * <li>Characters in the range {@code '\u005Cu0001'} to
! * {@code '\u005Cu007F'} are represented by a single byte.
! * <li>The null character {@code '\u005Cu0000'} and characters
! * in the range {@code '\u005Cu0080'} to {@code '\u005Cu07FF'} are
! * represented by a pair of bytes.
! * <li>Characters in the range {@code '\u005Cu0800'}
! * to {@code '\u005CuFFFF'} are represented by three bytes.
! * </ul>
! *
! * <table class="plain" style="margin-left:2em;">
! * <caption>Encoding of UTF-8 values</caption>
! * <thead>
! * <tr>
! * <th scope="col" rowspan="2">Value</th>
! * <th scope="col" rowspan="2">Byte</th>
! * <th scope="col" colspan="8" id="bit_a">Bit Values</th>
! * </tr>
! * <tr>
! * <!-- Value -->
! * <!-- Byte -->
! * <th scope="col" style="width:3em"> 7 </th>
! * <th scope="col" style="width:3em"> 6 </th>
! * <th scope="col" style="width:3em"> 5 </th>
! * <th scope="col" style="width:3em"> 4 </th>
! * <th scope="col" style="width:3em"> 3 </th>
! * <th scope="col" style="width:3em"> 2 </th>
! * <th scope="col" style="width:3em"> 1 </th>
! * <th scope="col" style="width:3em"> 0 </th>
! * </thead>
* <tbody>
* <tr>
! * <th scope="row" style="text-align:left; font-weight:normal">
! * {@code \u005Cu0001} to {@code \u005Cu007F} </th>
! * <th scope="row" style="font-weight:normal"> 1 </th>
* <td style="text-align:center">0
* <td colspan="7" style="text-align:center">bits 6-0
* </tr>
* <tr>
! * <th scope="row" rowspan="2" style="text-align:left; font-weight:normal">
! * {@code \u005Cu0000},<br>
! * {@code \u005Cu0080} to {@code \u005Cu07FF} </th>
! * <th scope="row" style="font-weight:normal"> 1 </th>
* <td style="text-align:center">1
* <td style="text-align:center">1
* <td style="text-align:center">0
* <td colspan="5" style="text-align:center">bits 10-6
* </tr>
* <tr>
! * <!-- (value) -->
! * <th scope="row" style="font-weight:normal"> 2 </th>
* <td style="text-align:center">1
* <td style="text-align:center">0
* <td colspan="6" style="text-align:center">bits 5-0
* </tr>
* <tr>
! * <th scope="row" rowspan="3" style="text-align:left; font-weight:normal">
! * {@code \u005Cu0800} to {@code \u005CuFFFF} </th>
! * <th scope="row" style="font-weight:normal"> 1 </th>
* <td style="text-align:center">1
* <td style="text-align:center">1
* <td style="text-align:center">1
* <td style="text-align:center">0
* <td colspan="4" style="text-align:center">bits 15-12
* </tr>
* <tr>
! * <!-- (value) -->
! * <th scope="row" style="font-weight:normal"> 2 </th>
* <td style="text-align:center">1
* <td style="text-align:center">0
* <td colspan="6" style="text-align:center">bits 11-6
* </tr>
* <tr>
! * <!-- (value) -->
! * <th scope="row" style="font-weight:normal"> 3 </th>
* <td style="text-align:center">1
* <td style="text-align:center">0
* <td colspan="6" style="text-align:center">bits 5-0
* </tr>
* </tbody>
* </table>
! *
* <p>
* The differences between this format and the
* standard UTF-8 format are the following:
* <ul>
* <li>The null byte {@code '\u005Cu0000'} is encoded in 2-byte format
< prev index next >