< prev index next >

src/java.base/share/classes/java/io/DataInput.java

Print this page




  37  * format.
  38  * <p>
  39  * It is generally true of all the reading
  40  * routines in this interface that if end of
  41  * file is reached before the desired number
  42  * of bytes has been read, an {@code EOFException}
  43  * (which is a kind of {@code IOException})
  44  * is thrown. If any byte cannot be read for
  45  * any reason other than end of file, an {@code IOException}
  46  * other than {@code EOFException} is
  47  * thrown. In particular, an {@code IOException}
  48  * may be thrown if the input stream has been
  49  * closed.
  50  *
  51  * <h3><a id="modified-utf-8">Modified UTF-8</a></h3>
  52  * <p>
  53  * Implementations of the DataInput and DataOutput interfaces represent
  54  * Unicode strings in a format that is a slight modification of UTF-8.
  55  * (For information regarding the standard UTF-8 format, see section
  56  * <i>3.9 Unicode Encoding Forms</i> of <i>The Unicode Standard, Version
  57  * 4.0</i>).
  58  * Note that in the following table, the most significant bit appears in the
  59  * far left-hand column.
  60  *
  61  * <blockquote>
  62  *   <table class="plain">
  63  *     <caption style="display:none">Bit values and bytes</caption>



























  64  *     <tbody>
  65  *     <tr>
  66  *       <th colspan="9"><span style="font-weight:normal">
  67  *         All characters in the range {@code '\u005Cu0001'} to
  68  *         {@code '\u005Cu007F'} are represented by a single byte:</span></th>
  69  *     </tr>
  70  *     <tr>
  71  *       <td></td>
  72  *       <th colspan="8" id="bit_a">Bit Values</th>
  73  *     </tr>
  74  *     <tr>
  75  *       <th id="byte1_a" style="text-align:left">Byte 1</th>
  76  *       <td style="text-align:center">0
  77  *       <td colspan="7" style="text-align:center">bits 6-0
  78  *     </tr>
  79  *     <tr>
  80  *       <th colspan="9"><span style="font-weight:normal">
  81  *         The null character {@code '\u005Cu0000'} and characters
  82  *         in the range {@code '\u005Cu0080'} to {@code '\u005Cu07FF'} are
  83  *         represented by a pair of bytes:</span></th>
  84  *     </tr>
  85  *     <tr>
  86  *       <td></td>
  87  *       <th colspan="8" id="bit_b">Bit Values</th>
  88  *     </tr>
  89  *     <tr>
  90  *       <th id="byte1_b" style="text-align:left">Byte 1</th>
  91  *       <td style="text-align:center">1
  92  *       <td style="text-align:center">1
  93  *       <td style="text-align:center">0
  94  *       <td colspan="5" style="text-align:center">bits 10-6
  95  *     </tr>
  96  *     <tr>
  97  *       <th id="byte2_a" style="text-align:left">Byte 2</th>

  98  *       <td style="text-align:center">1
  99  *       <td style="text-align:center">0
 100  *       <td colspan="6" style="text-align:center">bits 5-0
 101  *     </tr>
 102  *     <tr>
 103  *       <th colspan="9"><span style="font-weight:normal">
 104  *         {@code char} values in the range {@code '\u005Cu0800'}
 105  *         to {@code '\u005CuFFFF'} are represented by three bytes:</span></th>
 106  *     </tr>
 107  *     <tr>
 108  *       <td></td>
 109  *       <th colspan="8"id="bit_c">Bit Values</th>
 110  *     </tr>
 111  *     <tr>
 112  *       <th id="byte1_c" style="text-align:left">Byte 1</th>
 113  *       <td style="text-align:center">1
 114  *       <td style="text-align:center">1
 115  *       <td style="text-align:center">1
 116  *       <td style="text-align:center">0
 117  *       <td colspan="4" style="text-align:center">bits 15-12
 118  *     </tr>
 119  *     <tr>
 120  *       <th id="byte2_b" style="text-align:left">Byte 2</th>

 121  *       <td style="text-align:center">1
 122  *       <td style="text-align:center">0
 123  *       <td colspan="6" style="text-align:center">bits 11-6
 124  *     </tr>
 125  *     <tr>
 126  *       <th id="byte3" style="text-align:left">Byte 3</th>

 127  *       <td style="text-align:center">1
 128  *       <td style="text-align:center">0
 129  *       <td colspan="6" style="text-align:center">bits 5-0
 130  *     </tr>
 131  *     </tbody>
 132  *   </table>
 133  * </blockquote>
 134  * <p>
 135  * The differences between this format and the
 136  * standard UTF-8 format are the following:
 137  * <ul>
 138  * <li>The null byte {@code '\u005Cu0000'} is encoded in 2-byte format
 139  *     rather than 1-byte, so that the encoded strings never have
 140  *     embedded nulls.
 141  * <li>Only the 1-byte, 2-byte, and 3-byte formats are used.
 142  * <li><a href="../lang/Character.html#unicode">Supplementary characters</a>
 143  *     are represented in the form of surrogate pairs.
 144  * </ul>
 145  * @author  Frank Yellin
 146  * @see     java.io.DataInputStream
 147  * @see     java.io.DataOutput
 148  * @since   1.0
 149  */
 150 public
 151 interface DataInput {
 152     /**
 153      * Reads some bytes from an input




  37  * format.
  38  * <p>
  39  * It is generally true of all the reading
  40  * routines in this interface that if end of
  41  * file is reached before the desired number
  42  * of bytes has been read, an {@code EOFException}
  43  * (which is a kind of {@code IOException})
  44  * is thrown. If any byte cannot be read for
  45  * any reason other than end of file, an {@code IOException}
  46  * other than {@code EOFException} is
  47  * thrown. In particular, an {@code IOException}
  48  * may be thrown if the input stream has been
  49  * closed.
  50  *
  51  * <h3><a id="modified-utf-8">Modified UTF-8</a></h3>
  52  * <p>
  53  * Implementations of the DataInput and DataOutput interfaces represent
  54  * Unicode strings in a format that is a slight modification of UTF-8.
  55  * (For information regarding the standard UTF-8 format, see section
  56  * <i>3.9 Unicode Encoding Forms</i> of <i>The Unicode Standard, Version
  57  * 4.0</i>)


  58  *
  59  * <ul>
  60  * <li>Characters in the range {@code '\u005Cu0001'} to
  61  *         {@code '\u005Cu007F'} are represented by a single byte.
  62  * <li>The null character {@code '\u005Cu0000'} and characters
  63  *         in the range {@code '\u005Cu0080'} to {@code '\u005Cu07FF'} are
  64  *         represented by a pair of bytes.
  65  * <li>Characters in the range {@code '\u005Cu0800'}
  66  *         to {@code '\u005CuFFFF'} are represented by three bytes.
  67  * </ul>
  68  *
  69  *   <table class="plain" style="margin-left:2em;">
  70  *     <caption>Encoding of UTF-8 values</caption>
  71  *     <thead>
  72  *     <tr>
  73  *       <th scope="col" rowspan="2">Value</th>
  74  *       <th scope="col" rowspan="2">Byte</th>
  75  *       <th scope="col" colspan="8" id="bit_a">Bit Values</th>
  76  *     </tr>
  77  *     <tr>
  78  *       <!-- Value -->
  79  *       <!-- Byte -->
  80  *       <th scope="col" style="width:3em"> 7 </th>
  81  *       <th scope="col" style="width:3em"> 6 </th>
  82  *       <th scope="col" style="width:3em"> 5 </th>
  83  *       <th scope="col" style="width:3em"> 4 </th>
  84  *       <th scope="col" style="width:3em"> 3 </th>
  85  *       <th scope="col" style="width:3em"> 2 </th>
  86  *       <th scope="col" style="width:3em"> 1 </th>
  87  *       <th scope="col" style="width:3em"> 0 </th>
  88  *     </thead>
  89  *     <tbody>
  90  *     <tr>
  91  *       <th scope="row" style="text-align:left; font-weight:normal">
  92  *         {@code \u005Cu0001} to {@code \u005Cu007F} </th>
  93  *       <th scope="row" style="font-weight:normal"> 1 </th>







  94  *       <td style="text-align:center">0
  95  *       <td colspan="7" style="text-align:center">bits 6-0
  96  *     </tr>
  97  *     <tr>
  98  *       <th scope="row" rowspan="2" style="text-align:left; font-weight:normal">
  99  *           {@code \u005Cu0000},<br>
 100  *           {@code \u005Cu0080} to {@code \u005Cu07FF} </th>
 101  *       <th scope="row" style="font-weight:normal"> 1 </th>







 102  *       <td style="text-align:center">1
 103  *       <td style="text-align:center">1
 104  *       <td style="text-align:center">0
 105  *       <td colspan="5" style="text-align:center">bits 10-6
 106  *     </tr>
 107  *     <tr>
 108  *       <!-- (value) -->
 109  *       <th scope="row" style="font-weight:normal"> 2 </th>
 110  *       <td style="text-align:center">1
 111  *       <td style="text-align:center">0
 112  *       <td colspan="6" style="text-align:center">bits 5-0
 113  *     </tr>
 114  *     <tr>
 115  *       <th scope="row" rowspan="3" style="text-align:left; font-weight:normal">
 116  *         {@code \u005Cu0800} to {@code \u005CuFFFF} </th>
 117  *       <th scope="row" style="font-weight:normal"> 1 </th>







 118  *       <td style="text-align:center">1
 119  *       <td style="text-align:center">1
 120  *       <td style="text-align:center">1
 121  *       <td style="text-align:center">0
 122  *       <td colspan="4" style="text-align:center">bits 15-12
 123  *     </tr>
 124  *     <tr>
 125  *       <!-- (value) -->
 126  *       <th scope="row" style="font-weight:normal"> 2 </th>
 127  *       <td style="text-align:center">1
 128  *       <td style="text-align:center">0
 129  *       <td colspan="6" style="text-align:center">bits 11-6
 130  *     </tr>
 131  *     <tr>
 132  *       <!-- (value) -->
 133  *       <th scope="row" style="font-weight:normal"> 3 </th>
 134  *       <td style="text-align:center">1
 135  *       <td style="text-align:center">0
 136  *       <td colspan="6" style="text-align:center">bits 5-0
 137  *     </tr>
 138  *     </tbody>
 139  *   </table>
 140  *
 141  * <p>
 142  * The differences between this format and the
 143  * standard UTF-8 format are the following:
 144  * <ul>
 145  * <li>The null byte {@code '\u005Cu0000'} is encoded in 2-byte format
 146  *     rather than 1-byte, so that the encoded strings never have
 147  *     embedded nulls.
 148  * <li>Only the 1-byte, 2-byte, and 3-byte formats are used.
 149  * <li><a href="../lang/Character.html#unicode">Supplementary characters</a>
 150  *     are represented in the form of surrogate pairs.
 151  * </ul>
 152  * @author  Frank Yellin
 153  * @see     java.io.DataInputStream
 154  * @see     java.io.DataOutput
 155  * @since   1.0
 156  */
 157 public
 158 interface DataInput {
 159     /**
 160      * Reads some bytes from an input


< prev index next >