jdk Sdiff src/java.base/share/classes/java/net

src/java.base/share/classes/java/net/URI.java

 115  * <p> The authority component of a hierarchical URI is, if specified, either
 116  * <i>server-based</i> or <i>registry-based</i>.  A server-based authority
 117  * parses according to the familiar syntax
 118  *
 119  * <blockquote>
 120  * [<i>user-info</i><b>{@code @}</b>]<i>host</i>[<b>{@code :}</b><i>port</i>]
 121  * </blockquote>
 122  *
 123  * where the characters <b>{@code @}</b> and <b>{@code :}</b> stand for
 124  * themselves.  Nearly all URI schemes currently in use are server-based.  An
 125  * authority component that does not parse in this way is considered to be
 126  * registry-based.
 127  *
 128  * <p> The path component of a hierarchical URI is itself said to be absolute
 129  * if it begins with a slash character ({@code '/'}); otherwise it is
 130  * relative.  The path of a hierarchical URI that is either absolute or
 131  * specifies an authority is always absolute.
 132  *
 133  * <p> All told, then, a URI instance has the following nine components:
 134  *
 135  * <blockquote><table class="borderless">
 136  * <caption style="display:none">Describes the components of a URI:scheme,scheme-specific-part,authority,user-info,host,port,path,query,fragment</caption>
 137  * <thead>
 138  * <tr><th><i>Component</i></th><th><i>Type</i></th></tr>
 139  * </thead>
 140  * <tbody>
 141  * <tr><td>scheme</td><td>{@code String}</td></tr>
 142  * <tr><td>scheme-specific-part&nbsp;&nbsp;&nbsp;&nbsp;</td><td>{@code String}</td></tr>
 143  * <tr><td>authority</td><td>{@code String}</td></tr>
 144  * <tr><td>user-info</td><td>{@code String}</td></tr>
 145  * <tr><td>host</td><td>{@code String}</td></tr>
 146  * <tr><td>port</td><td>{@code int}</td></tr>
 147  * <tr><td>path</td><td>{@code String}</td></tr>
 148  * <tr><td>query</td><td>{@code String}</td></tr>
 149  * <tr><td>fragment</td><td>{@code String}</td></tr>
 150  * </tbody>
 151  * </table></blockquote>
 152  *
 153  * In a given instance any particular component is either <i>undefined</i> or
 154  * <i>defined</i> with a distinct value.  Undefined string components are
 155  * represented by {@code null}, while undefined integer components are
 156  * represented by {@code -1}.  A string component may be defined to have the
 157  * empty string as its value; this is not equivalent to that component being
 158  * undefined.
 159  *
 160  * <p> Whether a particular component is or is not defined in an instance
 161  * depends upon the type of the URI being represented.  An absolute URI has a
 162  * scheme component.  An opaque URI has a scheme, a scheme-specific part, and
 163  * possibly a fragment, but has no other components.  A hierarchical URI always
 164  * has a path (though it may be empty) and a scheme-specific-part (which at
 165  * least contains the path), and may have any of the other components.  If the
 166  * authority component is present and is server-based then the host component
 167  * will be defined and the user-information and port components may be defined.
 168  *
 169  *
 170  * <h4> Operations on URI instances </h4>
 171  *

 236  * <blockquote>
 237  * {@code http://example.com/languages/java/sample/a/index.html#28}
 238  * </blockquote>
 239  *
 240  * against the base URI
 241  *
 242  * <blockquote>
 243  * {@code http://example.com/languages/java/}
 244  * </blockquote>
 245  *
 246  * yields the relative URI {@code sample/a/index.html#28}.
 247  *
 248  *
 249  * <h4> Character categories </h4>
 250  *
 251  * RFC&nbsp;2396 specifies precisely which characters are permitted in the
 252  * various components of a URI reference.  The following categories, most of
 253  * which are taken from that specification, are used below to describe these
 254  * constraints:
 255  *
 256  * <blockquote><table class="borderless">
 257  * <caption style="display:none">Describes categories alpha,digit,alphanum,unreserved,punct,reserved,escaped,and other</caption>
 258  *   <tbody>
 259  *   <tr><th style="vertical-align:top"><i>alpha</i></th>



 260  *       <td>The US-ASCII alphabetic characters,
 261  *        {@code 'A'}&nbsp;through&nbsp;{@code 'Z'}
 262  *        and {@code 'a'}&nbsp;through&nbsp;{@code 'z'}</td></tr>
 263  *   <tr><th style="vertical-align:top"><i>digit</i></th>
 264  *       <td>The US-ASCII decimal digit characters,
 265  *       {@code '0'}&nbsp;through&nbsp;{@code '9'}</td></tr>
 266  *   <tr><th style="vertical-align:top"><i>alphanum</i></th>
 267  *       <td>All <i>alpha</i> and <i>digit</i> characters</td></tr>
 268  *   <tr><th style="vertical-align:top"><i>unreserved</i>&nbsp;&nbsp;&nbsp;&nbsp;</th>
 269  *       <td>All <i>alphanum</i> characters together with those in the string
 270  *        {@code "_-!.~'()*"}</td></tr>
 271  *   <tr><th style="vertical-align:top"><i>punct</i></th>
 272  *       <td>The characters in the string {@code ",;:$&+="}</td></tr>
 273  *   <tr><th style="vertical-align:top"><i>reserved</i></th>
 274  *       <td>All <i>punct</i> characters together with those in the string
 275  *        {@code "?/[]@"}</td></tr>
 276  *   <tr><th style="vertical-align:top"><i>escaped</i></th>
 277  *       <td>Escaped octets, that is, triplets consisting of the percent
 278  *           character ({@code '%'}) followed by two hexadecimal digits
 279  *           ({@code '0'}-{@code '9'}, {@code 'A'}-{@code 'F'}, and
 280  *           {@code 'a'}-{@code 'f'})</td></tr>
 281  *   <tr><th style="vertical-align:top"><i>other</i></th>
 282  *       <td>The Unicode characters that are not in the US-ASCII character set,
 283  *           are not control characters (according to the {@link
 284  *           java.lang.Character#isISOControl(char) Character.isISOControl}
 285  *           method), and are not space characters (according to the {@link
 286  *           java.lang.Character#isSpaceChar(char) Character.isSpaceChar}
 287  *           method)&nbsp;&nbsp;<i>(<b>Deviation from RFC 2396</b>, which is
 288  *           limited to US-ASCII)</i></td></tr>
 289  * </tbody>
 290  * </table></blockquote>
 291  *
 292  * <p><a id="legal-chars"></a> The set of all legal URI characters consists of
 293  * the <i>unreserved</i>, <i>reserved</i>, <i>escaped</i>, and <i>other</i>
 294  * characters.
 295  *
 296  *
 297  * <h4> Escaped octets, quotation, encoding, and decoding </h4>
 298  *
 299  * RFC 2396 allows escaped octets to appear in the user-info, path, query, and
 300  * fragment components.  Escaping serves two purposes in URIs:
 301  *
 302  * <ul>
 303  *
 304  *   <li><p> To <i>encode</i> non-US-ASCII characters when a URI is required to
 305  *   conform strictly to RFC&nbsp;2396 by not containing any <i>other</i>
 306  *   characters.  </p></li>
 307  *
 308  *   <li><p> To <i>quote</i> characters that are otherwise illegal in a
 309  *   component.  The user-info, path, query, and fragment components differ
 310  *   slightly in terms of which characters are considered legal and illegal.

 115  * <p> The authority component of a hierarchical URI is, if specified, either
 116  * <i>server-based</i> or <i>registry-based</i>.  A server-based authority
 117  * parses according to the familiar syntax
 118  *
 119  * <blockquote>
 120  * [<i>user-info</i><b>{@code @}</b>]<i>host</i>[<b>{@code :}</b><i>port</i>]
 121  * </blockquote>
 122  *
 123  * where the characters <b>{@code @}</b> and <b>{@code :}</b> stand for
 124  * themselves.  Nearly all URI schemes currently in use are server-based.  An
 125  * authority component that does not parse in this way is considered to be
 126  * registry-based.
 127  *
 128  * <p> The path component of a hierarchical URI is itself said to be absolute
 129  * if it begins with a slash character ({@code '/'}); otherwise it is
 130  * relative.  The path of a hierarchical URI that is either absolute or
 131  * specifies an authority is always absolute.
 132  *
 133  * <p> All told, then, a URI instance has the following nine components:
 134  *
 135  * <table class="striped" style="margin-left:2em">
 136  * <caption style="display:none">Describes the components of a URI:scheme,scheme-specific-part,authority,user-info,host,port,path,query,fragment</caption>
 137  * <thead>
 138  * <tr><th scope="col">Component</th><th scope="col">Type</th></tr>
 139  * </thead>
 140  * <tbody style="text-align:left">
 141  * <tr><th scope="row">scheme</th><td>{@code String}</td></tr>
 142  * <tr><th scope="row">scheme-specific-part</th><td>{@code String}</td></tr>
 143  * <tr><th scope="row">authority</th><td>{@code String}</td></tr>
 144  * <tr><th scope="row">user-info</th><td>{@code String}</td></tr>
 145  * <tr><th scope="row">host</th><td>{@code String}</td></tr>
 146  * <tr><th scope="row">port</th><td>{@code int}</td></tr>
 147  * <tr><th scope="row">path</th><td>{@code String}</td></tr>
 148  * <tr><th scope="row">query</th><td>{@code String}</td></tr>
 149  * <tr><th scope="row">fragment</th><td>{@code String}</td></tr>
 150  * </tbody>
 151  * </table>
 152  *
 153  * In a given instance any particular component is either <i>undefined</i> or
 154  * <i>defined</i> with a distinct value.  Undefined string components are
 155  * represented by {@code null}, while undefined integer components are
 156  * represented by {@code -1}.  A string component may be defined to have the
 157  * empty string as its value; this is not equivalent to that component being
 158  * undefined.
 159  *
 160  * <p> Whether a particular component is or is not defined in an instance
 161  * depends upon the type of the URI being represented.  An absolute URI has a
 162  * scheme component.  An opaque URI has a scheme, a scheme-specific part, and
 163  * possibly a fragment, but has no other components.  A hierarchical URI always
 164  * has a path (though it may be empty) and a scheme-specific-part (which at
 165  * least contains the path), and may have any of the other components.  If the
 166  * authority component is present and is server-based then the host component
 167  * will be defined and the user-information and port components may be defined.
 168  *
 169  *
 170  * <h4> Operations on URI instances </h4>
 171  *

 236  * <blockquote>
 237  * {@code http://example.com/languages/java/sample/a/index.html#28}
 238  * </blockquote>
 239  *
 240  * against the base URI
 241  *
 242  * <blockquote>
 243  * {@code http://example.com/languages/java/}
 244  * </blockquote>
 245  *
 246  * yields the relative URI {@code sample/a/index.html#28}.
 247  *
 248  *
 249  * <h4> Character categories </h4>
 250  *
 251  * RFC&nbsp;2396 specifies precisely which characters are permitted in the
 252  * various components of a URI reference.  The following categories, most of
 253  * which are taken from that specification, are used below to describe these
 254  * constraints:
 255  *
 256  * <table class="striped" style="margin-left:2em">
 257  * <caption style="display:none">Describes categories alpha,digit,alphanum,unreserved,punct,reserved,escaped,and other</caption>
 258  *   <thead>
 259  *   <tr><th scope="col">Category</th><th scope="col">Description</th></tr>
 260  *   </thead>
 261  *   <tbody style="text-align:left">
 262  *   <tr><th scope="row" style="vertical-align:top">alpha</th>
 263  *       <td>The US-ASCII alphabetic characters,
 264  *        {@code 'A'}&nbsp;through&nbsp;{@code 'Z'}
 265  *        and {@code 'a'}&nbsp;through&nbsp;{@code 'z'}</td></tr>
 266  *   <tr><th scope="row" style="vertical-align:top">digit</th>
 267  *       <td>The US-ASCII decimal digit characters,
 268  *       {@code '0'}&nbsp;through&nbsp;{@code '9'}</td></tr>
 269  *   <tr><th scope="row" style="vertical-align:top">alphanum</th>
 270  *       <td>All <i>alpha</i> and <i>digit</i> characters</td></tr>
 271  *   <tr><th scope="row" style="vertical-align:top">unreserved</th>
 272  *       <td>All <i>alphanum</i> characters together with those in the string
 273  *        {@code "_-!.~'()*"}</td></tr>
 274  *   <tr><th scope="row" style="vertical-align:top">punct</th>
 275  *       <td>The characters in the string {@code ",;:$&+="}</td></tr>
 276  *   <tr><th scope="row" style="vertical-align:top">reserved</th>
 277  *       <td>All <i>punct</i> characters together with those in the string
 278  *        {@code "?/[]@"}</td></tr>
 279  *   <tr><th scope="row" style="vertical-align:top">escaped</th>
 280  *       <td>Escaped octets, that is, triplets consisting of the percent
 281  *           character ({@code '%'}) followed by two hexadecimal digits
 282  *           ({@code '0'}-{@code '9'}, {@code 'A'}-{@code 'F'}, and
 283  *           {@code 'a'}-{@code 'f'})</td></tr>
 284  *   <tr><th scope="row" style="vertical-align:top">other</th>
 285  *       <td>The Unicode characters that are not in the US-ASCII character set,
 286  *           are not control characters (according to the {@link
 287  *           java.lang.Character#isISOControl(char) Character.isISOControl}
 288  *           method), and are not space characters (according to the {@link
 289  *           java.lang.Character#isSpaceChar(char) Character.isSpaceChar}
 290  *           method)&nbsp;&nbsp;<i>(<b>Deviation from RFC 2396</b>, which is
 291  *           limited to US-ASCII)</i></td></tr>
 292  * </tbody>
 293  * </table>
 294  *
 295  * <p><a id="legal-chars"></a> The set of all legal URI characters consists of
 296  * the <i>unreserved</i>, <i>reserved</i>, <i>escaped</i>, and <i>other</i>
 297  * characters.
 298  *
 299  *
 300  * <h4> Escaped octets, quotation, encoding, and decoding </h4>
 301  *
 302  * RFC 2396 allows escaped octets to appear in the user-info, path, query, and
 303  * fragment components.  Escaping serves two purposes in URIs:
 304  *
 305  * <ul>
 306  *
 307  *   <li><p> To <i>encode</i> non-US-ASCII characters when a URI is required to
 308  *   conform strictly to RFC&nbsp;2396 by not containing any <i>other</i>
 309  *   characters.  </p></li>
 310  *
 311  *   <li><p> To <i>quote</i> characters that are otherwise illegal in a
 312  *   component.  The user-info, path, query, and fragment components differ
 313  *   slightly in terms of which characters are considered legal and illegal.

< prev index next >