115 * <p> The authority component of a hierarchical URI is, if specified, either 116 * <i>server-based</i> or <i>registry-based</i>. A server-based authority 117 * parses according to the familiar syntax 118 * 119 * <blockquote> 120 * [<i>user-info</i><b>{@code @}</b>]<i>host</i>[<b>{@code :}</b><i>port</i>] 121 * </blockquote> 122 * 123 * where the characters <b>{@code @}</b> and <b>{@code :}</b> stand for 124 * themselves. Nearly all URI schemes currently in use are server-based. An 125 * authority component that does not parse in this way is considered to be 126 * registry-based. 127 * 128 * <p> The path component of a hierarchical URI is itself said to be absolute 129 * if it begins with a slash character ({@code '/'}); otherwise it is 130 * relative. The path of a hierarchical URI that is either absolute or 131 * specifies an authority is always absolute. 132 * 133 * <p> All told, then, a URI instance has the following nine components: 134 * 135 * <blockquote><table class="borderless"> 136 * <caption style="display:none">Describes the components of a URI:scheme,scheme-specific-part,authority,user-info,host,port,path,query,fragment</caption> 137 * <thead> 138 * <tr><th><i>Component</i></th><th><i>Type</i></th></tr> 139 * </thead> 140 * <tbody> 141 * <tr><td>scheme</td><td>{@code String}</td></tr> 142 * <tr><td>scheme-specific-part </td><td>{@code String}</td></tr> 143 * <tr><td>authority</td><td>{@code String}</td></tr> 144 * <tr><td>user-info</td><td>{@code String}</td></tr> 145 * <tr><td>host</td><td>{@code String}</td></tr> 146 * <tr><td>port</td><td>{@code int}</td></tr> 147 * <tr><td>path</td><td>{@code String}</td></tr> 148 * <tr><td>query</td><td>{@code String}</td></tr> 149 * <tr><td>fragment</td><td>{@code String}</td></tr> 150 * </tbody> 151 * </table></blockquote> 152 * 153 * In a given instance any particular component is either <i>undefined</i> or 154 * <i>defined</i> with a distinct value. Undefined string components are 155 * represented by {@code null}, while undefined integer components are 156 * represented by {@code -1}. A string component may be defined to have the 157 * empty string as its value; this is not equivalent to that component being 158 * undefined. 159 * 160 * <p> Whether a particular component is or is not defined in an instance 161 * depends upon the type of the URI being represented. An absolute URI has a 162 * scheme component. An opaque URI has a scheme, a scheme-specific part, and 163 * possibly a fragment, but has no other components. A hierarchical URI always 164 * has a path (though it may be empty) and a scheme-specific-part (which at 165 * least contains the path), and may have any of the other components. If the 166 * authority component is present and is server-based then the host component 167 * will be defined and the user-information and port components may be defined. 168 * 169 * 170 * <h4> Operations on URI instances </h4> 171 * 236 * <blockquote> 237 * {@code http://example.com/languages/java/sample/a/index.html#28} 238 * </blockquote> 239 * 240 * against the base URI 241 * 242 * <blockquote> 243 * {@code http://example.com/languages/java/} 244 * </blockquote> 245 * 246 * yields the relative URI {@code sample/a/index.html#28}. 247 * 248 * 249 * <h4> Character categories </h4> 250 * 251 * RFC 2396 specifies precisely which characters are permitted in the 252 * various components of a URI reference. The following categories, most of 253 * which are taken from that specification, are used below to describe these 254 * constraints: 255 * 256 * <blockquote><table class="borderless"> 257 * <caption style="display:none">Describes categories alpha,digit,alphanum,unreserved,punct,reserved,escaped,and other</caption> 258 * <tbody> 259 * <tr><th style="vertical-align:top"><i>alpha</i></th> 260 * <td>The US-ASCII alphabetic characters, 261 * {@code 'A'} through {@code 'Z'} 262 * and {@code 'a'} through {@code 'z'}</td></tr> 263 * <tr><th style="vertical-align:top"><i>digit</i></th> 264 * <td>The US-ASCII decimal digit characters, 265 * {@code '0'} through {@code '9'}</td></tr> 266 * <tr><th style="vertical-align:top"><i>alphanum</i></th> 267 * <td>All <i>alpha</i> and <i>digit</i> characters</td></tr> 268 * <tr><th style="vertical-align:top"><i>unreserved</i> </th> 269 * <td>All <i>alphanum</i> characters together with those in the string 270 * {@code "_-!.~'()*"}</td></tr> 271 * <tr><th style="vertical-align:top"><i>punct</i></th> 272 * <td>The characters in the string {@code ",;:$&+="}</td></tr> 273 * <tr><th style="vertical-align:top"><i>reserved</i></th> 274 * <td>All <i>punct</i> characters together with those in the string 275 * {@code "?/[]@"}</td></tr> 276 * <tr><th style="vertical-align:top"><i>escaped</i></th> 277 * <td>Escaped octets, that is, triplets consisting of the percent 278 * character ({@code '%'}) followed by two hexadecimal digits 279 * ({@code '0'}-{@code '9'}, {@code 'A'}-{@code 'F'}, and 280 * {@code 'a'}-{@code 'f'})</td></tr> 281 * <tr><th style="vertical-align:top"><i>other</i></th> 282 * <td>The Unicode characters that are not in the US-ASCII character set, 283 * are not control characters (according to the {@link 284 * java.lang.Character#isISOControl(char) Character.isISOControl} 285 * method), and are not space characters (according to the {@link 286 * java.lang.Character#isSpaceChar(char) Character.isSpaceChar} 287 * method) <i>(<b>Deviation from RFC 2396</b>, which is 288 * limited to US-ASCII)</i></td></tr> 289 * </tbody> 290 * </table></blockquote> 291 * 292 * <p><a id="legal-chars"></a> The set of all legal URI characters consists of 293 * the <i>unreserved</i>, <i>reserved</i>, <i>escaped</i>, and <i>other</i> 294 * characters. 295 * 296 * 297 * <h4> Escaped octets, quotation, encoding, and decoding </h4> 298 * 299 * RFC 2396 allows escaped octets to appear in the user-info, path, query, and 300 * fragment components. Escaping serves two purposes in URIs: 301 * 302 * <ul> 303 * 304 * <li><p> To <i>encode</i> non-US-ASCII characters when a URI is required to 305 * conform strictly to RFC 2396 by not containing any <i>other</i> 306 * characters. </p></li> 307 * 308 * <li><p> To <i>quote</i> characters that are otherwise illegal in a 309 * component. The user-info, path, query, and fragment components differ 310 * slightly in terms of which characters are considered legal and illegal. | 115 * <p> The authority component of a hierarchical URI is, if specified, either 116 * <i>server-based</i> or <i>registry-based</i>. A server-based authority 117 * parses according to the familiar syntax 118 * 119 * <blockquote> 120 * [<i>user-info</i><b>{@code @}</b>]<i>host</i>[<b>{@code :}</b><i>port</i>] 121 * </blockquote> 122 * 123 * where the characters <b>{@code @}</b> and <b>{@code :}</b> stand for 124 * themselves. Nearly all URI schemes currently in use are server-based. An 125 * authority component that does not parse in this way is considered to be 126 * registry-based. 127 * 128 * <p> The path component of a hierarchical URI is itself said to be absolute 129 * if it begins with a slash character ({@code '/'}); otherwise it is 130 * relative. The path of a hierarchical URI that is either absolute or 131 * specifies an authority is always absolute. 132 * 133 * <p> All told, then, a URI instance has the following nine components: 134 * 135 * <table class="striped" style="margin-left:2em"> 136 * <caption style="display:none">Describes the components of a URI:scheme,scheme-specific-part,authority,user-info,host,port,path,query,fragment</caption> 137 * <thead> 138 * <tr><th scope="col">Component</th><th scope="col">Type</th></tr> 139 * </thead> 140 * <tbody style="text-align:left"> 141 * <tr><th scope="row">scheme</th><td>{@code String}</td></tr> 142 * <tr><th scope="row">scheme-specific-part</th><td>{@code String}</td></tr> 143 * <tr><th scope="row">authority</th><td>{@code String}</td></tr> 144 * <tr><th scope="row">user-info</th><td>{@code String}</td></tr> 145 * <tr><th scope="row">host</th><td>{@code String}</td></tr> 146 * <tr><th scope="row">port</th><td>{@code int}</td></tr> 147 * <tr><th scope="row">path</th><td>{@code String}</td></tr> 148 * <tr><th scope="row">query</th><td>{@code String}</td></tr> 149 * <tr><th scope="row">fragment</th><td>{@code String}</td></tr> 150 * </tbody> 151 * </table> 152 * 153 * In a given instance any particular component is either <i>undefined</i> or 154 * <i>defined</i> with a distinct value. Undefined string components are 155 * represented by {@code null}, while undefined integer components are 156 * represented by {@code -1}. A string component may be defined to have the 157 * empty string as its value; this is not equivalent to that component being 158 * undefined. 159 * 160 * <p> Whether a particular component is or is not defined in an instance 161 * depends upon the type of the URI being represented. An absolute URI has a 162 * scheme component. An opaque URI has a scheme, a scheme-specific part, and 163 * possibly a fragment, but has no other components. A hierarchical URI always 164 * has a path (though it may be empty) and a scheme-specific-part (which at 165 * least contains the path), and may have any of the other components. If the 166 * authority component is present and is server-based then the host component 167 * will be defined and the user-information and port components may be defined. 168 * 169 * 170 * <h4> Operations on URI instances </h4> 171 * 236 * <blockquote> 237 * {@code http://example.com/languages/java/sample/a/index.html#28} 238 * </blockquote> 239 * 240 * against the base URI 241 * 242 * <blockquote> 243 * {@code http://example.com/languages/java/} 244 * </blockquote> 245 * 246 * yields the relative URI {@code sample/a/index.html#28}. 247 * 248 * 249 * <h4> Character categories </h4> 250 * 251 * RFC 2396 specifies precisely which characters are permitted in the 252 * various components of a URI reference. The following categories, most of 253 * which are taken from that specification, are used below to describe these 254 * constraints: 255 * 256 * <table class="striped" style="margin-left:2em"> 257 * <caption style="display:none">Describes categories alpha,digit,alphanum,unreserved,punct,reserved,escaped,and other</caption> 258 * <thead> 259 * <tr><th scope="col">Category</th><th scope="col">Description</th></tr> 260 * </thead> 261 * <tbody style="text-align:left"> 262 * <tr><th scope="row" style="vertical-align:top">alpha</th> 263 * <td>The US-ASCII alphabetic characters, 264 * {@code 'A'} through {@code 'Z'} 265 * and {@code 'a'} through {@code 'z'}</td></tr> 266 * <tr><th scope="row" style="vertical-align:top">digit</th> 267 * <td>The US-ASCII decimal digit characters, 268 * {@code '0'} through {@code '9'}</td></tr> 269 * <tr><th scope="row" style="vertical-align:top">alphanum</th> 270 * <td>All <i>alpha</i> and <i>digit</i> characters</td></tr> 271 * <tr><th scope="row" style="vertical-align:top">unreserved</th> 272 * <td>All <i>alphanum</i> characters together with those in the string 273 * {@code "_-!.~'()*"}</td></tr> 274 * <tr><th scope="row" style="vertical-align:top">punct</th> 275 * <td>The characters in the string {@code ",;:$&+="}</td></tr> 276 * <tr><th scope="row" style="vertical-align:top">reserved</th> 277 * <td>All <i>punct</i> characters together with those in the string 278 * {@code "?/[]@"}</td></tr> 279 * <tr><th scope="row" style="vertical-align:top">escaped</th> 280 * <td>Escaped octets, that is, triplets consisting of the percent 281 * character ({@code '%'}) followed by two hexadecimal digits 282 * ({@code '0'}-{@code '9'}, {@code 'A'}-{@code 'F'}, and 283 * {@code 'a'}-{@code 'f'})</td></tr> 284 * <tr><th scope="row" style="vertical-align:top">other</th> 285 * <td>The Unicode characters that are not in the US-ASCII character set, 286 * are not control characters (according to the {@link 287 * java.lang.Character#isISOControl(char) Character.isISOControl} 288 * method), and are not space characters (according to the {@link 289 * java.lang.Character#isSpaceChar(char) Character.isSpaceChar} 290 * method) <i>(<b>Deviation from RFC 2396</b>, which is 291 * limited to US-ASCII)</i></td></tr> 292 * </tbody> 293 * </table> 294 * 295 * <p><a id="legal-chars"></a> The set of all legal URI characters consists of 296 * the <i>unreserved</i>, <i>reserved</i>, <i>escaped</i>, and <i>other</i> 297 * characters. 298 * 299 * 300 * <h4> Escaped octets, quotation, encoding, and decoding </h4> 301 * 302 * RFC 2396 allows escaped octets to appear in the user-info, path, query, and 303 * fragment components. Escaping serves two purposes in URIs: 304 * 305 * <ul> 306 * 307 * <li><p> To <i>encode</i> non-US-ASCII characters when a URI is required to 308 * conform strictly to RFC 2396 by not containing any <i>other</i> 309 * characters. </p></li> 310 * 311 * <li><p> To <i>quote</i> characters that are otherwise illegal in a 312 * component. The user-info, path, query, and fragment components differ 313 * slightly in terms of which characters are considered legal and illegal. |