66 * <h3> URI syntax and components </h3>
67 *
68 * At the highest level a URI reference (hereinafter simply "URI") in string
69 * form has the syntax
70 *
71 * <blockquote>
72 * [<i>scheme</i><b>{@code :}</b>]<i>scheme-specific-part</i>[<b>{@code #}</b><i>fragment</i>]
73 * </blockquote>
74 *
75 * where square brackets [...] delineate optional components and the characters
76 * <b>{@code :}</b> and <b>{@code #}</b> stand for themselves.
77 *
78 * <p> An <i>absolute</i> URI specifies a scheme; a URI that is not absolute is
79 * said to be <i>relative</i>. URIs are also classified according to whether
80 * they are <i>opaque</i> or <i>hierarchical</i>.
81 *
82 * <p> An <i>opaque</i> URI is an absolute URI whose scheme-specific part does
83 * not begin with a slash character ({@code '/'}). Opaque URIs are not
84 * subject to further parsing. Some examples of opaque URIs are:
85 *
86 * <blockquote><table cellpadding=0 cellspacing=0 summary="layout">
87 * <tr><td>{@code mailto:java-net@java.sun.com}<td></tr>
88 * <tr><td>{@code news:comp.lang.java}<td></tr>
89 * <tr><td>{@code urn:isbn:096139210x}</td></tr>
90 * </table></blockquote>
91 *
92 * <p> A <i>hierarchical</i> URI is either an absolute URI whose
93 * scheme-specific part begins with a slash character, or a relative URI, that
94 * is, a URI that does not specify a scheme. Some examples of hierarchical
95 * URIs are:
96 *
97 * <blockquote>
98 * {@code http://example.com/languages/java/}<br>
99 * {@code sample/a/index.html#28}<br>
100 * {@code ../../demo/b/index.html}<br>
101 * {@code file:///~/calendar}
102 * </blockquote>
103 *
104 * <p> A hierarchical URI is subject to further parsing according to the syntax
105 *
106 * <blockquote>
107 * [<i>scheme</i><b>{@code :}</b>][<b>{@code //}</b><i>authority</i>][<i>path</i>][<b>{@code ?}</b><i>query</i>][<b>{@code #}</b><i>fragment</i>]
108 * </blockquote>
109 *
110 * where the characters <b>{@code :}</b>, <b>{@code /}</b>,
115 * <p> The authority component of a hierarchical URI is, if specified, either
116 * <i>server-based</i> or <i>registry-based</i>. A server-based authority
117 * parses according to the familiar syntax
118 *
119 * <blockquote>
120 * [<i>user-info</i><b>{@code @}</b>]<i>host</i>[<b>{@code :}</b><i>port</i>]
121 * </blockquote>
122 *
123 * where the characters <b>{@code @}</b> and <b>{@code :}</b> stand for
124 * themselves. Nearly all URI schemes currently in use are server-based. An
125 * authority component that does not parse in this way is considered to be
126 * registry-based.
127 *
128 * <p> The path component of a hierarchical URI is itself said to be absolute
129 * if it begins with a slash character ({@code '/'}); otherwise it is
130 * relative. The path of a hierarchical URI that is either absolute or
131 * specifies an authority is always absolute.
132 *
133 * <p> All told, then, a URI instance has the following nine components:
134 *
135 * <blockquote><table summary="Describes the components of a URI:scheme,scheme-specific-part,authority,user-info,host,port,path,query,fragment">
136 * <tr><th><i>Component</i></th><th><i>Type</i></th></tr>
137 * <tr><td>scheme</td><td>{@code String}</td></tr>
138 * <tr><td>scheme-specific-part </td><td>{@code String}</td></tr>
139 * <tr><td>authority</td><td>{@code String}</td></tr>
140 * <tr><td>user-info</td><td>{@code String}</td></tr>
141 * <tr><td>host</td><td>{@code String}</td></tr>
142 * <tr><td>port</td><td>{@code int}</td></tr>
143 * <tr><td>path</td><td>{@code String}</td></tr>
144 * <tr><td>query</td><td>{@code String}</td></tr>
145 * <tr><td>fragment</td><td>{@code String}</td></tr>
146 * </table></blockquote>
147 *
148 * In a given instance any particular component is either <i>undefined</i> or
149 * <i>defined</i> with a distinct value. Undefined string components are
150 * represented by {@code null}, while undefined integer components are
151 * represented by {@code -1}. A string component may be defined to have the
152 * empty string as its value; this is not equivalent to that component being
153 * undefined.
154 *
155 * <p> Whether a particular component is or is not defined in an instance
156 * depends upon the type of the URI being represented. An absolute URI has a
157 * scheme component. An opaque URI has a scheme, a scheme-specific part, and
158 * possibly a fragment, but has no other components. A hierarchical URI always
159 * has a path (though it may be empty) and a scheme-specific-part (which at
160 * least contains the path), and may have any of the other components. If the
161 * authority component is present and is server-based then the host component
162 * will be defined and the user-information and port components may be defined.
163 *
164 *
165 * <h4> Operations on URI instances </h4>
231 * <blockquote>
232 * {@code http://example.com/languages/java/sample/a/index.html#28}
233 * </blockquote>
234 *
235 * against the base URI
236 *
237 * <blockquote>
238 * {@code http://example.com/languages/java/}
239 * </blockquote>
240 *
241 * yields the relative URI {@code sample/a/index.html#28}.
242 *
243 *
244 * <h4> Character categories </h4>
245 *
246 * RFC 2396 specifies precisely which characters are permitted in the
247 * various components of a URI reference. The following categories, most of
248 * which are taken from that specification, are used below to describe these
249 * constraints:
250 *
251 * <blockquote><table cellspacing=2 summary="Describes categories alpha,digit,alphanum,unreserved,punct,reserved,escaped,and other">
252 * <tr><th valign=top><i>alpha</i></th>
253 * <td>The US-ASCII alphabetic characters,
254 * {@code 'A'} through {@code 'Z'}
255 * and {@code 'a'} through {@code 'z'}</td></tr>
256 * <tr><th valign=top><i>digit</i></th>
257 * <td>The US-ASCII decimal digit characters,
258 * {@code '0'} through {@code '9'}</td></tr>
259 * <tr><th valign=top><i>alphanum</i></th>
260 * <td>All <i>alpha</i> and <i>digit</i> characters</td></tr>
261 * <tr><th valign=top><i>unreserved</i> </th>
262 * <td>All <i>alphanum</i> characters together with those in the string
263 * {@code "_-!.~'()*"}</td></tr>
264 * <tr><th valign=top><i>punct</i></th>
265 * <td>The characters in the string {@code ",;:$&+="}</td></tr>
266 * <tr><th valign=top><i>reserved</i></th>
267 * <td>All <i>punct</i> characters together with those in the string
268 * {@code "?/[]@"}</td></tr>
269 * <tr><th valign=top><i>escaped</i></th>
270 * <td>Escaped octets, that is, triplets consisting of the percent
271 * character ({@code '%'}) followed by two hexadecimal digits
272 * ({@code '0'}-{@code '9'}, {@code 'A'}-{@code 'F'}, and
273 * {@code 'a'}-{@code 'f'})</td></tr>
274 * <tr><th valign=top><i>other</i></th>
275 * <td>The Unicode characters that are not in the US-ASCII character set,
276 * are not control characters (according to the {@link
277 * java.lang.Character#isISOControl(char) Character.isISOControl}
278 * method), and are not space characters (according to the {@link
279 * java.lang.Character#isSpaceChar(char) Character.isSpaceChar}
280 * method) <i>(<b>Deviation from RFC 2396</b>, which is
281 * limited to US-ASCII)</i></td></tr>
282 * </table></blockquote>
283 *
284 * <p><a id="legal-chars"></a> The set of all legal URI characters consists of
285 * the <i>unreserved</i>, <i>reserved</i>, <i>escaped</i>, and <i>other</i>
286 * characters.
287 *
288 *
289 * <h4> Escaped octets, quotation, encoding, and decoding </h4>
290 *
291 * RFC 2396 allows escaped octets to appear in the user-info, path, query, and
292 * fragment components. Escaping serves two purposes in URIs:
293 *
294 * <ul>
295 *
296 * <li><p> To <i>encode</i> non-US-ASCII characters when a URI is required to
297 * conform strictly to RFC 2396 by not containing any <i>other</i>
298 * characters. </p></li>
299 *
300 * <li><p> To <i>quote</i> characters that are otherwise illegal in a
301 * component. The user-info, path, query, and fragment components differ
|
66 * <h3> URI syntax and components </h3>
67 *
68 * At the highest level a URI reference (hereinafter simply "URI") in string
69 * form has the syntax
70 *
71 * <blockquote>
72 * [<i>scheme</i><b>{@code :}</b>]<i>scheme-specific-part</i>[<b>{@code #}</b><i>fragment</i>]
73 * </blockquote>
74 *
75 * where square brackets [...] delineate optional components and the characters
76 * <b>{@code :}</b> and <b>{@code #}</b> stand for themselves.
77 *
78 * <p> An <i>absolute</i> URI specifies a scheme; a URI that is not absolute is
79 * said to be <i>relative</i>. URIs are also classified according to whether
80 * they are <i>opaque</i> or <i>hierarchical</i>.
81 *
82 * <p> An <i>opaque</i> URI is an absolute URI whose scheme-specific part does
83 * not begin with a slash character ({@code '/'}). Opaque URIs are not
84 * subject to further parsing. Some examples of opaque URIs are:
85 *
86 * <blockquote><ul style="list-style-type:none">
87 * <li>{@code mailto:java-net@java.sun.com}</li>
88 * <li>{@code news:comp.lang.java}</li>
89 * <li>{@code urn:isbn:096139210x}</li>
90 * </ul></blockquote>
91 *
92 * <p> A <i>hierarchical</i> URI is either an absolute URI whose
93 * scheme-specific part begins with a slash character, or a relative URI, that
94 * is, a URI that does not specify a scheme. Some examples of hierarchical
95 * URIs are:
96 *
97 * <blockquote>
98 * {@code http://example.com/languages/java/}<br>
99 * {@code sample/a/index.html#28}<br>
100 * {@code ../../demo/b/index.html}<br>
101 * {@code file:///~/calendar}
102 * </blockquote>
103 *
104 * <p> A hierarchical URI is subject to further parsing according to the syntax
105 *
106 * <blockquote>
107 * [<i>scheme</i><b>{@code :}</b>][<b>{@code //}</b><i>authority</i>][<i>path</i>][<b>{@code ?}</b><i>query</i>][<b>{@code #}</b><i>fragment</i>]
108 * </blockquote>
109 *
110 * where the characters <b>{@code :}</b>, <b>{@code /}</b>,
115 * <p> The authority component of a hierarchical URI is, if specified, either
116 * <i>server-based</i> or <i>registry-based</i>. A server-based authority
117 * parses according to the familiar syntax
118 *
119 * <blockquote>
120 * [<i>user-info</i><b>{@code @}</b>]<i>host</i>[<b>{@code :}</b><i>port</i>]
121 * </blockquote>
122 *
123 * where the characters <b>{@code @}</b> and <b>{@code :}</b> stand for
124 * themselves. Nearly all URI schemes currently in use are server-based. An
125 * authority component that does not parse in this way is considered to be
126 * registry-based.
127 *
128 * <p> The path component of a hierarchical URI is itself said to be absolute
129 * if it begins with a slash character ({@code '/'}); otherwise it is
130 * relative. The path of a hierarchical URI that is either absolute or
131 * specifies an authority is always absolute.
132 *
133 * <p> All told, then, a URI instance has the following nine components:
134 *
135 * <blockquote><table class="borderless">
136 * <caption style="display:none">Describes the components of a URI:scheme,scheme-specific-part,authority,user-info,host,port,path,query,fragment</caption>
137 * <thead>
138 * <tr><th><i>Component</i></th><th><i>Type</i></th></tr>
139 * </thead>
140 * <tbody>
141 * <tr><td>scheme</td><td>{@code String}</td></tr>
142 * <tr><td>scheme-specific-part </td><td>{@code String}</td></tr>
143 * <tr><td>authority</td><td>{@code String}</td></tr>
144 * <tr><td>user-info</td><td>{@code String}</td></tr>
145 * <tr><td>host</td><td>{@code String}</td></tr>
146 * <tr><td>port</td><td>{@code int}</td></tr>
147 * <tr><td>path</td><td>{@code String}</td></tr>
148 * <tr><td>query</td><td>{@code String}</td></tr>
149 * <tr><td>fragment</td><td>{@code String}</td></tr>
150 * </tbody>
151 * </table></blockquote>
152 *
153 * In a given instance any particular component is either <i>undefined</i> or
154 * <i>defined</i> with a distinct value. Undefined string components are
155 * represented by {@code null}, while undefined integer components are
156 * represented by {@code -1}. A string component may be defined to have the
157 * empty string as its value; this is not equivalent to that component being
158 * undefined.
159 *
160 * <p> Whether a particular component is or is not defined in an instance
161 * depends upon the type of the URI being represented. An absolute URI has a
162 * scheme component. An opaque URI has a scheme, a scheme-specific part, and
163 * possibly a fragment, but has no other components. A hierarchical URI always
164 * has a path (though it may be empty) and a scheme-specific-part (which at
165 * least contains the path), and may have any of the other components. If the
166 * authority component is present and is server-based then the host component
167 * will be defined and the user-information and port components may be defined.
168 *
169 *
170 * <h4> Operations on URI instances </h4>
236 * <blockquote>
237 * {@code http://example.com/languages/java/sample/a/index.html#28}
238 * </blockquote>
239 *
240 * against the base URI
241 *
242 * <blockquote>
243 * {@code http://example.com/languages/java/}
244 * </blockquote>
245 *
246 * yields the relative URI {@code sample/a/index.html#28}.
247 *
248 *
249 * <h4> Character categories </h4>
250 *
251 * RFC 2396 specifies precisely which characters are permitted in the
252 * various components of a URI reference. The following categories, most of
253 * which are taken from that specification, are used below to describe these
254 * constraints:
255 *
256 * <blockquote><table>
257 * <caption style="display:none">Describes categories alpha,digit,alphanum,unreserved,punct,reserved,escaped,and other</caption>
258 * <tbody>
259 * <tr><th valign=top><i>alpha</i></th>
260 * <td>The US-ASCII alphabetic characters,
261 * {@code 'A'} through {@code 'Z'}
262 * and {@code 'a'} through {@code 'z'}</td></tr>
263 * <tr><th valign=top><i>digit</i></th>
264 * <td>The US-ASCII decimal digit characters,
265 * {@code '0'} through {@code '9'}</td></tr>
266 * <tr><th valign=top><i>alphanum</i></th>
267 * <td>All <i>alpha</i> and <i>digit</i> characters</td></tr>
268 * <tr><th valign=top><i>unreserved</i> </th>
269 * <td>All <i>alphanum</i> characters together with those in the string
270 * {@code "_-!.~'()*"}</td></tr>
271 * <tr><th valign=top><i>punct</i></th>
272 * <td>The characters in the string {@code ",;:$&+="}</td></tr>
273 * <tr><th valign=top><i>reserved</i></th>
274 * <td>All <i>punct</i> characters together with those in the string
275 * {@code "?/[]@"}</td></tr>
276 * <tr><th valign=top><i>escaped</i></th>
277 * <td>Escaped octets, that is, triplets consisting of the percent
278 * character ({@code '%'}) followed by two hexadecimal digits
279 * ({@code '0'}-{@code '9'}, {@code 'A'}-{@code 'F'}, and
280 * {@code 'a'}-{@code 'f'})</td></tr>
281 * <tr><th valign=top><i>other</i></th>
282 * <td>The Unicode characters that are not in the US-ASCII character set,
283 * are not control characters (according to the {@link
284 * java.lang.Character#isISOControl(char) Character.isISOControl}
285 * method), and are not space characters (according to the {@link
286 * java.lang.Character#isSpaceChar(char) Character.isSpaceChar}
287 * method) <i>(<b>Deviation from RFC 2396</b>, which is
288 * limited to US-ASCII)</i></td></tr>
289 * </tbody>
290 * </table></blockquote>
291 *
292 * <p><a id="legal-chars"></a> The set of all legal URI characters consists of
293 * the <i>unreserved</i>, <i>reserved</i>, <i>escaped</i>, and <i>other</i>
294 * characters.
295 *
296 *
297 * <h4> Escaped octets, quotation, encoding, and decoding </h4>
298 *
299 * RFC 2396 allows escaped octets to appear in the user-info, path, query, and
300 * fragment components. Escaping serves two purposes in URIs:
301 *
302 * <ul>
303 *
304 * <li><p> To <i>encode</i> non-US-ASCII characters when a URI is required to
305 * conform strictly to RFC 2396 by not containing any <i>other</i>
306 * characters. </p></li>
307 *
308 * <li><p> To <i>quote</i> characters that are otherwise illegal in a
309 * component. The user-info, path, query, and fragment components differ
|