36 * it will be useful, but WITHOUT ANY WARRANTY; without even the implied 37 * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 38 * 39 * [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231 40 */ 41 42 package org.w3c.dom.ls; 43 44 import org.w3c.dom.Document; 45 import org.w3c.dom.DOMConfiguration; 46 import org.w3c.dom.Node; 47 import org.w3c.dom.DOMException; 48 49 /** 50 * An interface to an object that is able to build, or augment, a DOM tree 51 * from various input sources. 52 * <p> <code>LSParser</code> provides an API for parsing XML and building the 53 * corresponding DOM document structure. A <code>LSParser</code> instance 54 * can be obtained by invoking the 55 * <code>DOMImplementationLS.createLSParser()</code> method. 56 * <p> As specified in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 57 * , when a document is first made available via the LSParser: 58 * <ul> 59 * <li> there will 60 * never be two adjacent nodes of type NODE_TEXT, and there will never be 61 * empty text nodes. 62 * </li> 63 * <li> it is expected that the <code>value</code> and 64 * <code>nodeValue</code> attributes of an <code>Attr</code> node initially 65 * return the <a href='http://www.w3.org/TR/2004/REC-xml-20040204#AVNormalize'>XML 1.0 66 * normalized value</a>. However, if the parameters "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-validate-if-schema'> 67 * validate-if-schema</a>" and "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-datatype-normalization'> 68 * datatype-normalization</a>" are set to <code>true</code>, depending on the attribute normalization 69 * used, the attribute values may differ from the ones obtained by the XML 70 * 1.0 attribute normalization. If the parameters "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-datatype-normalization'> 71 * datatype-normalization</a>" is set to <code>false</code>, the XML 1.0 attribute normalization is 72 * guaranteed to occur, and if the attributes list does not contain 73 * namespace declarations, the <code>attributes</code> attribute on 74 * <code>Element</code> node represents the property <b>[attributes]</b> defined in [<a href='http://www.w3.org/TR/2004/REC-xml-infoset-20040204/'>XML Information Set</a>] 75 * . 76 * </li> 77 * </ul> 78 * <p> Asynchronous <code>LSParser</code> objects are expected to also 79 * implement the <code>events::EventTarget</code> interface so that event 80 * listeners can be registered on asynchronous <code>LSParser</code> 81 * objects. 82 * <p> Events supported by asynchronous <code>LSParser</code> objects are: 83 * <dl> 84 * <dt>load</dt> 85 * <dd> 86 * The <code>LSParser</code> finishes to load the document. See also the 87 * definition of the <code>LSLoadEvent</code> interface. </dd> 88 * <dt>progress</dt> 89 * <dd> The 90 * <code>LSParser</code> signals progress as data is parsed. This 91 * specification does not attempt to define exactly when progress events 92 * should be dispatched. That is intentionally left as 93 * implementation-dependent. Here is one example of how an application might 94 * dispatch progress events: Once the parser starts receiving data, a 95 * progress event is dispatched to indicate that the parsing starts. From 96 * there on, a progress event is dispatched for every 4096 bytes of data 97 * that is received and processed. This is only one example, though, and 98 * implementations can choose to dispatch progress events at any time while 99 * parsing, or not dispatch them at all. See also the definition of the 100 * <code>LSProgressEvent</code> interface. </dd> 101 * </dl> 102 * <p ><b>Note:</b> All events defined in this specification use the 103 * namespace URI <code>"http://www.w3.org/2002/DOMLS"</code>. 104 * <p> While parsing an input source, errors are reported to the application 105 * through the error handler (<code>LSParser.domConfig</code>'s "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 106 * error-handler</a>" parameter). This specification does in no way try to define all possible 107 * errors that can occur while parsing XML, or any other markup, but some 108 * common error cases are defined. The types (<code>DOMError.type</code>) of 109 * errors and warnings defined by this specification are: 110 * <dl> 111 * <dt> 112 * <code>"check-character-normalization-failure" [error]</code> </dt> 113 * <dd> Raised if 114 * the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-check-character-normalization'> 115 * check-character-normalization</a>" is set to true and a string is encountered that fails normalization 116 * checking. </dd> 117 * <dt><code>"doctype-not-allowed" [fatal]</code></dt> 118 * <dd> Raised if the 119 * configuration parameter "disallow-doctype" is set to <code>true</code> 120 * and a doctype is encountered. </dd> 121 * <dt><code>"no-input-specified" [fatal]</code></dt> 122 * <dd> 123 * Raised when loading a document and no input is specified in the 124 * <code>LSInput</code> object. </dd> 125 * <dt> 126 * <code>"pi-base-uri-not-preserved" [warning]</code></dt> 127 * <dd> Raised if a processing 128 * instruction is encountered in a location where the base URI of the 129 * processing instruction can not be preserved. One example of a case where 130 * this warning will be raised is if the configuration parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-entities'> 131 * entities</a>" is set to <code>false</code> and the following XML file is parsed: 132 * <pre> 133 * <!DOCTYPE root [ <!ENTITY e SYSTEM 'subdir/myentity.ent' ]> 134 * <root> &e; </root></pre> 135 * And <code>subdir/myentity.ent</code> 136 * contains: 137 * <pre><one> <two/> </one> <?pi 3.14159?> 138 * <more/></pre> 139 * </dd> 140 * <dt><code>"unbound-prefix-in-entity" [warning]</code></dt> 141 * <dd> An 142 * implementation dependent warning that may be raised if the configuration 143 * parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-namespaces'> 144 * namespaces</a>" is set to <code>true</code> and an unbound namespace prefix is 145 * encountered in an entity's replacement text. Raising this warning is not 146 * enforced since some existing parsers may not recognize unbound namespace 147 * prefixes in the replacement text of entities. </dd> 148 * <dt> 149 * <code>"unknown-character-denormalization" [fatal]</code></dt> 150 * <dd> Raised if the 151 * configuration parameter "ignore-unknown-character-denormalizations" is 152 * set to <code>false</code> and a character is encountered for which the 153 * processor cannot determine the normalization properties. </dd> 154 * <dt> 155 * <code>"unsupported-encoding" [fatal]</code></dt> 156 * <dd> Raised if an unsupported 157 * encoding is encountered. </dd> 158 * <dt><code>"unsupported-media-type" [fatal]</code></dt> 159 * <dd> 160 * Raised if the configuration parameter "supported-media-types-only" is set 161 * to <code>true</code> and an unsupported media type is encountered. </dd> 162 * </dl> 163 * <p> In addition to raising the defined errors and warnings, implementations 164 * are expected to raise implementation specific errors and warnings for any 165 * other error and warning cases such as IO errors (file not found, 166 * permission denied,...), XML well-formedness errors, and so on. 167 * <p>See also the <a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407'>Document Object Model (DOM) Level 3 Load 168 and Save Specification</a>. 169 * 170 * @since 1.5 171 */ 172 public interface LSParser { 173 /** 174 * The <code>DOMConfiguration</code> object used when parsing an input 175 * source. This <code>DOMConfiguration</code> is specific to the parse 176 * operation. No parameter values from this <code>DOMConfiguration</code> 177 * object are passed automatically to the <code>DOMConfiguration</code> 178 * object on the <code>Document</code> that is created, or used, by the 179 * parse operation. The DOM application is responsible for passing any 180 * needed parameter values from this <code>DOMConfiguration</code> 181 * object to the <code>DOMConfiguration</code> object referenced by the 182 * <code>Document</code> object. 183 * <br> In addition to the parameters recognized in on the <a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMConfiguration'> 184 * DOMConfiguration</a> interface defined in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 185 * , the <code>DOMConfiguration</code> objects for <code>LSParser</code> 186 * add or modify the following parameters: 187 * <dl> 188 * <dt> 189 * <code>"charset-overrides-xml-encoding"</code></dt> 190 * <dd> 191 * <dl> 192 * <dt><code>true</code></dt> 193 * <dd>[<em>optional</em>] (<em>default</em>) If a higher level protocol such as HTTP [<a href='http://www.ietf.org/rfc/rfc2616.txt'>IETF RFC 2616</a>] provides an 194 * indication of the character encoding of the input stream being 195 * processed, that will override any encoding specified in the XML 196 * declaration or the Text declaration (see also section 4.3.3, 197 * "Character Encoding in Entities", in [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>]). 198 * Explicitly setting an encoding in the <code>LSInput</code> overrides 199 * any encoding from the protocol. </dd> 200 * <dt><code>false</code></dt> 201 * <dd>[<em>required</em>] The parser ignores any character set encoding information from 202 * higher-level protocols. </dd> 203 * </dl></dd> 204 * <dt><code>"disallow-doctype"</code></dt> 205 * <dd> 206 * <dl> 207 * <dt> 208 * <code>true</code></dt> 209 * <dd>[<em>optional</em>] Throw a fatal <b>"doctype-not-allowed"</b> error if a doctype node is found while parsing the document. This is 210 * useful when dealing with things like SOAP envelopes where doctype 211 * nodes are not allowed. </dd> 212 * <dt><code>false</code></dt> 213 * <dd>[<em>required</em>] (<em>default</em>) Allow doctype nodes in the document. </dd> 214 * </dl></dd> 215 * <dt> 216 * <code>"ignore-unknown-character-denormalizations"</code></dt> 217 * <dd> 218 * <dl> 219 * <dt> 220 * <code>true</code></dt> 221 * <dd>[<em>required</em>] (<em>default</em>) If, while verifying full normalization when [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>] is 222 * supported, a processor encounters characters for which it cannot 223 * determine the normalization properties, then the processor will 224 * ignore any possible denormalizations caused by these characters. 225 * This parameter is ignored for [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>]. </dd> 226 * <dt> 227 * <code>false</code></dt> 228 * <dd>[<em>optional</em>] Report an fatal <b>"unknown-character-denormalization"</b> error if a character is encountered for which the processor cannot 229 * determine the normalization properties. </dd> 230 * </dl></dd> 231 * <dt><code>"infoset"</code></dt> 232 * <dd> See 233 * the definition of <code>DOMConfiguration</code> for a description of 234 * this parameter. Unlike in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 235 * , this parameter will default to <code>true</code> for 236 * <code>LSParser</code>. </dd> 237 * <dt><code>"namespaces"</code></dt> 238 * <dd> 239 * <dl> 240 * <dt><code>true</code></dt> 241 * <dd>[<em>required</em>] (<em>default</em>) Perform the namespace processing as defined in [<a href='http://www.w3.org/TR/1999/REC-xml-names-19990114/'>XML Namespaces</a>] 242 * and [<a href='http://www.w3.org/TR/2004/REC-xml-names11-20040204/'>XML Namespaces 1.1</a>] 243 * . </dd> 244 * <dt><code>false</code></dt> 245 * <dd>[<em>optional</em>] Do not perform the namespace processing. </dd> 246 * </dl></dd> 247 * <dt> 248 * <code>"resource-resolver"</code></dt> 249 * <dd>[<em>required</em>] A reference to a <code>LSResourceResolver</code> object, or null. If 250 * the value of this parameter is not null when an external resource 251 * (such as an external XML entity or an XML schema location) is 252 * encountered, the implementation will request that the 253 * <code>LSResourceResolver</code> referenced in this parameter resolves 254 * the resource. </dd> 255 * <dt><code>"supported-media-types-only"</code></dt> 256 * <dd> 257 * <dl> 258 * <dt> 259 * <code>true</code></dt> 260 * <dd>[<em>optional</em>] Check that the media type of the parsed resource is a supported media 261 * type. If an unsupported media type is encountered, a fatal error of 262 * type <b>"unsupported-media-type"</b> will be raised. The media types defined in [<a href='http://www.ietf.org/rfc/rfc3023.txt'>IETF RFC 3023</a>] must always 263 * be accepted. </dd> 264 * <dt><code>false</code></dt> 265 * <dd>[<em>required</em>] (<em>default</em>) Accept any media type. </dd> 266 * </dl></dd> 267 * <dt><code>"validate"</code></dt> 268 * <dd> See the definition of 269 * <code>DOMConfiguration</code> for a description of this parameter. 270 * Unlike in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 271 * , the processing of the internal subset is always accomplished, even 272 * if this parameter is set to <code>false</code>. </dd> 273 * <dt> 274 * <code>"validate-if-schema"</code></dt> 275 * <dd> See the definition of 276 * <code>DOMConfiguration</code> for a description of this parameter. 277 * Unlike in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 278 * , the processing of the internal subset is always accomplished, even 279 * if this parameter is set to <code>false</code>. </dd> 280 * <dt> 281 * <code>"well-formed"</code></dt> 282 * <dd> See the definition of 283 * <code>DOMConfiguration</code> for a description of this parameter. 284 * Unlike in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 285 * , this parameter cannot be set to <code>false</code>. </dd> 286 * </dl> 287 */ 288 public DOMConfiguration getDomConfig(); 289 290 /** 291 * When a filter is provided, the implementation will call out to the 292 * filter as it is constructing the DOM tree structure. The filter can 293 * choose to remove elements from the document being constructed, or to 294 * terminate the parsing early. 295 * <br> The filter is invoked after the operations requested by the 296 * <code>DOMConfiguration</code> parameters have been applied. For 297 * example, if "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-validate'> 298 * validate</a>" is set to <code>true</code>, the validation is done before invoking the 299 * filter. 300 */ 301 public LSParserFilter getFilter(); 302 /** 303 * When a filter is provided, the implementation will call out to the 304 * filter as it is constructing the DOM tree structure. The filter can 305 * choose to remove elements from the document being constructed, or to 306 * terminate the parsing early. 307 * <br> The filter is invoked after the operations requested by the 308 * <code>DOMConfiguration</code> parameters have been applied. For 309 * example, if "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-validate'> 310 * validate</a>" is set to <code>true</code>, the validation is done before invoking the 311 * filter. 312 */ 313 public void setFilter(LSParserFilter filter); 314 315 /** 316 * <code>true</code> if the <code>LSParser</code> is asynchronous, 317 * <code>false</code> if it is synchronous. 318 */ 319 public boolean getAsync(); 320 321 /** 322 * <code>true</code> if the <code>LSParser</code> is currently busy 323 * loading a document, otherwise <code>false</code>. 324 */ 325 public boolean getBusy(); 326 327 /** 328 * Parse an XML document from a resource identified by a 329 * <code>LSInput</code>. 330 * @param input The <code>LSInput</code> from which the source of the 331 * document is to be read. 332 * @return If the <code>LSParser</code> is a synchronous 333 * <code>LSParser</code>, the newly created and populated 334 * <code>Document</code> is returned. If the <code>LSParser</code> is 335 * asynchronous, <code>null</code> is returned since the document 336 * object may not yet be constructed when this method returns. 337 * @exception DOMException 338 * INVALID_STATE_ERR: Raised if the <code>LSParser</code>'s 339 * <code>LSParser.busy</code> attribute is <code>true</code>. 340 * @exception LSException 341 * PARSE_ERR: Raised if the <code>LSParser</code> was unable to load 342 * the XML document. DOM applications should attach a 343 * <code>DOMErrorHandler</code> using the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 344 * error-handler</a>" if they wish to get details on the error. 345 */ 346 public Document parse(LSInput input) 347 throws DOMException, LSException; 348 349 /** 350 * Parse an XML document from a location identified by a URI reference [<a href='http://www.ietf.org/rfc/rfc2396.txt'>IETF RFC 2396</a>]. If the URI 351 * contains a fragment identifier (see section 4.1 in [<a href='http://www.ietf.org/rfc/rfc2396.txt'>IETF RFC 2396</a>]), the 352 * behavior is not defined by this specification, future versions of 353 * this specification may define the behavior. 354 * @param uri The location of the XML document to be read. 355 * @return If the <code>LSParser</code> is a synchronous 356 * <code>LSParser</code>, the newly created and populated 357 * <code>Document</code> is returned, or <code>null</code> if an error 358 * occured. If the <code>LSParser</code> is asynchronous, 359 * <code>null</code> is returned since the document object may not yet 360 * be constructed when this method returns. 361 * @exception DOMException 362 * INVALID_STATE_ERR: Raised if the <code>LSParser.busy</code> 363 * attribute is <code>true</code>. 364 * @exception LSException 365 * PARSE_ERR: Raised if the <code>LSParser</code> was unable to load 366 * the XML document. DOM applications should attach a 367 * <code>DOMErrorHandler</code> using the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 368 * error-handler</a>" if they wish to get details on the error. 369 */ 370 public Document parseURI(String uri) 371 throws DOMException, LSException; 372 373 // ACTION_TYPES 374 /** 375 * Append the result of the parse operation as children of the context 376 * node. For this action to work, the context node must be an 377 * <code>Element</code> or a <code>DocumentFragment</code>. 378 */ 379 public static final short ACTION_APPEND_AS_CHILDREN = 1; 380 /** 381 * Replace all the children of the context node with the result of the 382 * parse operation. For this action to work, the context node must be an 383 * <code>Element</code>, a <code>Document</code>, or a 384 * <code>DocumentFragment</code>. 385 */ 386 public static final short ACTION_REPLACE_CHILDREN = 2; 387 /** 388 * Insert the result of the parse operation as the immediately preceding 414 * context node (or its parent, depending on where the result will be 415 * inserted) is used for resolving unbound namespace prefixes. The 416 * context node's <code>ownerDocument</code> node (or the node itself if 417 * the node of type <code>DOCUMENT_NODE</code>) is used to resolve 418 * default attributes and entity references. 419 * <br> As the new data is inserted into the document, at least one 420 * mutation event is fired per new immediate child or sibling of the 421 * context node. 422 * <br> If the context node is a <code>Document</code> node and the action 423 * is <code>ACTION_REPLACE_CHILDREN</code>, then the document that is 424 * passed as the context node will be changed such that its 425 * <code>xmlEncoding</code>, <code>documentURI</code>, 426 * <code>xmlVersion</code>, <code>inputEncoding</code>, 427 * <code>xmlStandalone</code>, and all other such attributes are set to 428 * what they would be set to if the input source was parsed using 429 * <code>LSParser.parse()</code>. 430 * <br> This method is always synchronous, even if the 431 * <code>LSParser</code> is asynchronous (<code>LSParser.async</code> is 432 * <code>true</code>). 433 * <br> If an error occurs while parsing, the caller is notified through 434 * the <code>ErrorHandler</code> instance associated with the "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 435 * error-handler</a>" parameter of the <code>DOMConfiguration</code>. 436 * <br> When calling <code>parseWithContext</code>, the values of the 437 * following configuration parameters will be ignored and their default 438 * values will always be used instead: "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-validate'> 439 * validate</a>", "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-validate-if-schema'> 440 * validate-if-schema</a>", and "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-element-content-whitespace'> 441 * element-content-whitespace</a>". Other parameters will be treated normally, and the parser is expected 442 * to call the <code>LSParserFilter</code> just as if a whole document 443 * was parsed. 444 * @param input The <code>LSInput</code> from which the source document 445 * is to be read. The source document must be an XML fragment, i.e. 446 * anything except a complete XML document (except in the case where 447 * the context node of type <code>DOCUMENT_NODE</code>, and the action 448 * is <code>ACTION_REPLACE_CHILDREN</code>), a DOCTYPE (internal 449 * subset), entity declaration(s), notation declaration(s), or XML or 450 * text declaration(s). 451 * @param contextArg The node that is used as the context for the data 452 * that is being parsed. This node must be a <code>Document</code> 453 * node, a <code>DocumentFragment</code> node, or a node of a type 454 * that is allowed as a child of an <code>Element</code> node, e.g. it 455 * cannot be an <code>Attribute</code> node. 456 * @param action This parameter describes which action should be taken 457 * between the new set of nodes being inserted and the existing 458 * children of the context node. The set of possible actions is 459 * defined in <code>ACTION_TYPES</code> above. 460 * @return Return the node that is the result of the parse operation. If 461 * the result is more than one top-level node, the first one is 462 * returned. 463 * @exception DOMException 464 * HIERARCHY_REQUEST_ERR: Raised if the content cannot replace, be 465 * inserted before, after, or as a child of the context node (see also 466 * <code>Node.insertBefore</code> or <code>Node.replaceChild</code> in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 467 * ). 468 * <br> NOT_SUPPORTED_ERR: Raised if the <code>LSParser</code> doesn't 469 * support this method, or if the context node is of type 470 * <code>Document</code> and the DOM implementation doesn't support 471 * the replacement of the <code>DocumentType</code> child or 472 * <code>Element</code> child. 473 * <br> NO_MODIFICATION_ALLOWED_ERR: Raised if the context node is a 474 * read only node and the content is being appended to its child list, 475 * or if the parent node of the context node is read only node and the 476 * content is being inserted in its child list. 477 * <br> INVALID_STATE_ERR: Raised if the <code>LSParser.busy</code> 478 * attribute is <code>true</code>. 479 * @exception LSException 480 * PARSE_ERR: Raised if the <code>LSParser</code> was unable to load 481 * the XML fragment. DOM applications should attach a 482 * <code>DOMErrorHandler</code> using the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 483 * error-handler</a>" if they wish to get details on the error. 484 */ 485 public Node parseWithContext(LSInput input, 486 Node contextArg, 487 short action) 488 throws DOMException, LSException; 489 490 /** 491 * Abort the loading of the document that is currently being loaded by 492 * the <code>LSParser</code>. If the <code>LSParser</code> is currently 493 * not busy, a call to this method does nothing. 494 */ 495 public void abort(); 496 497 } | 36 * it will be useful, but WITHOUT ANY WARRANTY; without even the implied 37 * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 38 * 39 * [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231 40 */ 41 42 package org.w3c.dom.ls; 43 44 import org.w3c.dom.Document; 45 import org.w3c.dom.DOMConfiguration; 46 import org.w3c.dom.Node; 47 import org.w3c.dom.DOMException; 48 49 /** 50 * An interface to an object that is able to build, or augment, a DOM tree 51 * from various input sources. 52 * <p> <code>LSParser</code> provides an API for parsing XML and building the 53 * corresponding DOM document structure. A <code>LSParser</code> instance 54 * can be obtained by invoking the 55 * <code>DOMImplementationLS.createLSParser()</code> method. 56 * <p> As specified in 57 * [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 58 * , when a document is first made available via the LSParser: 59 * <ul> 60 * <li> there will 61 * never be two adjacent nodes of type NODE_TEXT, and there will never be 62 * empty text nodes. 63 * </li> 64 * <li> it is expected that the <code>value</code> and 65 * <code>nodeValue</code> attributes of an <code>Attr</code> node initially 66 * return the <a href='http://www.w3.org/TR/2004/REC-xml-20040204#AVNormalize'>XML 1.0 67 * normalized value</a>. However, if the parameters 68 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-validate-if-schema'>validate-if-schema</a>" and 69 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-datatype-normalization'>datatype-normalization</a>" 70 * are set to <code>true</code>, depending on the attribute normalization 71 * used, the attribute values may differ from the ones obtained by the XML 72 * 1.0 attribute normalization. If the parameters 73 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-datatype-normalization'>datatype-normalization</a>" 74 * is set to <code>false</code>, the XML 1.0 attribute normalization is 75 * guaranteed to occur, and if the attributes list does not contain 76 * namespace declarations, the <code>attributes</code> attribute on 77 * <code>Element</code> node represents the property <b>[attributes]</b> defined in 78 * [<a href='http://www.w3.org/TR/2004/REC-xml-infoset-20040204/'>XML Information Set</a>]. 79 * </li> 80 * </ul> 81 * <p> Asynchronous <code>LSParser</code> objects are expected to also 82 * implement the <code>events::EventTarget</code> interface so that event 83 * listeners can be registered on asynchronous <code>LSParser</code> 84 * objects. 85 * <p> Events supported by asynchronous <code>LSParser</code> objects are: 86 * <dl> 87 * <dt>load</dt> 88 * <dd> 89 * The <code>LSParser</code> finishes to load the document. See also the 90 * definition of the <code>LSLoadEvent</code> interface. </dd> 91 * <dt>progress</dt> 92 * <dd> The 93 * <code>LSParser</code> signals progress as data is parsed. This 94 * specification does not attempt to define exactly when progress events 95 * should be dispatched. That is intentionally left as 96 * implementation-dependent. Here is one example of how an application might 97 * dispatch progress events: Once the parser starts receiving data, a 98 * progress event is dispatched to indicate that the parsing starts. From 99 * there on, a progress event is dispatched for every 4096 bytes of data 100 * that is received and processed. This is only one example, though, and 101 * implementations can choose to dispatch progress events at any time while 102 * parsing, or not dispatch them at all. See also the definition of the 103 * <code>LSProgressEvent</code> interface. </dd> 104 * </dl> 105 * <p ><b>Note:</b> All events defined in this specification use the 106 * namespace URI <code>"http://www.w3.org/2002/DOMLS"</code>. 107 * <p> While parsing an input source, errors are reported to the application 108 * through the error handler (<code>LSParser.domConfig</code>'s 109 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-error-handler'>error-handler</a>" 110 * parameter). This specification does in no way try to define all possible 111 * errors that can occur while parsing XML, or any other markup, but some 112 * common error cases are defined. The types (<code>DOMError.type</code>) of 113 * errors and warnings defined by this specification are: 114 * <dl> 115 * <dt> 116 * <code>"check-character-normalization-failure" [error]</code> </dt> 117 * <dd> Raised if the parameter 118 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-check-character-normalization'>check-character-normalization</a>" 119 * is set to true and a string is encountered that fails normalization 120 * checking. </dd> 121 * <dt><code>"doctype-not-allowed" [fatal]</code></dt> 122 * <dd> Raised if the 123 * configuration parameter "disallow-doctype" is set to <code>true</code> 124 * and a doctype is encountered. </dd> 125 * <dt><code>"no-input-specified" [fatal]</code></dt> 126 * <dd> 127 * Raised when loading a document and no input is specified in the 128 * <code>LSInput</code> object. </dd> 129 * <dt> 130 * <code>"pi-base-uri-not-preserved" [warning]</code></dt> 131 * <dd> Raised if a processing 132 * instruction is encountered in a location where the base URI of the 133 * processing instruction can not be preserved. One example of a case where 134 * this warning will be raised is if the configuration parameter 135 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-entities'>entities</a>" 136 * is set to <code>false</code> and the following XML file is parsed: 137 * <pre> 138 * <!DOCTYPE root [ <!ENTITY e SYSTEM 'subdir/myentity.ent' ]> 139 * <root> &e; </root></pre> 140 * And <code>subdir/myentity.ent</code> 141 * contains: 142 * <pre><one> <two/> </one> <?pi 3.14159?> 143 * <more/></pre> 144 * </dd> 145 * <dt><code>"unbound-prefix-in-entity" [warning]</code></dt> 146 * <dd> An 147 * implementation dependent warning that may be raised if the configuration parameter 148 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-namespaces'>namespaces</a>" 149 * is set to <code>true</code> and an unbound namespace prefix is 150 * encountered in an entity's replacement text. Raising this warning is not 151 * enforced since some existing parsers may not recognize unbound namespace 152 * prefixes in the replacement text of entities. </dd> 153 * <dt> 154 * <code>"unknown-character-denormalization" [fatal]</code></dt> 155 * <dd> Raised if the 156 * configuration parameter "ignore-unknown-character-denormalizations" is 157 * set to <code>false</code> and a character is encountered for which the 158 * processor cannot determine the normalization properties. </dd> 159 * <dt> 160 * <code>"unsupported-encoding" [fatal]</code></dt> 161 * <dd> Raised if an unsupported 162 * encoding is encountered. </dd> 163 * <dt><code>"unsupported-media-type" [fatal]</code></dt> 164 * <dd> 165 * Raised if the configuration parameter "supported-media-types-only" is set 166 * to <code>true</code> and an unsupported media type is encountered. </dd> 167 * </dl> 168 * <p> In addition to raising the defined errors and warnings, implementations 169 * are expected to raise implementation specific errors and warnings for any 170 * other error and warning cases such as IO errors (file not found, 171 * permission denied,...), XML well-formedness errors, and so on. 172 * <p>See also the 173 * <a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407'>Document Object Model (DOM) Level 3 Load and Save Specification</a>. 174 * 175 * @since 1.5 176 */ 177 public interface LSParser { 178 /** 179 * The <code>DOMConfiguration</code> object used when parsing an input 180 * source. This <code>DOMConfiguration</code> is specific to the parse 181 * operation. No parameter values from this <code>DOMConfiguration</code> 182 * object are passed automatically to the <code>DOMConfiguration</code> 183 * object on the <code>Document</code> that is created, or used, by the 184 * parse operation. The DOM application is responsible for passing any 185 * needed parameter values from this <code>DOMConfiguration</code> 186 * object to the <code>DOMConfiguration</code> object referenced by the 187 * <code>Document</code> object. 188 * <br> In addition to the parameters recognized in on the 189 * <a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#DOMConfiguration'>DOMConfiguration</a> 190 * interface defined in 191 * [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 192 * , the <code>DOMConfiguration</code> objects for <code>LSParser</code> 193 * add or modify the following parameters: 194 * <dl> 195 * <dt> 196 * <code>"charset-overrides-xml-encoding"</code></dt> 197 * <dd> 198 * <dl> 199 * <dt><code>true</code></dt> 200 * <dd>[<em>optional</em>] (<em>default</em>) If a higher level protocol such as HTTP 201 * [<a href='http://www.ietf.org/rfc/rfc2616.txt'>IETF RFC 2616</a>] provides an 202 * indication of the character encoding of the input stream being 203 * processed, that will override any encoding specified in the XML 204 * declaration or the Text declaration (see also section 4.3.3, 205 * "Character Encoding in Entities", in [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>]). 206 * Explicitly setting an encoding in the <code>LSInput</code> overrides 207 * any encoding from the protocol. </dd> 208 * <dt><code>false</code></dt> 209 * <dd>[<em>required</em>] The parser ignores any character set encoding information from 210 * higher-level protocols. </dd> 211 * </dl></dd> 212 * <dt><code>"disallow-doctype"</code></dt> 213 * <dd> 214 * <dl> 215 * <dt> 216 * <code>true</code></dt> 217 * <dd>[<em>optional</em>] Throw a fatal <b>"doctype-not-allowed"</b> error 218 * if a doctype node is found while parsing the document. This is 219 * useful when dealing with things like SOAP envelopes where doctype 220 * nodes are not allowed. </dd> 221 * <dt><code>false</code></dt> 222 * <dd>[<em>required</em>] (<em>default</em>) Allow doctype nodes in the document. </dd> 223 * </dl></dd> 224 * <dt> 225 * <code>"ignore-unknown-character-denormalizations"</code></dt> 226 * <dd> 227 * <dl> 228 * <dt> 229 * <code>true</code></dt> 230 * <dd>[<em>required</em>] (<em>default</em>) If, while verifying full normalization when 231 * [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>] is 232 * supported, a processor encounters characters for which it cannot 233 * determine the normalization properties, then the processor will 234 * ignore any possible denormalizations caused by these characters. 235 * This parameter is ignored for [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>]. 236 * </dd> 237 * <dt> 238 * <code>false</code></dt> 239 * <dd>[<em>optional</em>] Report an fatal <b>"unknown-character-denormalization"</b> 240 * error if a character is encountered for which the processor cannot 241 * determine the normalization properties. </dd> 242 * </dl></dd> 243 * <dt><code>"infoset"</code></dt> 244 * <dd> See 245 * the definition of <code>DOMConfiguration</code> for a description of 246 * this parameter. Unlike in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 247 * , this parameter will default to <code>true</code> for 248 * <code>LSParser</code>. </dd> 249 * <dt><code>"namespaces"</code></dt> 250 * <dd> 251 * <dl> 252 * <dt><code>true</code></dt> 253 * <dd>[<em>required</em>] (<em>default</em>) Perform the namespace processing as defined in 254 * [<a href='http://www.w3.org/TR/1999/REC-xml-names-19990114/'>XML Namespaces</a>] 255 * and [<a href='http://www.w3.org/TR/2004/REC-xml-names11-20040204/'>XML Namespaces 1.1</a>] 256 * . </dd> 257 * <dt><code>false</code></dt> 258 * <dd>[<em>optional</em>] Do not perform the namespace processing. </dd> 259 * </dl></dd> 260 * <dt> 261 * <code>"resource-resolver"</code></dt> 262 * <dd>[<em>required</em>] A reference to a <code>LSResourceResolver</code> object, or null. If 263 * the value of this parameter is not null when an external resource 264 * (such as an external XML entity or an XML schema location) is 265 * encountered, the implementation will request that the 266 * <code>LSResourceResolver</code> referenced in this parameter resolves 267 * the resource. </dd> 268 * <dt><code>"supported-media-types-only"</code></dt> 269 * <dd> 270 * <dl> 271 * <dt> 272 * <code>true</code></dt> 273 * <dd>[<em>optional</em>] Check that the media type of the parsed resource is a supported media 274 * type. If an unsupported media type is encountered, a fatal error of 275 * type <b>"unsupported-media-type"</b> will be raised. The media types defined in 276 * [<a href='http://www.ietf.org/rfc/rfc3023.txt'>IETF RFC 3023</a>] must always 277 * be accepted. </dd> 278 * <dt><code>false</code></dt> 279 * <dd>[<em>required</em>] (<em>default</em>) Accept any media type. </dd> 280 * </dl></dd> 281 * <dt><code>"validate"</code></dt> 282 * <dd> See the definition of 283 * <code>DOMConfiguration</code> for a description of this parameter. 284 * Unlike in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 285 * , the processing of the internal subset is always accomplished, even 286 * if this parameter is set to <code>false</code>. </dd> 287 * <dt> 288 * <code>"validate-if-schema"</code></dt> 289 * <dd> See the definition of 290 * <code>DOMConfiguration</code> for a description of this parameter. 291 * Unlike in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 292 * , the processing of the internal subset is always accomplished, even 293 * if this parameter is set to <code>false</code>. </dd> 294 * <dt> 295 * <code>"well-formed"</code></dt> 296 * <dd> See the definition of 297 * <code>DOMConfiguration</code> for a description of this parameter. 298 * Unlike in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 299 * , this parameter cannot be set to <code>false</code>. </dd> 300 * </dl> 301 */ 302 public DOMConfiguration getDomConfig(); 303 304 /** 305 * When a filter is provided, the implementation will call out to the 306 * filter as it is constructing the DOM tree structure. The filter can 307 * choose to remove elements from the document being constructed, or to 308 * terminate the parsing early. 309 * <br> The filter is invoked after the operations requested by the 310 * <code>DOMConfiguration</code> parameters have been applied. For 311 * example, if "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-validate'>validate</a>" 312 * is set to <code>true</code>, the validation is done before invoking the 313 * filter. 314 */ 315 public LSParserFilter getFilter(); 316 /** 317 * When a filter is provided, the implementation will call out to the 318 * filter as it is constructing the DOM tree structure. The filter can 319 * choose to remove elements from the document being constructed, or to 320 * terminate the parsing early. 321 * <br> The filter is invoked after the operations requested by the 322 * <code>DOMConfiguration</code> parameters have been applied. For 323 * example, if "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-validate'>validate</a>" 324 * is set to <code>true</code>, the validation is done before invoking the 325 * filter. 326 */ 327 public void setFilter(LSParserFilter filter); 328 329 /** 330 * <code>true</code> if the <code>LSParser</code> is asynchronous, 331 * <code>false</code> if it is synchronous. 332 */ 333 public boolean getAsync(); 334 335 /** 336 * <code>true</code> if the <code>LSParser</code> is currently busy 337 * loading a document, otherwise <code>false</code>. 338 */ 339 public boolean getBusy(); 340 341 /** 342 * Parse an XML document from a resource identified by a 343 * <code>LSInput</code>. 344 * @param input The <code>LSInput</code> from which the source of the 345 * document is to be read. 346 * @return If the <code>LSParser</code> is a synchronous 347 * <code>LSParser</code>, the newly created and populated 348 * <code>Document</code> is returned. If the <code>LSParser</code> is 349 * asynchronous, <code>null</code> is returned since the document 350 * object may not yet be constructed when this method returns. 351 * @exception DOMException 352 * INVALID_STATE_ERR: Raised if the <code>LSParser</code>'s 353 * <code>LSParser.busy</code> attribute is <code>true</code>. 354 * @exception LSException 355 * PARSE_ERR: Raised if the <code>LSParser</code> was unable to load 356 * the XML document. DOM applications should attach a 357 * <code>DOMErrorHandler</code> using the parameter 358 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-error-handler'>error-handler</a>" 359 * if they wish to get details on the error. 360 */ 361 public Document parse(LSInput input) 362 throws DOMException, LSException; 363 364 /** 365 * Parse an XML document from a location identified by a URI reference 366 * [<a href='http://www.ietf.org/rfc/rfc2396.txt'>IETF RFC 2396</a>]. If the URI 367 * contains a fragment identifier (see section 4.1 in 368 * [<a href='http://www.ietf.org/rfc/rfc2396.txt'>IETF RFC 2396</a>]), the 369 * behavior is not defined by this specification, future versions of 370 * this specification may define the behavior. 371 * @param uri The location of the XML document to be read. 372 * @return If the <code>LSParser</code> is a synchronous 373 * <code>LSParser</code>, the newly created and populated 374 * <code>Document</code> is returned, or <code>null</code> if an error 375 * occured. If the <code>LSParser</code> is asynchronous, 376 * <code>null</code> is returned since the document object may not yet 377 * be constructed when this method returns. 378 * @exception DOMException 379 * INVALID_STATE_ERR: Raised if the <code>LSParser.busy</code> 380 * attribute is <code>true</code>. 381 * @exception LSException 382 * PARSE_ERR: Raised if the <code>LSParser</code> was unable to load 383 * the XML document. DOM applications should attach a 384 * <code>DOMErrorHandler</code> using the parameter 385 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-error-handler'>error-handler</a>" 386 * if they wish to get details on the error. 387 */ 388 public Document parseURI(String uri) 389 throws DOMException, LSException; 390 391 // ACTION_TYPES 392 /** 393 * Append the result of the parse operation as children of the context 394 * node. For this action to work, the context node must be an 395 * <code>Element</code> or a <code>DocumentFragment</code>. 396 */ 397 public static final short ACTION_APPEND_AS_CHILDREN = 1; 398 /** 399 * Replace all the children of the context node with the result of the 400 * parse operation. For this action to work, the context node must be an 401 * <code>Element</code>, a <code>Document</code>, or a 402 * <code>DocumentFragment</code>. 403 */ 404 public static final short ACTION_REPLACE_CHILDREN = 2; 405 /** 406 * Insert the result of the parse operation as the immediately preceding 432 * context node (or its parent, depending on where the result will be 433 * inserted) is used for resolving unbound namespace prefixes. The 434 * context node's <code>ownerDocument</code> node (or the node itself if 435 * the node of type <code>DOCUMENT_NODE</code>) is used to resolve 436 * default attributes and entity references. 437 * <br> As the new data is inserted into the document, at least one 438 * mutation event is fired per new immediate child or sibling of the 439 * context node. 440 * <br> If the context node is a <code>Document</code> node and the action 441 * is <code>ACTION_REPLACE_CHILDREN</code>, then the document that is 442 * passed as the context node will be changed such that its 443 * <code>xmlEncoding</code>, <code>documentURI</code>, 444 * <code>xmlVersion</code>, <code>inputEncoding</code>, 445 * <code>xmlStandalone</code>, and all other such attributes are set to 446 * what they would be set to if the input source was parsed using 447 * <code>LSParser.parse()</code>. 448 * <br> This method is always synchronous, even if the 449 * <code>LSParser</code> is asynchronous (<code>LSParser.async</code> is 450 * <code>true</code>). 451 * <br> If an error occurs while parsing, the caller is notified through 452 * the <code>ErrorHandler</code> instance associated with the 453 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-error-handler'>error-handler</a>" 454 * parameter of the <code>DOMConfiguration</code>. 455 * <br> When calling <code>parseWithContext</code>, the values of the 456 * following configuration parameters will be ignored and their default 457 * values will always be used instead: 458 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-validate'>validate</a>", 459 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-validate-if-schema'>validate-if-schema</a>", 460 * and 461 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-element-content-whitespace'>element-content-whitespace</a>". 462 * Other parameters will be treated normally, and the parser is expected 463 * to call the <code>LSParserFilter</code> just as if a whole document 464 * was parsed. 465 * @param input The <code>LSInput</code> from which the source document 466 * is to be read. The source document must be an XML fragment, i.e. 467 * anything except a complete XML document (except in the case where 468 * the context node of type <code>DOCUMENT_NODE</code>, and the action 469 * is <code>ACTION_REPLACE_CHILDREN</code>), a DOCTYPE (internal 470 * subset), entity declaration(s), notation declaration(s), or XML or 471 * text declaration(s). 472 * @param contextArg The node that is used as the context for the data 473 * that is being parsed. This node must be a <code>Document</code> 474 * node, a <code>DocumentFragment</code> node, or a node of a type 475 * that is allowed as a child of an <code>Element</code> node, e.g. it 476 * cannot be an <code>Attribute</code> node. 477 * @param action This parameter describes which action should be taken 478 * between the new set of nodes being inserted and the existing 479 * children of the context node. The set of possible actions is 480 * defined in <code>ACTION_TYPES</code> above. 481 * @return Return the node that is the result of the parse operation. If 482 * the result is more than one top-level node, the first one is 483 * returned. 484 * @exception DOMException 485 * HIERARCHY_REQUEST_ERR: Raised if the content cannot replace, be 486 * inserted before, after, or as a child of the context node (see also 487 * <code>Node.insertBefore</code> or <code>Node.replaceChild</code> in 488 * [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 489 * ). 490 * <br> NOT_SUPPORTED_ERR: Raised if the <code>LSParser</code> doesn't 491 * support this method, or if the context node is of type 492 * <code>Document</code> and the DOM implementation doesn't support 493 * the replacement of the <code>DocumentType</code> child or 494 * <code>Element</code> child. 495 * <br> NO_MODIFICATION_ALLOWED_ERR: Raised if the context node is a 496 * read only node and the content is being appended to its child list, 497 * or if the parent node of the context node is read only node and the 498 * content is being inserted in its child list. 499 * <br> INVALID_STATE_ERR: Raised if the <code>LSParser.busy</code> 500 * attribute is <code>true</code>. 501 * @exception LSException 502 * PARSE_ERR: Raised if the <code>LSParser</code> was unable to load 503 * the XML fragment. DOM applications should attach a 504 * <code>DOMErrorHandler</code> using the parameter 505 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-error-handler'>error-handler</a>" 506 * if they wish to get details on the error. 507 */ 508 public Node parseWithContext(LSInput input, 509 Node contextArg, 510 short action) 511 throws DOMException, LSException; 512 513 /** 514 * Abort the loading of the document that is currently being loaded by 515 * the <code>LSParser</code>. If the <code>LSParser</code> is currently 516 * not busy, a call to this method does nothing. 517 */ 518 public void abort(); 519 520 } |