34 * Informatics and Mathematics, Keio University). All Rights Reserved. This 35 * work is distributed under the W3C(r) Software License [1] in the hope that 36 * it will be useful, but WITHOUT ANY WARRANTY; without even the implied 37 * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 38 * 39 * [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231 40 */ 41 42 package org.w3c.dom.ls; 43 44 import org.w3c.dom.DOMConfiguration; 45 import org.w3c.dom.Node; 46 import org.w3c.dom.DOMException; 47 48 /** 49 * A <code>LSSerializer</code> provides an API for serializing (writing) a 50 * DOM document out into XML. The XML data is written to a string or an 51 * output stream. Any changes or fixups made during the serialization affect 52 * only the serialized data. The <code>Document</code> object and its 53 * children are never altered by the serialization operation. 54 * <p> During serialization of XML data, namespace fixup is done as defined in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 55 * , Appendix B. [<a href='http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113'>DOM Level 2 Core</a>] 56 * allows empty strings as a real namespace URI. If the 57 * <code>namespaceURI</code> of a <code>Node</code> is empty string, the 58 * serialization will treat them as <code>null</code>, ignoring the prefix 59 * if any. 60 * <p> <code>LSSerializer</code> accepts any node type for serialization. For 61 * nodes of type <code>Document</code> or <code>Entity</code>, well-formed 62 * XML will be created when possible (well-formedness is guaranteed if the 63 * document or entity comes from a parse operation and is unchanged since it 64 * was created). The serialized output for these node types is either as a 65 * XML document or an External XML Entity, respectively, and is acceptable 66 * input for an XML parser. For all other types of nodes the serialized form 67 * is implementation dependent. 68 * <p>Within a <code>Document</code>, <code>DocumentFragment</code>, or 69 * <code>Entity</code> being serialized, <code>Nodes</code> are processed as 70 * follows 71 * <ul> 72 * <li> <code>Document</code> nodes are written, including the XML 73 * declaration (unless the parameter "xml-declaration" is set to 74 * <code>false</code>) and a DTD subset, if one exists in the DOM. Writing a 75 * <code>Document</code> node serializes the entire document. 76 * </li> 77 * <li> 78 * <code>Entity</code> nodes, when written directly by 79 * <code>LSSerializer.write</code>, outputs the entity expansion but no 80 * namespace fixup is done. The resulting output will be valid as an 81 * external entity. 82 * </li> 83 * <li> If the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-entities'> 84 * entities</a>" is set to <code>true</code>, <code>EntityReference</code> nodes are 85 * serialized as an entity reference of the form " 86 * <code>&entityName;</code>" in the output. Child nodes (the expansion) 87 * of the entity reference are ignored. If the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-entities'> 88 * entities</a>" is set to <code>false</code>, only the children of the entity reference 89 * are serialized. <code>EntityReference</code> nodes with no children (no 90 * corresponding <code>Entity</code> node or the corresponding 91 * <code>Entity</code> nodes have no children) are always serialized. 92 * </li> 93 * <li> 94 * <code>CDATAsections</code> containing content characters that cannot be 95 * represented in the specified output encoding are handled according to the 96 * "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-split-cdata-sections'> 97 * split-cdata-sections</a>" parameter. If the parameter is set to <code>true</code>, 98 * <code>CDATAsections</code> are split, and the unrepresentable characters 99 * are serialized as numeric character references in ordinary content. The 100 * exact position and number of splits is not specified. If the parameter 101 * is set to <code>false</code>, unrepresentable characters in a 102 * <code>CDATAsection</code> are reported as 103 * <code>"wf-invalid-character"</code> errors if the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-well-formed'> 104 * well-formed</a>" is set to <code>true</code>. The error is not recoverable - there is no 105 * mechanism for supplying alternative characters and continuing with the 106 * serialization. 107 * </li> 108 * <li> <code>DocumentFragment</code> nodes are serialized by 109 * serializing the children of the document fragment in the order they 110 * appear in the document fragment. 111 * </li> 112 * <li> All other node types (Element, Text, 113 * etc.) are serialized to their corresponding XML source form. 114 * </li> 115 * </ul> 116 * <p ><b>Note:</b> The serialization of a <code>Node</code> does not always 117 * generate a well-formed XML document, i.e. a <code>LSParser</code> might 118 * throw fatal errors when parsing the resulting serialization. 119 * <p> Within the character data of a document (outside of markup), any 120 * characters that cannot be represented directly are replaced with 121 * character references. Occurrences of '<' and '&' are replaced by 122 * the predefined entities &lt; and &amp;. The other predefined 123 * entities (&gt;, &apos;, and &quot;) might not be used, except 124 * where needed (e.g. using &gt; in cases such as ']]>'). Any 125 * characters that cannot be represented directly in the output character 126 * encoding are serialized as numeric character references (and since 127 * character encoding standards commonly use hexadecimal representations of 128 * characters, using the hexadecimal representation when serializing 129 * character references is encouraged). 130 * <p> To allow attribute values to contain both single and double quotes, the 131 * apostrophe or single-quote character (') may be represented as 132 * "&apos;", and the double-quote character (") as "&quot;". New 133 * line characters and other characters that cannot be represented directly 134 * in attribute values in the output character encoding are serialized as a 135 * numeric character reference. 136 * <p> Within markup, but outside of attributes, any occurrence of a character 137 * that cannot be represented in the output character encoding is reported 138 * as a <code>DOMError</code> fatal error. An example would be serializing 139 * the element <LaCañada/> with <code>encoding="us-ascii"</code>. 140 * This will result with a generation of a <code>DOMError</code> 141 * "wf-invalid-character-in-node-name" (as proposed in "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-well-formed'> 142 * well-formed</a>"). 143 * <p> When requested by setting the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-normalize-characters'> 144 * normalize-characters</a>" on <code>LSSerializer</code> to true, character normalization is 145 * performed according to the definition of <a href='http://www.w3.org/TR/2004/REC-xml11-20040204/#dt-fullnorm'>fully 146 * normalized</a> characters included in appendix E of [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>] on all 147 * data to be serialized, both markup and character data. The character 148 * normalization process affects only the data as it is being written; it 149 * does not alter the DOM's view of the document after serialization has 150 * completed. 151 * <p> Implementations are required to support the encodings "UTF-8", 152 * "UTF-16", "UTF-16BE", and "UTF-16LE" to guarantee that data is 153 * serializable in all encodings that are required to be supported by all 154 * XML parsers. When the encoding is UTF-8, whether or not a byte order mark 155 * is serialized, or if the output is big-endian or little-endian, is 156 * implementation dependent. When the encoding is UTF-16, whether or not the 157 * output is big-endian or little-endian is implementation dependent, but a 158 * Byte Order Mark must be generated for non-character outputs, such as 159 * <code>LSOutput.byteStream</code> or <code>LSOutput.systemId</code>. If 160 * the Byte Order Mark is not generated, a "byte-order-mark-needed" warning 161 * is reported. When the encoding is UTF-16LE or UTF-16BE, the output is 162 * big-endian (UTF-16BE) or little-endian (UTF-16LE) and the Byte Order Mark 163 * is not be generated. In all cases, the encoding declaration, if 164 * generated, will correspond to the encoding used during the serialization 165 * (e.g. <code>encoding="UTF-16"</code> will appear if UTF-16 was 166 * requested). 167 * <p> Namespaces are fixed up during serialization, the serialization process 168 * will verify that namespace declarations, namespace prefixes and the 169 * namespace URI associated with elements and attributes are consistent. If 170 * inconsistencies are found, the serialized form of the document will be 171 * altered to remove them. The method used for doing the namespace fixup 172 * while serializing a document is the algorithm defined in Appendix B.1, 173 * "Namespace normalization", of [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 174 * . 175 * <p> While serializing a document, the parameter "discard-default-content" 176 * controls whether or not non-specified data is serialized. 177 * <p> While serializing, errors and warnings are reported to the application 178 * through the error handler (<code>LSSerializer.domConfig</code>'s "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 179 * error-handler</a>" parameter). This specification does in no way try to define all possible 180 * errors and warnings that can occur while serializing a DOM node, but some 181 * common error and warning cases are defined. The types ( 182 * <code>DOMError.type</code>) of errors and warnings defined by this 183 * specification are: 184 * <dl> 185 * <dt><code>"no-output-specified" [fatal]</code></dt> 186 * <dd> Raised when 187 * writing to a <code>LSOutput</code> if no output is specified in the 188 * <code>LSOutput</code>. </dd> 189 * <dt> 190 * <code>"unbound-prefix-in-entity-reference" [fatal]</code> </dt> 191 * <dd> Raised if the 192 * configuration parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-namespaces'> 193 * namespaces</a>" is set to <code>true</code> and an entity whose replacement text 194 * contains unbound namespace prefixes is referenced in a location where 195 * there are no bindings for the namespace prefixes. </dd> 196 * <dt> 197 * <code>"unsupported-encoding" [fatal]</code></dt> 198 * <dd> Raised if an unsupported 199 * encoding is encountered. </dd> 200 * </dl> 201 * <p> In addition to raising the defined errors and warnings, implementations 202 * are expected to raise implementation specific errors and warnings for any 203 * other error and warning cases such as IO errors (file not found, 204 * permission denied,...) and so on. 205 * <p>See also the <a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407'>Document Object Model (DOM) Level 3 Load 206 and Save Specification</a>. 207 * 208 * @since 1.5 209 */ 210 public interface LSSerializer { 211 /** 212 * The <code>DOMConfiguration</code> object used by the 213 * <code>LSSerializer</code> when serializing a DOM node. 214 * <br> In addition to the parameters recognized by the <a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMConfiguration'> 215 * DOMConfiguration</a> interface defined in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 216 * , the <code>DOMConfiguration</code> objects for 217 * <code>LSSerializer</code> adds, or modifies, the following 218 * parameters: 219 * <dl> 220 * <dt><code>"canonical-form"</code></dt> 221 * <dd> 222 * <dl> 223 * <dt><code>true</code></dt> 224 * <dd>[<em>optional</em>] Writes the document according to the rules specified in [<a href='http://www.w3.org/TR/2001/REC-xml-c14n-20010315'>Canonical XML</a>]. 225 * In addition to the behavior described in "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-canonical-form'> 226 * canonical-form</a>" [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 227 * , setting this parameter to <code>true</code> will set the parameters 228 * "format-pretty-print", "discard-default-content", and "xml-declaration 229 * ", to <code>false</code>. Setting one of those parameters to 230 * <code>true</code> will set this parameter to <code>false</code>. 231 * Serializing an XML 1.1 document when "canonical-form" is 232 * <code>true</code> will generate a fatal error. </dd> 233 * <dt><code>false</code></dt> 234 * <dd>[<em>required</em>] (<em>default</em>) Do not canonicalize the output. </dd> 235 * </dl></dd> 236 * <dt><code>"discard-default-content"</code></dt> 237 * <dd> 238 * <dl> 239 * <dt> 240 * <code>true</code></dt> 241 * <dd>[<em>required</em>] (<em>default</em>) Use the <code>Attr.specified</code> attribute to decide what attributes 242 * should be discarded. Note that some implementations might use 243 * whatever information available to the implementation (i.e. XML 244 * schema, DTD, the <code>Attr.specified</code> attribute, and so on) to 245 * determine what attributes and content to discard if this parameter is 246 * set to <code>true</code>. </dd> 250 * <dt><code>"format-pretty-print"</code></dt> 251 * <dd> 252 * <dl> 253 * <dt> 254 * <code>true</code></dt> 255 * <dd>[<em>optional</em>] Formatting the output by adding whitespace to produce a pretty-printed, 256 * indented, human-readable form. The exact form of the transformations 257 * is not specified by this specification. Pretty-printing changes the 258 * content of the document and may affect the validity of the document, 259 * validating implementations should preserve validity. </dd> 260 * <dt> 261 * <code>false</code></dt> 262 * <dd>[<em>required</em>] (<em>default</em>) Don't pretty-print the result. </dd> 263 * </dl></dd> 264 * <dt> 265 * <code>"ignore-unknown-character-denormalizations"</code> </dt> 266 * <dd> 267 * <dl> 268 * <dt> 269 * <code>true</code></dt> 270 * <dd>[<em>required</em>] (<em>default</em>) If, while verifying full normalization when [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>] is 271 * supported, a character is encountered for which the normalization 272 * properties cannot be determined, then raise a 273 * <code>"unknown-character-denormalization"</code> warning (instead of 274 * raising an error, if this parameter is not set) and ignore any 275 * possible denormalizations caused by these characters. </dd> 276 * <dt> 277 * <code>false</code></dt> 278 * <dd>[<em>optional</em>] Report a fatal error if a character is encountered for which the 279 * processor cannot determine the normalization properties. </dd> 280 * </dl></dd> 281 * <dt> 282 * <code>"normalize-characters"</code></dt> 283 * <dd> This parameter is equivalent to 284 * the one defined by <code>DOMConfiguration</code> in [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 285 * . Unlike in the Core, the default value for this parameter is 286 * <code>true</code>. While DOM implementations are not required to 287 * support <a href='http://www.w3.org/TR/2004/REC-xml11-20040204/#dt-fullnorm'>fully 288 * normalizing</a> the characters in the document according to appendix E of [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>], this 289 * parameter must be activated by default if supported. </dd> 290 * <dt> 291 * <code>"xml-declaration"</code></dt> 292 * <dd> 293 * <dl> 294 * <dt><code>true</code></dt> 295 * <dd>[<em>required</em>] (<em>default</em>) If a <code>Document</code>, <code>Element</code>, or <code>Entity</code> 296 * node is serialized, the XML declaration, or text declaration, should 297 * be included. The version (<code>Document.xmlVersion</code> if the 298 * document is a Level 3 document and the version is non-null, otherwise 299 * use the value "1.0"), and the output encoding (see 300 * <code>LSSerializer.write</code> for details on how to find the output 301 * encoding) are specified in the serialized XML declaration. </dd> 302 * <dt> 303 * <code>false</code></dt> 304 * <dd>[<em>required</em>] Do not serialize the XML and text declarations. Report a 305 * <code>"xml-declaration-needed"</code> warning if this will cause 306 * problems (i.e. the serialized data is of an XML version other than [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>], or an 307 * encoding would be needed to be able to re-parse the serialized data). </dd> 308 * </dl></dd> 309 * </dl> 310 */ 311 public DOMConfiguration getDomConfig(); 312 313 /** 314 * The end-of-line sequence of characters to be used in the XML being 315 * written out. Any string is supported, but XML treats only a certain 316 * set of characters sequence as end-of-line (See section 2.11, 317 * "End-of-Line Handling" in [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>], if the 318 * serialized content is XML 1.0 or section 2.11, "End-of-Line Handling" 319 * in [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>], if the 320 * serialized content is XML 1.1). Using other character sequences than 321 * the recommended ones can result in a document that is either not 322 * serializable or not well-formed). 323 * <br> On retrieval, the default value of this attribute is the 324 * implementation specific default end-of-line sequence. DOM 325 * implementations should choose the default to match the usual 326 * convention for text files in the environment being used. 327 * Implementations must choose a default sequence that matches one of 328 * those allowed by XML 1.0 or XML 1.1, depending on the serialized 329 * content. Setting this attribute to <code>null</code> will reset its 330 * value to the default value. 331 * <br> 332 */ 333 public String getNewLine(); 334 /** 335 * The end-of-line sequence of characters to be used in the XML being 336 * written out. Any string is supported, but XML treats only a certain 337 * set of characters sequence as end-of-line (See section 2.11, 338 * "End-of-Line Handling" in [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>], if the 339 * serialized content is XML 1.0 or section 2.11, "End-of-Line Handling" 340 * in [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>], if the 341 * serialized content is XML 1.1). Using other character sequences than 342 * the recommended ones can result in a document that is either not 343 * serializable or not well-formed). 344 * <br> On retrieval, the default value of this attribute is the 345 * implementation specific default end-of-line sequence. DOM 346 * implementations should choose the default to match the usual 347 * convention for text files in the environment being used. 348 * Implementations must choose a default sequence that matches one of 349 * those allowed by XML 1.0 or XML 1.1, depending on the serialized 350 * content. Setting this attribute to <code>null</code> will reset its 351 * value to the default value. 352 * <br> 353 */ 354 public void setNewLine(String newLine); 355 356 /** 357 * When the application provides a filter, the serializer will call out 358 * to the filter before serializing each Node. The filter implementation 359 * can choose to remove the node from the stream or to terminate the 360 * serialization early. 361 * <br> The filter is invoked after the operations requested by the 362 * <code>DOMConfiguration</code> parameters have been applied. For 363 * example, CDATA sections won't be passed to the filter if "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-cdata-sections'> 364 * cdata-sections</a>" is set to <code>false</code>. 365 */ 366 public LSSerializerFilter getFilter(); 367 /** 368 * When the application provides a filter, the serializer will call out 369 * to the filter before serializing each Node. The filter implementation 370 * can choose to remove the node from the stream or to terminate the 371 * serialization early. 372 * <br> The filter is invoked after the operations requested by the 373 * <code>DOMConfiguration</code> parameters have been applied. For 374 * example, CDATA sections won't be passed to the filter if "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-cdata-sections'> 375 * cdata-sections</a>" is set to <code>false</code>. 376 */ 377 public void setFilter(LSSerializerFilter filter); 378 379 /** 380 * Serialize the specified node as described above in the general 381 * description of the <code>LSSerializer</code> interface. The output is 382 * written to the supplied <code>LSOutput</code>. 383 * <br> When writing to a <code>LSOutput</code>, the encoding is found by 384 * looking at the encoding information that is reachable through the 385 * <code>LSOutput</code> and the item to be written (or its owner 386 * document) in this order: 387 * <ol> 388 * <li> <code>LSOutput.encoding</code>, 389 * </li> 390 * <li> 391 * <code>Document.inputEncoding</code>, 392 * </li> 393 * <li> 394 * <code>Document.xmlEncoding</code>. 395 * </li> 397 * <br> If no encoding is reachable through the above properties, a 398 * default encoding of "UTF-8" will be used. If the specified encoding 399 * is not supported an "unsupported-encoding" fatal error is raised. 400 * <br> If no output is specified in the <code>LSOutput</code>, a 401 * "no-output-specified" fatal error is raised. 402 * <br> The implementation is responsible of associating the appropriate 403 * media type with the serialized data. 404 * <br> When writing to a HTTP URI, a HTTP PUT is performed. When writing 405 * to other types of URIs, the mechanism for writing the data to the URI 406 * is implementation dependent. 407 * @param nodeArg The node to serialize. 408 * @param destination The destination for the serialized DOM. 409 * @return Returns <code>true</code> if <code>node</code> was 410 * successfully serialized. Return <code>false</code> in case the 411 * normal processing stopped but the implementation kept serializing 412 * the document; the result of the serialization being implementation 413 * dependent then. 414 * @exception LSException 415 * SERIALIZE_ERR: Raised if the <code>LSSerializer</code> was unable to 416 * serialize the node. DOM applications should attach a 417 * <code>DOMErrorHandler</code> using the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 418 * error-handler</a>" if they wish to get details on the error. 419 */ 420 public boolean write(Node nodeArg, 421 LSOutput destination) 422 throws LSException; 423 424 /** 425 * A convenience method that acts as if <code>LSSerializer.write</code> 426 * was called with a <code>LSOutput</code> with no encoding specified 427 * and <code>LSOutput.systemId</code> set to the <code>uri</code> 428 * argument. 429 * @param nodeArg The node to serialize. 430 * @param uri The URI to write to. 431 * @return Returns <code>true</code> if <code>node</code> was 432 * successfully serialized. Return <code>false</code> in case the 433 * normal processing stopped but the implementation kept serializing 434 * the document; the result of the serialization being implementation 435 * dependent then. 436 * @exception LSException 437 * SERIALIZE_ERR: Raised if the <code>LSSerializer</code> was unable to 438 * serialize the node. DOM applications should attach a 439 * <code>DOMErrorHandler</code> using the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 440 * error-handler</a>" if they wish to get details on the error. 441 */ 442 public boolean writeToURI(Node nodeArg, 443 String uri) 444 throws LSException; 445 446 /** 447 * Serialize the specified node as described above in the general 448 * description of the <code>LSSerializer</code> interface. The output is 449 * written to a <code>DOMString</code> that is returned to the caller. 450 * The encoding used is the encoding of the <code>DOMString</code> type, 451 * i.e. UTF-16. Note that no Byte Order Mark is generated in a 452 * <code>DOMString</code> object. 453 * @param nodeArg The node to serialize. 454 * @return Returns the serialized data. 455 * @exception DOMException 456 * DOMSTRING_SIZE_ERR: Raised if the resulting string is too long to 457 * fit in a <code>DOMString</code>. 458 * @exception LSException 459 * SERIALIZE_ERR: Raised if the <code>LSSerializer</code> was unable to 460 * serialize the node. DOM applications should attach a 461 * <code>DOMErrorHandler</code> using the parameter "<a href='http://www.w3.org/TR/DOM-Level-3-Core/core.html#parameter-error-handler'> 462 * error-handler</a>" if they wish to get details on the error. 463 */ 464 public String writeToString(Node nodeArg) 465 throws DOMException, LSException; 466 467 } | 34 * Informatics and Mathematics, Keio University). All Rights Reserved. This 35 * work is distributed under the W3C(r) Software License [1] in the hope that 36 * it will be useful, but WITHOUT ANY WARRANTY; without even the implied 37 * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 38 * 39 * [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231 40 */ 41 42 package org.w3c.dom.ls; 43 44 import org.w3c.dom.DOMConfiguration; 45 import org.w3c.dom.Node; 46 import org.w3c.dom.DOMException; 47 48 /** 49 * A <code>LSSerializer</code> provides an API for serializing (writing) a 50 * DOM document out into XML. The XML data is written to a string or an 51 * output stream. Any changes or fixups made during the serialization affect 52 * only the serialized data. The <code>Document</code> object and its 53 * children are never altered by the serialization operation. 54 * <p> During serialization of XML data, namespace fixup is done as defined in 55 * [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 56 * , Appendix B. [<a href='http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113'>DOM Level 2 Core</a>] 57 * allows empty strings as a real namespace URI. If the 58 * <code>namespaceURI</code> of a <code>Node</code> is empty string, the 59 * serialization will treat them as <code>null</code>, ignoring the prefix 60 * if any. 61 * <p> <code>LSSerializer</code> accepts any node type for serialization. For 62 * nodes of type <code>Document</code> or <code>Entity</code>, well-formed 63 * XML will be created when possible (well-formedness is guaranteed if the 64 * document or entity comes from a parse operation and is unchanged since it 65 * was created). The serialized output for these node types is either as a 66 * XML document or an External XML Entity, respectively, and is acceptable 67 * input for an XML parser. For all other types of nodes the serialized form 68 * is implementation dependent. 69 * <p>Within a <code>Document</code>, <code>DocumentFragment</code>, or 70 * <code>Entity</code> being serialized, <code>Nodes</code> are processed as 71 * follows 72 * <ul> 73 * <li> <code>Document</code> nodes are written, including the XML 74 * declaration (unless the parameter "xml-declaration" is set to 75 * <code>false</code>) and a DTD subset, if one exists in the DOM. Writing a 76 * <code>Document</code> node serializes the entire document. 77 * </li> 78 * <li> 79 * <code>Entity</code> nodes, when written directly by 80 * <code>LSSerializer.write</code>, outputs the entity expansion but no 81 * namespace fixup is done. The resulting output will be valid as an 82 * external entity. 83 * </li> 84 * <li> If the parameter 85 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-entities'>entities</a>" 86 * is set to <code>true</code>, <code>EntityReference</code> nodes are 87 * serialized as an entity reference of the form " 88 * <code>&entityName;</code>" in the output. Child nodes (the expansion) 89 * of the entity reference are ignored. If the parameter 90 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-entities'>entities</a>" 91 * is set to <code>false</code>, only the children of the entity reference 92 * are serialized. <code>EntityReference</code> nodes with no children (no 93 * corresponding <code>Entity</code> node or the corresponding 94 * <code>Entity</code> nodes have no children) are always serialized. 95 * </li> 96 * <li> 97 * <code>CDATAsections</code> containing content characters that cannot be 98 * represented in the specified output encoding are handled according to the 99 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-split-cdata-sections'>split-cdata-sections</a>" 100 * parameter. If the parameter is set to <code>true</code>, 101 * <code>CDATAsections</code> are split, and the unrepresentable characters 102 * are serialized as numeric character references in ordinary content. The 103 * exact position and number of splits is not specified. If the parameter 104 * is set to <code>false</code>, unrepresentable characters in a 105 * <code>CDATAsection</code> are reported as 106 * <code>"wf-invalid-character"</code> errors if the parameter 107 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-well-formed'>well-formed</a>" 108 * is set to <code>true</code>. The error is not recoverable - there is no 109 * mechanism for supplying alternative characters and continuing with the 110 * serialization. 111 * </li> 112 * <li> <code>DocumentFragment</code> nodes are serialized by 113 * serializing the children of the document fragment in the order they 114 * appear in the document fragment. 115 * </li> 116 * <li> All other node types (Element, Text, 117 * etc.) are serialized to their corresponding XML source form. 118 * </li> 119 * </ul> 120 * <p ><b>Note:</b> The serialization of a <code>Node</code> does not always 121 * generate a well-formed XML document, i.e. a <code>LSParser</code> might 122 * throw fatal errors when parsing the resulting serialization. 123 * <p> Within the character data of a document (outside of markup), any 124 * characters that cannot be represented directly are replaced with 125 * character references. Occurrences of '<' and '&' are replaced by 126 * the predefined entities &lt; and &amp;. The other predefined 127 * entities (&gt;, &apos;, and &quot;) might not be used, except 128 * where needed (e.g. using &gt; in cases such as ']]>'). Any 129 * characters that cannot be represented directly in the output character 130 * encoding are serialized as numeric character references (and since 131 * character encoding standards commonly use hexadecimal representations of 132 * characters, using the hexadecimal representation when serializing 133 * character references is encouraged). 134 * <p> To allow attribute values to contain both single and double quotes, the 135 * apostrophe or single-quote character (') may be represented as 136 * "&apos;", and the double-quote character (") as "&quot;". New 137 * line characters and other characters that cannot be represented directly 138 * in attribute values in the output character encoding are serialized as a 139 * numeric character reference. 140 * <p> Within markup, but outside of attributes, any occurrence of a character 141 * that cannot be represented in the output character encoding is reported 142 * as a <code>DOMError</code> fatal error. An example would be serializing 143 * the element <LaCañada/> with <code>encoding="us-ascii"</code>. 144 * This will result with a generation of a <code>DOMError</code> 145 * "wf-invalid-character-in-node-name" (as proposed in 146 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-well-formed'>well-formed</a>"). 147 * <p> When requested by setting the parameter 148 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-normalize-characters'>normalize-characters</a>" 149 * on <code>LSSerializer</code> to true, character normalization is 150 * performed according to the definition of 151 * <a href='http://www.w3.org/TR/2004/REC-xml11-20040204/#dt-fullnorm'>fully 152 * normalized</a> characters included in appendix E of 153 * [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>] on all 154 * data to be serialized, both markup and character data. The character 155 * normalization process affects only the data as it is being written; it 156 * does not alter the DOM's view of the document after serialization has 157 * completed. 158 * <p> Implementations are required to support the encodings "UTF-8", 159 * "UTF-16", "UTF-16BE", and "UTF-16LE" to guarantee that data is 160 * serializable in all encodings that are required to be supported by all 161 * XML parsers. When the encoding is UTF-8, whether or not a byte order mark 162 * is serialized, or if the output is big-endian or little-endian, is 163 * implementation dependent. When the encoding is UTF-16, whether or not the 164 * output is big-endian or little-endian is implementation dependent, but a 165 * Byte Order Mark must be generated for non-character outputs, such as 166 * <code>LSOutput.byteStream</code> or <code>LSOutput.systemId</code>. If 167 * the Byte Order Mark is not generated, a "byte-order-mark-needed" warning 168 * is reported. When the encoding is UTF-16LE or UTF-16BE, the output is 169 * big-endian (UTF-16BE) or little-endian (UTF-16LE) and the Byte Order Mark 170 * is not be generated. In all cases, the encoding declaration, if 171 * generated, will correspond to the encoding used during the serialization 172 * (e.g. <code>encoding="UTF-16"</code> will appear if UTF-16 was 173 * requested). 174 * <p> Namespaces are fixed up during serialization, the serialization process 175 * will verify that namespace declarations, namespace prefixes and the 176 * namespace URI associated with elements and attributes are consistent. If 177 * inconsistencies are found, the serialized form of the document will be 178 * altered to remove them. The method used for doing the namespace fixup 179 * while serializing a document is the algorithm defined in Appendix B.1, 180 * "Namespace normalization", of 181 * [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 182 * . 183 * <p> While serializing a document, the parameter "discard-default-content" 184 * controls whether or not non-specified data is serialized. 185 * <p> While serializing, errors and warnings are reported to the application 186 * through the error handler (<code>LSSerializer.domConfig</code>'s 187 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-error-handler'>error-handler</a>" 188 * parameter). This specification does in no way try to define all possible 189 * errors and warnings that can occur while serializing a DOM node, but some 190 * common error and warning cases are defined. The types ( 191 * <code>DOMError.type</code>) of errors and warnings defined by this 192 * specification are: 193 * <dl> 194 * <dt><code>"no-output-specified" [fatal]</code></dt> 195 * <dd> Raised when 196 * writing to a <code>LSOutput</code> if no output is specified in the 197 * <code>LSOutput</code>. </dd> 198 * <dt> 199 * <code>"unbound-prefix-in-entity-reference" [fatal]</code> </dt> 200 * <dd> Raised if the 201 * configuration parameter 202 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-namespaces'>namespaces</a>" 203 * is set to <code>true</code> and an entity whose replacement text 204 * contains unbound namespace prefixes is referenced in a location where 205 * there are no bindings for the namespace prefixes. </dd> 206 * <dt> 207 * <code>"unsupported-encoding" [fatal]</code></dt> 208 * <dd> Raised if an unsupported 209 * encoding is encountered. </dd> 210 * </dl> 211 * <p> In addition to raising the defined errors and warnings, implementations 212 * are expected to raise implementation specific errors and warnings for any 213 * other error and warning cases such as IO errors (file not found, 214 * permission denied,...) and so on. 215 * <p>See also the 216 * <a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-LS-20040407'> 217 Document Object Model (DOM) Level 3 Load and Save Specification</a>. 218 * 219 * @since 1.5 220 */ 221 public interface LSSerializer { 222 /** 223 * The <code>DOMConfiguration</code> object used by the 224 * <code>LSSerializer</code> when serializing a DOM node. 225 * <br> In addition to the parameters recognized by the 226 * <a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#DOMConfiguration'>DOMConfiguration</a> 227 * interface defined in 228 * [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 229 * , the <code>DOMConfiguration</code> objects for 230 * <code>LSSerializer</code> adds, or modifies, the following 231 * parameters: 232 * <dl> 233 * <dt><code>"canonical-form"</code></dt> 234 * <dd> 235 * <dl> 236 * <dt><code>true</code></dt> 237 * <dd>[<em>optional</em>] Writes the document according to the rules specified in 238 * [<a href='http://www.w3.org/TR/2001/REC-xml-c14n-20010315'>Canonical XML</a>]. 239 * In addition to the behavior described in 240 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-canonical-form'>canonical-form</a>" 241 * [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 242 * , setting this parameter to <code>true</code> will set the parameters 243 * "format-pretty-print", "discard-default-content", and "xml-declaration 244 * ", to <code>false</code>. Setting one of those parameters to 245 * <code>true</code> will set this parameter to <code>false</code>. 246 * Serializing an XML 1.1 document when "canonical-form" is 247 * <code>true</code> will generate a fatal error. </dd> 248 * <dt><code>false</code></dt> 249 * <dd>[<em>required</em>] (<em>default</em>) Do not canonicalize the output. </dd> 250 * </dl></dd> 251 * <dt><code>"discard-default-content"</code></dt> 252 * <dd> 253 * <dl> 254 * <dt> 255 * <code>true</code></dt> 256 * <dd>[<em>required</em>] (<em>default</em>) Use the <code>Attr.specified</code> attribute to decide what attributes 257 * should be discarded. Note that some implementations might use 258 * whatever information available to the implementation (i.e. XML 259 * schema, DTD, the <code>Attr.specified</code> attribute, and so on) to 260 * determine what attributes and content to discard if this parameter is 261 * set to <code>true</code>. </dd> 265 * <dt><code>"format-pretty-print"</code></dt> 266 * <dd> 267 * <dl> 268 * <dt> 269 * <code>true</code></dt> 270 * <dd>[<em>optional</em>] Formatting the output by adding whitespace to produce a pretty-printed, 271 * indented, human-readable form. The exact form of the transformations 272 * is not specified by this specification. Pretty-printing changes the 273 * content of the document and may affect the validity of the document, 274 * validating implementations should preserve validity. </dd> 275 * <dt> 276 * <code>false</code></dt> 277 * <dd>[<em>required</em>] (<em>default</em>) Don't pretty-print the result. </dd> 278 * </dl></dd> 279 * <dt> 280 * <code>"ignore-unknown-character-denormalizations"</code> </dt> 281 * <dd> 282 * <dl> 283 * <dt> 284 * <code>true</code></dt> 285 * <dd>[<em>required</em>] (<em>default</em>) If, while verifying full normalization when 286 * [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>] is 287 * supported, a character is encountered for which the normalization 288 * properties cannot be determined, then raise a 289 * <code>"unknown-character-denormalization"</code> warning (instead of 290 * raising an error, if this parameter is not set) and ignore any 291 * possible denormalizations caused by these characters. </dd> 292 * <dt> 293 * <code>false</code></dt> 294 * <dd>[<em>optional</em>] Report a fatal error if a character is encountered for which the 295 * processor cannot determine the normalization properties. </dd> 296 * </dl></dd> 297 * <dt> 298 * <code>"normalize-characters"</code></dt> 299 * <dd> This parameter is equivalent to 300 * the one defined by <code>DOMConfiguration</code> in 301 * [<a href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>DOM Level 3 Core</a>] 302 * . Unlike in the Core, the default value for this parameter is 303 * <code>true</code>. While DOM implementations are not required to 304 * support <a href='http://www.w3.org/TR/2004/REC-xml11-20040204/#dt-fullnorm'>fully 305 * normalizing</a> the characters in the document according to appendix E of 306 * [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>], this 307 * parameter must be activated by default if supported. </dd> 308 * <dt> 309 * <code>"xml-declaration"</code></dt> 310 * <dd> 311 * <dl> 312 * <dt><code>true</code></dt> 313 * <dd>[<em>required</em>] (<em>default</em>) If a <code>Document</code>, 314 * <code>Element</code>, or <code>Entity</code> 315 * node is serialized, the XML declaration, or text declaration, should 316 * be included. The version (<code>Document.xmlVersion</code> if the 317 * document is a Level 3 document and the version is non-null, otherwise 318 * use the value "1.0"), and the output encoding (see 319 * <code>LSSerializer.write</code> for details on how to find the output 320 * encoding) are specified in the serialized XML declaration. </dd> 321 * <dt> 322 * <code>false</code></dt> 323 * <dd>[<em>required</em>] Do not serialize the XML and text declarations. Report a 324 * <code>"xml-declaration-needed"</code> warning if this will cause 325 * problems (i.e. the serialized data is of an XML version other than 326 * [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>], or an 327 * encoding would be needed to be able to re-parse the serialized data). </dd> 328 * </dl></dd> 329 * </dl> 330 */ 331 public DOMConfiguration getDomConfig(); 332 333 /** 334 * The end-of-line sequence of characters to be used in the XML being 335 * written out. Any string is supported, but XML treats only a certain 336 * set of characters sequence as end-of-line (See section 2.11, 337 * "End-of-Line Handling" in [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>], 338 * if the serialized content is XML 1.0 or section 2.11, "End-of-Line Handling" 339 * in [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>], if the 340 * serialized content is XML 1.1). Using other character sequences than 341 * the recommended ones can result in a document that is either not 342 * serializable or not well-formed). 343 * <br> On retrieval, the default value of this attribute is the 344 * implementation specific default end-of-line sequence. DOM 345 * implementations should choose the default to match the usual 346 * convention for text files in the environment being used. 347 * Implementations must choose a default sequence that matches one of 348 * those allowed by XML 1.0 or XML 1.1, depending on the serialized 349 * content. Setting this attribute to <code>null</code> will reset its 350 * value to the default value. 351 * <br> 352 */ 353 public String getNewLine(); 354 /** 355 * The end-of-line sequence of characters to be used in the XML being 356 * written out. Any string is supported, but XML treats only a certain 357 * set of characters sequence as end-of-line (See section 2.11, 358 * "End-of-Line Handling" in [<a href='http://www.w3.org/TR/2004/REC-xml-20040204'>XML 1.0</a>], 359 * if the serialized content is XML 1.0 or section 2.11, "End-of-Line Handling" 360 * in [<a href='http://www.w3.org/TR/2004/REC-xml11-20040204/'>XML 1.1</a>], if the 361 * serialized content is XML 1.1). Using other character sequences than 362 * the recommended ones can result in a document that is either not 363 * serializable or not well-formed). 364 * <br> On retrieval, the default value of this attribute is the 365 * implementation specific default end-of-line sequence. DOM 366 * implementations should choose the default to match the usual 367 * convention for text files in the environment being used. 368 * Implementations must choose a default sequence that matches one of 369 * those allowed by XML 1.0 or XML 1.1, depending on the serialized 370 * content. Setting this attribute to <code>null</code> will reset its 371 * value to the default value. 372 * <br> 373 */ 374 public void setNewLine(String newLine); 375 376 /** 377 * When the application provides a filter, the serializer will call out 378 * to the filter before serializing each Node. The filter implementation 379 * can choose to remove the node from the stream or to terminate the 380 * serialization early. 381 * <br> The filter is invoked after the operations requested by the 382 * <code>DOMConfiguration</code> parameters have been applied. For 383 * example, CDATA sections won't be passed to the filter if 384 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-cdata-sections'>cdata-sections</a>" 385 * is set to <code>false</code>. 386 */ 387 public LSSerializerFilter getFilter(); 388 /** 389 * When the application provides a filter, the serializer will call out 390 * to the filter before serializing each Node. The filter implementation 391 * can choose to remove the node from the stream or to terminate the 392 * serialization early. 393 * <br> The filter is invoked after the operations requested by the 394 * <code>DOMConfiguration</code> parameters have been applied. For 395 * example, CDATA sections won't be passed to the filter if 396 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-cdata-sections'>cdata-sections</a>" 397 * is set to <code>false</code>. 398 */ 399 public void setFilter(LSSerializerFilter filter); 400 401 /** 402 * Serialize the specified node as described above in the general 403 * description of the <code>LSSerializer</code> interface. The output is 404 * written to the supplied <code>LSOutput</code>. 405 * <br> When writing to a <code>LSOutput</code>, the encoding is found by 406 * looking at the encoding information that is reachable through the 407 * <code>LSOutput</code> and the item to be written (or its owner 408 * document) in this order: 409 * <ol> 410 * <li> <code>LSOutput.encoding</code>, 411 * </li> 412 * <li> 413 * <code>Document.inputEncoding</code>, 414 * </li> 415 * <li> 416 * <code>Document.xmlEncoding</code>. 417 * </li> 419 * <br> If no encoding is reachable through the above properties, a 420 * default encoding of "UTF-8" will be used. If the specified encoding 421 * is not supported an "unsupported-encoding" fatal error is raised. 422 * <br> If no output is specified in the <code>LSOutput</code>, a 423 * "no-output-specified" fatal error is raised. 424 * <br> The implementation is responsible of associating the appropriate 425 * media type with the serialized data. 426 * <br> When writing to a HTTP URI, a HTTP PUT is performed. When writing 427 * to other types of URIs, the mechanism for writing the data to the URI 428 * is implementation dependent. 429 * @param nodeArg The node to serialize. 430 * @param destination The destination for the serialized DOM. 431 * @return Returns <code>true</code> if <code>node</code> was 432 * successfully serialized. Return <code>false</code> in case the 433 * normal processing stopped but the implementation kept serializing 434 * the document; the result of the serialization being implementation 435 * dependent then. 436 * @exception LSException 437 * SERIALIZE_ERR: Raised if the <code>LSSerializer</code> was unable to 438 * serialize the node. DOM applications should attach a 439 * <code>DOMErrorHandler</code> using the parameter 440 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-error-handler'>error-handler</a>" 441 * if they wish to get details on the error. 442 */ 443 public boolean write(Node nodeArg, 444 LSOutput destination) 445 throws LSException; 446 447 /** 448 * A convenience method that acts as if <code>LSSerializer.write</code> 449 * was called with a <code>LSOutput</code> with no encoding specified 450 * and <code>LSOutput.systemId</code> set to the <code>uri</code> 451 * argument. 452 * @param nodeArg The node to serialize. 453 * @param uri The URI to write to. 454 * @return Returns <code>true</code> if <code>node</code> was 455 * successfully serialized. Return <code>false</code> in case the 456 * normal processing stopped but the implementation kept serializing 457 * the document; the result of the serialization being implementation 458 * dependent then. 459 * @exception LSException 460 * SERIALIZE_ERR: Raised if the <code>LSSerializer</code> was unable to 461 * serialize the node. DOM applications should attach a 462 * <code>DOMErrorHandler</code> using the parameter 463 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-error-handler'>error-handler</a>" 464 * if they wish to get details on the error. 465 */ 466 public boolean writeToURI(Node nodeArg, 467 String uri) 468 throws LSException; 469 470 /** 471 * Serialize the specified node as described above in the general 472 * description of the <code>LSSerializer</code> interface. The output is 473 * written to a <code>DOMString</code> that is returned to the caller. 474 * The encoding used is the encoding of the <code>DOMString</code> type, 475 * i.e. UTF-16. Note that no Byte Order Mark is generated in a 476 * <code>DOMString</code> object. 477 * @param nodeArg The node to serialize. 478 * @return Returns the serialized data. 479 * @exception DOMException 480 * DOMSTRING_SIZE_ERR: Raised if the resulting string is too long to 481 * fit in a <code>DOMString</code>. 482 * @exception LSException 483 * SERIALIZE_ERR: Raised if the <code>LSSerializer</code> was unable to 484 * serialize the node. DOM applications should attach a 485 * <code>DOMErrorHandler</code> using the parameter 486 * "<a href='https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-error-handler'>error-handler</a>" 487 * if they wish to get details on the error. 488 */ 489 public String writeToString(Node nodeArg) 490 throws DOMException, LSException; 491 492 } |