41 * invokes the appropriate methods in the ParserCallback class. This 42 * is the default parser used by HTMLEditorKit to parse HTML url's. 43 * <p>This will message the callback for all valid tags, as well as 44 * tags that are implied but not explicitly specified. For example, the 45 * html string (<p>blah) only has a p tag defined. The callback 46 * will see the following methods: 47 * <ol><li><i>handleStartTag(html, ...)</i></li> 48 * <li><i>handleStartTag(head, ...)</i></li> 49 * <li><i>handleEndTag(head)</i></li> 50 * <li><i>handleStartTag(body, ...)</i></li> 51 * <li><i>handleStartTag(p, ...)</i></li> 52 * <li><i>handleText(...)</i></li> 53 * <li><i>handleEndTag(p)</i></li> 54 * <li><i>handleEndTag(body)</i></li> 55 * <li><i>handleEndTag(html)</i></li> 56 * </ol> 57 * The items in <i>italic</i> are implied, that is, although they were not 58 * explicitly specified, to be correct html they should have been present 59 * (head isn't necessary, but it is still generated). For tags that 60 * are implied, the AttributeSet argument will have a value of 61 * <code>Boolean.TRUE</code> for the key 62 * <code>HTMLEditorKit.ParserCallback.IMPLIED</code>. 63 * <p>HTML.Attributes defines a type safe enumeration of html attributes. 64 * If an attribute key of a tag is defined in HTML.Attribute, the 65 * HTML.Attribute will be used as the key, otherwise a String will be used. 66 * For example <p foo=bar class=neat> has two attributes. foo is 67 * not defined in HTML.Attribute, where as class is, therefore the 68 * AttributeSet will have two values in it, HTML.Attribute.CLASS with 69 * a String value of 'neat' and the String key 'foo' with a String value of 70 * 'bar'. 71 * <p>The position argument will indicate the start of the tag, comment 72 * or text. Similar to arrays, the first character in the stream has a 73 * position of 0. For tags that are 74 * implied the position will indicate 75 * the location of the next encountered tag. In the first example, 76 * the implied start body and html tags will have the same position as the 77 * p tag, and the implied end p, html and body tags will all have the same 78 * position. 79 * <p>As html skips whitespace the position for text will be the position 80 * of the first valid character, eg in the string '\n\n\nblah' 81 * the text 'blah' will have a position of 3, the newlines are skipped. 82 * <p> 83 * For attributes that do not have a value, eg in the html 84 * string <code><foo blah></code> the attribute <code>blah</code> 85 * does not have a value, there are two possible values that will be 86 * placed in the AttributeSet's value: 87 * <ul> 88 * <li>If the DTD does not contain an definition for the element, or the 89 * definition does not have an explicit value then the value in the 90 * AttributeSet will be <code>HTML.NULL_ATTRIBUTE_VALUE</code>. 91 * <li>If the DTD contains an explicit value, as in: 92 * <code><!ATTLIST OPTION selected (selected) #IMPLIED></code> 93 * this value from the dtd (in this case selected) will be used. 94 * </ul> 95 * <p> 96 * Once the stream has been parsed, the callback is notified of the most 97 * likely end of line string. The end of line string will be one of 98 * \n, \r or \r\n, which ever is encountered the most in parsing the 99 * stream. 100 * 101 * @author Sunita Mani 102 */ 103 public class DocumentParser extends javax.swing.text.html.parser.Parser { 104 105 private int inbody; 106 private int intitle; 107 private int inhead; 108 private int instyle; 109 private int inscript; 110 private boolean seentitle; 111 private HTMLEditorKit.ParserCallback callback = null; 112 private boolean ignoreCharSet = false; | 41 * invokes the appropriate methods in the ParserCallback class. This 42 * is the default parser used by HTMLEditorKit to parse HTML url's. 43 * <p>This will message the callback for all valid tags, as well as 44 * tags that are implied but not explicitly specified. For example, the 45 * html string (<p>blah) only has a p tag defined. The callback 46 * will see the following methods: 47 * <ol><li><i>handleStartTag(html, ...)</i></li> 48 * <li><i>handleStartTag(head, ...)</i></li> 49 * <li><i>handleEndTag(head)</i></li> 50 * <li><i>handleStartTag(body, ...)</i></li> 51 * <li><i>handleStartTag(p, ...)</i></li> 52 * <li><i>handleText(...)</i></li> 53 * <li><i>handleEndTag(p)</i></li> 54 * <li><i>handleEndTag(body)</i></li> 55 * <li><i>handleEndTag(html)</i></li> 56 * </ol> 57 * The items in <i>italic</i> are implied, that is, although they were not 58 * explicitly specified, to be correct html they should have been present 59 * (head isn't necessary, but it is still generated). For tags that 60 * are implied, the AttributeSet argument will have a value of 61 * {@code Boolean.TRUE} for the key 62 * {@code HTMLEditorKit.ParserCallback.IMPLIED}. 63 * <p>HTML.Attributes defines a type safe enumeration of html attributes. 64 * If an attribute key of a tag is defined in HTML.Attribute, the 65 * HTML.Attribute will be used as the key, otherwise a String will be used. 66 * For example <p foo=bar class=neat> has two attributes. foo is 67 * not defined in HTML.Attribute, where as class is, therefore the 68 * AttributeSet will have two values in it, HTML.Attribute.CLASS with 69 * a String value of 'neat' and the String key 'foo' with a String value of 70 * 'bar'. 71 * <p>The position argument will indicate the start of the tag, comment 72 * or text. Similar to arrays, the first character in the stream has a 73 * position of 0. For tags that are 74 * implied the position will indicate 75 * the location of the next encountered tag. In the first example, 76 * the implied start body and html tags will have the same position as the 77 * p tag, and the implied end p, html and body tags will all have the same 78 * position. 79 * <p>As html skips whitespace the position for text will be the position 80 * of the first valid character, eg in the string '\n\n\nblah' 81 * the text 'blah' will have a position of 3, the newlines are skipped. 82 * <p> 83 * For attributes that do not have a value, eg in the html 84 * string {@code <foo blah>} the attribute {@code blah} 85 * does not have a value, there are two possible values that will be 86 * placed in the AttributeSet's value: 87 * <ul> 88 * <li>If the DTD does not contain an definition for the element, or the 89 * definition does not have an explicit value then the value in the 90 * AttributeSet will be {@code HTML.NULL_ATTRIBUTE_VALUE}. 91 * <li>If the DTD contains an explicit value, as in: 92 * {@code < !ATTLIST OPTION selected (selected) #IMPLIED>} 93 * this value from the dtd (in this case selected) will be used. 94 * </ul> 95 * <p> 96 * Once the stream has been parsed, the callback is notified of the most 97 * likely end of line string. The end of line string will be one of 98 * \n, \r or \r\n, which ever is encountered the most in parsing the 99 * stream. 100 * 101 * @author Sunita Mani 102 */ 103 public class DocumentParser extends javax.swing.text.html.parser.Parser { 104 105 private int inbody; 106 private int intitle; 107 private int inhead; 108 private int instyle; 109 private int inscript; 110 private boolean seentitle; 111 private HTMLEditorKit.ParserCallback callback = null; 112 private boolean ignoreCharSet = false; |