< prev index next >

src/java.desktop/share/classes/javax/swing/text/html/parser/DocumentParser.java

Print this page




  41  * invokes the appropriate methods in the ParserCallback class. This
  42  * is the default parser used by HTMLEditorKit to parse HTML url's.
  43  * <p>This will message the callback for all valid tags, as well as
  44  * tags that are implied but not explicitly specified. For example, the
  45  * html string (&lt;p&gt;blah) only has a p tag defined. The callback
  46  * will see the following methods:
  47  * <ol><li><i>handleStartTag(html, ...)</i></li>
  48  *     <li><i>handleStartTag(head, ...)</i></li>
  49  *     <li><i>handleEndTag(head)</i></li>
  50  *     <li><i>handleStartTag(body, ...)</i></li>
  51  *     <li><i>handleStartTag(p, ...)</i></li>
  52  *     <li><i>handleText(...)</i></li>
  53  *     <li><i>handleEndTag(p)</i></li>
  54  *     <li><i>handleEndTag(body)</i></li>
  55  *     <li><i>handleEndTag(html)</i></li>
  56  * </ol>
  57  * The items in <i>italic</i> are implied, that is, although they were not
  58  * explicitly specified, to be correct html they should have been present
  59  * (head isn't necessary, but it is still generated). For tags that
  60  * are implied, the AttributeSet argument will have a value of
  61  * <code>Boolean.TRUE</code> for the key
  62  * <code>HTMLEditorKit.ParserCallback.IMPLIED</code>.
  63  * <p>HTML.Attributes defines a type safe enumeration of html attributes.
  64  * If an attribute key of a tag is defined in HTML.Attribute, the
  65  * HTML.Attribute will be used as the key, otherwise a String will be used.
  66  * For example &lt;p foo=bar class=neat&gt; has two attributes. foo is
  67  * not defined in HTML.Attribute, where as class is, therefore the
  68  * AttributeSet will have two values in it, HTML.Attribute.CLASS with
  69  * a String value of 'neat' and the String key 'foo' with a String value of
  70  * 'bar'.
  71  * <p>The position argument will indicate the start of the tag, comment
  72  * or text. Similar to arrays, the first character in the stream has a
  73  * position of 0. For tags that are
  74  * implied the position will indicate
  75  * the location of the next encountered tag. In the first example,
  76  * the implied start body and html tags will have the same position as the
  77  * p tag, and the implied end p, html and body tags will all have the same
  78  * position.
  79  * <p>As html skips whitespace the position for text will be the position
  80  * of the first valid character, eg in the string '\n\n\nblah'
  81  * the text 'blah' will have a position of 3, the newlines are skipped.
  82  * <p>
  83  * For attributes that do not have a value, eg in the html
  84  * string <code>&lt;foo blah&gt;</code> the attribute <code>blah</code>
  85  * does not have a value, there are two possible values that will be
  86  * placed in the AttributeSet's value:
  87  * <ul>
  88  * <li>If the DTD does not contain an definition for the element, or the
  89  *     definition does not have an explicit value then the value in the
  90  *     AttributeSet will be <code>HTML.NULL_ATTRIBUTE_VALUE</code>.
  91  * <li>If the DTD contains an explicit value, as in:
  92  *     <code>&lt;!ATTLIST OPTION selected (selected) #IMPLIED&gt;</code>
  93  *     this value from the dtd (in this case selected) will be used.
  94  * </ul>
  95  * <p>
  96  * Once the stream has been parsed, the callback is notified of the most
  97  * likely end of line string. The end of line string will be one of
  98  * \n, \r or \r\n, which ever is encountered the most in parsing the
  99  * stream.
 100  *
 101  * @author      Sunita Mani
 102  */
 103 public class DocumentParser extends javax.swing.text.html.parser.Parser {
 104 
 105     private int inbody;
 106     private int intitle;
 107     private int inhead;
 108     private int instyle;
 109     private int inscript;
 110     private boolean seentitle;
 111     private HTMLEditorKit.ParserCallback callback = null;
 112     private boolean ignoreCharSet = false;




  41  * invokes the appropriate methods in the ParserCallback class. This
  42  * is the default parser used by HTMLEditorKit to parse HTML url's.
  43  * <p>This will message the callback for all valid tags, as well as
  44  * tags that are implied but not explicitly specified. For example, the
  45  * html string (&lt;p&gt;blah) only has a p tag defined. The callback
  46  * will see the following methods:
  47  * <ol><li><i>handleStartTag(html, ...)</i></li>
  48  *     <li><i>handleStartTag(head, ...)</i></li>
  49  *     <li><i>handleEndTag(head)</i></li>
  50  *     <li><i>handleStartTag(body, ...)</i></li>
  51  *     <li><i>handleStartTag(p, ...)</i></li>
  52  *     <li><i>handleText(...)</i></li>
  53  *     <li><i>handleEndTag(p)</i></li>
  54  *     <li><i>handleEndTag(body)</i></li>
  55  *     <li><i>handleEndTag(html)</i></li>
  56  * </ol>
  57  * The items in <i>italic</i> are implied, that is, although they were not
  58  * explicitly specified, to be correct html they should have been present
  59  * (head isn't necessary, but it is still generated). For tags that
  60  * are implied, the AttributeSet argument will have a value of
  61  * {@code Boolean.TRUE} for the key
  62  * {@code HTMLEditorKit.ParserCallback.IMPLIED}.
  63  * <p>HTML.Attributes defines a type safe enumeration of html attributes.
  64  * If an attribute key of a tag is defined in HTML.Attribute, the
  65  * HTML.Attribute will be used as the key, otherwise a String will be used.
  66  * For example &lt;p foo=bar class=neat&gt; has two attributes. foo is
  67  * not defined in HTML.Attribute, where as class is, therefore the
  68  * AttributeSet will have two values in it, HTML.Attribute.CLASS with
  69  * a String value of 'neat' and the String key 'foo' with a String value of
  70  * 'bar'.
  71  * <p>The position argument will indicate the start of the tag, comment
  72  * or text. Similar to arrays, the first character in the stream has a
  73  * position of 0. For tags that are
  74  * implied the position will indicate
  75  * the location of the next encountered tag. In the first example,
  76  * the implied start body and html tags will have the same position as the
  77  * p tag, and the implied end p, html and body tags will all have the same
  78  * position.
  79  * <p>As html skips whitespace the position for text will be the position
  80  * of the first valid character, eg in the string '\n\n\nblah'
  81  * the text 'blah' will have a position of 3, the newlines are skipped.
  82  * <p>
  83  * For attributes that do not have a value, eg in the html
  84  * string {@code <foo blah>} the attribute {@code blah}
  85  * does not have a value, there are two possible values that will be
  86  * placed in the AttributeSet's value:
  87  * <ul>
  88  * <li>If the DTD does not contain an definition for the element, or the
  89  *     definition does not have an explicit value then the value in the
  90  *     AttributeSet will be {@code HTML.NULL_ATTRIBUTE_VALUE}.
  91  * <li>If the DTD contains an explicit value, as in:
  92  *     {@code < !ATTLIST OPTION selected (selected) #IMPLIED>}
  93  *     this value from the dtd (in this case selected) will be used.
  94  * </ul>
  95  * <p>
  96  * Once the stream has been parsed, the callback is notified of the most
  97  * likely end of line string. The end of line string will be one of
  98  * \n, \r or \r\n, which ever is encountered the most in parsing the
  99  * stream.
 100  *
 101  * @author      Sunita Mani
 102  */
 103 public class DocumentParser extends javax.swing.text.html.parser.Parser {
 104 
 105     private int inbody;
 106     private int intitle;
 107     private int inhead;
 108     private int instyle;
 109     private int inscript;
 110     private boolean seentitle;
 111     private HTMLEditorKit.ParserCallback callback = null;
 112     private boolean ignoreCharSet = false;


< prev index next >