lempinen.neatseeker.html
Class HTMLParser
java.lang.Object
|
+--lempinen.neatseeker.html.HTMLParser
- public class HTMLParser
- extends java.lang.Object
A utility class that can be used for retrieving the title and a fragment
of sample text from a HTML resource.
The class uses a modified version of the Java Tidy tool as the HTML
parser, as it has the remarkable capability to make sense of bad quality
HTML.
The parse()
function returns a HTMLTarget whose title
and sample text attributes have been initialised.
Field Summary |
protected java.lang.String |
sample
|
protected org.w3c.tidy.Tidy |
tidy
|
protected java.lang.String |
title
|
Constructor Summary |
HTMLParser()
Creates a new HTMLParser. |
Method Summary |
private java.lang.String |
collectText(org.w3c.dom.Node node)
Recursively collects the textual contents of a node. |
static void |
main(java.lang.String[] args)
|
HTMLTarget |
parse(java.io.InputStream in)
Parses the specified input stream. |
Methods inherited from class java.lang.Object |
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait |
tidy
protected org.w3c.tidy.Tidy tidy
title
protected java.lang.String title
sample
protected java.lang.String sample
HTMLParser
public HTMLParser()
- Creates a new HTMLParser.
parse
public HTMLTarget parse(java.io.InputStream in)
- Parses the specified input stream.
collectText
private java.lang.String collectText(org.w3c.dom.Node node)
- Recursively collects the textual contents of a node.
main
public static void main(java.lang.String[] args)