lempinen.neatseeker.html
Class HTMLParser

java.lang.Object
  |
  +--lempinen.neatseeker.html.HTMLParser

public class HTMLParser
extends java.lang.Object

A utility class that can be used for retrieving the title and a fragment of sample text from a HTML resource.

The class uses a modified version of the Java Tidy tool as the HTML parser, as it has the remarkable capability to make sense of bad quality HTML.

The parse() function returns a HTMLTarget whose title and sample text attributes have been initialised.


Field Summary
protected  java.lang.String sample
           
protected  org.w3c.tidy.Tidy tidy
           
protected  java.lang.String title
           
 
Constructor Summary
HTMLParser()
          Creates a new HTMLParser.
 
Method Summary
private  java.lang.String collectText(org.w3c.dom.Node node)
          Recursively collects the textual contents of a node.
static void main(java.lang.String[] args)
           
 HTMLTarget parse(java.io.InputStream in)
          Parses the specified input stream.
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

tidy

protected org.w3c.tidy.Tidy tidy

title

protected java.lang.String title

sample

protected java.lang.String sample
Constructor Detail

HTMLParser

public HTMLParser()
Creates a new HTMLParser.
Method Detail

parse

public HTMLTarget parse(java.io.InputStream in)
Parses the specified input stream.

collectText

private java.lang.String collectText(org.w3c.dom.Node node)
Recursively collects the textual contents of a node.

main

public static void main(java.lang.String[] args)