NeatSeeker framework documentation: Class HTMLParser

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: INNER | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

lempinen.neatseeker.html
Class HTMLParser

java.lang.Object
  |
  +--lempinen.neatseeker.html.HTMLParser

public class HTMLParser
extends java.lang.Object

A utility class that can be used for retrieving the title and a fragment of sample text from a HTML resource.

The class uses a modified version of the Java Tidy tool as the HTML parser, as it has the remarkable capability to make sense of bad quality HTML.

The parse() function returns a HTMLTarget whose title and sample text attributes have been initialised.

Field Summary

protected java.lang.String sample


protected org.w3c.tidy.Tidy tidy


protected java.lang.String title


Constructor Summary

HTMLParser()
          Creates a new HTMLParser.

Method Summary

private java.lang.String collectText(org.w3c.dom.Node node)
          Recursively collects the textual contents of a node.

static void main(java.lang.String[] args)


HTMLTarget parse(java.io.InputStream in)
          Parses the specified input stream.

Methods inherited from class java.lang.Object

, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait

Field Detail

tidy

protected org.w3c.tidy.Tidy tidy

title

protected java.lang.String title

sample

protected java.lang.String sample

Constructor Detail

HTMLParser

public HTMLParser()

Creates a new HTMLParser.

Method Detail

parse

public HTMLTarget parse(java.io.InputStream in)

Parses the specified input stream.

collectText

private java.lang.String collectText(org.w3c.dom.Node node)

Recursively collects the textual contents of a node.

main

public static void main(java.lang.String[] args)