lempinen.neatseeker.html
Class HTMLIndexer

java.lang.Object
  |
  +--lempinen.neatseeker.core.AbstractIndexer
        |
        +--lempinen.neatseeker.html.HTMLIndexer
All Implemented Interfaces:
Indexer

public class HTMLIndexer
extends AbstractIndexer
implements Indexer

Implements an Indexer for HTML documents.

This class provides a simple HTML indexing mechanism, where the HTML tags are ignored and the body of the text is inserted in the index. It uses the HTMLParser class to obtain the document title and a fragment of sample text.

Version:
$Id: HTMLIndexer.java,v 1.9 2000/10/05 15:53:23 lempinen Exp $

Field Summary
protected  HTMLParser htmlparser
          The HTML parser object.
 
Fields inherited from class lempinen.neatseeker.core.AbstractIndexer
cache, collector, conf, lowerCase, repository, statistics, stoplist
 
Constructor Summary
HTMLIndexer()
           
 
Method Summary
 void init(Configuration c)
          Initialises the Indexer if an empty constructor was used.
 void process(java.io.InputStream in, java.lang.String uri)
          Indexes the data in the URI specified by uri.
 
Methods inherited from class lempinen.neatseeker.core.AbstractIndexer
add, createCache, getCache, getCollector, getConfiguration, getRepository, setCollector, setConfiguration, start
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 
Methods inherited from interface lempinen.neatseeker.core.Indexer
add, getCollector, getConfiguration, getRepository, setCollector, setConfiguration, start
 

Field Detail

htmlparser

protected HTMLParser htmlparser
The HTML parser object.
Constructor Detail

HTMLIndexer

public HTMLIndexer()
Method Detail

init

public void init(Configuration c)
          throws java.io.IOException
Description copied from interface: Indexer
Initialises the Indexer if an empty constructor was used.
Specified by:
init in interface Indexer
Overrides:
init in class AbstractIndexer

process

public void process(java.io.InputStream in,
                    java.lang.String uri)
             throws java.io.IOException
Indexes the data in the URI specified by uri.
Specified by:
process in interface Indexer