|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--lempinen.neatseeker.core.AbstractIndexer | +--lempinen.neatseeker.html.HTMLIndexer
Implements an Indexer for HTML documents.
This class provides a simple HTML indexing mechanism, where the HTML tags are ignored and the body of the text is inserted in the index. It uses the HTMLParser class to obtain the document title and a fragment of sample text.
Field Summary | |
protected HTMLParser |
htmlparser
The HTML parser object. |
Fields inherited from class lempinen.neatseeker.core.AbstractIndexer |
cache, collector, conf, lowerCase, repository, statistics, stoplist |
Constructor Summary | |
HTMLIndexer()
|
Method Summary | |
void |
init(Configuration c)
Initialises the Indexer if an empty constructor was used. |
void |
process(java.io.InputStream in,
java.lang.String uri)
Indexes the data in the URI specified by uri . |
Methods inherited from class lempinen.neatseeker.core.AbstractIndexer |
add, createCache, getCache, getCollector, getConfiguration, getRepository, setCollector, setConfiguration, start |
Methods inherited from class java.lang.Object |
|
Methods inherited from interface lempinen.neatseeker.core.Indexer |
add, getCollector, getConfiguration, getRepository, setCollector, setConfiguration, start |
Field Detail |
protected HTMLParser htmlparser
Constructor Detail |
public HTMLIndexer()
Method Detail |
public void init(Configuration c) throws java.io.IOException
Indexer
init
in interface Indexer
init
in class AbstractIndexer
public void process(java.io.InputStream in, java.lang.String uri) throws java.io.IOException
uri
.process
in interface Indexer
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |