lempinen.neatseeker.ir
Class TfIdfWeighter

java.lang.Object
  |
  +--lempinen.neatseeker.core.AbstractWeighter
        |
        +--lempinen.neatseeker.ir.TfIdfWeighter
All Implemented Interfaces:
java.util.Comparator, Weighter

public class TfIdfWeighter
extends AbstractWeighter

Provides an implementation of simple TF*IDF based term weighting.

The formula used in this class is from Ari Pirkola's Linguistic Problems and Methods in Text Retrieval (p.19). Pirkola credits Allan et al (1997) in his text, but I have not verified the source.

Basically, the formula implements a TF*IDF (term frequency * inverted document frequency) method, prioritising shorter documents over longer ones.

I do not know if I even got formula right -- I am not an IR guy :) And yes, the simple copies made of variables already assigned in AbstractWeighter were simply to comply with the terminology in the source document. Sue me.

Version:
$Id: TfIdfWeighter.java,v 1.3 2000/09/28 18:55:08 lempinen Exp $
Author:
Sami Lempinen

Fields inherited from class lempinen.neatseeker.core.AbstractWeighter
documentCount, repository, resultCount, wordCount
 
Constructor Summary
TfIdfWeighter()
           
 
Method Summary
 float calculateWeight(Pointer p)
          Calculates the weight of a Pointer using a simple TF/IDF algorithm.
 java.lang.String toString()
          Returns a textual identification of this Weighter.
 
Methods inherited from class lempinen.neatseeker.core.AbstractWeighter
compare, init, setResultCount
 
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait
 
Methods inherited from interface java.util.Comparator
equals
 

Constructor Detail

TfIdfWeighter

public TfIdfWeighter()
Method Detail

calculateWeight

public float calculateWeight(Pointer p)
Calculates the weight of a Pointer using a simple TF/IDF algorithm.

toString

public java.lang.String toString()
Description copied from interface: Weighter
Returns a textual identification of this Weighter.
Overrides:
toString in class java.lang.Object