Class TfIdfWeighter

All Implemented Interfaces:
java.util.Comparator, Weighter

public class TfIdfWeighter
extends AbstractWeighter

Provides an implementation of simple TF*IDF based term weighting.

The formula used in this class is from Ari Pirkola's Linguistic Problems and Methods in Text Retrieval (p.19). Pirkola credits Allan et al (1997) in his text, but I have not verified the source.

Basically, the formula implements a TF*IDF (term frequency * inverted document frequency) method, prioritising shorter documents over longer ones.

I do not know if I even got formula right -- I am not an IR guy :) And yes, the simple copies made of variables already assigned in AbstractWeighter were simply to comply with the terminology in the source document. Sue me.

$Id: TfIdfWeighter.java,v 1.3 2000/09/28 18:55:08 lempinen Exp $
Sami Lempinen

Fields inherited from class lempinen.neatseeker.core.AbstractWeighter
documentCount, repository, resultCount, wordCount
Constructor Summary
Method Summary
 float calculateWeight(Pointer p)
          Calculates the weight of a Pointer using a simple TF/IDF algorithm.
 java.lang.String toString()
          Returns a textual identification of this Weighter.
Methods inherited from class lempinen.neatseeker.core.AbstractWeighter
compare, init, setResultCount
Methods inherited from class java.lang.Object
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait
Methods inherited from interface java.util.Comparator

Constructor Detail


public TfIdfWeighter()
Method Detail


public float calculateWeight(Pointer p)
Calculates the weight of a Pointer using a simple TF/IDF algorithm.


public java.lang.String toString()
Description copied from interface: Weighter
Returns a textual identification of this Weighter.
toString in class java.lang.Object