Class TfIdfWeighter

All Implemented Interfaces:
java.util.Comparator, Weighter

public class TfIdfWeighter
extends AbstractWeighter

Provides an implementation of simple TF*IDF based term weighting.

The formula used in this class is from Ari Pirkola's Linguistic Problems and Methods in Text Retrieval (p.19). Pirkola credits Allan et al (1997) in his text, but I have not verified the source.

Basically, the formula implements a TF*IDF (term frequency * inverted document frequency) method, prioritising shorter documents over longer ones.

I do not know if I even got formula right -- I am not an IR guy :) And yes, the simple copies made of variables already assigned in AbstractWeighter were simply to comply with the terminology in the source document. Sue me.

Sami Lempinen

Constructor Summary
Method Summary
 float calculateWeight(Pointer p)
          Calculates the weight of a Pointer using a simple TF/IDF algorithm.
 java.lang.String toString()
          Returns a textual identification of this Weighter.
Constructor Detail


public TfIdfWeighter()
Method Detail


public float calculateWeight(Pointer p)
Calculates the weight of a Pointer using a simple TF/IDF algorithm.


public java.lang.String toString()
Description copied from interface: Weighter
Returns a textual identification of this Weighter.
