lempinen.neatseeker.ir
Class TfIdfWeighter
java.lang.Object
|
+--lempinen.neatseeker.core.AbstractWeighter
|
+--lempinen.neatseeker.ir.TfIdfWeighter
- All Implemented Interfaces:
- java.util.Comparator, Weighter
- public class TfIdfWeighter
- extends AbstractWeighter
Provides an implementation of simple TF*IDF based term weighting.
The formula used in this class is from Ari Pirkola's
Linguistic Problems and Methods in Text Retrieval (p.19).
Pirkola credits Allan et al (1997) in his text, but I have not
verified the source.
Basically, the formula implements a TF*IDF (term frequency * inverted
document frequency) method, prioritising shorter documents over
longer ones.
I do not know if I even got formula right -- I am not an IR guy :)
And yes, the simple copies made of variables already assigned in
AbstractWeighter were simply to comply with the terminology in the
source document. Sue me.
- Version:
- $Id: TfIdfWeighter.java,v 1.3 2000/09/28 18:55:08 lempinen Exp $
- Author:
- Sami Lempinen
Method Summary |
float |
calculateWeight(Pointer p)
Calculates the weight of a Pointer using a simple TF/IDF algorithm. |
java.lang.String |
toString()
Returns a textual identification of this Weighter. |
Methods inherited from class java.lang.Object |
, clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, wait, wait, wait |
Methods inherited from interface java.util.Comparator |
equals |
TfIdfWeighter
public TfIdfWeighter()
calculateWeight
public float calculateWeight(Pointer p)
- Calculates the weight of a Pointer using a simple TF/IDF algorithm.
toString
public java.lang.String toString()
- Description copied from interface:
Weighter
- Returns a textual identification of this Weighter.
- Overrides:
toString
in class java.lang.Object