org.apache.mahout.utils.vectors.lucene
Class ClusterLabels

java.lang.Object
  extended by org.apache.mahout.utils.vectors.lucene.ClusterLabels

public class ClusterLabels
extends Object

Get labels for the cluster using Log Likelihood Ratio (LLR).

"The most useful way to think of this (LLR) is as the percentage of in-cluster documents that have the feature (term) versus the percentage out, keeping in mind that both percentages are uncertain since we have only a sample of all possible documents." - Ted Dunning

More about LLR can be found at : http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html


Field Summary
static int DEFAULT_MAX_LABELS
           
static int DEFAULT_MIN_IDS
           
 
Constructor Summary
ClusterLabels(org.apache.hadoop.fs.Path seqFileDir, org.apache.hadoop.fs.Path pointsDir, String indexDir, String contentField, int minNumIds, int maxLabels)
           
 
Method Summary
protected  List<org.apache.mahout.utils.vectors.lucene.TermInfoClusterInOut> getClusterLabels(Integer integer, Collection<WeightedPropertyVectorWritable> wpvws)
          Get the list of labels, sorted by best score.
 String getIdField()
           
 void getLabels()
           
 String getOutput()
           
static void main(String[] args)
           
 void setIdField(String idField)
           
 void setOutput(String output)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MIN_IDS

public static final int DEFAULT_MIN_IDS
See Also:
Constant Field Values

DEFAULT_MAX_LABELS

public static final int DEFAULT_MAX_LABELS
See Also:
Constant Field Values
Constructor Detail

ClusterLabels

public ClusterLabels(org.apache.hadoop.fs.Path seqFileDir,
                     org.apache.hadoop.fs.Path pointsDir,
                     String indexDir,
                     String contentField,
                     int minNumIds,
                     int maxLabels)
Method Detail

getLabels

public void getLabels()
               throws IOException
Throws:
IOException

getClusterLabels

protected List<org.apache.mahout.utils.vectors.lucene.TermInfoClusterInOut> getClusterLabels(Integer integer,
                                                                                             Collection<WeightedPropertyVectorWritable> wpvws)
                                                                                      throws IOException
Get the list of labels, sorted by best score.

Throws:
IOException

getIdField

public String getIdField()

setIdField

public void setIdField(String idField)

getOutput

public String getOutput()

setOutput

public void setOutput(String output)

main

public static void main(String[] args)


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.