org.apache.mahout.clustering.iterator
Class ClusterIterator

java.lang.Object
  extended by org.apache.mahout.clustering.iterator.ClusterIterator

public final class ClusterIterator
extends Object

This is a clustering iterator which works with a set of Vector data and a prior ClusterClassifier which has been initialized with a set of models. Its implementation is algorithm-neutral and works for any iterative clustering algorithm (currently k-means, fuzzy-k-means and Dirichlet) that processes all the input vectors in each iteration. The cluster classifier is configured with a ClusteringPolicy to select the desired clustering algorithm.


Field Summary
static String PRIOR_PATH_KEY
           
 
Method Summary
static ClusterClassifier iterate(Iterable<Vector> data, ClusterClassifier classifier, int numIterations)
          Iterate over data using a prior-trained ClusterClassifier, for a number of iterations
static void iterateMR(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path inPath, org.apache.hadoop.fs.Path priorPath, org.apache.hadoop.fs.Path outPath, int numIterations)
          Iterate over data using a prior-trained ClusterClassifier, for a number of iterations using a mapreduce implementation
static void iterateSeq(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path inPath, org.apache.hadoop.fs.Path priorPath, org.apache.hadoop.fs.Path outPath, int numIterations)
          Iterate over data using a prior-trained ClusterClassifier, for a number of iterations using a sequential implementation
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PRIOR_PATH_KEY

public static final String PRIOR_PATH_KEY
See Also:
Constant Field Values
Method Detail

iterate

public static ClusterClassifier iterate(Iterable<Vector> data,
                                        ClusterClassifier classifier,
                                        int numIterations)
Iterate over data using a prior-trained ClusterClassifier, for a number of iterations

Parameters:
data - a List<Vector> of input vectors
classifier - a prior ClusterClassifier
numIterations - the int number of iterations to perform
Returns:
the posterior ClusterClassifier

iterateSeq

public static void iterateSeq(org.apache.hadoop.conf.Configuration conf,
                              org.apache.hadoop.fs.Path inPath,
                              org.apache.hadoop.fs.Path priorPath,
                              org.apache.hadoop.fs.Path outPath,
                              int numIterations)
                       throws IOException
Iterate over data using a prior-trained ClusterClassifier, for a number of iterations using a sequential implementation

Parameters:
conf - the Configuration
inPath - a Path to input VectorWritables
priorPath - a Path to the prior classifier
outPath - a Path of output directory
numIterations - the int number of iterations to perform
Throws:
IOException

iterateMR

public static void iterateMR(org.apache.hadoop.conf.Configuration conf,
                             org.apache.hadoop.fs.Path inPath,
                             org.apache.hadoop.fs.Path priorPath,
                             org.apache.hadoop.fs.Path outPath,
                             int numIterations)
                      throws IOException,
                             InterruptedException,
                             ClassNotFoundException
Iterate over data using a prior-trained ClusterClassifier, for a number of iterations using a mapreduce implementation

Parameters:
conf - the Configuration
inPath - a Path to input VectorWritables
priorPath - a Path to the prior classifier
outPath - a Path of output directory
numIterations - the int number of iterations to perform
Throws:
IOException
InterruptedException
ClassNotFoundException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.