org.apache.mahout.clustering.fuzzykmeans
Class FuzzyKMeansDriver

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansDriver
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class FuzzyKMeansDriver
extends AbstractJob


Field Summary
static String M_OPTION
           
 
Fields inherited from class org.apache.mahout.common.AbstractJob
argMap, inputFile, inputPath, outputFile, outputPath, tempPath
 
Constructor Summary
FuzzyKMeansDriver()
           
 
Method Summary
static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, double convergenceDelta, int maxIterations, float m, boolean runSequential)
          Iterate over the input vectors to produce cluster directories for each iteration
static void clusterData(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, double convergenceDelta, float m, boolean emitMostLikely, double threshold, boolean runSequential)
          Run the job using supplied arguments
static void main(String[] args)
           
static void run(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, double convergenceDelta, int maxIterations, float m, boolean runClustering, boolean emitMostLikely, double threshold, boolean runSequential)
          Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.
static void run(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, double convergenceDelta, int maxIterations, float m, boolean runClustering, boolean emitMostLikely, double threshold, boolean runSequential)
          Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.
 int run(String[] args)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

M_OPTION

public static final String M_OPTION
See Also:
Constant Field Values
Constructor Detail

FuzzyKMeansDriver

public FuzzyKMeansDriver()
Method Detail

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Throws:
Exception

run

public static void run(org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path clustersIn,
                       org.apache.hadoop.fs.Path output,
                       double convergenceDelta,
                       int maxIterations,
                       float m,
                       boolean runClustering,
                       boolean emitMostLikely,
                       double threshold,
                       boolean runSequential)
                throws IOException,
                       ClassNotFoundException,
                       InterruptedException
Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.

Parameters:
input - the directory pathname for input points
clustersIn - the directory pathname for initial & computed clusters
output - the directory pathname for output points
convergenceDelta - the convergence delta value
maxIterations - the maximum number of iterations
m - the fuzzification factor, see http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
runClustering - true if points are to be clustered after iterations complete
emitMostLikely - a boolean if true emit only most likely cluster for each point
threshold - a double threshold value emits all clusters having greater pdf (emitMostLikely = false)
runSequential - if true run in sequential execution mode
Throws:
IOException
ClassNotFoundException
InterruptedException

run

public static void run(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path clustersIn,
                       org.apache.hadoop.fs.Path output,
                       double convergenceDelta,
                       int maxIterations,
                       float m,
                       boolean runClustering,
                       boolean emitMostLikely,
                       double threshold,
                       boolean runSequential)
                throws IOException,
                       ClassNotFoundException,
                       InterruptedException
Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.

Parameters:
input - the directory pathname for input points
clustersIn - the directory pathname for initial & computed clusters
output - the directory pathname for output points
convergenceDelta - the convergence delta value
maxIterations - the maximum number of iterations
m - the fuzzification factor, see http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
runClustering - true if points are to be clustered after iterations complete
emitMostLikely - a boolean if true emit only most likely cluster for each point
threshold - a double threshold value emits all clusters having greater pdf (emitMostLikely = false)
runSequential - if true run in sequential execution mode
Throws:
IOException
ClassNotFoundException
InterruptedException

buildClusters

public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
                                                      org.apache.hadoop.fs.Path input,
                                                      org.apache.hadoop.fs.Path clustersIn,
                                                      org.apache.hadoop.fs.Path output,
                                                      double convergenceDelta,
                                                      int maxIterations,
                                                      float m,
                                                      boolean runSequential)
                                               throws IOException,
                                                      InterruptedException,
                                                      ClassNotFoundException
Iterate over the input vectors to produce cluster directories for each iteration

Parameters:
input - the directory pathname for input points
clustersIn - the file pathname for initial cluster centers
output - the directory pathname for output points
convergenceDelta - the convergence delta value
maxIterations - the maximum number of iterations
m - the fuzzification factor, see http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
runSequential - if true run in sequential execution mode
Returns:
the Path of the final clusters directory
Throws:
IOException
InterruptedException
ClassNotFoundException

clusterData

public static void clusterData(org.apache.hadoop.conf.Configuration conf,
                               org.apache.hadoop.fs.Path input,
                               org.apache.hadoop.fs.Path clustersIn,
                               org.apache.hadoop.fs.Path output,
                               double convergenceDelta,
                               float m,
                               boolean emitMostLikely,
                               double threshold,
                               boolean runSequential)
                        throws IOException,
                               ClassNotFoundException,
                               InterruptedException
Run the job using supplied arguments

Parameters:
input - the directory pathname for input points
clustersIn - the directory pathname for input clusters
output - the directory pathname for output points
convergenceDelta - the convergence delta value
emitMostLikely - a boolean if true emit only most likely cluster for each point
threshold - a double threshold value emits all clusters having greater pdf (emitMostLikely = false)
runSequential - if true run in sequential execution mode
Throws:
IOException
ClassNotFoundException
InterruptedException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.