org.apache.mahout.clustering.canopy
Class CanopyDriver

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.clustering.canopy.CanopyDriver
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class CanopyDriver
extends AbstractJob


Field Summary
static String DEFAULT_CLUSTERED_POINTS_DIRECTORY
           
 
Fields inherited from class org.apache.mahout.common.AbstractJob
argMap, inputFile, inputPath, outputFile, outputPath, tempPath
 
Constructor Summary
CanopyDriver()
           
 
Method Summary
static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, double t3, double t4, int clusterFilter, boolean runSequential)
          Build a directory of Canopy clusters from the input vectors and other arguments.
static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, int clusterFilter, boolean runSequential)
          Convenience method for backwards compatibility
static void main(String[] args)
           
static void run(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, boolean runClustering, double clusterClassificationThreshold, boolean runSequential)
          Convenience method to provide backward compatibility
static void run(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, double t3, double t4, int clusterFilter, boolean runClustering, double clusterClassificationThreshold, boolean runSequential)
          Build a directory of Canopy clusters from the input arguments and, if requested, cluster the input vectors using these clusters
static void run(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, boolean runClustering, double clusterClassificationThreshold, boolean runSequential)
          Convenience method creates new Configuration() Build a directory of Canopy clusters from the input arguments and, if requested, cluster the input vectors using these clusters
 int run(String[] args)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_CLUSTERED_POINTS_DIRECTORY

public static final String DEFAULT_CLUSTERED_POINTS_DIRECTORY
See Also:
Constant Field Values
Constructor Detail

CanopyDriver

public CanopyDriver()
Method Detail

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Throws:
Exception

run

public static void run(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path output,
                       DistanceMeasure measure,
                       double t1,
                       double t2,
                       double t3,
                       double t4,
                       int clusterFilter,
                       boolean runClustering,
                       double clusterClassificationThreshold,
                       boolean runSequential)
                throws IOException,
                       InterruptedException,
                       ClassNotFoundException
Build a directory of Canopy clusters from the input arguments and, if requested, cluster the input vectors using these clusters

Parameters:
conf - the Configuration
input - the Path to the directory containing input vectors
output - the Path for all output directories
measure - the DistanceMeasure
t1 - the double T1 distance metric
t2 - the double T2 distance metric
t3 - the reducer's double T1 distance metric
t4 - the reducer's double T2 distance metric
clusterFilter - the minimum canopy size output by the mappers
runClustering - cluster the input vectors if true
clusterClassificationThreshold - vectors having pdf below this value will not be clustered. Its value should be between 0 and 1.
runSequential - execute sequentially if true
Throws:
IOException
InterruptedException
ClassNotFoundException

run

public static void run(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path output,
                       DistanceMeasure measure,
                       double t1,
                       double t2,
                       boolean runClustering,
                       double clusterClassificationThreshold,
                       boolean runSequential)
                throws IOException,
                       InterruptedException,
                       ClassNotFoundException
Convenience method to provide backward compatibility

Throws:
IOException
InterruptedException
ClassNotFoundException

run

public static void run(org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path output,
                       DistanceMeasure measure,
                       double t1,
                       double t2,
                       boolean runClustering,
                       double clusterClassificationThreshold,
                       boolean runSequential)
                throws IOException,
                       InterruptedException,
                       ClassNotFoundException
Convenience method creates new Configuration() Build a directory of Canopy clusters from the input arguments and, if requested, cluster the input vectors using these clusters

Parameters:
input - the Path to the directory containing input vectors
output - the Path for all output directories
t1 - the double T1 distance metric
t2 - the double T2 distance metric
runClustering - cluster the input vectors if true
clusterClassificationThreshold - vectors having pdf below this value will not be clustered. Its value should be between 0 and 1.
runSequential - execute sequentially if true
Throws:
IOException
InterruptedException
ClassNotFoundException

buildClusters

public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
                                                      org.apache.hadoop.fs.Path input,
                                                      org.apache.hadoop.fs.Path output,
                                                      DistanceMeasure measure,
                                                      double t1,
                                                      double t2,
                                                      int clusterFilter,
                                                      boolean runSequential)
                                               throws IOException,
                                                      InterruptedException,
                                                      ClassNotFoundException
Convenience method for backwards compatibility

Throws:
IOException
InterruptedException
ClassNotFoundException

buildClusters

public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
                                                      org.apache.hadoop.fs.Path input,
                                                      org.apache.hadoop.fs.Path output,
                                                      DistanceMeasure measure,
                                                      double t1,
                                                      double t2,
                                                      double t3,
                                                      double t4,
                                                      int clusterFilter,
                                                      boolean runSequential)
                                               throws IOException,
                                                      InterruptedException,
                                                      ClassNotFoundException
Build a directory of Canopy clusters from the input vectors and other arguments. Run sequential or mapreduce execution as requested

Parameters:
conf - the Configuration to use
input - the Path to the directory containing input vectors
output - the Path for all output directories
measure - the DistanceMeasure
t1 - the double T1 distance metric
t2 - the double T2 distance metric
t3 - the reducer's double T1 distance metric
t4 - the reducer's double T2 distance metric
clusterFilter - the int minimum size of canopies produced
runSequential - a boolean indicates to run the sequential (reference) algorithm
Returns:
the canopy output directory Path
Throws:
IOException
InterruptedException
ClassNotFoundException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.