org.apache.mahout.math.hadoop.stats
Class BasicStats

java.lang.Object
  extended by org.apache.mahout.math.hadoop.stats.BasicStats

public final class BasicStats
extends Object

Methods for calculating basic stats (mean, variance, stdDev, etc.) in map/reduce


Method Summary
static double stdDev(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, org.apache.hadoop.conf.Configuration baseConf)
          Calculate the standard deviation
static double stdDevForGivenMean(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, double mean, org.apache.hadoop.conf.Configuration baseConf)
          Calculate the standard deviation given a predefined mean
static double variance(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, org.apache.hadoop.conf.Configuration baseConf)
          Calculate the variance of values stored as
static double varianceForGivenMean(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, double mean, org.apache.hadoop.conf.Configuration baseConf)
          Calculate the variance by a predefined mean of values stored as
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

variance

public static double variance(org.apache.hadoop.fs.Path input,
                              org.apache.hadoop.fs.Path output,
                              org.apache.hadoop.conf.Configuration baseConf)
                       throws IOException,
                              InterruptedException,
                              ClassNotFoundException
Calculate the variance of values stored as

Parameters:
input - The input file containing the key and the count
output - The output to store the intermediate values
baseConf -
Returns:
The variance (based on sample estimation)
Throws:
IOException
InterruptedException
ClassNotFoundException

varianceForGivenMean

public static double varianceForGivenMean(org.apache.hadoop.fs.Path input,
                                          org.apache.hadoop.fs.Path output,
                                          double mean,
                                          org.apache.hadoop.conf.Configuration baseConf)
                                   throws IOException,
                                          InterruptedException,
                                          ClassNotFoundException
Calculate the variance by a predefined mean of values stored as

Parameters:
input - The input file containing the key and the count
output - The output to store the intermediate values
mean - The mean based on which to compute the variance
baseConf -
Returns:
The variance (based on sample estimation)
Throws:
IOException
InterruptedException
ClassNotFoundException

stdDev

public static double stdDev(org.apache.hadoop.fs.Path input,
                            org.apache.hadoop.fs.Path output,
                            org.apache.hadoop.conf.Configuration baseConf)
                     throws IOException,
                            InterruptedException,
                            ClassNotFoundException
Calculate the standard deviation

Parameters:
input - The input file containing the key and the count
output - The output file to write the counting results to
baseConf - The base configuration
Returns:
The standard deviation
Throws:
IOException
InterruptedException
ClassNotFoundException

stdDevForGivenMean

public static double stdDevForGivenMean(org.apache.hadoop.fs.Path input,
                                        org.apache.hadoop.fs.Path output,
                                        double mean,
                                        org.apache.hadoop.conf.Configuration baseConf)
                                 throws IOException,
                                        InterruptedException,
                                        ClassNotFoundException
Calculate the standard deviation given a predefined mean

Parameters:
input - The input file containing the key and the count
output - The output file to write the counting results to
mean - The mean based on which to compute the standard deviation
baseConf - The base configuration
Returns:
The standard deviation
Throws:
IOException
InterruptedException
ClassNotFoundException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.