org.apache.mahout.math.hadoop.decomposer
Class EigenVerificationJob

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.math.hadoop.decomposer.EigenVerificationJob
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class EigenVerificationJob
extends AbstractJob

Class for taking the output of an eigendecomposition (specified as a Path location), and verifies correctness, in terms of the following: if you have a vector e, and a matrix m, then let e' = m.timesSquared(v); the error w.r.t. eigenvector-ness is the cosine of the angle between e and e':

   error(e,e') = e.dot(e') / (e.norm(2)*e'.norm(2))
 

A set of eigenvectors should also all be very close to orthogonal, so this job computes all inner products between eigenvectors, and checks that this is close to the identity matrix.

Parameters used in the cleanup (other than in the input/output path options) include --minEigenvalue, which specifies the value below which eigenvector/eigenvalue pairs will be discarded, and --maxError, which specifies the maximum error (as defined above) to be tolerated in an eigenvector.

If all the eigenvectors can fit in memory, --inMemory allows for a speedier completion of this task by doing so.


Field Summary
static String CLEAN_EIGENVECTORS
           
 
Fields inherited from class org.apache.mahout.common.AbstractJob
argMap, inputFile, inputPath, outputFile, outputPath, tempPath
 
Constructor Summary
EigenVerificationJob()
           
 
Method Summary
 org.apache.hadoop.fs.Path getCleanedEigensPath()
           
static void main(String[] args)
           
 int run(org.apache.hadoop.fs.Path corpusInput, org.apache.hadoop.fs.Path eigenInput, org.apache.hadoop.fs.Path output, org.apache.hadoop.fs.Path tempOut, double maxError, double minEigenValue, boolean inMemory, org.apache.hadoop.conf.Configuration conf)
          Run the job with the given arguments
 int run(String[] args)
           
 void runJob(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path eigenInput, org.apache.hadoop.fs.Path corpusInput, org.apache.hadoop.fs.Path output, boolean inMemory, double maxError, int maxEigens)
          Progammatic invocation of run()
 void setEigensToVerify(VectorIterable eigens)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CLEAN_EIGENVECTORS

public static final String CLEAN_EIGENVECTORS
See Also:
Constant Field Values
Constructor Detail

EigenVerificationJob

public EigenVerificationJob()
Method Detail

setEigensToVerify

public void setEigensToVerify(VectorIterable eigens)

run

public int run(String[] args)
        throws Exception
Throws:
Exception

run

public int run(org.apache.hadoop.fs.Path corpusInput,
               org.apache.hadoop.fs.Path eigenInput,
               org.apache.hadoop.fs.Path output,
               org.apache.hadoop.fs.Path tempOut,
               double maxError,
               double minEigenValue,
               boolean inMemory,
               org.apache.hadoop.conf.Configuration conf)
        throws IOException
Run the job with the given arguments

Parameters:
corpusInput - the corpus input Path
eigenInput - the eigenvector input Path
output - the output Path
tempOut - temporary output Path
maxError - a double representing the maximum error
minEigenValue - a double representing the minimum eigenvalue
inMemory - a boolean requesting in-memory preparation
conf - the Configuration to use, or null if a default is ok (saves referencing Configuration in calling classes unless needed)
Throws:
IOException

getCleanedEigensPath

public org.apache.hadoop.fs.Path getCleanedEigensPath()

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

runJob

public void runJob(org.apache.hadoop.conf.Configuration conf,
                   org.apache.hadoop.fs.Path eigenInput,
                   org.apache.hadoop.fs.Path corpusInput,
                   org.apache.hadoop.fs.Path output,
                   boolean inMemory,
                   double maxError,
                   int maxEigens)
            throws IOException
Progammatic invocation of run()

Parameters:
eigenInput - Output of LanczosSolver
corpusInput - Input of LanczosSolver
Throws:
IOException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.