org.apache.mahout.vectorizer.collocations.llr
Class CollocDriver
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.vectorizer.collocations.llr.CollocDriver
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public final class CollocDriver
- extends AbstractJob
Driver for LLR Collocation discovery mapreduce job
Method Summary |
static void |
generateAllGrams(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.conf.Configuration baseConf,
int maxNGramSize,
int minSupport,
float minLLRValue,
int reduceTasks)
Generate all ngrams for the DictionaryVectorizer job |
static void |
main(String[] args)
|
int |
run(String[] args)
|
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SUBGRAM_OUTPUT_DIRECTORY
public static final String SUBGRAM_OUTPUT_DIRECTORY
- See Also:
- Constant Field Values
NGRAM_OUTPUT_DIRECTORY
public static final String NGRAM_OUTPUT_DIRECTORY
- See Also:
- Constant Field Values
EMIT_UNIGRAMS
public static final String EMIT_UNIGRAMS
- See Also:
- Constant Field Values
DEFAULT_EMIT_UNIGRAMS
public static final boolean DEFAULT_EMIT_UNIGRAMS
- See Also:
- Constant Field Values
CollocDriver
public CollocDriver()
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Throws:
Exception
generateAllGrams
public static void generateAllGrams(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.conf.Configuration baseConf,
int maxNGramSize,
int minSupport,
float minLLRValue,
int reduceTasks)
throws IOException,
InterruptedException,
ClassNotFoundException
- Generate all ngrams for the
DictionaryVectorizer
job
- Parameters:
input
- input path containing tokenized documentsoutput
- output path where ngrams are generated including unigramsbaseConf
- job configurationmaxNGramSize
- minValue = 2.minSupport
- minimum support to prune ngrams including unigramsminLLRValue
- minimum threshold to prune ngramsreduceTasks
- number of reducers used
- Throws:
IOException
InterruptedException
ClassNotFoundException
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.