org.apache.mahout.utils
Class SplitInputJob
java.lang.Object
org.apache.mahout.utils.SplitInputJob
public final class SplitInputJob
- extends Object
Method Summary |
static void |
run(org.apache.hadoop.conf.Configuration initialConf,
org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
int keepPct,
float randomSelectionPercent)
Run job to downsample, randomly permute and split data into test and
training sets. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
run
public static void run(org.apache.hadoop.conf.Configuration initialConf,
org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
int keepPct,
float randomSelectionPercent)
throws IOException,
ClassNotFoundException,
InterruptedException
- Run job to downsample, randomly permute and split data into test and
training sets. This job takes a SequenceFile as input and outputs two
SequenceFiles test-r-00000 and training-r-00000 which contain the test and
training sets respectively
- Parameters:
initialConf
- inputPath
- path to input data SequenceFileoutputPath
- path for output data SequenceFileskeepPct
- percentage of key value pairs in input to keep. The rest are
discardedrandomSelectionPercent
- percentage of key value pairs to allocate to test set. Remainder
are allocated to training set
- Throws:
IOException
ClassNotFoundException
InterruptedException
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.