org.apache.mahout.text.wikipedia
Class WikipediaDatasetCreatorDriver

java.lang.Object
  extended by org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver

public final class WikipediaDatasetCreatorDriver
extends Object

Create and run the Wikipedia Dataset Creator.


Method Summary
static void main(String[] args)
          Takes in two arguments: The input Path where the input documents live The output Path where to write the classifier as a SequenceFile
static void runJob(String input, String output, String catFile, boolean exactMatchOnly, Class<? extends org.apache.lucene.analysis.Analyzer> analyzerClass)
          Run the job
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(String[] args)
                 throws IOException,
                        InterruptedException
Takes in two arguments:
  1. The input Path where the input documents live
  2. The output Path where to write the classifier as a SequenceFile

Throws:
IOException
InterruptedException

runJob

public static void runJob(String input,
                          String output,
                          String catFile,
                          boolean exactMatchOnly,
                          Class<? extends org.apache.lucene.analysis.Analyzer> analyzerClass)
                   throws IOException,
                          InterruptedException,
                          ClassNotFoundException
Run the job

Parameters:
input - the input pathname String
output - the output pathname String
catFile - the file containing the Wikipedia categories
exactMatchOnly - if true, then the Wikipedia category must match exactly instead of simply containing the category string
Throws:
IOException
InterruptedException
ClassNotFoundException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.