|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.text.wikipedia.WikipediaXmlSplitter
public final class WikipediaXmlSplitter
The Bayes example package provides some helper classes for training the Naive Bayes classifier
on the Twenty Newsgroups data. See PrepareTwentyNewsgroups
for details on running the trainer and
formatting the Twenty Newsgroups data properly for the training.
The easiest way to prepare the data is to use the ant task in core/build.xml:
ant extract-20news-18828
This runs the arg line:
-p $\{working.dir\}/20news-18828/ -o $\{working.dir\}/20news-18828-collapse -a $\{analyzer\} -c UTF-8
To Run the Wikipedia examples (assumes you've built the Mahout Job jar):
ant enwiki-files
bin/hadoop jar $MAHOUT_HOME/target/mahout-examples-0.x
org.apache.mahout.classifier.bayes.WikipediaXmlSplitter
-d $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles.xml
-o $MAHOUT_HOME/examples/work/wikipedia/chunks/ -c 64
Method Summary | |
---|---|
static void |
main(String[] args)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static void main(String[] args) throws IOException
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |