Package org.apache.mahout.text.wikipedia

Class Summary
WikipediaAnalyzer  
WikipediaDatasetCreatorDriver Create and run the Wikipedia Dataset Creator.
WikipediaDatasetCreatorMapper Maps over Wikipedia xml format and output all document having the category listed in the input category file
WikipediaDatasetCreatorReducer Can also be used as a local Combiner
WikipediaMapper Maps over Wikipedia xml format and output all document having the category listed in the input category file
WikipediaXmlSplitter The Bayes example package provides some helper classes for training the Naive Bayes classifier on the Twenty Newsgroups data.
XmlInputFormat Reads records that are delimited by a specific begin/end tag.
XmlInputFormat.XmlRecordReader XMLRecordReader class to read through a given xml document to output xml blocks as records as specified by the start tag and end tag
 



Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.