Package org.apache.mahout.clustering.lda.cvb

Class Summary
CachingCVB0Mapper Run ensemble learning via loading the ModelTrainer with two TopicModel instances: one from the previous iteration, the other empty.
CachingCVB0PerplexityMapper  
CVB0DocInferenceMapper  
CVB0Driver See CachingCVB0Mapper for more details on scalability and room for improvement.
CVB0Driver.DualDoubleSumReducer Sums keys and values independently.
CVB0TopicTermVectorNormalizerMapper Performs L1 normalization of input vectors.
InMemoryCollapsedVariationalBayes0 Runs the same algorithm as CVB0Driver, but sequentially, in memory.
ModelTrainer Multithreaded LDA model trainer class, which primarily operates by running a "map/reduce" operation, all in memory locally (ie not a hadoop job!) : the "map" operation is to take the "read-only" TopicModel and use it to iteratively learn the p(topic|term, doc) distribution for documents (this can be done in parallel across many documents, as the "read-only" model is, well, read-only.
TopicModel Thin wrapper around a Matrix of counts of occurrences of (topic, term) pairs.
 

Enum Summary
CachingCVB0PerplexityMapper.Counters Hadoop counters for CachingCVB0PerplexityMapper, to aid in debugging.
 



Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.