org.apache.mahout.text
Class MailArchivesClusteringAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.util.StopwordAnalyzerBase
          extended by org.apache.mahout.text.MailArchivesClusteringAnalyzer
All Implemented Interfaces:
Closeable

public final class MailArchivesClusteringAnalyzer
extends org.apache.lucene.analysis.util.StopwordAnalyzerBase

Custom Lucene Analyzer designed for aggressive feature reduction for clustering the ASF Mail Archives using an extended set of stop words, excluding non-alpha-numeric tokens, and porter stemming.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.Analyzer.GlobalReuseStrategy, org.apache.lucene.analysis.Analyzer.PerFieldReuseStrategy, org.apache.lucene.analysis.Analyzer.ReuseStrategy, org.apache.lucene.analysis.Analyzer.TokenStreamComponents
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
matchVersion, stopwords
 
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
 
Constructor Summary
MailArchivesClusteringAnalyzer()
           
MailArchivesClusteringAnalyzer(org.apache.lucene.analysis.util.CharArraySet stopSet)
           
 
Method Summary
protected  org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String fieldName, Reader reader)
           
 
Methods inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, tokenStream, tokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MailArchivesClusteringAnalyzer

public MailArchivesClusteringAnalyzer()

MailArchivesClusteringAnalyzer

public MailArchivesClusteringAnalyzer(org.apache.lucene.analysis.util.CharArraySet stopSet)
Method Detail

createComponents

protected org.apache.lucene.analysis.Analyzer.TokenStreamComponents createComponents(String fieldName,
                                                                                     Reader reader)
Specified by:
createComponents in class org.apache.lucene.analysis.Analyzer


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.