org.apache.mahout.utils.vectors.lucene
Class LuceneIterator

java.lang.Object
  extended by com.google.common.collect.UnmodifiableIterator<T>
      extended by com.google.common.collect.AbstractIterator<Vector>
          extended by org.apache.mahout.utils.vectors.lucene.AbstractLuceneIterator
              extended by org.apache.mahout.utils.vectors.lucene.LuceneIterator
All Implemented Interfaces:
Iterator<Vector>

public class LuceneIterator
extends AbstractLuceneIterator

An Iterator over Vectors that uses a Lucene index as the source for creating the Vectors. The field used to create the vectors currently must have term vectors stored for it.


Field Summary
protected  String idField
           
protected  Set<String> idFieldSelector
           
 
Fields inherited from class org.apache.mahout.utils.vectors.lucene.AbstractLuceneIterator
bump, field, indexReader, maxErrorDocs, nextDocId, nextLogRecord, normPower, numErrorDocs, skippedErrorMessages, terminfo, weight
 
Constructor Summary
LuceneIterator(org.apache.lucene.index.IndexReader indexReader, String idField, String field, TermInfo termInfo, Weight weight, double normPower)
          Produce a LuceneIterable that can create the Vector plus normalize it.
LuceneIterator(org.apache.lucene.index.IndexReader indexReader, String idField, String field, TermInfo termInfo, Weight weight, double normPower, double maxPercentErrorDocs)
           
 
Method Summary
protected  String getVectorName(int documentIndex)
          Given the document name, derive a name for the vector.
 
Methods inherited from class org.apache.mahout.utils.vectors.lucene.AbstractLuceneIterator
computeNext
 
Methods inherited from class com.google.common.collect.AbstractIterator
endOfData, hasNext, next, peek
 
Methods inherited from class com.google.common.collect.UnmodifiableIterator
remove
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

idFieldSelector

protected final Set<String> idFieldSelector

idField

protected final String idField
Constructor Detail

LuceneIterator

public LuceneIterator(org.apache.lucene.index.IndexReader indexReader,
                      String idField,
                      String field,
                      TermInfo termInfo,
                      Weight weight,
                      double normPower)
Produce a LuceneIterable that can create the Vector plus normalize it.

Parameters:
indexReader - IndexReader to read the documents from.
idField - field containing the id. May be null.
field - field to use for the Vector
termInfo - termInfo
weight - weight
normPower - the normalization value. Must be non-negative, or LuceneIterable.NO_NORMALIZING

LuceneIterator

public LuceneIterator(org.apache.lucene.index.IndexReader indexReader,
                      String idField,
                      String field,
                      TermInfo termInfo,
                      Weight weight,
                      double normPower,
                      double maxPercentErrorDocs)
Parameters:
indexReader - IndexReader to read the documents from.
idField - field containing the id. May be null.
field - field to use for the Vector
termInfo - termInfo
weight - weight
normPower - the normalization value. Must be non-negative, or LuceneIterable.NO_NORMALIZING
maxPercentErrorDocs - most documents that will be tolerated without a term freq vector. In [0,1].
See Also:
LuceneIterator(org.apache.lucene.index.IndexReader, String, String, org.apache.mahout.utils.vectors.TermInfo, org.apache.mahout.vectorizer.Weight, double)
Method Detail

getVectorName

protected String getVectorName(int documentIndex)
                        throws IOException
Description copied from class: AbstractLuceneIterator
Given the document name, derive a name for the vector. This may involve reading the document from Lucene and setting up any other state that the subclass wants. This will be called once for each document that the iterator processes.

Specified by:
getVectorName in class AbstractLuceneIterator
Parameters:
documentIndex - the lucene document index.
Returns:
the name to store in the vector.
Throws:
IOException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.