org.apache.mahout.utils.vectors.lucene
Class AbstractLuceneIterator

java.lang.Object
  extended by com.google.common.collect.UnmodifiableIterator<T>
      extended by com.google.common.collect.AbstractIterator<Vector>
          extended by org.apache.mahout.utils.vectors.lucene.AbstractLuceneIterator
All Implemented Interfaces:
Iterator<Vector>
Direct Known Subclasses:
LuceneIterator

public abstract class AbstractLuceneIterator
extends com.google.common.collect.AbstractIterator<Vector>

Iterate over a Lucene index, extracting term vectors. Subclasses define how much information to retrieve from the Lucene index.


Field Summary
protected  Bump125 bump
           
protected  String field
           
protected  org.apache.lucene.index.IndexReader indexReader
           
protected  int maxErrorDocs
           
protected  int nextDocId
           
protected  long nextLogRecord
           
protected  double normPower
           
protected  int numErrorDocs
           
protected  int skippedErrorMessages
           
protected  TermInfo terminfo
           
protected  Weight weight
           
 
Constructor Summary
AbstractLuceneIterator(TermInfo terminfo, double normPower, org.apache.lucene.index.IndexReader indexReader, Weight weight, double maxPercentErrorDocs, String field)
           
 
Method Summary
protected  Vector computeNext()
           
protected abstract  String getVectorName(int documentIndex)
          Given the document name, derive a name for the vector.
 
Methods inherited from class com.google.common.collect.AbstractIterator
endOfData, hasNext, next, peek
 
Methods inherited from class com.google.common.collect.UnmodifiableIterator
remove
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

indexReader

protected final org.apache.lucene.index.IndexReader indexReader

field

protected final String field

terminfo

protected final TermInfo terminfo

normPower

protected final double normPower

weight

protected final Weight weight

bump

protected final Bump125 bump

nextDocId

protected int nextDocId

maxErrorDocs

protected int maxErrorDocs

numErrorDocs

protected int numErrorDocs

nextLogRecord

protected long nextLogRecord

skippedErrorMessages

protected int skippedErrorMessages
Constructor Detail

AbstractLuceneIterator

public AbstractLuceneIterator(TermInfo terminfo,
                              double normPower,
                              org.apache.lucene.index.IndexReader indexReader,
                              Weight weight,
                              double maxPercentErrorDocs,
                              String field)
Method Detail

getVectorName

protected abstract String getVectorName(int documentIndex)
                                 throws IOException
Given the document name, derive a name for the vector. This may involve reading the document from Lucene and setting up any other state that the subclass wants. This will be called once for each document that the iterator processes.

Parameters:
documentIndex - the lucene document index.
Returns:
the name to store in the vector.
Throws:
IOException

computeNext

protected Vector computeNext()
Specified by:
computeNext in class com.google.common.collect.AbstractIterator<Vector>


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.