org.apache.mahout.vectorizer.encoders
Class AdaptiveWordValueEncoder

java.lang.Object
  extended by org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
      extended by org.apache.mahout.vectorizer.encoders.WordValueEncoder
          extended by org.apache.mahout.vectorizer.encoders.AdaptiveWordValueEncoder

public class AdaptiveWordValueEncoder
extends WordValueEncoder

Encodes words into vectors much as does WordValueEncoder while maintaining an adaptive dictionary of values seen so far. This allows weighting of terms without a pre-scan of all of the data.


Field Summary
 
Fields inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
CONTINUOUS_VALUE_HASH_SEED, WORD_LIKE_VALUE_HASH_SEED
 
Constructor Summary
AdaptiveWordValueEncoder(String name)
           
 
Method Summary
 void addToVector(String originalForm, double weight, Vector data)
          Adds a value to a vector.
 com.google.common.collect.Multiset<String> getDictionary()
           
protected  double getWeight(byte[] originalForm, double w)
           
protected  double weight(byte[] originalForm)
           
 
Methods inherited from class org.apache.mahout.vectorizer.encoders.WordValueEncoder
addToVector, asString, hashForProbe
 
Methods inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
addToVector, addToVector, bytesForString, getName, getProbes, hash, hash, hash, hash, hash, hashesForProbe, isTraceEnabled, setProbes, setTraceDictionary, trace, trace
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AdaptiveWordValueEncoder

public AdaptiveWordValueEncoder(String name)
Method Detail

addToVector

public void addToVector(String originalForm,
                        double weight,
                        Vector data)
Adds a value to a vector.

Overrides:
addToVector in class FeatureVectorEncoder
Parameters:
originalForm - The original form of the value as a string.
data - The vector to which the value should be added.
weight - The weight to be applied to this feature.

getWeight

protected double getWeight(byte[] originalForm,
                           double w)
Overrides:
getWeight in class WordValueEncoder

weight

protected double weight(byte[] originalForm)
Specified by:
weight in class WordValueEncoder

getDictionary

public com.google.common.collect.Multiset<String> getDictionary()


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.