org.apache.mahout.vectorizer.encoders
Class WordValueEncoder

java.lang.Object
  extended by org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
      extended by org.apache.mahout.vectorizer.encoders.WordValueEncoder
Direct Known Subclasses:
AdaptiveWordValueEncoder, StaticWordValueEncoder

public abstract class WordValueEncoder
extends FeatureVectorEncoder

Encodes words as sparse vector updates to a Vector. Weighting is defined by a sub-class.


Field Summary
 
Fields inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
CONTINUOUS_VALUE_HASH_SEED, WORD_LIKE_VALUE_HASH_SEED
 
Constructor Summary
protected WordValueEncoder(String name)
           
 
Method Summary
 void addToVector(byte[] originalForm, double w, Vector data)
          Adds a value to a vector.
 String asString(String originalForm)
          Converts a value into a form that would help a human understand the internals of how the value is being interpreted.
protected  double getWeight(byte[] originalForm, double w)
           
protected  int hashForProbe(byte[] originalForm, int dataSize, String name, int probe)
          Provides the unique hash for a particular probe.
protected abstract  double weight(byte[] originalForm)
           
 
Methods inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
addToVector, addToVector, addToVector, bytesForString, getName, getProbes, hash, hash, hash, hash, hash, hashesForProbe, isTraceEnabled, setProbes, setTraceDictionary, trace, trace
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordValueEncoder

protected WordValueEncoder(String name)
Method Detail

addToVector

public void addToVector(byte[] originalForm,
                        double w,
                        Vector data)
Adds a value to a vector.

Specified by:
addToVector in class FeatureVectorEncoder
Parameters:
originalForm - The original form of the value as a string.
data - The vector to which the value should be added.

getWeight

protected double getWeight(byte[] originalForm,
                           double w)
Overrides:
getWeight in class FeatureVectorEncoder

hashForProbe

protected int hashForProbe(byte[] originalForm,
                           int dataSize,
                           String name,
                           int probe)
Description copied from class: FeatureVectorEncoder
Provides the unique hash for a particular probe. For all encoders except text, this is all that is needed and the default implementation of hashesForProbe will do the right thing. For text and similar values, hashesForProbe should be over-ridden and this method should not be used.

Specified by:
hashForProbe in class FeatureVectorEncoder
Parameters:
originalForm - The original byte array value
dataSize - The length of the vector being encoded
name - The name of the variable being encoded
probe - The probe number
Returns:
The hash of the current probe

asString

public String asString(String originalForm)
Converts a value into a form that would help a human understand the internals of how the value is being interpreted. For text-like things, this is likely to be a list of the terms found with associated weights (if any).

Specified by:
asString in class FeatureVectorEncoder
Parameters:
originalForm - The original form of the value as a string.
Returns:
A string that a human can read.

weight

protected abstract double weight(byte[] originalForm)


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.