|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.classifier.AbstractVectorClassifier
org.apache.mahout.clustering.classify.ClusterClassifier
public class ClusterClassifier
This classifier works with any ClusteringPolicy and its associated Clusters. It is initialized with a policy and a list of compatible clusters and thereafter it can classify any new Vector into one or more of the clusters based upon the pdf() function which each cluster supports. In addition, it is an OnlineLearner and can be trained. Training amounts to asking the actual model to observe the vector and closing the classifier causes all the models to computeParameters. Because a ClusterClassifier implements Writable, it can be written-to and read-from a sequence file as a single entity. For sequential and mapreduce clustering in conjunction with a ClusterIterator; however, it utilizes an exploded file format. In this format, the iterator writes the policy to a single POLICY_FILE_NAME file in the clustersOut directory and the models are written to one or more part-n files so that multiple reducers may employed to produce them.
Field Summary |
---|
Fields inherited from class org.apache.mahout.classifier.AbstractVectorClassifier |
---|
MIN_LOG_LIKELIHOOD |
Constructor Summary | |
---|---|
|
ClusterClassifier()
|
protected |
ClusterClassifier(ClusteringPolicy policy)
|
|
ClusterClassifier(List<Cluster> models,
ClusteringPolicy policy)
The public constructor accepts a list of clusters to become the models |
Method Summary | |
---|---|
Vector |
classify(Vector instance)
Compute and return a vector containing n-1 scores, where
n is equal to numCategories() , given an input
vector instance . |
double |
classifyScalar(Vector instance)
Classifies a vector in the special case of a binary classifier where AbstractVectorClassifier.classify(Vector) would return a vector with only one element. |
void |
close()
Prepares the classifier for classification and deallocates any temporary data structures. |
List<Cluster> |
getModels()
|
ClusteringPolicy |
getPolicy()
|
int |
numCategories()
Returns the number of categories that a target variable can be assigned to. |
void |
readFields(DataInput in)
|
void |
readFromSeqFiles(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path path)
|
static ClusteringPolicy |
readPolicy(org.apache.hadoop.fs.Path path)
|
void |
train(int actual,
Vector instance)
Updates the model using a particular target variable value and a feature vector. |
void |
train(int actual,
Vector data,
double weight)
Train the models given an additional weight. |
void |
train(long trackingKey,
int actual,
Vector instance)
Updates the model using a particular target variable value and a feature vector. |
void |
train(long trackingKey,
String groupKey,
int actual,
Vector instance)
Updates the model using a particular target variable value and a feature vector. |
void |
write(DataOutput out)
|
static void |
writePolicy(ClusteringPolicy policy,
org.apache.hadoop.fs.Path path)
|
void |
writeToSeqFiles(org.apache.hadoop.fs.Path path)
|
Methods inherited from class org.apache.mahout.classifier.AbstractVectorClassifier |
---|
classify, classifyFull, classifyFull, classifyFull, classifyNoLink, classifyScalar, logLikelihood |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ClusterClassifier(List<Cluster> models, ClusteringPolicy policy)
models
- a Listpolicy
- a ClusteringPolicypublic ClusterClassifier()
protected ClusterClassifier(ClusteringPolicy policy)
Method Detail |
---|
public Vector classify(Vector instance)
AbstractVectorClassifier
n-1
scores, where
n
is equal to numCategories()
, given an input
vector instance
. Higher scores indicate that the input vector
is more likely to belong to that category. The categories are denoted by
the integers 0
through n-1
(inclusive), and the
scores in the returned vector correspond to categories 1 through
n-1
(leaving out category 0). It is assumed that the score for
category 0 is one minus the sum of the scores in the returned vector.
classify
in class AbstractVectorClassifier
instance
- A feature vector to be classified.
n-1
encoding.public double classifyScalar(Vector instance)
AbstractVectorClassifier
AbstractVectorClassifier.classify(Vector)
would return a vector with only one element. As
such, using this method can avoid the allocation of a vector.
classifyScalar
in class AbstractVectorClassifier
instance
- The feature vector to be classified.
AbstractVectorClassifier.classify(Vector)
public int numCategories()
AbstractVectorClassifier
0
to numCategories()-1
(inclusive).
numCategories
in class AbstractVectorClassifier
public void write(DataOutput out) throws IOException
write
in interface org.apache.hadoop.io.Writable
IOException
public void readFields(DataInput in) throws IOException
readFields
in interface org.apache.hadoop.io.Writable
IOException
public void train(int actual, Vector instance)
OnlineLearner
train
in interface OnlineLearner
actual
- The value of the target variable. This value should be in the half-open
interval [0..n) where n is the number of target categories.instance
- The feature vector for this example.public void train(int actual, Vector data, double weight)
actual
- the int index of a modeldata
- a data Vectorweight
- a double weighting factorpublic void train(long trackingKey, String groupKey, int actual, Vector instance)
OnlineLearner
train
in interface OnlineLearner
trackingKey
- The tracking key for this training example.groupKey
- An optional value that allows examples to be grouped in the computation of
the update to the model.actual
- The value of the target variable. This value should be in the half-open
interval [0..n) where n is the number of target categories.instance
- The feature vector for this example.public void train(long trackingKey, int actual, Vector instance)
OnlineLearner
train
in interface OnlineLearner
trackingKey
- The tracking key for this training example.actual
- The value of the target variable. This value should be in the half-open
interval [0..n) where n is the number of target categories.instance
- The feature vector for this example.public void close()
OnlineLearner
close
in interface Closeable
close
in interface OnlineLearner
public List<Cluster> getModels()
public ClusteringPolicy getPolicy()
public void writeToSeqFiles(org.apache.hadoop.fs.Path path) throws IOException
IOException
public void readFromSeqFiles(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path path) throws IOException
IOException
public static ClusteringPolicy readPolicy(org.apache.hadoop.fs.Path path) throws IOException
IOException
public static void writePolicy(ClusteringPolicy policy, org.apache.hadoop.fs.Path path) throws IOException
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |