org.apache.mahout.classifier.sgd
Class ModelDissector
java.lang.Object
org.apache.mahout.classifier.sgd.ModelDissector
public class ModelDissector
- extends Object
Uses sample data to reverse engineer a feature-hashed model.
The result gives approximate weights for features and interactions
in the original space.
The idea is that the hashed encoders have the option of having a trace dictionary. This
tells us where each feature is hashed to, or each feature/value combination in the case
of word-like values. Using this dictionary, we can put values into a synthetic feature
vector in just the locations specified by a single feature or interaction. Then we can
push this through a linear part of a model to see the contribution of that input. For
any generalized linear model like logistic regression, there is a linear part of the
model that allows this.
What the ModelDissector does is to accept a trace dictionary and a model in an update
method. It figures out the weights for the elements in the trace dictionary and stashes
them. Then in a summary method, the biggest weights are returned. This update/flush
style is used so that the trace dictionary doesn't have to grow to enormous levels,
but instead can be cleared between updates.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ModelDissector
public ModelDissector()
update
public void update(Vector features,
Map<String,Set<Integer>> traceDictionary,
AbstractVectorClassifier learner)
- Probes a model to determine the effect of a particular variable. This is done
with the ade of a trace dictionary which has recorded the locations in the feature
vector that are modified by various variable values. We can set these locations to
1 and then look at the resulting score. This tells us the weight the model places
on that variable.
- Parameters:
features
- A feature vector to use (destructively)traceDictionary
- A trace dictionary containing variables and what locations
in the feature vector are affected by themlearner
- The model that we are probing to find weights on features
summary
public List<ModelDissector.Weight> summary(int n)
- Returns the n most important features with their
weights, most important category and the top few
categories that they affect.
- Parameters:
n
- How many results to return.
- Returns:
- A list of the top variables.
Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.