org.apache.mahout.classifier.sgd
Class ModelDissector

java.lang.Object
  extended by org.apache.mahout.classifier.sgd.ModelDissector

public class ModelDissector
extends Object

Uses sample data to reverse engineer a feature-hashed model. The result gives approximate weights for features and interactions in the original space. The idea is that the hashed encoders have the option of having a trace dictionary. This tells us where each feature is hashed to, or each feature/value combination in the case of word-like values. Using this dictionary, we can put values into a synthetic feature vector in just the locations specified by a single feature or interaction. Then we can push this through a linear part of a model to see the contribution of that input. For any generalized linear model like logistic regression, there is a linear part of the model that allows this. What the ModelDissector does is to accept a trace dictionary and a model in an update method. It figures out the weights for the elements in the trace dictionary and stashes them. Then in a summary method, the biggest weights are returned. This update/flush style is used so that the trace dictionary doesn't have to grow to enormous levels, but instead can be cleared between updates.


Nested Class Summary
static class ModelDissector.Weight
           
 
Constructor Summary
ModelDissector()
           
 
Method Summary
 List<ModelDissector.Weight> summary(int n)
          Returns the n most important features with their weights, most important category and the top few categories that they affect.
 void update(Vector features, Map<String,Set<Integer>> traceDictionary, AbstractVectorClassifier learner)
          Probes a model to determine the effect of a particular variable.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ModelDissector

public ModelDissector()
Method Detail

update

public void update(Vector features,
                   Map<String,Set<Integer>> traceDictionary,
                   AbstractVectorClassifier learner)
Probes a model to determine the effect of a particular variable. This is done with the ade of a trace dictionary which has recorded the locations in the feature vector that are modified by various variable values. We can set these locations to 1 and then look at the resulting score. This tells us the weight the model places on that variable.

Parameters:
features - A feature vector to use (destructively)
traceDictionary - A trace dictionary containing variables and what locations in the feature vector are affected by them
learner - The model that we are probing to find weights on features

summary

public List<ModelDissector.Weight> summary(int n)
Returns the n most important features with their weights, most important category and the top few categories that they affect.

Parameters:
n - How many results to return.
Returns:
A list of the top variables.


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.