org.apache.mahout.cf.taste.hadoop.similarity.item
Class ItemSimilarityJob

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public final class ItemSimilarityJob
extends AbstractJob

Distributed precomputation of the item-item-similarities for Itembased Collaborative Filtering

Preferences in the input file should look like userID,itemID[,preferencevalue]

Preference value is optional to accommodate applications that have no notion of a preference value (that is, the user simply expresses a preference for an item, but no degree of preference).

The preference value is assumed to be parseable as a double. The user IDs and item IDs are parsed as longs.

Command line arguments specific to this class are:

  1. --input (path): Directory containing one or more text files with the preference data
  2. --output (path): output path where similarity data should be written
  3. --similarityClassname (classname): Name of distributed similarity measure class to instantiate or a predefined similarity from VectorSimilarityMeasure
  4. --maxSimilaritiesPerItem (integer): Maximum number of similarities considered per item (100)
  5. --maxPrefsPerUser (integer): max number of preferences to consider per user, users with more preferences will be sampled down (1000)
  6. --minPrefsPerUser (integer): ignore users with less preferences than this (1)
  7. --booleanData (boolean): Treat input data as having no pref values (false)
  8. --threshold (double): discard item pairs with a similarity value below this

General command line options are documented in AbstractJob.

Note that because of how Hadoop parses arguments, all "-D" arguments must appear before all other arguments.


Nested Class Summary
static class ItemSimilarityJob.MostSimilarItemPairsMapper
           
static class ItemSimilarityJob.MostSimilarItemPairsReducer
           
 
Field Summary
static String ITEM_ID_INDEX_PATH_STR
           
static String MAX_SIMILARITIES_PER_ITEM
           
 
Fields inherited from class org.apache.mahout.common.AbstractJob
argMap, inputFile, inputPath, outputFile, outputPath, tempPath
 
Constructor Summary
ItemSimilarityJob()
           
 
Method Summary
static void main(String[] args)
           
 int run(String[] args)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getConf, getDimensions, getFloat, getFloat, getGroup, getInputFile, getInputPath, getInt, getInt, getOption, getOption, getOption, getOptions, getOutputFile, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, prepareJob, setConf, setS3SafeCombinedInputPath, shouldRunNextPhase
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ITEM_ID_INDEX_PATH_STR

public static final String ITEM_ID_INDEX_PATH_STR

MAX_SIMILARITIES_PER_ITEM

public static final String MAX_SIMILARITIES_PER_ITEM
Constructor Detail

ItemSimilarityJob

public ItemSimilarityJob()
Method Detail

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Throws:
Exception


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.