org.apache.mahout.fpm.pfpgrowth
Class PFPGrowth

java.lang.Object
  extended by org.apache.mahout.fpm.pfpgrowth.PFPGrowth

public final class PFPGrowth
extends Object

Parallel FP Growth Driver Class. Runs each stage of PFPGrowth as described in the paper http://infolab.stanford.edu/~echang/recsys08-69.pdf


Field Summary
static String ENCODING
           
static String F_LIST
           
static String FILE_PATTERN
           
static String FP_GROWTH
           
static String FREQUENT_PATTERNS
           
static String INPUT
           
static String MAX_HEAP_SIZE
           
static String MAX_PER_GROUP
           
static String MIN_SUPPORT
           
static String NUM_GROUPS
           
static int NUM_GROUPS_DEFAULT
           
static String OUTPUT
           
static String PARALLEL_COUNTING
           
static String PFP_PARAMETERS
           
static String SPLIT_PATTERN
           
static Pattern SPLITTER
           
static String USE_FPG2
           
 
Method Summary
static int getGroup(int itemId, int maxPerGroup)
           
static IntArrayList getGroupMembers(int groupId, int maxPerGroup, int numFeatures)
           
static List<Pair<String,Long>> readFList(org.apache.hadoop.conf.Configuration conf)
          Generates the fList from the serialized string representation
static List<Pair<String,Long>> readFList(Parameters params)
          read the feature frequency List which is built at the end of the Parallel counting job
static List<Pair<String,TopKStringPatterns>> readFrequentPattern(Parameters params)
          Read the Frequent Patterns generated from Text
static void runPFPGrowth(Parameters params)
           
static void runPFPGrowth(Parameters params, org.apache.hadoop.conf.Configuration conf)
           
static void saveFList(Iterable<Pair<String,Long>> flist, Parameters params, org.apache.hadoop.conf.Configuration conf)
          Serializes the fList and returns the string representation of the List
static void startAggregating(Parameters params, org.apache.hadoop.conf.Configuration conf)
          Run the aggregation Job to aggregate the different TopK patterns and group each Pattern by the features present in it and thus calculate the final Top K frequent Patterns for each feature
static void startParallelCounting(Parameters params, org.apache.hadoop.conf.Configuration conf)
          Count the frequencies of various features in parallel using Map/Reduce
static void startParallelFPGrowth(Parameters params, org.apache.hadoop.conf.Configuration conf)
          Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ENCODING

public static final String ENCODING
See Also:
Constant Field Values

F_LIST

public static final String F_LIST
See Also:
Constant Field Values

NUM_GROUPS

public static final String NUM_GROUPS
See Also:
Constant Field Values

NUM_GROUPS_DEFAULT

public static final int NUM_GROUPS_DEFAULT
See Also:
Constant Field Values

MAX_PER_GROUP

public static final String MAX_PER_GROUP
See Also:
Constant Field Values

OUTPUT

public static final String OUTPUT
See Also:
Constant Field Values

MIN_SUPPORT

public static final String MIN_SUPPORT
See Also:
Constant Field Values

MAX_HEAP_SIZE

public static final String MAX_HEAP_SIZE
See Also:
Constant Field Values

INPUT

public static final String INPUT
See Also:
Constant Field Values

PFP_PARAMETERS

public static final String PFP_PARAMETERS
See Also:
Constant Field Values

FILE_PATTERN

public static final String FILE_PATTERN
See Also:
Constant Field Values

FP_GROWTH

public static final String FP_GROWTH
See Also:
Constant Field Values

FREQUENT_PATTERNS

public static final String FREQUENT_PATTERNS
See Also:
Constant Field Values

PARALLEL_COUNTING

public static final String PARALLEL_COUNTING
See Also:
Constant Field Values

SPLIT_PATTERN

public static final String SPLIT_PATTERN
See Also:
Constant Field Values

USE_FPG2

public static final String USE_FPG2
See Also:
Constant Field Values

SPLITTER

public static final Pattern SPLITTER
Method Detail

readFList

public static List<Pair<String,Long>> readFList(org.apache.hadoop.conf.Configuration conf)
                                         throws IOException
Generates the fList from the serialized string representation

Returns:
Deserialized Feature Frequency List
Throws:
IOException

saveFList

public static void saveFList(Iterable<Pair<String,Long>> flist,
                             Parameters params,
                             org.apache.hadoop.conf.Configuration conf)
                      throws IOException
Serializes the fList and returns the string representation of the List

Throws:
IOException

readFList

public static List<Pair<String,Long>> readFList(Parameters params)
read the feature frequency List which is built at the end of the Parallel counting job

Returns:
Feature Frequency List

getGroup

public static int getGroup(int itemId,
                           int maxPerGroup)

getGroupMembers

public static IntArrayList getGroupMembers(int groupId,
                                           int maxPerGroup,
                                           int numFeatures)

readFrequentPattern

public static List<Pair<String,TopKStringPatterns>> readFrequentPattern(Parameters params)
                                                                 throws IOException
Read the Frequent Patterns generated from Text

Returns:
List of TopK patterns for each string frequent feature
Throws:
IOException

runPFPGrowth

public static void runPFPGrowth(Parameters params,
                                org.apache.hadoop.conf.Configuration conf)
                         throws IOException,
                                InterruptedException,
                                ClassNotFoundException
Parameters:
params - params
conf - Configuration
Throws:
ClassNotFoundException
InterruptedException
IOException

runPFPGrowth

public static void runPFPGrowth(Parameters params)
                         throws IOException,
                                InterruptedException,
                                ClassNotFoundException
Parameters:
params - params should contain input and output locations as a string value, the additional parameters include minSupport(3), maxHeapSize(50), numGroups(1000)
Throws:
IOException
InterruptedException
ClassNotFoundException

startAggregating

public static void startAggregating(Parameters params,
                                    org.apache.hadoop.conf.Configuration conf)
                             throws IOException,
                                    InterruptedException,
                                    ClassNotFoundException
Run the aggregation Job to aggregate the different TopK patterns and group each Pattern by the features present in it and thus calculate the final Top K frequent Patterns for each feature

Throws:
IOException
InterruptedException
ClassNotFoundException

startParallelCounting

public static void startParallelCounting(Parameters params,
                                         org.apache.hadoop.conf.Configuration conf)
                                  throws IOException,
                                         InterruptedException,
                                         ClassNotFoundException
Count the frequencies of various features in parallel using Map/Reduce

Throws:
IOException
InterruptedException
ClassNotFoundException

startParallelFPGrowth

public static void startParallelFPGrowth(Parameters params,
                                         org.apache.hadoop.conf.Configuration conf)
                                  throws IOException,
                                         InterruptedException,
                                         ClassNotFoundException
Run the Parallel FPGrowth Map/Reduce Job to calculate the Top K features of group dependent shards

Throws:
IOException
InterruptedException
ClassNotFoundException


Copyright © 2008–2014 The Apache Software Foundation. All rights reserved.