|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
org.apache.hadoop.mapreduce.lib.input.TextInputFormat
org.apache.mahout.text.wikipedia.XmlInputFormat
public class XmlInputFormat
Reads records that are delimited by a specific begin/end tag.
Nested Class Summary | |
---|---|
static class |
XmlInputFormat.XmlRecordReader
XMLRecordReader class to read through a given xml document to output xml blocks as records as specified by the start tag and end tag |
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
---|
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter |
Field Summary | |
---|---|
static String |
END_TAG_KEY
|
static String |
START_TAG_KEY
|
Constructor Summary | |
---|---|
XmlInputFormat()
|
Method Summary | |
---|---|
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
|
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.TextInputFormat |
---|
isSplitable |
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
---|
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String START_TAG_KEY
public static final String END_TAG_KEY
Constructor Detail |
---|
public XmlInputFormat()
Method Detail |
---|
public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
createRecordReader
in class org.apache.hadoop.mapreduce.lib.input.TextInputFormat
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |