Example: Calculating Word Occurrences: (Key, Value-List) (Key, Value-List)

1. FFetches job resources locally. 2.
Enters the copy phase to fetch local copies of all the assigned map results from the map worker nodes. 3. When the copy phase completes, executes the sort phase to merge the copied results into a single sorted set of (key, value-list) pairs. 4. When the sort phase completes, executes the reduce phase, invoking the job-supplied reduce function on each (key, value-list) pair. 5. Saves the final results to the output destination, such as HDFS.
The input to a reduce function is key-value pairs where the value is a list of values sharing the same key. For example, if one map task produces a key-value pair ('eat', 2) and another map task produces the pair ('eat', 1), then these pairs are consolidated into ('eat', (2, 1)) for input to the reduce function. If the purpose of the reduce phase is to compute a sum of all the values for each key, then the final output key-value pair for this input is ('eat', 3). For a more complete example, see Example: Calculating Word Occurrences. Output from the reduce phase is saved to the destination configured for the job, such as HDFS or MarkLogic Server. Reduce tasks use an OutputFormat subclass to record results. The Hadoop API provides OutputFormat subclasses for using HDFS as the output destination. The MarkLogic Connector for Hadoop provides OutputFormat subclasses for using a MarkLogic Server database as the destination. For a list of available subclasses, see OutputFormat Subclasses. The connector also provides classes for defin
1. Fetches job resources locally. 2. Enters the copy phase to fetch local copies of all the assigned map results from the map worker nodes. 3. When the copy phase completes, executes the sort phase to merge the copied results into a single sorted set of (key, value-list) pairs. 4. When the sort phase completes, executes the reduce phase, invoking the job-supplied reduce function on each (key, value-list) pair. 5. Saves the final results to the output destination, such as HDFS.
The input to a reduce function is key-value pairs where the value is a list of values sharing the same key. For example, if one map task produces a key-value pair ('eat', 2) and another map task produces the pair ('eat', 1), then these pairs are consolidated into ('eat', (2, 1)) for input to the reduce function. If the purpose of the reduce phase is to compute a sum of all the values for each key, then the final output key-value pair for this input is ('eat', 3). For a more complete example, see Example: Calculating Word Occurrences. Output from the reduce phase is saved to the destination configured for the job, such as HDFS or MarkLogic Server. Reduce tasks use an OutputFormat subclass to record results. The Hadoop API provides OutputFormat subclasses for using HDFS as the output destination. The MarkLogic Connector for Hadoop provides OutputFormat subclasses for using a MarkLogic Server database as the destination. For a list of available subclasses, see OutputFormat Subclasses. The connector also provides classes for defin 1. etches job resources locally. 2. Enters the copy phase to fetch local copies of all the assigned map results from the map worker nodes. 3. When the copy phase completes, executes the sort phase to merge the copied results into a single sorted set of (key, value-list) pairs. 4. When the sort phase completes, executes the reduce phase, invoking the job-supplied reduce function on each (key, value-list) pair. 5. Saves the final results to the output destination, such as HDFS. The input to a reduce function is key-value pairs where the value is a list of values sharing the same key. For example, if one map task produces a key-value pair ('eat', 2) and another map task produces the pair ('eat', 1), then these pairs are consolidated into ('eat', (2, 1)) for input to the reduce function. If the purpose of the reduce phase is to compute a sum of all the values for each key, then the final output key-value pair for this input is ('eat', 3). For a more complete example, see Example: Calculating Word Occurrences. Output from the reduce phase is saved to the destination configured for the job, such as HDFS or MarkLogic Server. Reduce tasks use an OutputFormat subclass to record results. The Hadoop API provides OutputFormat subclasses for using HDFS as the output destination. The MarkLogic Connector for Hadoop provides OutputFormat subclasses for using a MarkLogic Server database as the destination. For a list of available subclasses, see OutputFormat Subclasses. The connector also provides classes for defin

Example: Calculating Word Occurrences: (Key, Value-List) (Key, Value-List)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Example: Calculating Word Occurrences: (Key, Value-List) (Key, Value-List)

Uploaded by

Copyright:

Available Formats

1. FFetches job resources locally. 2.

You might also like