You are on page 1of 5

Ex No:9

Word count program to demonstrate the use of Map


and Reduce tasks
Date:

Aim:
To write a word count program to demonstrate the use of map and Reduce tasks

Description:
MapReduce is a processing technique and a program model for distributed computing
based on java. The MapReduce algorithm contains two important tasks, namely Map and
Reduce.

 Map stage : The map or mapper’s job is to process the input data. Generally the input
data is in the form of file or directory and is stored in the Hadoop file system
(HDFS). The input file is passed to the mapper function line by line. The mapper
processes the data and creates several small chunks of data.

 Reduce stage : This stage is the combination of the Shufflestage and


the Reduce stage. The Reducer’s job is to process the data that comes from the
mapper. After processing, it produces a new set of output, which will be stored in the
HDFS.

Steps in Eclipse IDE:

1. File->new project->Java project->next.”wordcount”-project name click finish

2. Right click on word count project and select properties.

3. Click add external jars.File system->usr->lib->hadoop

4. Select all jars and click ok once again.Add external jars ,all libs in “client”

5. Right click on source,new->class->wordcount->finish.

6. Add program.

Exporting the jar:

1. Right click on wordcount project and select export.Java->jar file.

2. Select destination and input file


Cat/home/location of file/wordcount.txt

Program:

org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import package wordcount;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {


public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);


private Text word = new Text();

public void map(Object key, Text value, Context context


) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer


extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,


Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}}

Create and Setting up path :

To list files:

hadoop fs –ls

To create directory:

hadoop fs -mkdir input


Execution:

hadoop –fs –put/home/loc/wordcount.txt/input/wordcount1.txt

hadoop fs –ls input

To Run:

hadoop jar /home/.../wordcount.jar wordcount.wordcount /input/wordcount.txt /output


Output Command:

hadoop fs –ls output

hadoop fs –cat output/part-00000

Result:
Thus the word count program to demonstrate the use of map and Reduce tasks was
written ,executed and output was verified.

You might also like