You are on page 1of 6

Steps to write a mapreduce example program(word count,sample

counter,consonents and vowels)


The requirement to create any program for Big data are,
Linux-Ubuntu(12.04),
Eclipse-Kepler (any version that supports OS)
Hadoop framework (2.2.0 or any version that supports)
Hadoop-eclipse plugin (which is compatible for eclipse and hadoop versions)
With the above requirements, now we are set to write the map reduce program. The following
steps are necessary to run the word count program
a. Start the hadoop and make sure all the nodes are running
b. Now prepare the DFS location for the hadoop ,steps are
a. Open the perspective and choose mapreduce perspective
b. Create a new hadoop location by giving the credentials
c. Check in the eclipse the DFS is working
c. Now create a java project(Mapreduce is also optional)
d. Copy the required packanges from the hadoop to the eclipse lib folder
e. Buid the path for the above packages
f. Create the java classes for the programs (Mapper ,reduce and main driver classes)
g. Run the main driver program as a java application

The below is the sample program for the word count application
The main driver program is below
package P2;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MyDriver {


public static void main(String[] args) throws IOException, ClassNotFoundException,
InterruptedException {
//create a configuration class pointing to default configuration
Configuration conf =new Configuration();

//prepare a job object

Job job =new Job(conf,"MyWordCountJob");

//link to the driver class with the job


job.setJarByClass(MyDriver.class);

//Link Mapper with job

job.setMapperClass(MyMapper.class);

//Link Reducer with job


job.setReducerClass(MyReducer.class);

//set final output key


job.setOutputKeyClass(Text.class);

//set final output value


job.setOutputKeyClass(IntWritable.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

//define the input path


Path input_dir=new Path("hdfs://localhost:8020/input_data/word.txt");
FileInputFormat.addInputPath(job, input_dir);

//define the input path


Path output_dir=new Path("hdfs://localhost:8020/output_data/");
FileOutputFormat.setOutputPath(job,output_dir);

System.exit(job.waitForCompletion(true)?0:1);

}
}

The mapper programmer is below


package P2;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> {


@Override
protected void map(LongWritable offset, Text line, Context context)
throws IOException, InterruptedException {
String currentLine = line.toString();
System.out.println("MyMapper.map():offset" + offset + "::currentLine="
+ currentLine);

String words[]=currentLine.split(" ");

for(String word:words){
context.write(new Text(word), new IntWritable(1));
}
System.out.println(currentLine);

The reducer program


package P2;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class MyReducer extends Reducer<Text,IntWritable,Text,IntWritable>{

@Override
protected void reduce(Text word, Iterable<IntWritable> arr,Context ctx)

throws IOException, InterruptedException {

Iterator it=arr.iterator();
int count=0;

while(it.hasNext())
{
IntWritable i=(IntWritable)it.next();
count=count+i.get();
}

ctx.write(word,new IntWritable(count));
}

You might also like