Professional Documents
Culture Documents
AICTE SPONSORED Faculty Development Programme (FDP) on “DATA SCIENCE RESEARCH AND
BIG DATA ANALYTICS” scheduled from 11.12.2017 to 23.12.2017
Introduction :
Apache Hadoop is an Open Source framework build for distributed Big Data storage
and processing data across computer clusters. The project is based on the following
components:
1. Hadoop Common – it contains the Java libraries and utilities needed by other
Hadoop modules.
2. HDFS – Hadoop Distributed File System – A Java based scalable file system
distributed across multiple nodes.
3. MapReduce – YARN framework for parallel big data processing.
4. Hadoop YARN: A framework for cluster resource management.
Procedure :
export
CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
## HADOOP env variables
export HADOOP_HOME=/opt/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
$ vi etc/hadoop/core-site.xml
<property>
<name>dfs.name.dir</name>
<value>file:///opt/volume/namenode</value>
</property>
$ vi etc/hadoop/yarn-site.xml
Add the following excerpt to yarn-site.xml file:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
$ vi etc/hadoop/hadoop-env.sh
Edit the following line to point to your Java system path.
export JAVA_HOME=/usr/java/default/
Hadoop NodeManager
Step 7: Manage Hadoop Services
21. To stop all hadoop instances run the below commands:
$ stop-yarn.sh
$ stop-dfs.sh
Then, add executable permissions for rc.local file and enable, start and check
service status by issuing the below commands:
$ chmod +x /etc/rc.d/rc.local
$ systemctl enable rc-local
$ systemctl start rc-local
$ systemctl status rc-local
What is Fuse?
• FUSE permits you to write down a traditional user land application as a bridge
for a conventional file system interface.
• The hadoop-hdfs-fuse package permits you to use your HDFS cluster as if it
were a conventional file system on Linux.
• It’s assumed that you simply have a operating HDFS cluster and grasp the
hostname and port that your NameNode exposes.
• The Hadoop fuse installation and configuration with Mounting HDFS, HDFS
mount using fuse is done by following the below steps.
Step 1 : Required Dependencies
Step 2 : Download and Install FUSE
Step 3 : Install RPM Packages
Step 4 : Modify HDFS FUSE
Step 5 : Check HADOOP Services
Step 6 : Create a Directory to Mount HADOOP
Extract hdfs-fuse-0.2.linux2.6-gcc4.1-x86.tar.gz
[hadoop@hadoop ~]#tar -zxvf hdfs-fuse-0.2.linux2.6-gcc4.1-x86.tar.gz
tmp user
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.ToolRunner;
if (args.length < 2) {
System.err.println("HdfsWriter [local input path] [hdfs output path]");
return 1;
}
Step 3: Verify whether the file is written into HDFS and check the contents of the
file.
MAP REDUCE
[student@localhost ~]$ su
Password:
[root@localhost student]# su - hadoop
Last login: Wed Aug 31 10:14:26 IST 2016 on pts/1
[hadoop@localhost ~]$ mkdir mapreduce
[hadoop@localhost ~]$ cd mapreduce
[hadoop@localhost mapreduce]$ vi WordCountMapper.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text,
IntWritable>
{
private final static IntWritable one= new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException
{
String line = value.toString();
StringTokenizer tokenizer = newStringTokenizer (line);
while(tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
context.write(word,one);
}
}
}
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
//to accept the hdfs input and output dir at run time
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
********************************************
“Knowing is not enough
We must apply
Willing is not enough
We must do”
Best Wishes
By
D.Kesavaraja M.E ,(PhD),MISTE,AMIE
Assistant Professor/CSE
Tiruchendur
Website : www.k7cloud.in Mail:k7cloud@gmail.com Mobile: +91 9865213214