Professional Documents
Culture Documents
Hadoop installation in fully distributed mode on master and slave nodes under ubuntu
environment is explained below
Steps to be followed on configuring master and slave nodes
1) Add Java software information to repository from ppa(personal package
archive)
2) Update repository
$ sudo apt-get update
3) Install Java 8
$ sudo apt-get install oracle-java8-installer
4) Verify which java version is installed
$ java -version
5) Create same account on all nodes to use hadoop installation
$ sudo useradd cse (cse is username)
$ sudo passwd cse (Enter password for cse)
6) Edit /etc/hosts file on all nodes which specifies IP address of each node
followed by hostnames(assumption) and add the following lines
192.168.100.22 master
192.168.100.23 slave1
192.168.100.24 slave2
7) Install SSH on all nodes
$ sudo apt-get install openssh-server
8) Configure key based login on all nodes which communicate with each other
without prompting for password
$ su cse (super user switching to cse account)
$ ssh-keygen -t rsa -P “” (public key is generated) and copy public key
from master to slave nodes
Installing Hadoop on Master nodes
9) create new hadoop folder say 'hadoop'
10) Install Hadoop-2.8.1(from apache site)
11) Extract tar.gz file and hadoop-2.8.1 is created
12) move hadoop installation folder(hadoop-2.8.1) to newly created
directory)(hadoop)
13) sudo chown -R cse hadoop/hadoop-2.8.1 (making cse owner of hadoop
folder)
Configuring Hadoop on master node
14 )
Several files needed to configure Hadoop located in etc of hadoop folder are
a) core-site.xml (contains configuration settings that hadoop uses when started. It
contains information such as the port number used for Hadoop instance, memory
allocated for the file system, memory limit for storing the data, and size of Read/Write
buffers. It also specifies where Namenode runs in the cluster(property
name=fs.default.name and value= hdfs://master:54310)
b) hdfs-site.xml (contains information such as the value of replication data, namenode
path, and datanode paths of local file systems)
Property name Value
dfs.replication 3
dfs.namenode.name.dir hadoop/hadoop-2.8.1/namenodenew
dfs.datanode.data.dir hadoop/hadoop-2.8.1/datanodenew