Setting Up Hadoop 2.X For MultiNode

Setting up hadoop 2.4.
1 multi-node
cluster in Ubuntu 14.04 64-bit
1.
Follow my post to setup single node hadoop cluster and set it up in all your slave computers.
2.
One PC will be master, from where everything is controlled. All other PC's are slaves. NOTE: We
will assume mypc1 is the master and other pc's are slaves
3.
Edit Hosts file to say at which IP Address your computers (Master and all slave PC's ) are and
modify following lines accordingly.NOTE: Hostname can be different from the hostname in that PC. -
sudo gedit /etc/hosts

For example, if PC1 is the name of a computer and its IP is 10.200.1.8, hostname in host file entry can
be mypc2 or anything.
NOTE: remove line starting with 127.1.0.1 in hosts file. And you should have same pc name ( as specified
by 127.0.0.1 line, in this case PC1) in /etc/hosts file and /etc/hostname file or else you will get the
error host not found. And restart system for the changes to take effect.
127.0.0.1 localhost PC1
10.200.1.7 mypc1
10.200.1.8 mypc2
10.200.1.9 mypc3
10.200.1.10 mypc4
4.
Configurations to be done in both Master and Slave Computers
Replace the code in core-site xml file with following code. Change mypc1 to the name of
cd
/usr/local/hadoop/etc/hadoop
sudo gedit core-site.xml
<configuration>
<property>
your Master PC ( first change directory using
<name>fs.default.name</name>
<value>hdfs://mypc1:54310</value>
<description>The name of the default file system.
scheme and authority determine the FileSystem implementation.
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
A URI whose
The
The uri's authority is used to
Replace the code in hdfs-site xml file with following code. ( value of Replication should
be equal to no. of Slave computers. In this case, 4.)
sudo gedit hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>4</value>
<description>Default block replication.
The actual number of replications can be specified when the file is

created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hdfs</value>
<description>Directory to store files in HDFS.
This directory is not formatted when namenode is formatted.

</description>
</property>
</configuration>
Replace the code in mapred-site xml file with following code.( modify mypc1 to the name
of your Master PC )
sudo gedit mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>mypc1:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
Replace the code in yarn-site xml file with following code. (Replace mypc1 with your
Master Node's name or IP Address )
sudo gedit yarn-site.xml
<configuration>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
</property>
<property>
<name>yarn.resourcemanager.address</name>
</property>
</configuration>
5.
Configurations to be done only in Master Computer ( NOTE:user should be hduser in
terminal . Redo this step if you change IP's later. First delete .ssh folder in hduser and generate new key
- ( sudo rm -R /home/hduser/.ssh && ssh-keygen -t rsa -P "" && cat
$HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys )
6.
Enabling SSH Access, so that master can access all computers, including the master pc
mypc1. ( use id_dsa.pub in case id_rsa.pub doesn't work.)
ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@mypc1
Now Test SSH connection to all PC's using this commands. ( ssh from your pc's hduser,
i.e type exit after you ssh a pc and then ssh again to a new one.)
ssh mypc1
ssh mypc2
ssh mypc3
ssh mypc4
Create a masters file and a slaves file to specify which PC's are masters and slaves.First
change directory and paste following lines respectively-
cd /usr/local/hadoop/etc/hado
op/
sudo gedit masters
mypc1
sudo gedit slaves
mypc1
mypc2
mypc3
mypc4
cat
/usr/local/hadoop/etc/hadoop/masters)
mypc1
Finally, output of Master File should be ( see using
Output of slaves file ( also includes master pc mypc1 as we want to run programs on
cat
/usr/local/hadoop/etc/hadoop/slaves)
master pc also. ) ( see using
mypc1
mypc2
mypc3
mypc4
7.
Testing Time!! - In Master PC ( NOTE: user should be hduser in the terminal )
cd
/usr/local/hadoop/bin
Change dir-
Format Namenode -
hadoop namenode format
By starting HDFS daemons ( start-dfs.sh ), the NameNode ( and Datandoe ) daemon is

started on Master PC and DataNode daemons are started on all nodes ( Slave PC's ). And by
Starting MapRed daemons ( start-mapred.sh), start NodeManager daemon in all PC's. NOTE:
Check if respective daemons are running by using
jps
cd /usr/local/hadoop/etc/hadoop start-dfs.sh && start-yarn.sh
Warining of Unable to load native hadoop libraries is OK. It won't affect hadoop
functionalities.
To Stopstop-dfs.sh
stop-yarn.sh
8. Web Interfaces ( NOTE: mypc1 can be replace by IP Address of Master PC)
Web UI displaying all cluster info - http://mypc1:8088/
Web UI of the Cluster Health and Namenode - http://mypc1:50070/
Web UI of Datanodes, Accessing Logs & browsing File System - http://mypc1:50075/
If any of the daemons are not running follow these steps: ( do this in all pc's )
1. remove temp. folders -
sudo rm -R /tmp/*
2. Only in master - Check if ssh to all the pc's is working, without entering password every time.
3. Kill all process if they are already running (Don't do it if you have any jobs running) -
sudo kill -9 $(lsof -ti:8088) && sudo kill

-9 $(lsof -ti:8042) && sudo kill -9 $(lsof
-ti:50070) && sudo kill -9 $(lsof
-ti:50075) && sudo kill -9 $(lsof
-ti:50090)
4. Format the namenode ( should not ask you to Re-format the File-system and exit status should be 0) -
hadoop namenode -format

5. From master, start all the daemons again.
Comments are highly appreciated.

Setting Up Hadoop 2.X For MultiNode

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Setting Up Hadoop 2.X For MultiNode

Uploaded by

Copyright:

Available Formats

Setting up hadoop 2.4.

sudo gedit /etc/hosts

Configurations to be done in both Master and Slave Computers

your Master PC ( first change directory using

<description>The name of the default file system.

scheme and authority determine the FileSystem implementation.

uri's scheme determines the config property (fs.SCHEME.impl) naming

the FileSystem implementation class.

determine the host, port, etc. for a filesystem.</description>

The uri's authority is used to

sudo gedit hdfs-site.xml

<description>Default block replication.

The actual number of replications can be specified when the file is

The default is used if replication is not specified in create time.

<description>Directory to store files in HDFS.

This directory is not formatted when namenode is formatted.

sudo gedit mapred-site.xml

at. If "local", then jobs are run in-process as a single map

and reduce task.

sudo gedit yarn-site.xml

sudo gedit slaves

Finally, output of Master File should be ( see using

Testing Time!! - In Master PC ( NOTE: user should be hduser in the terminal )

hadoop namenode format

By starting HDFS daemons ( start-dfs.sh ), the NameNode ( and Datandoe ) daemon is

cd /usr/local/hadoop/etc/hadoop start-dfs.sh && start-yarn.sh

8. Web Interfaces ( NOTE: mypc1 can be replace by IP Address of Master PC)

Web UI displaying all cluster info - http://mypc1:8088/

Web UI of the Cluster Health and Namenode - http://mypc1:50070/

Web UI of Datanodes, Accessing Logs & browsing File System - http://mypc1:50075/

1. remove temp. folders -

sudo kill -9 $(lsof -ti:8088) && sudo kill

hadoop namenode -format

Comments are highly appreciated.

You might also like