Professional Documents
Culture Documents
1 multi-node
cluster in Ubuntu 14.04 64-bit
1.
Follow my post to setup single node hadoop cluster and set it up in all your slave computers.
2.
One PC will be master, from where everything is controlled. All other PC's are slaves. NOTE: We
will assume mypc1 is the master and other pc's are slaves
3.
Edit Hosts file to say at which IP Address your computers (Master and all slave PC's ) are and
modify following lines accordingly.NOTE: Hostname can be different from the hostname in that PC. -
Replace the code in core-site xml file with following code. Change mypc1 to the name of
cd
/usr/local/hadoop/etc/hadoop
sudo gedit core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://mypc1:54310</value>
</property>
</configuration>
A URI whose
The
Replace the code in hdfs-site xml file with following code. ( value of Replication should
be equal to no. of Slave computers. In this case, 4.)
<configuration>
<property>
<name>dfs.replication</name>
<value>4</value>
</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hdfs</value>
</property>
</configuration>
Replace the code in mapred-site xml file with following code.( modify mypc1 to the name
of your Master PC )
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>mypc1:54311</value>
<description>The host and port that the MapReduce job tracker runs
</description>
</property>
</configuration>
Replace the code in yarn-site xml file with following code. (Replace mypc1 with your
Master Node's name or IP Address )
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>mypc1:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>mypc1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>mypc1:8050</value>
</property>
</configuration>
5.
Configurations to be done only in Master Computer ( NOTE:user should be hduser in
terminal . Redo this step if you change IP's later. First delete .ssh folder in hduser and generate new key
- ( sudo rm -R /home/hduser/.ssh && ssh-keygen -t rsa -P "" && cat
$HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys )
6.
Enabling SSH Access, so that master can access all computers, including the master pc
mypc1. ( use id_dsa.pub in case id_rsa.pub doesn't work.)
ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@mypc1
ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@mypc2
ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@mypc3
ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@mypc4
Now Test SSH connection to all PC's using this commands. ( ssh from your pc's hduser,
i.e type exit after you ssh a pc and then ssh again to a new one.)
ssh mypc1
ssh mypc2
ssh mypc3
ssh mypc4
Create a masters file and a slaves file to specify which PC's are masters and slaves.First
change directory and paste following lines respectively-
cd /usr/local/hadoop/etc/hado
op/
sudo gedit masters
mypc1
mypc1
mypc2
mypc3
mypc4
cat
/usr/local/hadoop/etc/hadoop/masters)
mypc1
Output of slaves file ( also includes master pc mypc1 as we want to run programs on
cat
/usr/local/hadoop/etc/hadoop/slaves)
master pc also. ) ( see using
mypc1
mypc2
mypc3
mypc4
7.
cd
/usr/local/hadoop/bin
Change dir-
Format Namenode -
jps
Warining of Unable to load native hadoop libraries is OK. It won't affect hadoop
functionalities.
To Stopstop-dfs.sh
stop-yarn.sh
If any of the daemons are not running follow these steps: ( do this in all pc's )
sudo rm -R /tmp/*
2. Only in master - Check if ssh to all the pc's is working, without entering password every time.
3. Kill all process if they are already running (Don't do it if you have any jobs running) -