You are on page 1of 6

Big-Data Hadoop

Minimum System Requirement:


Processor: I3- processor, with 64-bit architecture,
Ram: 4GB RAM
In c drive 30 GB of free space in hard disk.

Pseudo distributed mode


1. Windows users, Please download and install VMware Workstation 11 from the following
download link:
https://my.vmware.com/web/vmware/info/slug/desktop_end_user_computing/vmware_workstat
ion/11_0
and complete an on-line registration, which provides for a 30-day trial license. [This is about 350
MB download]. For those of you, who could not download it, download VMware Player, you will be
able to complete more than 95% of the exercises.
2. Mac users, please install VMware Fusion 7 (instead of WorkStation 11) on your machine.
http://www.vmware.com/in/products/fusion/fusion-evaluation.html
and complete an on-line registration, which provides for a 30-day trial license.
3. Install the VMware workstation or VMware fusion which is downloaded, as per your system
requirement.
4. Download Ubuntu operating system iso image from the below link;
http://www.ubuntu.com/download/desktop

(64 bit or 32 bit)

5. Install Ubuntu OS in VMware.


When installing Ubuntu os you may face below error,
This kernel requires an *86 -64 CPU. But only detected on i686 cpu.
Unable to boot - please use a kernel appropriate for your cpu.
Solution: please enable the VT-x technology in Boot setup.
Then try to install Ubuntu operating system. Once the installation is done, follow the below steps in
Ubuntu terminal,
6. Add a new group called "hadoop" using following command.
sudo addgroup hadoop
it will prompt for password. give password of user with whom you are logged in. sudo is used if we
want to run any command as super user(you can say admin of system).

7. Add a new user for hadoop in group.


sudo adduser --ingroup hadoop hduser.
hduser is the name of new user. Once you hit enter for this command, it will ask for different things
like, password that you want to set, display name of user etc. Give appropriate answer, You can give
display name also hduser for easy maintenance .
8. Now add hduser in the list of sudoers. so that you can run any command as sudo when you are
logged in as hduser.
sudo adduser hduser sudo.
9. Now Logout and login as hduser.
10. open the terminal and type the below command,
sudo apt-get update
it will prompt for password type your hduser password.
11. Install java, please use the below command,
sudo apt-get install openjdk-6-jdk
it will prompt for password type your hduser password.
12. Install ssh-server, please use the below command,
sudo apt-get install openssh-server
once the ssh-server installation is done, generate ssh-key by using the below command,
ssh-keygen
it will prompt you to give path to store keys, dont type anything, just press enter.
This command will generate two keys at "/home/hduser/.ssh/" path. id_rsa and id_rsa.pub.
Id_rsa is private key.
Id_rsa.pub is pubic key.
If i want to login into remote machine X. i will share my public key with machine X. In our case it is
local machine, so following command is used.
ssh-copy-id -i /home/hduser/.ssh/id_rsa.pub hduser@localhost
this will prompt for password. Give password for hduser.

Hadoop Installation

Download hadoop 2.7.1.


Extract hadoop and put it in folder "/home/hduser/hadoop"
Now we need to make configurations in hadoop configuration file. You will find these files in
"/home/hduser/hadoop/conf" folder. (this is according my system path).
There are 4 important files in this folder
a) hadoop-env.sh
b) hdfs-site.xml
c) mapred-site.xml
d) core-site.xml
e) Yarn-site.xml

hadoop-env.sh is a file which contains hadoop enviorment related properties. Here we can set
properties like where is java home, what is heap memory size, what is classpath of hadoop, which
version of IP to use etc. we will set java home in this file. for me java home is "/usr/lib/jvm/java-6openjdk-i386". so put following line in file and save.
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386

hdfs-site.xml is file which contains properties related to hdfs(hadoop distributed file system.). We
need to set here the replication factor here. By default replication factor is 3. since we are installing
hadoop in single machine. we will set it to 1. Copy following in-between the configuration tag in file.
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser hadoop_tmp/hdfs/datanode</value>
</property>

mapred-site.xml is a file that contains properties related to map reduce. we will set here ip
address and port of machine on which job tracker is running. copy following in between
configuration tag.

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

core-site.xml is property file which contains property which are common or used by both map
reduce and hdfs. here we will set ip address and port number of machine on which namenode
will be running. Other property tells where should hadoop store files like fsimage and blocks
etc. Copy following in between configuration tag.

<property>
name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

Now open the terminal and edit the .bashrc by using below command,

Vi .bashrc or gedit .bashrc


here we are going to set the java path and hadoop path. .bashrc is a shell script that Bash runs
whenever it is started interactively. You can put any command in that file that you could type at the
command prompt.
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64
export HADOOP_HOME=/home/hduser/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
save and exit(Press esc and :wq)
And update the .bashrc file by using below command,
. .bashrc

Now open terminal and format namenode with the following command. Namenode should be
formatted only once, before you start using your hadoop cluster. if you format namnode later, you
will lose all the data stored on hdfs. Notice that "/home/hduser/hadoop/bin/" folder contains all
the important scripts to start hadoop, stop hadoop, access hdfs, format hdfs etc.
/home/hduser/hadoop/bin/hadoop namenode -format
Now you can start hadoop using following command.
/home/hduser/hadoop/bin/start-all.sh
you can check if hadoop has started using following command
jps
it shows all java processes running. it should show following processes.

ResourceManager
DataNode
Jps
JobHistoryServer
NameNode
NodeManager

You might also like