Professional Documents
Culture Documents
Hadoop Installation
hadoop-env.sh is a file which contains hadoop enviorment related properties. Here we can set
properties like where is java home, what is heap memory size, what is classpath of hadoop, which
version of IP to use etc. we will set java home in this file. for me java home is "/usr/lib/jvm/java-6openjdk-i386". so put following line in file and save.
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
hdfs-site.xml is file which contains properties related to hdfs(hadoop distributed file system.). We
need to set here the replication factor here. By default replication factor is 3. since we are installing
hadoop in single machine. we will set it to 1. Copy following in-between the configuration tag in file.
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/hadoop_tmp/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser hadoop_tmp/hdfs/datanode</value>
</property>
mapred-site.xml is a file that contains properties related to map reduce. we will set here ip
address and port of machine on which job tracker is running. copy following in between
configuration tag.
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
core-site.xml is property file which contains property which are common or used by both map
reduce and hdfs. here we will set ip address and port number of machine on which namenode
will be running. Other property tells where should hadoop store files like fsimage and blocks
etc. Copy following in between configuration tag.
<property>
name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Now open the terminal and edit the .bashrc by using below command,
Now open terminal and format namenode with the following command. Namenode should be
formatted only once, before you start using your hadoop cluster. if you format namnode later, you
will lose all the data stored on hdfs. Notice that "/home/hduser/hadoop/bin/" folder contains all
the important scripts to start hadoop, stop hadoop, access hdfs, format hdfs etc.
/home/hduser/hadoop/bin/hadoop namenode -format
Now you can start hadoop using following command.
/home/hduser/hadoop/bin/start-all.sh
you can check if hadoop has started using following command
jps
it shows all java processes running. it should show following processes.
ResourceManager
DataNode
Jps
JobHistoryServer
NameNode
NodeManager