Professional Documents
Culture Documents
error:=
-------------Prepare to Start the Hadoop Cluster:------------------Unpack the downloaded Hadoop distribution. In the distribution, edit the file by
following command:sudo vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh to define some parameters as
follows:
# set to the root of your Java installation
export JAVA_HOME=/usr/java/latest
# Assuming your installation directory is /usr/local/hadoop
export HADOOP_PREFIX=/usr/local/hadoop
-------------Setting Global Variables(for both users)--------------------sudo vi ~/.bashrc
add at the end of file
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
#export JAVA_HOME=/usr/java/default
export
export
export
export
export
export
HADOOP_HOME=/usr/local/hadoop
HADOOP_PREFIX=/usr/local/hadoop
HADOOP_INSTALL=/usr/local/hadoop
PATH=$PATH:$JAVA_HOME/bin
PATH=$PATH:$HADOOP_INSTALL/bin
PATH=$PATH:$HADOOP_INSTALL/sbin
export
export
export
export
export
export
HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
HADOOP_MAPRED_HOME=$HADOOP_INSTALL
HADOOP_COMMON_HOME=$HADOOP_INSTALL
HADOOP_HDFS_HOME=$HADOOP_INSTALL
YARN_HOME=$HADOOP_INSTALL
HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END
After adding the following code, reload the settings usingsource ~/.bashrc
Try the following command:
$ hadoop
---------------------------------Standalone Operation:---------------------------(might not run if we have already set-up hadoop for Pseudo-mode i.e..Connection
refused )
By default, Hadoop is configured to run in a non-distributed mode, as a single J
ava process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and the
n finds and displays every match of the given regular
expression. Output is written to the given output directory.
$ mkdir input
$ cp /usr/local/hadoop/etc/hadoop/*.xml input
$ stop-dfs.sh
-----------------------YARN on Single Node:--------------------------------You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a fe
w parameters and running ResourceManager
daemon and NodeManager daemon in addition.
The following instructions assume that 1. ~ 4. steps of the above instructions a
re already executed.
1. Configure parameters as follows:
etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
2. Start ResourceManager daemon and NodeManager daemon:
$ start-yarn.sh
use 'jps' to check running deamons
3. Browse the web interface for the ResourceManager; by default it is available
at:
ResourceManager - http://localhost:8088/
4. Run a MapReduce job.
5. When you're done, stop the daemons with:
$ sbin/stop-yarn.sh
Inputs and Outputs for wordcount example:The MapReduce framework operates exclusively on <key, value> pairs, that is, the
framework views the
input to the job as a set of <key, value> pairs and produces a set of <key, valu
e> pairs as the output of
the job, conceivably of different types.
The key and value classes have to be serializable by the framework and hence nee
d to implement the
Writable interface. Additionally, the key classes have to implement the Writable
Comparable interface to
facilitate sorting by the framework.
Input and Output types of a MapReduce job:
(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3, v3>
(output)
----------------Example: WordCount v1.0------------------------------Before we jump into the details, lets walk through an example MapReduce applicat
ion to get a flavour for
how they work.
WordCount is a simple application that counts the number of occurrences of each
word in a given input set.
This works with a local-standalone, pseudo-distributed or fully-distributed Hado