You are on page 1of 20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

Support

Developers

Contact Us

Downloads

Search

COMMUNITY

Hadoop & Big Data

FAQs
Blog
Accumulo (1)
Avro (16)
Bigtop (6)
Books (11)
Careers (14)
CDH (150)

DOWNLOADS

TRAINING

BLOGS

How-to: Install CDH on Mac OSX 10.9 Mavericks


by Jordan Hambleton

Our Customers

DOCUMENTATION

September 16, 2014

11 comments

This overview will cover the basic tarball setup for your Mac.
If youre an engineer building applications on CDH and becoming familiar with all the rich features for designing the
next big solution, it becomes essential to have a native Mac OSX install. Sure, you may argue that your MBP with its
four-core, hyper-threaded i7, SSD, 16GB of DDR3 memory are sufficient for spinning up a VM, and in most instances
such as using a VM for a quick demo youre right. However, when experimenting with a slightly heavier workload
that is a bit more resource intensive, youll want to explore a native install.
In this post, I will cover setup of a few basic dependencies and the necessities to run HDFS, MapReduce with YARN,
Apache ZooKeeper, and Apache HBase. It should be used as a guideline to get your local CDH box setup with the
objective to enable you with building and running applications on the Apache Hadoop stack.
Note: This process is not supported and thus you should be comfortable as a self-supporting sysadmin. With that in
mind, the configurations throughout this guideline are suggested for your default bash shell environment that can be
set in your ~/.profile.

Dependencies

Cloud (18)

Install the Java version that is supported for the CDH version you are installing. In my case for CDH 5.1, Ive installed
JDK 1.7 u67.Historically the JDK for Mac OSX was only available from Apple, but since JDK 1.7, its available directly
through Oracles Java downloads. Download the .dmg (in the example below, jdk-7u67-macosx-x64.dmg) and

Cloudera Labs (3)

install it.

Cloudera Life (6)

Verify and configure the installation:

Cloudera Manager (72)

Old Java path: /System/Library/Frameworks/JavaVM.framework/Home


New Java path: /Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/Contents/Home

http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

1/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

Community (206)
Data Ingestion (20)
Data Science (33)
Events (45)

export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/Contents/Home"
Note:Youll notice that after installing the Oracle JDK, the original path used to manage versioning
/System/Library/Frameworks/JavaVM.framework/Versions, will not be updated and you now have the
control to manage your versions independently.
Enable ssh on your mac by turning on remote login. You can find this option under your toolbars Appleicon > System
Preferences > Sharing.

Flume (21)

1. Check the box for Remote Login to enable the service.

General (334)

2. Allow access for: Only these users: Administrators


Note:In this same window, you can modify your computers hostname.

Graph Processing (3)


Guest (97)
Hadoop (332)
Hardware (6)
HBase (133)
HDFS (51)
Hive (71)
How-to (75)
Hue (33)
Impala (83)
Kafka (6)
Kite SDK (16)
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

2/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

Mahout (5)

Enable password-less ssh login to localhost for MRv1 and HBase.


1. Open your terminal.

MapReduce (73)

2. Generate an rsaor dsakey.


1. ssh-keygen -t rsa -P ""

Meet The Engineer (22)


Oozie (26)

2. Continue through the key generator prompts (use default options).


3. Test: ssh localhost

Ops And DevOps (22)

Homebrew

Parquet (12)

Another toolkit I admire is Homebrew, a package manager for OSX. While Xcode developer command-line tools are
great, the savvy naming conventions and ease of use of Homebrew get the job done in a fun way.

Pig (36)

I havent needed Homebrew for much else than for installing dependencies required for building native Snappy
libraries for Mac OSX and ease of install of MySQL for Hive. Snappy is commonly used within HBase, HDFS, and
MapReduce for compression and decompression.

Project Rhino (5)

CDH

QuickStart VM (5)

Finally, the easy part: The CDH tarballs are very nicely packaged and easily downloadable from Clouderas repository.
Ive downloaded tarballs for CDH 5.1.0.

Search (23)

Download and explode the tarballs in a libdirectory where you can manage latest versions with a simple symlink as

Security (29)

the following.Although Mac OSXs Make Alias feature is bi-directional, do not use it, but instead use your commandline ln -s command, such as ln -s source_file target_file.

Performance (12)

Sentry (1)

/Users/jordanh/cloudera/
cdh5.1/

Spark (32)
Sqoop (24)
Support (5)
Testing (8)
This Month In The
Ecosystem (15)

hadoop -> /Users/jordanh/cloudera/lib/hadoop-2.3.0-cdh5.1.0


hbase -> /Users/jordanh/cloudera/lib/hbase-0.98.1-cdh5.1.0
hive -> /Users/jordanh/cloudera/lib/hive-0.12.0-cdh5.1.0
zookeeper -> /Users/jordanh/cloudera/lib/zookeeper-3.4.5-cdh4.7.0
ops/
dn
logs/hadoop, logs/hbase, logs/yarn

http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

3/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

Tools (9)
Training (45)
Use Case (67)

nn/
pids
tmp/
zk/
Youll notice above that youve created a handful of directories under a folder named ops. Youll use them later to

Whirr (6)

customize the configuration of the essential components for running Hadoop. Set your environment properties
according to the paths where youve exploded your tarballs.

YARN (15)
ZooKeeper (24)
Archives by Month

Shell
~/.profile

CDH="cdh5.1"
export HADOOP_HOME="/Users/jordanh/cloudera/${CDH}/hadoop"
export HBASE_HOME="/Users/jordanh/cloudera/${CDH}/hbase"
export HIVE_HOME="/Users/jordanh/cloudera/${CDH}/hive"
export HCAT_HOME="/Users/jordanh/cloudera/${CDH}/hive/hcatalog"

export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZK_HOME}/bin:${HBASE_HOME

Update your main Hadoop configuration files, as shown in the sample files below. You can also download all files
referenced in this post directly from here.
$HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
<description>The name of the default file system.A URI whose
scheme and authority determine the FileSystem implementation.The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/jordanh/cloudera/ops/tmp/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.compression.codecs</name>
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

XHTML

4/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.ha
<description>A comma-separated list of the compression codec classes that can
be used for compression/decompression. In addition to any classes specified
with this property (which take precedence), codec classes on the classpath
are discovered using a Java ServiceLoader.</description>
</property>
</configuration>

$
XHTML
H
A
<D
nO
aO
mP
e_
>H
dO
fM
sE
./
ne
at
mc
e/
nh
oa
dd
eo
.o
np
a/
mh
ed
.f
ds
irs
<i
/t
ne
a.
mx
em
>l
<value>/Users/jordanh/cloudera/ops/nn</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage).If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/Users/jordanh/cloudera/ops/dn/</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks.If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>localhost:50075</value>
<description>
The datanode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

5/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

I attribute the YARN and MRv2 configuration and setup from the CDH 5 installation docs. I will not digress into the
specifications of each property or the orchestration and details of how YARN and MRv2 operate, but theres some
great information that my colleague Sandy has already shared for developers and admins.
Be sure to make the necessary adjustments per your systems memory and CPU constraints. Per the image below, it
is easy to see how these parameters will affect your machines performance when you execute jobs.

Next, edit the following files as shown.


$HADOOP_HOME/etc/hadoop/yarn-site.xml
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
<description>Number of CPU cores that can be allocated
for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
<description>The minimum allocation for every container request at the RM,
in MBs. Memory requests lower than this won't take effect,
and the specified value will get allocated at minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

XHTML

6/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<description>The maximum allocation for every container request at the RM,
in MBs. Memory requests higher than this won't take effect,
and will get capped to this value.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM,
in terms of virtual CPU cores. Requests lower than this won't take effect,
and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
<description>The maximum allocation for every container request at the RM,
in terms of virtual CPU cores. Requests higher than this won't take effect,
and will get capped to this value.</description>
</property>
</configuration>

$HADOOP_HOME/etc/hadoop/mapred-site.xml
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>1</value>
<description>
The number of virtual cores required for each reduce task.
</description>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
<description>Larger resource limit for maps.</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
<description>Larger resource limit for reduces.</description>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of maps.</description>
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

XHTML

7/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

<description>Heap-size for child jvms of maps.</description>


</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of reduces.</description>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1024</value>
<description>The amount of memory the MR AppMaster needs.</description>
</property>
</configuration>

$HADOOP_HOME/etc/hadoop/hadoop-env.sh (indicated properties only)

Shell

# Where log files are stored.$HADOOP_HOME/logs by default.


export HADOOP_LOG_DIR="/Users/jordanh/cloudera/ops/logs/hadoop"
export YARN_LOG_DIR="/Users/jordanh/cloudera/ops/logs/yarn"

# The directory where pid files are stored when processes run as daemons. /tmp by default.
export HADOOP_PID_DIR="/Users/jordanh/cloudera/ops/pids"
export YARN_PID_DIR=${HADOOP_PID_DIR}

You can configure HBase to run without separately downloading Apache ZooKeeper. Rather, it has a bundled package
that you can easily run as a separate instance or as standalone mode in a single JVM. I recommend using either
distributed or standalone mode instead of a separately downloaded ZooKeeper tarball on your machine for ease of
use, configuration, and management.
The primary difference with configuration between running HBase in distributed or standalone mode is with the
hbase.cluster.distributedproperty in hbase-site.xml. Set the property to false for launching HBase in
standalone mode or true to spin up separate instances for services such as HBases ZooKeeper and RegionServer.
Update the following configurations for HBase as specified to run it per this type of configuration.
Note regarding hbase-site.xml: Property hbase.cluster.distributedis set to false by default and will launch
in standalone mode. Also, hbase.zookeeper.quorumis set to localhost by default and does not need to be
overridden in our scenario.
$HBASE_HOME/conf/hbase-site.xml
false, startup will run all HBase and ZooKeeper daemons together
in the one JVM.
</description>
</property>
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

XHTML

8/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

</property>
<property>
<name>hbase.tmp.dir</name>
<value>/Users/jordanh/cloudera/ops/tmp/hbase-${user.name}</value>
<description>Temporary directory on the local filesystem.
Change this setting to point to a location more permanent
than '/tmp' (The '/tmp' directory is often cleared on
machine restart).
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/Users/jordanh/cloudera/ops/zk</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
<description>The directory shared by region servers and into
which HBase persists.The URL should be 'fully-qualified'
to include the filesystem scheme.For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase.By default HBase writes
into /tmp.Change this configuration else all data will be lost
on machine restart.
</description>
</property>
</configuration>

Note regarding $HBASE_HOME/conf/hbase-env.sh: By default HBASE_MANAGES_ZKis set as true and is listed


below only for explicit definition.
$HBASE_HOME/conf/hbase-env.sh

Shell

# Where log files are stored.$HBASE_HOME/logs by default.


# Where log files are stored.$HBASE_HOME/logs by default.
export HBASE_LOG_DIR="/Users/jordanh/cloudera/ops/logs/hbase"

# The directory where pid files are stored. /tmp by default.


export HBASE_PID_DIR="/Users/jordanh/cloudera/ops/pids"

# Tell HBase whether it should manage its own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=true
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

9/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

Pulling it All Together


By now, you should have accomplished setting up HDFS, YARN, and HBase. Hadoop setup and configuration is quite
tedious, much less managing it over time (thus Cloudera Manager, which is unfortunately not available for Macs).
These are the bare essentials for getting your local machine ready for running MapReduce jobs and building
applications on HBase. In the next few steps, we will start/stop the services and provide examples to ensure each
service is operating correctly. The steps are listed in the specific order for initialization in order to adhere to
dependencies. The order could be reversed for halting the services.
Service HDFS
NameNode
format: hdfs namenode -format
start: hdfs namenode
stop: Ctrl-C
url: http://localhost:50070/dfshealth.html
DataNode
start: hdfs datanode
stop: Ctrl-C
url: http://localhost:50075/browseDirectory.jsp?dir=%2F&nnaddr=127.0.0.1:8020
Test
hadoop fs -mkdir /tmp
hadoop fs -put /path/to/local/file.txt /tmp/
hadoop fs -cat /tmp/file.txt
Service YARN
ResourceManager

http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

10/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

start: yarn resourcemanager


stop: Ctrl-C
url: http://localhost:8088/cluster
NodeManager
start: yarn nodemanager
stop: Ctrl-C
url: http://localhost:8042/node
MapReduce Job History Server
start: mapred historyserver, mr-jobhistory-daemon.sh start historyserver
stop: Ctrl-C, mr-jobhistory-daemon.sh stop historyserver
url: http://localhost:19888/jobhistory/app
Test Vanilla YARN Application
Shell
hadoop jar $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.3.0

Test MRv2 YARN TestDFSIO


Shell
hadoop org.apache.hadoop.fs.TestDFSIO -write -nrFiles 5 -size 1GB
hadoop org.apache.hadoop.fs.TestDFSIO -read -nrFiles 5 -size 1GB

Test MRv2 YARN Terasort/Teragen


Shell
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0-cdh5.1.0.jar
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0-cdh5.1.0.jar

Test MRv2 YARN Pi


Shell
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

11/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0-cdh5.1.0.jar

Service HBase
HBase Master/RegionServer/ZooKeeper
start: start-hbase.sh
stop: stop-hbase.sh
logs: /Users/jordanh/cloudera/ops/logs/hbase/
url: http://localhost:60010/master-status
Test
Shell
hbase shell
create 'URL_HITS', {NAME=>'HOURLY'},{NAME=>'DAILY'},{NAME=>'YEARLY'}
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'HOURLY:2014090110', '10'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'HOURLY:2014090111', '5'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'HOURLY:2014090112', '30'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'HOURLY:2014090113', '80'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'HOURLY:2014090114', '7'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'DAILY:20140901', '10012'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'YEARLY:2014', '93310101'

scan 'URL_HITS'

Kite SDK Test


Get familiar with the Kite SDK by trying out this example that loads data to both HDFS and then HBase. Note that
there are a few common issues on your OSX that may surface when running through the Kite SDK example. They can
be easily resolved with additional setup/config as specified below.
Problem: NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/NoSuchObjectException
Resolution: Fix your classpath by making sure to set HIVE_HOME and HCAT_HOMEin your environment.
Shell
export HIVE_HOME="/Users/jordanh/cloudera/${CDH}/hive"
export HCAT_HOME="/Users/jordanh/cloudera/${CDH}/hive/hcatalog"

Problem: InvocationTargetExceptionCaused by: java.lang.UnsatisfiedLinkError: no snappyjava


http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

12/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

in java.library.path
Resolution: Snappy libraries are not compiled for Mac OSX out of the box. A Snappy Java port was introduced in
CDH 5 and likely will require to be recompiled on your machine.
Shell
git clone https://github.com/xerial/snappy-java.git
cd snappy-java
make

cp target/snappy-java-1.1.1.3.jar $HADOOP_HOME/share/hadoop/common/lib/asnappy-java-1.1.1.3.jar

Landing Page
Creating a landing page will help consolidate all the HTTP addresses of the services that youre running. Please note
that localhost can be replaced with your local hostname (such as jakuza-mbp.local).
Service Apache HTTPD
start: sudo -s launchctl load -w /System/Library/LaunchDaemons/org.apache.httpd.plist
stop: sudo -s launchctl unload -w /System/Library/LaunchDaemons/org.apache.httpd.plist
logs: /var/log/apache2/
url: http://localhost/index.html
Create index.html (edit /Library/WebServer/Documents/index.html, which you can download here).
It will look something like this:

http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

13/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

Conclusion
With this guide, you should have a locally running Hadoop cluster with HDFS, MapReduce, and HBase. These are the
core components for Hadoop, and are good initial foundation for building and prototyping your applications locally.
I hope this will be a good starting point on your dev box to try out more ways to build your products, whether they are
data pipelines, analytics, machine learning, search and exploration, or more, on the Hadoop stack.
Jordan Hambleton is a Solutions Architect at Cloudera.

http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

14/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

Filed under:
CDH
General

11 Responses
KRIS / SEPTEMBER 16, 2014 / 11:41 PM

Hi Justin,
Might be valuable for reader to point the to 2 blogposts describing a local install and configuring it to run in local and
pseudo distributed mode. The second blogpost describes the way its automated with the help of ansible even.
Thanks in advance
http://blog.godatadriven.com/local-and-pseudo-distributed-cdh5-hadoop-on-your-laptop.html
http://blog.godatadriven.com/automated-cdh5-hadoop-on-your-laptop-with-ansible.html
Kris
CHEN, JIANZHONG / SEPTEMBER 17, 2014 / 6:31 AM

really cool stuff!!


STEPHEN BOESCH / SEPTEMBER 18, 2014 / 6:02 PM

Hi, please add instructions for hive with mysql as the metastore. Hive is an essential ingredient of an Hadoop Ecosytem
more so than HBase. In any case thanks for putting together what is there so far (Including HBase). thanks.
JORDAN HAMBLETON / SEPTEMBER 25, 2014 / 11:37 AM

Hi Stephen,
Appreciate the comment. If youve followed the steps above, hive will work out of the box using its embedded metastore
using derby. Be sure to use the same local directory for all of the instances you use hive shell in order to use the same
metastore created.
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

15/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

In addition to the local metastore, you can install mysql via brew. Follow the config & setup from the link below. Ive
listed a few tips below.
https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/5.0/CDH5-InstallationGuide/cdh5ig_hive_metastore_configure.html
1. Install Mysql & Setup (mac conversions)
1.1. brew install mysql
1.2. follow instructions for mysql config from above CDH5 install link.
1.3. get mysql connector (ie. mysql-connector-java-5.1.16-bin.jar) and copy it to $HIVE_HOME/lib/
quick tips
* start mysql: mysql.server start
* stop mysql: mysql.server stop
Lastly, update hive-site.xml as specified on CDH5 install link above or find a copy on my github:
https://github.com/joropolis/misc-data/blob/master/blog-2014-09-01/hive-site.xml
Launch hive shell & if you have any issues, try running in debug mode.
hive -hiveconf hive.root.logger=DEBUG,console
If you see an error like the following, ensure the mysql connector jar is in $HIVE_HOME/lib/.
* The specified datastore driver (com.mysql.jdbc.Driver) was not found in the CLASSPATH
DEBASISH / OCTOBER 04, 2014 / 11:21 AM

Are you sure it works ?


I am getting this error:
org.apache.hadoop.util.Shell$ExitCodeException:
I saw on other posts that yarn.application.classpath has to be set to fix this error but I could not make that work yet as
well
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

16/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

JORDAN HAMBLETON / OCTOBER 06, 2014 / 11:19 PM

Thanks for the note Debasish. Yes, this is working without additional configuration. Did you check your yarn logs for the
DistributedShell command you executed (see snip below)? In my example, it will print out the top cpu hogs on your mac!
Also, note that only mapreduce job logs are viewable through the mapred historyserver web. Use the cmd line to view
your yarn logs based on the application id per example below.
$ yarn logs -applicationId application_1412661426311_0001
Container: container_1412661426311_0001_01_000002 on jakuza-mbp_54669
=======================================================================
LogType: stderr
LogLength: 0
Log Contents:
LogType: stdout
LogLength: 6724
Log Contents:
PID STAT %CPU TIME COMMAND
2703 S+ 25.0 0:01.76 /Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/
159 Ss 14.4 1:42.13 /Library/StartupItems/SymAutoProtect/
122 Ss 5.1 6:30.38 /System/Library/Frameworks/ApplicationServices
2516 S+ 4.1 0:07.47 /Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/Contents/

SOMNATH / NOVEMBER 21, 2014 / 10:14 AM

HUE install fails with the message


/Users/somnathchoudhuri/software/cloudera/hue-3.6.0-cdh5.1.3> make apps
/Users/somnathchoudhuri/software/cloudera/hue-3.6.0-cdh5.1.3/Makefile.vars:42: *** Error: must have python
development packages for 2.6 or 2.7. Could not find Python.h. Please install python2.6-devel or python2.7-devel. Stop.
We have tried uninstalling and installing Python using brew and also installing gcc. Nothing seems to work. Other then
Hue everything else starts up ok (datanode, namenode, resourcemanager, nodemanager, proxyserver and history
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

17/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

server).
MIKE B / NOVEMBER 29, 2014 / 11:01 AM

@Somnath:
I had the same problem, and I took a look in Makefile.vars, and it is looking for the python libs in /usr/include/python2.7
I ran the following, and it seems to work:
sudo mkdir /usr/include
sudo ln -s /usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/include/python2.7
/usr/include/python2.7
(You might have to adjust these slightly if youre using a different version of Python.)
Hope this helps.
IS IT POSSIBLE TO INSTALL CLOUDERA MANAGER AGENT ON MAC / DECEMBER 11, 2014 / 12:06 PM

Awesome post. I successfully configured it on my mac. One question is whether its possible to install cloudera manager
agent on mac. I have some old linux machines and Ive configured them through cloudera manager. I want to add my
mac to the cluster managed by cm. Thanks a lot.
SOON / DECEMBER 12, 2014 / 4:30 PM

Thanks, this was really helpful. I was able to install it successfully on my mac with no problems. Is it possible to run
HiveServer2?
JORDAN HAMBLETON / DECEMBER 29, 2014 / 1:13 PM

Soon, HiveServer2 requires configuring your hive client config property hive.metastore.uris in $HIVE_HOME/conf/hivesite.xml as below (a copy can be found from mentioned links).
hive.metastore.uris
thrift://localhost:9083
IP address (or fully-qualified domain name) and port of the metastore host
To start the metastore & hiveserver2, use the following commands:

http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

18/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

> hive service metastore


> hive service hiveserver2
Connect using beeline and query your tables. Be sure hdfs, yarn, and mysql (if in use) are running prior to any queries
that result in MapReduce jobs.
> beeline -u jdbc:hive2://localhost:10000
If you see the below expected error, you can login using a different user using -n in the beeline command that has
access to the hdfs data youre querying.
* RuntimeException org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous

Leave a comment
Name

REQUIRED

Email

REQUIRED

(WILL NOT BE PUBLISHED)

Website

Comment

http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

19/20

2/19/2015

How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog

Leave Comment

Prove you're human! *


6 + three =

Products

Solutions

Partners

About

Cloudera Enterprise

Enterprise Solutions

Resource Library

Hadoop & Big Data

Cloudera Express

Partner Solutions

Support

Management Team

Cloudera Manager

Industry Solutions

English
Follow us:

Board

CDH

Events

All Downloads

Press Center

Professional Services

Careers

Training

Contact Us

Share:

Subscription Center

Cloudera, Inc.

www.cloudera.com

2014 Cloudera, Inc. All rights reserved Terms & Conditions Privacy Policy

1001 Page Mill Road Bldg 2

US: 1-888-789-1488

Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

Palo Alto, CA 94304

Intl: 1-650-362-0488

http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/

20/20

You might also like