How-To - Install CDH On Mac OSX 10

2/19/2015
How-to: Install CDH on Mac OSX 10.9 Mavericks | Cloudera Engineering Blog
Support
Developers
Contact Us
Downloads
Search
COMMUNITY
Hadoop & Big Data
FAQs
Blog
Accumulo (1)
Avro (16)
Bigtop (6)
Books (11)
Careers (14)
CDH (150)
DOWNLOADS
TRAINING
BLOGS
How-to: Install CDH on Mac OSX 10.9 Mavericks

by Jordan Hambleton
Our Customers
DOCUMENTATION
September 16, 2014
11 comments
This overview will cover the basic tarball setup for your Mac.
If youre an engineer building applications on CDH and becoming familiar with all the rich features for designing the
next big solution, it becomes essential to have a native Mac OSX install. Sure, you may argue that your MBP with its
four-core, hyper-threaded i7, SSD, 16GB of DDR3 memory are sufficient for spinning up a VM, and in most instances
such as using a VM for a quick demo youre right. However, when experimenting with a slightly heavier workload
that is a bit more resource intensive, youll want to explore a native install.
In this post, I will cover setup of a few basic dependencies and the necessities to run HDFS, MapReduce with YARN,
Apache ZooKeeper, and Apache HBase. It should be used as a guideline to get your local CDH box setup with the
objective to enable you with building and running applications on the Apache Hadoop stack.
Note: This process is not supported and thus you should be comfortable as a self-supporting sysadmin. With that in
mind, the configurations throughout this guideline are suggested for your default bash shell environment that can be
set in your ~/.profile.
Dependencies
Cloud (18)
Install the Java version that is supported for the CDH version you are installing. In my case for CDH 5.1, Ive installed
JDK 1.7 u67.Historically the JDK for Mac OSX was only available from Apple, but since JDK 1.7, its available directly
through Oracles Java downloads. Download the .dmg (in the example below, jdk-7u67-macosx-x64.dmg) and
Cloudera Labs (3)
install it.
Cloudera Life (6)
Verify and configure the installation:
Cloudera Manager (72)
Old Java path: /System/Library/Frameworks/JavaVM.framework/Home

New Java path: /Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/Contents/Home
http://blog.cloudera.com/blog/2014/09/how-to-install-cdh-on-mac-osx-10-9-mavericks/
1/20
2/19/2015
Community (206)
Data Ingestion (20)
Data Science (33)
Events (45)
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/Contents/Home"
Note:Youll notice that after installing the Oracle JDK, the original path used to manage versioning
/System/Library/Frameworks/JavaVM.framework/Versions, will not be updated and you now have the
control to manage your versions independently.
Enable ssh on your mac by turning on remote login. You can find this option under your toolbars Appleicon > System
Preferences > Sharing.
Flume (21)
1. Check the box for Remote Login to enable the service.
General (334)
2. Allow access for: Only these users: Administrators

Note:In this same window, you can modify your computers hostname.
Graph Processing (3)

Guest (97)
Hadoop (332)
Hardware (6)
HBase (133)
HDFS (51)
Hive (71)
How-to (75)
Hue (33)
Impala (83)
Kafka (6)
Kite SDK (16)
2/20
2/19/2015
Mahout (5)
Enable password-less ssh login to localhost for MRv1 and HBase.

1. Open your terminal.
MapReduce (73)
2. Generate an rsaor dsakey.

1. ssh-keygen -t rsa -P ""
Meet The Engineer (22)

Oozie (26)
2. Continue through the key generator prompts (use default options).

3. Test: ssh localhost
Ops And DevOps (22)
Homebrew
Parquet (12)
Another toolkit I admire is Homebrew, a package manager for OSX. While Xcode developer command-line tools are
great, the savvy naming conventions and ease of use of Homebrew get the job done in a fun way.
Pig (36)
I havent needed Homebrew for much else than for installing dependencies required for building native Snappy
libraries for Mac OSX and ease of install of MySQL for Hive. Snappy is commonly used within HBase, HDFS, and
MapReduce for compression and decompression.
Project Rhino (5)
CDH
QuickStart VM (5)
Finally, the easy part: The CDH tarballs are very nicely packaged and easily downloadable from Clouderas repository.
Ive downloaded tarballs for CDH 5.1.0.
Search (23)
Download and explode the tarballs in a libdirectory where you can manage latest versions with a simple symlink as
Security (29)
the following.Although Mac OSXs Make Alias feature is bi-directional, do not use it, but instead use your commandline ln -s command, such as ln -s source_file target_file.
Performance (12)
Sentry (1)
/Users/jordanh/cloudera/
cdh5.1/
Spark (32)
Sqoop (24)
Support (5)
Testing (8)
This Month In The
Ecosystem (15)
hadoop -> /Users/jordanh/cloudera/lib/hadoop-2.3.0-cdh5.1.0

hbase -> /Users/jordanh/cloudera/lib/hbase-0.98.1-cdh5.1.0
hive -> /Users/jordanh/cloudera/lib/hive-0.12.0-cdh5.1.0
zookeeper -> /Users/jordanh/cloudera/lib/zookeeper-3.4.5-cdh4.7.0
ops/
dn
logs/hadoop, logs/hbase, logs/yarn
3/20
2/19/2015
Tools (9)
Training (45)
Use Case (67)
nn/
pids
tmp/
zk/
Youll notice above that youve created a handful of directories under a folder named ops. Youll use them later to
Whirr (6)
customize the configuration of the essential components for running Hadoop. Set your environment properties
according to the paths where youve exploded your tarballs.
YARN (15)
ZooKeeper (24)
Archives by Month
Shell
~/.profile
CDH="cdh5.1"
export HADOOP_HOME="/Users/jordanh/cloudera/${CDH}/hadoop"
export HBASE_HOME="/Users/jordanh/cloudera/${CDH}/hbase"
export HIVE_HOME="/Users/jordanh/cloudera/${CDH}/hive"
export HCAT_HOME="/Users/jordanh/cloudera/${CDH}/hive/hcatalog"
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZK_HOME}/bin:${HBASE_HOME
Update your main Hadoop configuration files, as shown in the sample files below. You can also download all files
referenced in this post directly from here.
$HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
<description>The name of the default file system.A URI whose
scheme and authority determine the FileSystem implementation.The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/jordanh/cloudera/ops/tmp/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.compression.codecs</name>
XHTML
4/20
2/19/2015
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.ha
<description>A comma-separated list of the compression codec classes that can
be used for compression/decompression. In addition to any classes specified
with this property (which take precedence), codec classes on the classpath
are discovered using a Java ServiceLoader.</description>
</property>
</configuration>
$
XHTML
H
A
<D
nO
aO
mP
e_
>H
dO
fM
sE
./
ne
at
mc
e/
nh
oa
dd
eo
.o
np
a/
mh
ed
.f
ds
irs
<i
/t
ne
a.
mx
em
>l
<value>/Users/jordanh/cloudera/ops/nn</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage).If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/Users/jordanh/cloudera/ops/dn/</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks.If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>localhost:50075</value>
<description>
The datanode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
5/20
2/19/2015
I attribute the YARN and MRv2 configuration and setup from the CDH 5 installation docs. I will not digress into the
specifications of each property or the orchestration and details of how YARN and MRv2 operate, but theres some
great information that my colleague Sandy has already shared for developers and admins.
Be sure to make the necessary adjustments per your systems memory and CPU constraints. Per the image below, it
is easy to see how these parameters will affect your machines performance when you execute jobs.
Next, edit the following files as shown.

$HADOOP_HOME/etc/hadoop/yarn-site.xml
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
<description>Number of CPU cores that can be allocated
for containers.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
<description>The minimum allocation for every container request at the RM,
in MBs. Memory requests lower than this won't take effect,
and the specified value will get allocated at minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
XHTML
6/20
2/19/2015
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<description>The maximum allocation for every container request at the RM,
in MBs. Memory requests higher than this won't take effect,
and will get capped to this value.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM,
in terms of virtual CPU cores. Requests lower than this won't take effect,
and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
<description>The maximum allocation for every container request at the RM,
in terms of virtual CPU cores. Requests higher than this won't take effect,
and will get capped to this value.</description>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/mapred-site.xml
</property>
<property>
<name>mapreduce.reduce.cpu.vcores</name>
<value>1</value>
<description>
The number of virtual cores required for each reduce task.
</description>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
<description>Larger resource limit for maps.</description>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
<description>Larger resource limit for reduces.</description>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of maps.</description>
XHTML
7/20
2/19/2015
<description>Heap-size for child jvms of maps.</description>

</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx768m</value>
<description>Heap-size for child jvms of reduces.</description>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>1024</value>
<description>The amount of memory the MR AppMaster needs.</description>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/hadoop-env.sh (indicated properties only)
Shell
# Where log files are stored.$HADOOP_HOME/logs by default.

export HADOOP_LOG_DIR="/Users/jordanh/cloudera/ops/logs/hadoop"
export YARN_LOG_DIR="/Users/jordanh/cloudera/ops/logs/yarn"
# The directory where pid files are stored when processes run as daemons. /tmp by default.
export HADOOP_PID_DIR="/Users/jordanh/cloudera/ops/pids"
export YARN_PID_DIR=${HADOOP_PID_DIR}
You can configure HBase to run without separately downloading Apache ZooKeeper. Rather, it has a bundled package
that you can easily run as a separate instance or as standalone mode in a single JVM. I recommend using either
distributed or standalone mode instead of a separately downloaded ZooKeeper tarball on your machine for ease of
use, configuration, and management.
The primary difference with configuration between running HBase in distributed or standalone mode is with the
hbase.cluster.distributedproperty in hbase-site.xml. Set the property to false for launching HBase in
standalone mode or true to spin up separate instances for services such as HBases ZooKeeper and RegionServer.
Update the following configurations for HBase as specified to run it per this type of configuration.
Note regarding hbase-site.xml: Property hbase.cluster.distributedis set to false by default and will launch
in standalone mode. Also, hbase.zookeeper.quorumis set to localhost by default and does not need to be
overridden in our scenario.
$HBASE_HOME/conf/hbase-site.xml
false, startup will run all HBase and ZooKeeper daemons together
in the one JVM.
</description>
</property>
XHTML
8/20
2/19/2015
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/Users/jordanh/cloudera/ops/tmp/hbase-${user.name}</value>
<description>Temporary directory on the local filesystem.
Change this setting to point to a location more permanent
than '/tmp' (The '/tmp' directory is often cleared on
machine restart).
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/Users/jordanh/cloudera/ops/zk</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
<description>The directory shared by region servers and into
which HBase persists.The URL should be 'fully-qualified'
to include the filesystem scheme.For example, to specify the
HDFS directory '/hbase' where the HDFS instance's namenode is
running at namenode.example.org on port 9000, set this value to:
hdfs://namenode.example.org:9000/hbase.By default HBase writes
into /tmp.Change this configuration else all data will be lost
on machine restart.
</description>
</property>
</configuration>
Note regarding $HBASE_HOME/conf/hbase-env.sh: By default HBASE_MANAGES_ZKis set as true and is listed

below only for explicit definition.
$HBASE_HOME/conf/hbase-env.sh
Shell
# Where log files are stored.$HBASE_HOME/logs by default.

# Where log files are stored.$HBASE_HOME/logs by default.
export HBASE_LOG_DIR="/Users/jordanh/cloudera/ops/logs/hbase"
# The directory where pid files are stored. /tmp by default.

export HBASE_PID_DIR="/Users/jordanh/cloudera/ops/pids"
# Tell HBase whether it should manage its own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=true
9/20
2/19/2015
Pulling it All Together

By now, you should have accomplished setting up HDFS, YARN, and HBase. Hadoop setup and configuration is quite
tedious, much less managing it over time (thus Cloudera Manager, which is unfortunately not available for Macs).
These are the bare essentials for getting your local machine ready for running MapReduce jobs and building
applications on HBase. In the next few steps, we will start/stop the services and provide examples to ensure each
service is operating correctly. The steps are listed in the specific order for initialization in order to adhere to
dependencies. The order could be reversed for halting the services.
Service HDFS
NameNode
format: hdfs namenode -format
start: hdfs namenode
stop: Ctrl-C
url: http://localhost:50070/dfshealth.html
DataNode
start: hdfs datanode
stop: Ctrl-C
url: http://localhost:50075/browseDirectory.jsp?dir=%2F&nnaddr=127.0.0.1:8020
Test
hadoop fs -mkdir /tmp
hadoop fs -put /path/to/local/file.txt /tmp/
hadoop fs -cat /tmp/file.txt
Service YARN
ResourceManager
10/20
2/19/2015
start: yarn resourcemanager

stop: Ctrl-C
url: http://localhost:8088/cluster
NodeManager
start: yarn nodemanager
stop: Ctrl-C
url: http://localhost:8042/node
MapReduce Job History Server
start: mapred historyserver, mr-jobhistory-daemon.sh start historyserver
stop: Ctrl-C, mr-jobhistory-daemon.sh stop historyserver
url: http://localhost:19888/jobhistory/app
Test Vanilla YARN Application
Shell
hadoop jar $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.3.0
Test MRv2 YARN TestDFSIO

Shell
hadoop org.apache.hadoop.fs.TestDFSIO -write -nrFiles 5 -size 1GB
hadoop org.apache.hadoop.fs.TestDFSIO -read -nrFiles 5 -size 1GB
Test MRv2 YARN Terasort/Teragen

Shell
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0-cdh5.1.0.jar
Test MRv2 YARN Pi

Shell
11/20
2/19/2015
Service HBase
HBase Master/RegionServer/ZooKeeper
start: start-hbase.sh
stop: stop-hbase.sh
logs: /Users/jordanh/cloudera/ops/logs/hbase/
url: http://localhost:60010/master-status
Test
Shell
hbase shell
create 'URL_HITS', {NAME=>'HOURLY'},{NAME=>'DAILY'},{NAME=>'YEARLY'}
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'HOURLY:2014090110', '10'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'DAILY:20140901', '10012'
put 'URL_HITS', 'com.cloudera.blog.osx.localinstall', 'YEARLY:2014', '93310101'
scan 'URL_HITS'
Kite SDK Test

Get familiar with the Kite SDK by trying out this example that loads data to both HDFS and then HBase. Note that
there are a few common issues on your OSX that may surface when running through the Kite SDK example. They can
be easily resolved with additional setup/config as specified below.
Problem: NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/NoSuchObjectException
Resolution: Fix your classpath by making sure to set HIVE_HOME and HCAT_HOMEin your environment.
Shell
export HIVE_HOME="/Users/jordanh/cloudera/${CDH}/hive"
export HCAT_HOME="/Users/jordanh/cloudera/${CDH}/hive/hcatalog"
Problem: InvocationTargetExceptionCaused by: java.lang.UnsatisfiedLinkError: no snappyjava

12/20
2/19/2015
in java.library.path
Resolution: Snappy libraries are not compiled for Mac OSX out of the box. A Snappy Java port was introduced in
CDH 5 and likely will require to be recompiled on your machine.
Shell
git clone https://github.com/xerial/snappy-java.git
cd snappy-java
make
cp target/snappy-java-1.1.1.3.jar $HADOOP_HOME/share/hadoop/common/lib/asnappy-java-1.1.1.3.jar
Landing Page
Creating a landing page will help consolidate all the HTTP addresses of the services that youre running. Please note
that localhost can be replaced with your local hostname (such as jakuza-mbp.local).
Service Apache HTTPD
start: sudo -s launchctl load -w /System/Library/LaunchDaemons/org.apache.httpd.plist
stop: sudo -s launchctl unload -w /System/Library/LaunchDaemons/org.apache.httpd.plist
logs: /var/log/apache2/
url: http://localhost/index.html
Create index.html (edit /Library/WebServer/Documents/index.html, which you can download here).
It will look something like this:
13/20
2/19/2015
Conclusion
With this guide, you should have a locally running Hadoop cluster with HDFS, MapReduce, and HBase. These are the
core components for Hadoop, and are good initial foundation for building and prototyping your applications locally.
I hope this will be a good starting point on your dev box to try out more ways to build your products, whether they are
data pipelines, analytics, machine learning, search and exploration, or more, on the Hadoop stack.
Jordan Hambleton is a Solutions Architect at Cloudera.
14/20
2/19/2015
Filed under:
CDH
General
11 Responses
KRIS / SEPTEMBER 16, 2014 / 11:41 PM
Hi Justin,
Might be valuable for reader to point the to 2 blogposts describing a local install and configuring it to run in local and
pseudo distributed mode. The second blogpost describes the way its automated with the help of ansible even.
Thanks in advance
http://blog.godatadriven.com/local-and-pseudo-distributed-cdh5-hadoop-on-your-laptop.html
http://blog.godatadriven.com/automated-cdh5-hadoop-on-your-laptop-with-ansible.html
Kris
CHEN, JIANZHONG / SEPTEMBER 17, 2014 / 6:31 AM
really cool stuff!!

STEPHEN BOESCH / SEPTEMBER 18, 2014 / 6:02 PM
Hi, please add instructions for hive with mysql as the metastore. Hive is an essential ingredient of an Hadoop Ecosytem
more so than HBase. In any case thanks for putting together what is there so far (Including HBase). thanks.
JORDAN HAMBLETON / SEPTEMBER 25, 2014 / 11:37 AM
Hi Stephen,
Appreciate the comment. If youve followed the steps above, hive will work out of the box using its embedded metastore
using derby. Be sure to use the same local directory for all of the instances you use hive shell in order to use the same
metastore created.
15/20
2/19/2015
In addition to the local metastore, you can install mysql via brew. Follow the config & setup from the link below. Ive
listed a few tips below.
https://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/5.0/CDH5-InstallationGuide/cdh5ig_hive_metastore_configure.html
1. Install Mysql & Setup (mac conversions)
1.1. brew install mysql
1.2. follow instructions for mysql config from above CDH5 install link.
1.3. get mysql connector (ie. mysql-connector-java-5.1.16-bin.jar) and copy it to $HIVE_HOME/lib/
quick tips
* start mysql: mysql.server start
* stop mysql: mysql.server stop
Lastly, update hive-site.xml as specified on CDH5 install link above or find a copy on my github:
https://github.com/joropolis/misc-data/blob/master/blog-2014-09-01/hive-site.xml
Launch hive shell & if you have any issues, try running in debug mode.
hive -hiveconf hive.root.logger=DEBUG,console
If you see an error like the following, ensure the mysql connector jar is in $HIVE_HOME/lib/.
* The specified datastore driver (com.mysql.jdbc.Driver) was not found in the CLASSPATH
DEBASISH / OCTOBER 04, 2014 / 11:21 AM
Are you sure it works ?

I am getting this error:
org.apache.hadoop.util.Shell$ExitCodeException:
I saw on other posts that yarn.application.classpath has to be set to fix this error but I could not make that work yet as
well
16/20
2/19/2015
JORDAN HAMBLETON / OCTOBER 06, 2014 / 11:19 PM
Thanks for the note Debasish. Yes, this is working without additional configuration. Did you check your yarn logs for the
DistributedShell command you executed (see snip below)? In my example, it will print out the top cpu hogs on your mac!
Also, note that only mapreduce job logs are viewable through the mapred historyserver web. Use the cmd line to view
your yarn logs based on the application id per example below.
$ yarn logs -applicationId application_1412661426311_0001
Container: container_1412661426311_0001_01_000002 on jakuza-mbp_54669
=======================================================================
LogType: stderr
LogLength: 0
Log Contents:
LogType: stdout
LogLength: 6724
Log Contents:
PID STAT %CPU TIME COMMAND
2703 S+ 25.0 0:01.76 /Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/
159 Ss 14.4 1:42.13 /Library/StartupItems/SymAutoProtect/
122 Ss 5.1 6:30.38 /System/Library/Frameworks/ApplicationServices
2516 S+ 4.1 0:07.47 /Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/Contents/
SOMNATH / NOVEMBER 21, 2014 / 10:14 AM
HUE install fails with the message

/Users/somnathchoudhuri/software/cloudera/hue-3.6.0-cdh5.1.3> make apps
/Users/somnathchoudhuri/software/cloudera/hue-3.6.0-cdh5.1.3/Makefile.vars:42: *** Error: must have python
development packages for 2.6 or 2.7. Could not find Python.h. Please install python2.6-devel or python2.7-devel. Stop.
We have tried uninstalling and installing Python using brew and also installing gcc. Nothing seems to work. Other then
Hue everything else starts up ok (datanode, namenode, resourcemanager, nodemanager, proxyserver and history
17/20
2/19/2015
server).
MIKE B / NOVEMBER 29, 2014 / 11:01 AM
@Somnath:
I had the same problem, and I took a look in Makefile.vars, and it is looking for the python libs in /usr/include/python2.7
I ran the following, and it seems to work:
sudo mkdir /usr/include
sudo ln -s /usr/local/Cellar/python/2.7.8_2/Frameworks/Python.framework/Versions/2.7/include/python2.7
/usr/include/python2.7
(You might have to adjust these slightly if youre using a different version of Python.)
Hope this helps.
IS IT POSSIBLE TO INSTALL CLOUDERA MANAGER AGENT ON MAC / DECEMBER 11, 2014 / 12:06 PM
Awesome post. I successfully configured it on my mac. One question is whether its possible to install cloudera manager
agent on mac. I have some old linux machines and Ive configured them through cloudera manager. I want to add my
mac to the cluster managed by cm. Thanks a lot.
SOON / DECEMBER 12, 2014 / 4:30 PM
Thanks, this was really helpful. I was able to install it successfully on my mac with no problems. Is it possible to run
HiveServer2?
JORDAN HAMBLETON / DECEMBER 29, 2014 / 1:13 PM
Soon, HiveServer2 requires configuring your hive client config property hive.metastore.uris in $HIVE_HOME/conf/hivesite.xml as below (a copy can be found from mentioned links).
hive.metastore.uris
thrift://localhost:9083
IP address (or fully-qualified domain name) and port of the metastore host
To start the metastore & hiveserver2, use the following commands:
18/20
2/19/2015
> hive service metastore

> hive service hiveserver2
Connect using beeline and query your tables. Be sure hdfs, yarn, and mysql (if in use) are running prior to any queries
that result in MapReduce jobs.
> beeline -u jdbc:hive2://localhost:10000
If you see the below expected error, you can login using a different user using -n in the beeline command that has
access to the hdfs data youre querying.
* RuntimeException org.apache.hadoop.security.AccessControlException: Permission denied: user=anonymous
Leave a comment
Name
REQUIRED
Email
REQUIRED
(WILL NOT BE PUBLISHED)
Website
Comment
19/20
2/19/2015
Leave Comment
Prove you're human! *

6 + three =
Products
Solutions
Partners
About
Cloudera Enterprise
Enterprise Solutions
Resource Library
Hadoop & Big Data
Cloudera Express
Partner Solutions
Support
Management Team
Cloudera Manager
Industry Solutions
English
Follow us:
Board
CDH
Events
All Downloads
Press Center
Professional Services
Careers
Training
Contact Us
Share:
Subscription Center
Cloudera, Inc.
www.cloudera.com
2014 Cloudera, Inc. All rights reserved Terms & Conditions Privacy Policy
1001 Page Mill Road Bldg 2
US: 1-888-789-1488
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.
Palo Alto, CA 94304
Intl: 1-650-362-0488
20/20

How-To - Install CDH On Mac OSX 10

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How-To - Install CDH On Mac OSX 10

Uploaded by

Copyright:

Available Formats

2/19/2015

Hadoop & Big Data

How-to: Install CDH on Mac OSX 10.9 Mavericks

September 16, 2014

Cloudera Labs (3)

Cloudera Life (6)

Verify and configure the installation:

Cloudera Manager (72)

Old Java path: /System/Library/Frameworks/JavaVM.framework/Home

1. Check the box for Remote Login to enable the service.

2. Allow access for: Only these users: Administrators

Graph Processing (3)

Enable password-less ssh login to localhost for MRv1 and HBase.

2. Generate an rsaor dsakey.

Meet The Engineer (22)

2. Continue through the key generator prompts (use default options).

Ops And DevOps (22)

Project Rhino (5)

hadoop -> /Users/jordanh/cloudera/lib/hadoop-2.3.0-cdh5.1.0

Next, edit the following files as shown.

<description>Heap-size for child jvms of maps.</description>

$HADOOP_HOME/etc/hadoop/hadoop-env.sh (indicated properties only)

# Where log files are stored.$HADOOP_HOME/logs by default.

Note regarding $HBASE_HOME/conf/hbase-env.sh: By default HBASE_MANAGES_ZKis set as true and is listed

# Where log files are stored.$HBASE_HOME/logs by default.

# The directory where pid files are stored. /tmp by default.

Pulling it All Together

start: yarn resourcemanager

Test MRv2 YARN TestDFSIO

Test MRv2 YARN Terasort/Teragen

Test MRv2 YARN Pi

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0-cdh5.1.0.jar

Kite SDK Test

Problem: InvocationTargetExceptionCaused by: java.lang.UnsatisfiedLinkError: no snappyjava

really cool stuff!!

Are you sure it works ?

JORDAN HAMBLETON / OCTOBER 06, 2014 / 11:19 PM

SOMNATH / NOVEMBER 21, 2014 / 10:14 AM

HUE install fails with the message

> hive service metastore

(WILL NOT BE PUBLISHED)

Prove you're human! *

Hadoop & Big Data

1001 Page Mill Road Bldg 2

Palo Alto, CA 94304

You might also like