You are on page 1of 4

posted Monday, June 17, 2013 10:29:52 AM

(Mods if you think this should be in the Hadoop forum, or both, please move)

I wouldn't say it was hard per se, but Hadoop has a lot of funky concepts.

For preparing, "24 hadoop interview questions" on the net is a great starting point. I then went
through Tom White's Hadoop the Definitive Guide 3rd Ed, chapters 2-8 completely, typing out
full notes and reading them constantly. For the other parts of the ecosystem, you need a high
level overview with the basics of how they work and what they can do that Hadoop can't.

On my laptop, played with local-mode in Eclipse plus pseudo-mode with jar files. Tried lots of
variations with a simple wordcount program:
Zero reducers, identity reducer, identity for both, combiners added
Logging and counter techniques
Made a chart of the 5-6 ways to set a property value and which took priority
List of the default conditions for things you don't specify (Hadoop can run given just IO paths).
Made a list of HDFS commands and kept playing around with them

Studied mock questions on the net, some had wrong answers, so be careful. Was going to
purchase from HadoopExam.com, but they required submitting my MS Product Key, and I
wasn't willing to do that. A shame, they advertised 200 questions for $45.

Took me a couple months of hardcore studying until it all became clear. The book took the most
time, reading about 270 pages twice and doing notes.

In the future, I plan to also do the Cloudera certs for HBase, Hadoop admin, and Data Scientist.
Will also take the Coursera Scala class with comes with a signed certificate. And maybe check
into Apache Cassandra (no cert yet from what I could tell).

Big Data is fun.


I am in the process of taking the Coursera class called "Introduction to Data Science". We got to practive
hadoop on Amazon's platform.
You would probably like this class if it it comes around again.
Course Page


Hi All,

I have also cleared the Hadoop Developer as well as Administrator exam in first attempt and cloudera
exam should not be taken light, because of its cost as well as level of the questions. Please find below
whatever I had followed to prepare the exam. However, I want to clarify here that
www.HadoopExam.com do not ask any Microsoft (MS) keys for simulator. Once you install the trial
version of their simulator on your computer, their software generate some unique key, which they need
it to identify uniqueness of the machine to avoid any theft of the software and I don't see anything
wrong with that.
I passed both CCAH & CCDH and here is what I did in that order.
1)www.HadoopExam.com Training Videos and Certification Simulator
Their Videos are very well designed for the core understanding of Hadoop Architecture. The lectures are
very precise yet comprehensive, which cuts learning time significantly, besides providing a definite edge
in certification exam and job interviews. Trainer did a great job in delivering the essential material in
such a concise and effective manner that gives the learner very good foundation in Hadoop framework.
2)http://developer.yahoo.com/hadoop/tutorial/index.html
3)http://hadoop.apache.org/docs/current/
4)Hadoop Definitive Guide_Tom White
5)Hadoop Operations_Eric Sammer
6) Practice : Implemented Simple word count example small program to get hands on.
Thanks
There is another course on Coursera named Web Intelligence and Big Data. Congratulations

1. Five (sometimes six) ways to set configuration values (from highest priority to least):
2.
3. 1. In Java code with custom setter (which only some properties have)
4. job.setNumReduceTasks(25)
5. Getters also available
6. System.out.println("numReducers: "+job.getNumReduceTasks());
7.
8. 2. In Java code with conf class before job creation
9. conf.set("mapred.userlog.retain.hours", "23");
10. conf.set("mapred.reduce.tasks", "100");
11. Job job = new Job(conf, "movie count");
12.
13. 3. Set property on command line
14. hadoop jar hadoop-student.jar actorlist -D mapred.userlog.retain.hours=22
15. /user/hadoop/veggie/input /user/hadoop/veggie/output
16. -D mapred.reduce.tasks=6
17. Repeat entry for multiple properties
18. -D mapred.heartbeats.in.second=80 -D mapred.map.max.attempts=2
19. Put value in single quotes if multiple words
20. -D mapred.job.name='My Job'
21.
22. 4. Load custom configuration file on command line
23. hadoop jar hadoop-student-homework-1.0.0-SNAPSHOT.jar actorlist
24. -conf /Users/john/custom.xml /user/hadoop/veggie/input
25. /user/hadoop/veggie/output
26.
27. 5. Set environment variable to a dir which contains the three site-
specific configuration files
28. export HADOOP_CONF_DIR=/Users/john/psuedo-conf
29.
30. 6. Change the "read-only" default configuration files under src/
31. Bad idea because it affects all users, but possible (may get you walked out the door)

You might also like