You are on page 1of 26

MapR Certified Hadoop Developer

Study Guide
1

CONTENTS
About MapR Study Guides .................................................................................................................................... 3

MapR Certified Hadoop Developer (MCHD) .................................................................................................... 3

SECTION 1 WHATS ON THE EXAM? ........................................................................... 5

1. Describe How MapReduce Programs Work - 12% .................................................................................... 5

2. Job Execution Framework MapReduce - 3% .............................................................................................. 5

3. Write a MapReduce Program - 7% ................................................................................................................. 5

4. Use the MapReduce API - 12% ......................................................................................................................... 6

5. Managing, Monitoring, and Testing MapReduce Jobs - 17% ................................................................. 6

6. Managing MapReduce Job Performance - 16% ......................................................................................... 6

7. Working with Data - 13% .................................................................................................................................. 6

8. Launching Jobs - 10% ......................................................................................................................................... 7

9. Using Non-Java Programs in MapReduce Streaming - 10% .................................................................... 7

Sample Questions ................................................................................................................................................... 8

SECTION 2- PREPARING FOR THE CERTIFICATION ....................................................... 13

Instructor and Virtual Instructor-led Training ............................................................................................... 13

DEV 301 - Developing Hadoop Applications .................................................................................................. 14

Tutorials, Books, Blogs, and Other Resources ............................................................................................... 16

Datasets .................................................................................................................................................................... 17

SECTION 3 - TAKING THE EXAM .................................................................................. 19

Register for the Exam............................................................................................................................................ 19

Reserve a Test Session ......................................................................................................................................... 20

Cancellation & Rescheduling .............................................................................................................................. 21

Test System Compatibility ...................................................................................................................................22

Day of the Exam .................................................................................................................................................... 24

After the Exam - Sharing Your Results ............................................................................................................ 25

Exam Retakes ......................................................................................................................................................... 26

About MapR Study Guides

MapR certification study guides are intended to help you prepare for certification by
providing additional study resources, sample questions, and details about how to take the
exam. The study guide by itself is not enough to prepare you for the exam. Youll need
training, practice, and experience. The study guide will point you in the right direction and
help you get ready.

If you use all the resources in this guide, and spend 6-12 months on your own using the
software, experimenting with the tools, and practicing the role you are certifying for, you
should be well prepared to attempt the exams.

MapR Certified Hadoop Developer (MCHD)

The MapR Certified Hadoop Developer credential is designed for Developers who program
MapReduce in Java. The credential measures the specific technical knowledge, skills, and
abilities required to design, develop, deploy, and manage MapReduce programs in Java.

Exam Cost: $250


Duration: 2 Hours

3
Exam?

1
Whats on the

Section 1 Whats on the Exam?


The MapR Certified Hadoop Developer exam is comprised of 9 exam topic sections and 25
objectives. There are 60-80 questions on the exam. MapR exams are frequently updated and
therefore the number of exam questions change frequently.

MapR tests new questions on the exam in an unscored manner. This means that you may see
test questions on the exam that are not used for scoring your exam. You will not know which
items are scored and which are unscored. Unscored items are being tested for inclusion in
future versions of the exam. They do not affect your results.

MapR exams are Pass or Fail. We do not publish the exam cut score because the passing score
changes frequently based on the scored items that are being used.

1. Describe how MapReduce programs work - 12%

1.1 Describe the MapReduce computational model including input, map, reduce, splits,
outputs, and combiners

1.2 Define how data flows in a MapReduce workflow including details on how data is
loaded, analyzed, stored, and read

2. Job Execution Framework MapReduce - 3%

2.1 Describe how MapReduce jobs are executed and monitored in both MapReduce v.1
and in YARN

3. Write a MapReduce Program - 7%

3.1 Design and implement a Mapper class

3.2 Design and implement a Reducer class

3.3 Design and implement a Driver class

4. Use the MapReduce API - 12%

4.1 Demonstrate how to use the MapReduce API to solve common programming
problems

4.2 Describe how Mapper input and Reducer output work in processing data in
MapReduce

4.3 Demonstrate how to use the Mapper, Reducer, and Job class APIs

5. Managing, Monitoring, and Testing MapReduce Jobs - 17%

5.1 Demonstrate how to use counters to validate jobs and how to write custom counters
for specific tasks

5.2 Demonstrate how to manage and display jobs, history, and logs using the command
line interface

5.3 Demonstrate how to use MRUnit to test Mapper and Reducer class functionality

6. Managing MapReduce Job Performance - 16%

6.1 Demonstrate how to enhance performance of MapReduce using combiners, output


compression, configuring java properties and JVM properties

6.2 Demonstrate MapR specific performance enhancements including ExpressLane and


direct shuffle

6.3 Describe the strategies that can be used to improve MapReduce performance

6.4 Demonstrate how to enhance performance of MapReduce using combiners, output


compression, configuring Java properties, and JVM properties

7. Working with Data - 13%

7.1 Describe the requirements for working with sequence files and compressing sequence
files on a MapR cluster

7.2 Demonstrate how to work with the distributed cache including distribution of jar
files, dynamic information to run a task, and using map-side joins.

7.3 Demonstrate how to work with HBase in MapReduce jobs as source, as a sink, and
both a source and sink in your data flow

7.4 Describe the requirements for working with sequence files and compressing
sequence files on a MapR cluster

8. Launching Jobs - 10%

8.1 Demonstrate how to use ChainMapper and ChainReducer in job chaining

8.2 Demonstrate how use Oozie to manage complex MapReduce workflows

8.3 Demonstrate how to manage multiple jobs in MapReduce within the driver

8.4 Demonstrate how to use ChainMapper and ChainReducer in job chaining

9. Using Non-Java Programs in MapReduce Streaming - 10%

9.1 Define the programming contract for mappers & reducers in MapReduce streaming

9.2 Demonstrate how to use non-Java programs such as Perl and Python to stream
MapReduce jobs

Sample Questions
The following questions represent the kinds of questions you will see on the exam. The
answers to these sample questions can be found in the answer key following the sample
questions.

Q1. Which statement is true about the MapReduce programming paradigm?

A. The MapReduce paradigm exploits parallelism


B. The MapReduce paradigm was invented at Google
C. The MapReduce paradigm was invented in Hadoop
D. The MapReduce paradigm requires synchronization

Q2. What information is included in a heartbeat from task tracker to job tracker?

A. Job status
B. Network errors
C. CPU health
D. Task status

Q3. Which Java statement converts a Text_value parameter to a list of tokens?

A. new StringTokenizer(value, "\\s+");


B. new StringTokenizer(value.toString(),"\\s+");
C. value.toString();
D. new String(value);

Q4. Which Java statement correctly sums the values of an Iterable values input
parameter?

A. for (IntWritable value : values) { sum += value.get(); }


B. for (IntWritable value : values) { sum += value.getNext(); }
C. while (values.peek()) { sum += values.get(); }
D. while (values.hasNext()) { sum += values.getNext(); }

Q5. Which statement is true of custom counters?

A. The task tracker JVM stores custom counters for a job


B. Custom counters are not supported in Hadoop
C. The job tracker JVM stores custom counters for a job
D. You must declare and initialize custom counter variables

Q6. Which Java statement correctly defines a combiner class in a MapReduce driver?

A. job.setCombinerClass(MyCombiner.class);
B. combiner.setClass(MyCombiner.class);
C. job.setCombiner(MyCombiner.class);
D. combiner.set(MyCombiner.class);

Q7. What is the correct signature for the map method of TableMapReduceUtil?

A. private void map(ImmutableBytesWritable, Result, Context)


B. protected int map(ImmutableBytesWritable, Result, Context)
C. protected void map(ImmutableBytesWritable, Result, Context)
D. protected void map(ImmutableBytesWritable, Context)

Q8. Which statement is true of distributing a streaming MapReduce program to


mappers and reducers?

A. Map and reduce programs are distributed to mappers and reducers using the
distributed cache
B. Map and reduce programs are automatically distributed to mappers and
reducers
C. Map and reduce programs may be pre-installed on mappers and reducers in
a directory contained in the HADOOP_PATH environment variable
D. Map and reduce programs may be pre-installed on the mappers and reducers
in the $STREAMING_DIR/bin directory

Q9. Which is a responsibility of the job client?

A. Compute input splits


B. Instantiate the RecordReader
C. Compute the number of spills
D. Instantiate the record writer

Q10. Which statement accurately describes how data flows through a streaming
reducer?
A. Each line in the partition is sent to the reducer one line at a time and then
standard input is closed
B. Every line from the input file is sent at once and then standard input is closed
C. Input keys are separated from values by the newline character
D. Input values are terminated by the tab character

Sample Question Answer Key

Q1. Which statement is true about the MapReduce programming paradigm?

A. *The MapReduce paradigm exploits parallelism


B. The MapReduce paradigm was invented at Google
C. The MapReduce paradigm was invented in Hadoop
D. The MapReduce paradigm requires synchronization

Q2. What information is included in a heartbeat from task tracker to job tracker?

A. Job status
B. Network errors
C. CPU health
D. *Task status

Q3. Which Java statement converts a Text_value parameter to a list of tokens?

A. new StringTokenizer(value, "\\s+");


B. *new StringTokenizer(value.toString(),"\\s+");
C. value.toString();
D. new String(value);

Q4. Which Java statement correctly sums the values of an Iterable values input
parameter?

A. *for (IntWritable value : values) { sum += value.get(); }


B. for (IntWritable value : values) { sum += value.getNext(); }
C. while (values.peek()) { sum += values.get(); }
D. while (values.hasNext()) { sum += values.getNext(); }

5. Which statement is true of custom counters?

A. Custom counters are not supported in Hadoop


B. You must declare and initialize custom counter variables
C. *The job tracker JVM stores custom counters for a job
D. The task tracker JVM stores custom counters for a job

10

Q6. Which Java statement correctly defines a combiner class in a MapReduce driver?

A. *job.setCombinerClass(MyCombiner.class);
B. combiner.setClass(MyCombiner.class);
C. job.setCombiner(MyCombiner.class);
D. combiner.set(MyCombiner.class);

Q7. What is the correct signature for the map method of TableMapReduceUtil?

A. private void map(ImmutableBytesWritable, Result, Context)


B. protected int map(ImmutableBytesWritable, Result, Context)
C. *protected void map(ImmutableBytesWritable, Result, Context)
D. protected void map(ImmutableBytesWritable, Context)

Q.8 Which statement is true of distributing a streaming MapReduce program to


mappers and reducers?

A. *Map and reduce programs are distributed to mappers and reducers using
the distributed cache
B. Map and reduce programs are automatically distributed to mappers and
reducers
C. Map and reduce programs may be pre-installed on mappers and reducers in
a directory contained in the HADOOP_PATH environment variable
D. Map and reduce programs may be pre-installed on the mappers and reducers
in the $STREAMING_DIR/bin directory

Q9. Which is a responsibility of the job client?

A. *Compute input splits


B. Instantiate the RecordReader
C. Compute the number of spills
D. Instantiate the record writer

Q10. Which statement accurately describes how data flows through a streaming
reducer?

A. *Each line in the partition is sent to the reducer one line at a time and then
standard input is closed
B. Every line from the input file is sent at once and then standard input is closed
C. Input keys are separated from values by the newline character
D. Input values are terminated by the tab character

11
2
Preparing for
the Certification

12

Section 2- Preparing for the Certification


MapR provides several ways to prepare for the certification including classroom
training, self-paced online training, videos, webinars, blogs, and ebooks.

MapR offers a number of training courses that will help you prepare. We recommend
taking the classroom training first, followed by self-paced online training, and then
several months of experimentation on your own learning the tools in a real-world
environment.

We also provide additional resources in this guide to support your learning. The blogs,
whiteboard walkthroughs, and ebooks are excellent supporting material in your efforts
to become a Hadoop Developer.

Instructor and Virtual Instructor-led Training


All courses include:
Certified MapR Instructor who is an SME in the topic, and is expert in
classroom facilitation and course delivery techniques
Collaboration and assistance for all students on completion of exercises
Lab exercises, a lab guide, slide guide, and job aids as appropriate
Course cluster for completing labs provided
Certification exam fee included one exam try only, done on the
students own time (not in class)

DEV 3000 Developing Hadoop Applications


Duration: 3 days
Cost: $2400
learn.mapr.com

Course Description:
This course teaches developers how to write Hadoop Applications using MapReduce
and YARN in Java. The course covers debugging, managing jobs, improving
performance, working with custom data, managing workflows, and using other
programming languages for MapReduce.

13

Prerequisites for Success in this Course


Beginner-to-intermediate fluency with Java or object-oriented programming in
an IDE
Basic Hadoop knowledge helpful but not required
A Linux, PC, or Mac with a MapR Sandbox downloaded (on-demand course)
connected to a Hadoop cluster via SSH and web browser (for ILT or vILT course)

Self-paced Training
MapR self-paced courses are based on the instructor-led courses. We recommend that
everyone considering the certification take advantage of the free online training in
conjunction with the instructor-led training.

DEV 301 - Developing Hadoop Applications


Duration: about 5 hours + labs
Cost: FREE!
learn.mapr.com

Course Description:
This course teaches developers how to write Hadoop applications using MapReduce and
YARN in Java. The course covers debugging, managing jobs, improving performance,
working with custom data, managing workflows, and using other programming
languages for MapReduce.

Lesson 1: Introduction to Developing Hadoop Applications


Illustrate the MapReduce model conceptually
Brief history of MapReduce
Discuss how MapReduce works at a high level
Define how data flows in MapReduce

Lesson 2: Job Execution Framework MapReduce v1 & v2


Describe the MapReduce v1 job execution framework
Compare MapReduce v1 to MapReduce v2 (YARN)
Describe how jobs execute in YARN
Describe how to manage jobs in YARN

Lesson 3: Write a MapReduce Program


Summary of the programming problem
Design and implement the Mapper class, Reducer class, and driver
Build and execute the code then examine the output

Lesson 4: Use the MapReduce API


API overview
Mapper input processing and Reducer output processing data flow
Explore the Mapper, Reducer, and Job class API

14

Lesson 5: Managing, Monitoring, and Testing MapReduce Jobs


Work with counters
Use the MCS to monitor jobs
Use the Hadoop CLI to manage jobs
Display job history and logs
Write unit tests for MapReduce programs

Lesson 6: Managing Performance


Review components of MapReduce performance
Enhance performance in your MapReduce jobs
Overview of MapR performance enhancements

Lesson 7: Working with Data


Work with sequence files
Working with the distributed cache
Working with HBase

Lesson 8: Launching Jobs


Implement programmatic job control in the driver
Use MapReduce chaining
Use Oozie to manage MapReduce workflows

Lesson 9: Using Non-Java Programs (Streaming MapReduce)


Overview of the MapReduce streaming paradigm
Configure MapReduce streaming parameters
Define the programming contract for mappers and reducers
Monitor and debug MapReduce streaming jobs
Hands-on exercises

15

Tutorials, Books, Blogs, and Other Resources


We recommend these resources to help you prepare for the MapR Certified Hadoop
Developer exam.

1. Real-World Hadoop ebook Ted Dunning and Ellen Friedman


https://www.mapr.com/real-world-hadoop

2. MapReduce: Simplified Data Processing on Large Clusters - Jeffrey Dean and Sanjay
Ghemawat
http://research.google.com/archive/mapreduce.html

3. MapReduce Tutorial apache.org


https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

4. Data-Intensive Text Processing with MapReduce - Jimmy Lin and Chris Dyer
http://lintool.github.io/MapReduceAlgorithms/

5. Hadoop: The Definitive Guide MapReduce for the Cloud Tom White
http://shop.oreilly.com/product/9780596521981.do

6. How To: Launching MapReduce Jobs James Casaletto


https://www.mapr.com/blog/how-to-launching-mapreduce-jobs

7. How To: Using Non-Java Programs or Streaming for MapReduce Jobs James
Casaletto
https://www.mapr.com/blog/how-using-non-java-programs-or-streaming-
mapreduce-jobs

8. How to Write a MapReduce Program James Casaletto


https://www.mapr.com/blog/how-write-mapreduce-program

9. How to Use the MapReduce API James Casaletto


https://www.mapr.com/blog/how-use-mapreduce-api

11. Apache Spark vs. MapReduce Whiteboard Walkthrough


https://www.mapr.com/resources/videos/apache-spark-vs-mapreduce-
whiteboardwalkthrough

16

Datasets
These are some datasets that we recommend for experimenting with.

1. UCI Machine Learning Repository


This site has almost 300 datasets of various types and sizes for tasks including
classification, regression, clustering, and recommender systems.
http://archive.ics.uci.edu/ml/

2. Amazon AWS public datasets


These datasets include the Human Genome Project, the Common Crawl web corpus,
Wikipedia data, and Google Books Ngrams. Information on these datasets can be
found at http://aws.amazon.com/publicdatasets/

3. Kaggle
This site includes a collection of datasets used in machine learning competitions run
by Kaggle. Areas include classification, regression, ranking, recommender systems,
and image analysis. These datasets can be found under the Competitions section at
http://www.kaggle.com/competitions

4. KDnuggets
This site has a detailed list of public datasets, including some of those mentioned
earlier. The list is available at http://www.kdnuggets.com/datasets/index.html

5. SF Open Data
SF OpenData is the central clearinghouse for data published by the City and County
of San Francisco and is part of the broader open data program.
https://data.sfgov.org/data

17
3
Taking the Exam

18

Section 3 - Taking the Exam

MapR Certification exams are delivered online using a service from Innovative Exams. A
human will proctor your exam. Your proctor will have access to your webcam and
desktop during your exam. Once you are logged in for your test session, and your
webcam and desktop are shared, your proctor will launch your exam.

This method allows you to take our exams anytime, and anywhere, but you will need a
quiet environment where you will remain uninterrupted for up to two hours. You will
also need a reliable Internet connection for the entire test session.

There are five steps in taking your exam:


1) Register for the exam
2) Reserve a test session
3) Test your system compatibility
4) Take the exam
5) Get your results

Register for the Exam


MapR exams are available for purchase exclusively at learn.mapr.com. You have six
months to complete your certification after you purchase the exam. After six months
have expired, your exam registration will be canceled. There are no refunds for expired
certification purchases.

1) Sign in to your profile at learn.mapr.com


2) Find the exam in the learn.mapr.com catalog and click Purchase
3) If you have a voucher code, enter it in the Promotion Code field
4) Use a credit card to pay for the exam
You may use a Visa, MasterCard, American Express, or Discover credit card. The
charge will appear as MAPR TECHNOLOGIES on your credit card statement.
5) Look for a confirmation with your Order ID

19

Reserve a Test Session


MapR exams are delivered on a platform called Innovative Exams. When you are ready
to schedule your exam, go back to your profile in learn.mapr.com, click on your exam,
and click the Continue to Innovative Exams link to proceed to scheduling. This will take
you to Examslocal.com.

1) In learn.mapr.com find your exam and click Continue to Innovative Exams


2) Single Sign on will bring you to Innovative Exams
3) Go to My Exams
4) Enter your exam title in the Search field
5) Choose an exam date

6) Choose a time slot at least 24 hours in advance

20

Once confirmed, your reservation will be in your My Exams tab of Innovative Exams

7) Check your email for a reservation confirmation

Cancellation & Rescheduling


Examinees are allowed to cancel or reschedule their exam with 24-hour notice without a
cancellation penalty. If they cancel or reschedule within 24 hours of the scheduled
appointment, the examinee will forfeit the entire cost of the exam and they will need to
pay for it again to reschedule. Examinees must cancel or reschedule their exams more
than 24 hours in advance to receive a full refund and remain eligible to take the exam.

To cancel an exam, the examinee logs into www.examslocal.com and clicks My Exams,
selects the exam to cancel, and then selects the Cancel button to confirm their
cancellation. A cancellation confirmation email will be sent to the examinee following
the cancellation.

21

Test System Compatibility


We recommend that you check your system compatibility several days before your
exam to make sure you are ready to go. Go to
https://www.examslocal.com/ScheduleExam/Home/CompatibilityCheck

These are the system requirements:

1) Mac, Windows, Linux, or Chrome OS


2) Google Chrome or Chromium version 32 and above
3) Your browser must accept third party cookies for the duration of the exam ONLY
4) Install Innovative Exams Google Chrome Extension
5) TCP: port 80 and 443
6) 1GB RAM & 2GHz dual core processor
7) Minimum 1280 x 800 resolution
8) Sufficient bandwidth to share your screen via the Internet

22

23

Day of the Exam


Make sure your Internet connection is strong and stable
Make sure you are in a quiet, well-lit room without distractions
Clear the room - you must be alone when taking your exam
No breaks are allowed during the exam; use the bathroom before you log in
Clear your desk of any materials, notebooks, and mobile devices
Silence your mobile and remove it from your desk
Configure your computer for a single display; multiple displays are not allowed
Close out of all other applications except for Chrome

We recommend that you sign in 30 minutes in advance of your testing time so that you
can communicate with your proctor, and get completely set up well in advance of your
test time.

You will be required to share your desktop and your webcam prior to the exam start.
YOUR EXAM SESSION WILL BE RECORDED. If the Proctor senses any misconduct, your
exam will be paused and you will contacted by the proctor. If your misconduct is not
corrected, the Proctor will shut down your exam, resulting in a Fail.

Examples of misconduct and/or misuse of the exam include, but are not limited to, the
following:

Impersonating another person


Accepting assistance or providing assistance to another person
Disclosure of exam content including, but not limited to, web postings, formal or
informal test preparation or discussion groups, or reconstruction through
memorization or any other method
Possession of unauthorized items during the exam. This includes study
materials, notes, computers and mobile devices.
Use of unauthorized materials (including brain-dump material and/or
unauthorized publication of exam questions with or without answers).
Making notes of any kind during the exam
Removing or attempting to remove exam material (in any format)
Modifying and/or altering the results and/or scoring the report or any other
exam record

MapR Certification exam policies can be viewed at: https://www.mapr.com/mapr-


certification-policies

24

After the Exam - Sharing Your Results

When you pass a MapR Certification exam, you will receive a confirmation email with
the details of your success. This will include the title of your certification and details on
how you can download your digital certificate, and share your certification on social
media.

Your certification will be updated in learn.mapr.com in your profile. From your profile
you can view your certificate and share it on LinkedIn.

25

Your certificate is available as a PDF. You can download and print your certificate from
your profile in learn.mapr.com.

Your credential contains a unique Certificate Number and a URL. You can share your
credential with anyone who needs to verify your certification.

If you happen to fail the exam, you will automatically qualify for a discounted exam
retake voucher. Retakes are $100 USD and can be purchased by contacting
certification@maprtech.com. MapR will verify your eligibility and supply you with a
special 1-time use discount code which you can apply at the time of purchase.

Exam Retakes

If you fail an exam, you are eligible to purchase and retake the exam in 14 days. Once
you have passed the exam, you may not take that version (e.g., v.4.0) of the exam again,
but you may take any newer version of the exam (e.g., v.4.1). A test result found to be in
violation of the retake policy will result in no credit awarded for the test taken. Violators
of these policies may be banned from participation in the MapR Certification Program.

26

You might also like