Professional Documents
Culture Documents
account at examslocal.com using the same email address you used to register with Cloudera.
Once you create an account and log in on examslocal.com, navigate to "Schedule an Exam", and
then enter "Cloudera" in the "Search Here" field. Select the exam you want to schedule and
follow the instructions to schedule your exam.
Where do I take Cloudera certification exams?
Anywhere. All you need is a computer, a webcam, Chrome or Chromium browser, and an
internet connection. For a full set of requirements,
visit https://www.examslocal.com/ScheduleExam/Home/CompatibilityCheck
What if I lose internet connectivity during the exam?
It is the sole responsibility of the test taker to maintain connectivity throughout the exam session.
If connectivity is lost, for any reason, it is the responsibility of the test taker to reconnect and
finish the exam within the scheduled time slot. No refunds or retakes will be given. Unfinished
or abandoned exam sessions will be scored as a fail.
Can I take the exam at a test center?
Cloudera no longer offers exams in test centers or approves the delivery of our exams in test
centers.
Steps to schedule your exam
1. Create an account at www.examslocal.com. You MUST use the exact same email you
used to register on university.cloudera.com.
2. Select the exam you purchased from the drop-down list (type Cloudera to find our
exams).
3. Choose a date and time you would like to take your exam. You must schedule a minimum
of 24 hours in advance.
4. Select a time slot for your exam
5. Pass the compatibility tool and install the screen sharing Chrome Extension
How do I reschedule an Exam Reservation?
If you need to reschedule your exam, please sign in at https://www.examslocal.com, click on
"My Exams", click on your scheduled exam and use the reschedule option. Email Innovative
Exams at examsupport@examslocal.com, or call +1-888-504-9178, +1-312-612-1049 for
additional support.
What is your exam cancellation policy?
If you wish to reschedule your exam, you must contact Innovative Exams at least 24 hours prior
to your scheduled appointment. Rescheduling less than 24 hours prior to your appointment
results in a forfeiture of your exam fees. All exams are non-refundable and non-transferable. All
exam purchases are valid for one year from date of purchase.
One form of government issued photo identification (i.e. driver's license, passport). Any
international passport or government issued form of identification must contain Western
(English) characters. You will be required to provide a means of photo identification
before the exam can be launched. If acceptable proof of identification is not provided to
the proctor prior to the exam, you will be refused entry to the exam. You must also
consent to having your photo taken. The ID will be used for identity verification only and
will not be stored. The proctor cannot release the exam to you until identification has
been successfully verified and you have agreed to the terms and conditions of the exam.
No refund or rescheduling is provided when an exam cannot be started due to failure to
provide proper identification.
You must login to take the exam on a computer that meets the minimum requirements
provided within the compatibility
check: https://www.examslocal.com/ScheduleExam/Home/CompatibilityCheck
as you want until you pass, however, you must pay for each attempt; Cloudera offers no
discounts for retake exams. Retakes are not allowed after the successful completion of a test.
Does my certification expire?
CCA certifications are valid for two years. CCP certifications are valid for three years.
CCDH, CCAH, and CCSHB certifications align to a specific CDH release and remains valid for
that version. Once that CDH version retires or the certification or exam retires, your certification
retires.
Are there prerequisites? Do I need to take training to take a certification test?
There are no prerequisites. Anyone can take a Cloudera Certification Test at anytime.
I passed, but I'd like to take the test again to improve my score. Can I do that?
Retakes are not allowed after the successful completion of a test. A test result found to be in
violation of the retake policy will not be processed, which will result in no credit awarded for the
test taken. Repeat violators will be banned from participation in the Cloudera Certification
Program.
Can I review my test or specific test questions and answers?
Cloudera certification tests adhere to the industry standard for high-stakes certification tests,
which includes the protection of all test content. As a certifying body, we go to great lengths to
protect the integrity of the items in our item pool. Cloudera does not provide exam items in any
other format than a proctored environment.
What is the confidentiality agreement I must agree to in order to test?All content,
specifically questions, answers, and exhibits of the certification exams are the proprietary and
confidential property of Cloudera. They may not be copied, reproduced, modified, published,
uploaded, posted, transmitted, shared, or distributed in any way without the express written
authorization of Cloudera. Candidates who sit for Cloudera exams must agree they have read and
will abide by the terms and conditions of the Cloudera Certifications and Confidentiality
Agreement before beginning the certification exam. The agreement applies to all exams.
Agreeing and adhering to this agreement is required to be officially certified and to maintain
valid certification. Candidates must first accept the terms and conditions of the Cloudera
Certification and Confidentiality Agreement prior to testing. Failure to accept the terms of this
Agreement will result in a terminated exam and forfeiture of the entire exam fee.
If Cloudera determines, in its sole discretion, that a candidate has shared any content of an exam
and is in violation of the Cloudera Certifications and Confidentiality Agreement, it reserves the
right to take action up to and including, but not limited to, decertification of an individual and a
permanent ban of the individual from Cloudera Certification programs, revocation of all previous
Cloudera Certifications, notification to the candidate's employer, and notification to law
enforcement agencies. Candidates found in violation of the Cloudera Certifications and
Confidentiality Agreement forfeit all fees previously paid to Cloudera or to Cloudera's
authorized vendors and may be required to pay additional fees for services rendered.
One form of government issued photo identification (i.e. driver's license, passport). Any
international passport or government issued form of identification must contain Western
(English) characters. You will be required to provide a means of photo identification
before the exam can be launched. If acceptable proof of identification is not provided to
the proctor prior to the exam, you will be refused entry to the exam. You must also
consent to having your photo taken. The ID will be used for identity verification only and
will not be stored. The proctor cannot release the exam to you until identification has
been successfully verified and you have agreed to the terms and conditions of the exam.
No refund or rescheduling is provided when an exam cannot be started due to failure to
provide proper identification.
You must login to take the exam on a computer that meets the minimum requirements
provided within the compatibility
check: https://www.examslocal.com/ScheduleExam/Home/CompatibilityCheck
Helpful Tips:
The username for the primary account in cloudera, and the password for that account is
cloudera.
The cloudera user has permission to run the sudo command, so separate root account
credentials are not needed.
To open a terminal window, right-click on the desktop (not in the browser) and select
Open in Terminal or click on the Applications menu at the top of the desktop and select
System Tools > Terminal.
To open a file editor, click on the Applications menu at the top of the desktop and select
either Accessories > gedit Text Editor for a simple text editor or Programming > Geany
for a simple IDE environment.
Many commands use the $STREAMING environment variable rather than long paths. The
variable represents the path to the streaming jar file, which is usually located at
/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-*.jar.
In the VM, the $STREAMING environment variable has been automatically set for you.
Identify the commands to manipulate files in the Hadoop File System Shell
Determine which files you must change and how in order to migrate a cluster
from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) running
on YARN
Network Topologies: understand network usage in Hadoop (for both HDFS and
MapReduce) and propose or identify key network design components for a
given scenario
Given a scenario, identify how the cluster will handle disk and machine
failures
Identify the function and purpose of available tools for cluster monitoring
Be able to install all the ecoystme components in CDH 5, including (but not
limited to): Impala, Flume, Oozie, Hue, Cloudera Manager, Sqoop, Hive, and
Pig
Identify the function and purpose of available tools for managing the Apache
Hadoop file system
Disclaimer: These exam preparation pages are intended to provide information about the
objectives covered by each exam, related resources, and recommended reading and courses. The
material contained within these pages is not intended to guarantee a passing score on any exam.
Cloudera recommends that a candidate thoroughly understand the objectives for each exam and
utilize the resources and training courses recommended on these pages to gain a thorough
understand of the domain of knowledge related to the role the exam evaluates.
Promote Your Achievement: Every CCA receives a logo for business cards,
resumes, and online profiles.
Verify Your Achievement: Every CCA certification comes with a license that
allows current and potential employers to validate your CCA status.
Current: The big data space evolves rapidly, no more so than in the Apache
Spark and Hadoop developer space. We upate our CCA exams regularly to
reflect the skills and tools relevant for today and beyond. And because
change is the only constant in open-source environments, Cloudera requires
all CCA credentials holders to stay current with two-year mandatory re-testing
in order to maintain current status and privileges.
Each CCA question requires you to solve a particular scenario. In some cases, a tool such as
Impala or Hive may be used. In other cases, coding is required. In order to speed up development
time of Spark questions, a template is often provided that contains a skeleton of the solution,
asking the candidate to fill in the missing lines with functional code. This template is written in
either Scala or Python.
You are not required to use the template and may solve the scenario using a language you prefer.
Be aware, however, that coding every problem from scratch may take more time than is allocated
for the exam.
Evaluation, Score Reporting, and Certificate
Your exam is graded immediately upon submission and you are e-mailed a score report the same
day as your exam. Your score report displays the problem number for each problem you
attempted and a grade on that problem. If you fail a problem, the score report includes the
criteria you failed (e.g., Records contain incorrect data or Incorrect file format). We do not
report more information in order to protect the exam content. Read more about reviewing exam
content on the FAQ.
If you pass the exam, you receive a second e-mail within a few days of your exam with your
digital certificate as a PDF, your license number, a Linkedin profile update, and a link to
download your CCA logos for use in your personal business collateral and social media profiles
Audience and Prerequisites
There are no prerequisites required to take any Cloudera certification exam. The CCA Spark and
Hadoop Developer exam (CCA175) follows the same objectives as Cloudera Developer Training
for Spark and Hadoop and the training course is an excellent preparation for the exam.
The skills to transfer data between external systems and your cluster. This includes the following:
Change the delimiter and file format of data during import using Sqoop
Ingest real-time and near-real time (NRT) streaming data into HDFS using
Flume
Load data into and out of HDFS using the Hadoop File System (FS) commands
Convert a set of data values in a given format stored in HDFS into new data values and/or a new
data format and write them into HDFS. This includes writing Spark applications in both Scala
and Python (see note above on exam question format for more information on using either Scale
or Python):
Load data from HDFS and store results back to HDFS using Spark
Data Analysis
Use Data Definition Language (DDL) to create tables in the Hive metastore for use by Hive and
Impala.
Create a table in the Hive metastore using the Avro file format and an
external schema file
CCA175 is a remote-proctored exam available anywhere, anytime. See the FAQ for more
information and system requirements.
CCA175 is a hands-on, practical exam using Cloudera technologies. Each user is given their own
CDH5 (currently 5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka,
Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also
comes with Python (2.6, 2.7, and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive
Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and NetBeans.
Documentation Available online during the exam
Scala Documentation
Apache Hive
Apache Pig
Apache Parquet
Cloudera HUE
Apache Oozie
DataFu 1.1.0
Only the documentation, links, and resources listed above are accessible during the exam. All
other websites, including Google/search functionality is disabled. You may not use notes or other
exam aids.
Course Overview
Role
Days
Register for
Course
Developer
4day
Register
Developer
4day
Register
Developer, Analyst
3day
Register
Search Training
Developer
3day
Register
HBase Training
Developer
3day
Register
Spark Training
Developer
3day
Register
Developer
4day
Register
Administer
4day
Register
Analyst
4day
Register
Required Exams
Each exam may be taken in any order. All three exams must be passed within 365 days of each
other. Candidates who fail an exam must wait a period of thirty calendar days, beginning the day
after the failed attempt, before they may retake the same exam. Candidates must pay for each
exam attempt.
Each passed exam is verifiable in your exam transcript and history.
Exam Format
Each exam is a single challenge scenario. You are provided access to the scenario, the data sets,
and the cluster. You are given eight (8) hours to complete the challenge. See below for more
information on the cluster.
Required Skills
Common Skills (all exams)
Extract relevant features from a large dataset that may contain bad records,
partial records, errors, or other forms of noise
Assign data records from a large dataset into a defined set of data groupings
Evaluate goodness of fit for a given set of data groupings and a dataset
Predict labels for an unlabeled dataset using a labeled dataset for reference
How to Prepare
Theory
o
Machine Learning
Coursera specialization
Introductory
Advanced
Statistics
Linear Algebra
https://www.coursera.org/course/matrix
Engineering
o
Tools
Spark
R Programming (Coursera)
All CCP: Data Scientist exams are remote-proctored and available anywhere, anytime. See
the FAQ for more information and system requirements.
Exams are hands-on, practical exams using data science tools on Cloudera technologies. Each
user is given their own 7-node, high-performance CDH5 (currently 5.3.2) cluster pre-loaded with
Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many
others (See a full list). In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10,
Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime,
Eclipse, NetBeans, scikit-learn, octave, NumPy, SciPy, Anaconda, R, plyr, dplyrimpaladb,
SparkML, vowpal wabbit, clouderML, oryx, impyla, CoreNLP, The Stanford Parser: A statistical
parser, Stanford Log-linear Part-Of-Speech Tagger, Stanford Named Entity Recognizer (NER),
Stanford Word Segmenter, opennlp, H2O, java-ml, RapidMiner, caffe, Weka, NLTK, matplotlib,
ggplot, d3py, SparkingPandas, randomforest, R: ggplot2, Sparkling water.
Currently, the cluster is open to the internet and there are no restrictions on tools you can install
or websites or resources you may use.