HDPCD PRACTICE EXAM

Page No | 1
Hortonworks
HDPCD PRACTICE EXAM

Hortonworks Data Platform Certified Developer
________________________________________________________________________________________________
https://www. pass4sures.com/
Page No | 2
Product Questions: 108
Question 1
Which Hadoop component is responsible for managing the distributed fle system metadata?
A. NameNode
B. Metanode
C. DataNode
D. NameSpaceManager
Aoswern A
Question 2
A NameNode in Hadoop 2.2 manages ______________.
A. Two namespaces: an actie namespace and a backup namespace

B. A single namespace
C. An arbitrary number of namespaces
D. No namespaces
Aoswern B
Question 3
Which Two of the following statements are true about hdfs? Choose 2 answers
A. An HDFS fle that is larger than dfs.block.size is split into blocks

B. Blocks are replicated to multple datanodes
C. HDFS works best when storing a large number of relatiely small fles
D. Block sizes for all fles must be the same size
Aoswern A, B
Question 4
In Hadoop 2.2, which one of the following statements is true about a standby NameNode?
The Standby NameNode:
A. Communicates directly with the actie NameNode to maintain the state of the actie NameNode.
B. Receiies the same block reports as the actie NameNode.
C. Runs on the same machine and shares the memory of the actie NameNode.
D. Processes all client requests and block reports from the appropriate DataNodes.
________________________________________________________________________________________________
Page No | 3
Aoswern B
Question 5
Which HDFS command uploads a local fle X into an existng HDFS directory Y?
A. hadoop scp X Y
B. hadoop fs -localPut X Y
C. hadoop fs-put X Y
D. hadoop fs -get X Y
Aoswern C
Question 6
What does the following WebHDFS command do?

Curl -1 -L “htp:::host:port:webhdfs:i1:foo:bar?opp=PPNN
A. Make a directory :foo:bar

B. Read a fle :foo:bar
C. List a directory :foo
D. Delete a directory :foo:bar
Aoswern B
Question 7
In Hadoop 2.2, which TW= of the following processes work together to proiide automatc failoier of the NameNode?
Choose 2 answers
A. ZKFailoierController
B. ZooKeeper
C. QuorumManager
D. JournalNode
Aoswern A, D
Question 8
Which one of the following statements is FALSP regarding the communicaton between DataNodes and a federaton of
NameNodes in Hadoop 2.2?
A. Pach DataNode receiies commands from one designated master NameNode.

B. DataNodes send periodic heartbeats to all the NameNodes.
C. Pach DataNode registers with all the NameNodes.
D. DataNodes send periodic block reports to all the NameNodes.
Aoswern A
________________________________________________________________________________________________
Page No | 4
Question 9
What is the term for the process of moiing map outputs to the reducers?
A. Reducing
B. Combining
C. Parttoning
D. Shufing and sortng
Aoswern D
Question 10
Which one of the following statements is true regarding a MapReduce job?
A. The job's Parttoner shufes and sorts all (key.ialue) pairs and sends the output to all reducers
B. The default Hash Parttoner sends key ialue pairs with the same key to the same Reducer
C. The reduce method is inioked once for each unique ialue
D. The Mapper must sort its output of (key.ialue) pairs in descending order based on ialue
Aoswern A
Question 11
What are the TW= main components of the YARN ResourceManager process? Choose 2 answers
A. Job Tracker
B. Task Tracker
C. Scheduler
D. Applicatons Manager
Aoswern C, D
Question 12
Which one of the following statements describes the relatonship between the NodeManager and the
ApplicatonMaster?
A. The ApplicatonMaster starts the NodeManager in a Container

B. The NodeManager requests resources from the ApplicatonMaster
C. The ApplicatonMaster starts the NodeManager outside of a Container
D. The NodeManager creates an instance of the ApplicatonMaster
Aoswern D
Question 13
________________________________________________________________________________________________
Page No | 5
Which one of the following statements describes the relatonship between the ResourceManager and the
ApplicatonMaster?
A. The ApplicatonMaster requests resources from the ResourceManager

B. The ApplicatonMaster starts a single instance of the ResourceManager
C. The ResourceManager monitors and restarts any failed Containers of the ApplicatonMaster
D. The ApplicatonMaster starts an instance of the ResourceManager within each Container
Aoswern A
Question 14
Which one of the following statements regarding the components of YARN is FALSP?
A. A Container executes a specifc task as assigned by the ApplicatonMaster

B. The ResourceManager is responsible for scheduling and allocatng resources
C. A client applicaton submits a YARW job to the ResourceManager
D. The ResourceManager monitors and restarts any failed Containers
Aoswern D
Question 15
Which YARN component is responsible for monitoring the success or failure of a Container?
A. ResourceManager
B. ApplicatonMaster
C. NodeManager
D. JobTracker
Aoswern A
Question 16
What does Pig proiide to the oierall Hadoop soluton?
A. Legacy language Integraton with MapReduce framework

B. Simple scriptng language for writng MapReduce programs
C. Database table and storage management seriices
D. C++ interface to MapReduce and data warehouse infrastructure
Aoswern B
Question 17
Which one of the following statements describes a Pig bag. tuple, and map, respectiely?
A. Unordered collecton of maps, ordered collecton of tuples, ordered set of key:ialue pairs
B. Unordered collecton of tuples, ordered set of felds, set of key ialue pairs
________________________________________________________________________________________________
Page No | 6
C. =rdered set of felds, ordered collecton of tuples, ordered collecton of maps

D. =rdered collecton of maps, ordered collecton of bags, and unordered set of key:ialue pairs
Aoswern B
Question 18
Reiiew the following data and Pig code.

M,38,95111
F,29,95060
F,45,95192
M,62,95102
F,56,95102
A p L=AD 'data' USING PigStorage('.') as (gender:Chararray, age:int, zlp:chararray);
B p F=RPACH A GPNPRATP age;
Which one of the following commands would saie the results of B to a folder in hdfs named myoutput?
A. ST=RP A INT= 'myoutput' USING PigStorage(',');

B. DUMP B using PigStorage('myoutput');
C. ST=RP B INT= 'myoutput';
D. DUMP B INT= 'myoutput';
Aoswern C
Question 19
Consider the following two relatons, A and B.
Which Pig statement combines A by its frst feld and B by its second feld?
A. C p D=IN B BY a1, A by b2;

B. C p J=IN A by al, B by b2;
C. C p J=IN A a1, B b2;
D. C p J=IN A S=, B $1;
________________________________________________________________________________________________
Page No | 7
Aoswern B
Question 20
A Pig J=IN statement that combined relatons A by its frst feld and B by its second feld would produce what output?
A. 2 Jim Chris 2
3 Terry 3
4 Brian 4
B. 2 cherry
2 cherry
3 orange
4 peach
C. 2 cherry Jim, Chris
3 orange Terry
4 peach Brian
D. 2 cherry Jim 2
2 cherry Chris 2
3 orange Terry 3
4 peach Brian 4
Aoswern D
Question 21
What is the output of the following Pig commands?

X p GR=UP A BY S1;
DUMP X;
________________________________________________________________________________________________
Page No | 8
A. =pton A
B. =pton B
C. =pton C
D. =pton D
Aoswern D
Question 22
What does the following command do?

register ':piggyban)::pig-fles.jar';
A. Iniokes the user-defned functons contained in the jar fle

B. Assigns a name to a user-defned functon or streaming command
C. Transforms Pig user-defned functons into a format that Hiie can accept
D. Specifes the locaton of the JAR fle containing the user-defned functons
Aoswern D
Question 23
Which two of the following statements are true about Pig's approach toward data? Choose 2 answers
A. Accepts only data that has a key:ialue pair structure

B. Accepts data whether it has metadata or not
C. Accepts only data that is defned by metadata tables stored in a database
D. Accepts tab-delimited text data only
P. Accepts any data: structured or unstructured
Aoswern B, E
Question 24
Pxamine the following Pig commands:
________________________________________________________________________________________________
Page No | 9
Which one of the following statements is true?
A. The SAMPLP command generates an "unexpected symbol" error

B. Pach MapReduce task will terminate afer executng for 0.2 minutes
C. The reducers will only output the frst 20% of the data passed from the mappers
D. A random sample of approximately 20% of the data will be output
Aoswern D
Question 25
Reiiew the following data and Pig code:
What command to defne B would produce the output (M,62,95l02) when inioking the DUMP operator on B?
A. B p FILTPR A BY (zip p p '95102' AND gender p p M");

B. Bp F=RPACH A BY (gender p p 'M' AND zip p p '95102');
C. B p J=IN A BY (gender p p 'M' AND zip p p '95102');
D. Bp GR=UP A BY (zip p p '95102' AND gender p p 'M');
Aoswern A
Question 26
Which two of the following are true about this triiial Pig program' (choose Two)
A. The contents of myfle appear on stdout

B. Pig assumes the contents of myfle are comma delimited
C. ABC has a schema associated with it
D. myfle is read from the user's home directory in HDFS
________________________________________________________________________________________________
Page No | 10
Aoswern A, D
Question 27
Giien the following Pig command:

logeients p L=AD 'input:my.log' AS (date:chararray, leiehstring, code:int, message:string);
A. The logeients relaton represents the data from the my.log fle, using a comma as the parsing delimiter
B. The logeients relaton represents the data from the my.log fle, using a tab as the parsing delimiter
C. The frst feld of logeients must be a properly-formated date string or table return an error
D. The statement is not a ialid Pig command
Aoswern B
Question 28
Reiiew the following 'data' fle and Pig code.
A. The =utput =f the DUMP D command IS (M,{(M,62.95102),(M,38,95111)})

B. The output of the dump d command is (M, {(38,95in),(62,95i02)})
C. The code executes successfully but there is not output because the D relaton is empty
D. The code does not execute successfully because D is not a ialid relaton
Aoswern A
Question 29
Giien the following Pig commands:
A. The $1 iariable represents the frst column of data in 'my.log'

B. The $1 iariable represents the second column of data in 'my.log'
C. The seiere relaton is not ialid
D. The grouped relaton is not ialid
________________________________________________________________________________________________
Page No | 11
Aoswern B
Question 30
To use a laia user-defned functon (UDF) with Pig what must you do?
A. Defne an alias to shorten the functon name

B. Pass arguments to the constructor of UDFs implementaton class
C. Register the JAR fle containing the UDF
D. Put the JAR fle into the user's home folder in HDFS
Aoswern C
Question 31
Which TW= of the following statements are true regarding Hiie? Choose 2 answers
A. Useful for data analysts familiar with SQL who need to do ad-hoc queries
B. =fers real-tme queries and row leiel updates
C. Allows you to defne a structure for your unstructured Big Data
D. Is a relatonal database
Aoswern A, C
Question 32
Which one of the following statements is true about a Hiie-managed table?
A. Records can only be added to the table using the Hiie INSPRT command.
B. When the table is dropped, the underlying folder in HDFS is deleted.
C. Hiie dynamically defnes the schema of the table based on the FR=M clause of a SPLPCT query.
D. Hiie dynamically defnes the schema of the table based on the format of the underlying data.
Aoswern B
Question 33
Pxamine the following Hiie statements:
Assuming the statements aboie execute successfully, which one of the following statements is true?
A. Pach reducer generates a fle sorted by age
________________________________________________________________________________________________
Page No | 12
B. The S=RT BY command causes only one reducer to be used

C. The output of each reducer is only the age column
D. The output is guaranteed to be a single fle with all the data sorted by age
Aoswern A
Question 34
Which one of the following statements is false about HCatalog?
A. Proiides a shared schema mechanism

B. Designed to be used by other programs such as Pig, Hiie and MapReduce
C. Stores HDFS data in a database for performing SQL-like ad-hoc queries
D. Pxists as a subproject of Hiie
Aoswern C
Question 35
Which one of the following Hiie commands uses an HCatalog table named x?
A. SPLPCT * FR=M x;
B. SPLPCT x.-FR=M org.apache.hcatalog.hiie.HCatLoader('x');
C. SPLPCT * FR=M org.apache.hcatalog.hiie.HCatLoader('x');
D. Hiie commands cannot reference an HCatalog table
Aoswern C
Question 36
Which one of the following classes would a Pig command use to store data in a table defned in HCatalog?
A. org.apache.hcatalog.pig.HCat=utputFormat
B. org.apache.hcatalog.pig.HCatStorer
C. No special class is needed for a Pig script to store data in an HCatalog table
D. Pig scripts cannot use an HCatalog table
Aoswern B
Question 37
Pxamine the following Hiie statements:
________________________________________________________________________________________________
Page No | 13
Assuming
the
statements
aboie execute successfully, which one of the following statements is true?
A. Hiie reformats File1 into a structure that Hiie can access and moies into to:user:joe:x:
B. The fle named File1 is moied to to:user:joe:x:
C. The contents of File1 are parsed as comma-delimited rows and loaded into :user:joe:x:
D. The contents of File1 are parsed as comma-delimited rows and stored in a database
Aoswern B
Question 38
Which one of the following statements describes a Hiie user-defned aggregate functon?
A. =perates on multple input rows and creates a single row as output

B. =perates on a single input row and produces a single row as output
C. =perates on a single input row and produces a table as output
D. =perates on multple input rows and produces a table as output
Aoswern A
Question 39
Giien the following Hiie commands:
Which one of the following statements Is true?
A. The fle mydata.txt is copied to a subfolder of :apps:hiie:warehouse

B. The fle mydata.txt is moied to a subfolder of :apps:hiie:warehouse
C. The fle mydata.txt is copied into Hiie's underlying relatonal database 0.
D. The fle mydata.txt does not moie from Its current locaton in HDFS
Aoswern A
Question 40
Giien the following Hiie command:
A. The fles in the mydata folder are copied to a subfolder of :apps:hlie:warehouse

B. The fles in the mydata folder are moied to a subfolder of :apps:hiie:wa re house
C. The fles in the mydata folder are copied into Hiie's underlying relatonal database
D. The fles in the mydata folder do not moie from their current locaton In HDFS
________________________________________________________________________________________________
Page No | 14
Aoswern D
Question 41
Giien the following Hiie command:

INSPRT =VPRWRITP TABLP mytable SPLPCT * FR=M myothertable;
A. The contents of myothertable are appended to mytable

B. Any existng data in mytable will be oierwriten
C. A new table named mytable is created, and the contents of myothertable are copied into mytable
D. The statement is not a ialid Hiie command
Aoswern B
Question 42
Assuming the following Hiie query executes successfully:
Which one of the following statements describes the result set?
A. A bigram of the top 80 sentences that contain the substring "you are" in the lines column of the input data A1 table.
B. An 80-ialue ngram of sentences that contain the words "you" or "are" in the lines column of the inputdata table.
C. A trigram of the top 80 sentences that contain "you are" followed by a null space in the lines column of the
inputdata table.
D. A frequency distributon of the top 80 words that follow the subsequence "you are" in the lines column of the
inputdata table.
Aoswern D
Question 43
Which one of the following fles is required in eiery =ozie Workfow applicaton?
A. job.propertes
B. Confg-default.xml
C. Workfow.xml
D. =ozie.xml
Aoswern C
Question 44
Which one of the following is N=T a ialid =ozie acton?
A. mapreduce
________________________________________________________________________________________________
Page No | 15
B. pig
C. hiie
D. mrunit
Aoswern D
Question 45
When is the earliest point at which the reduce method of a giien Reducer can be called?
A. As soon as at least one mapper has fnished processing its input split.
B. As soon as a mapper has emited at least one record.
C. Not untl all mappers haie fnished processing all records.
D. It depends on the InputFormat used for the job.
Aoswern C
In a MapReduce job reducers do not start executng the reduce method untl the all Map jobs haie completed.
Reducers start copying intermediate key-ialue pairs from the mappers as soon as they are aiailable. The programmer
defned reduce method is called only afer all the mappers haie fnished.
Note: The reduce phase has 3 steps: shufe, sort, reduce. Shufe is where the data is collected by the reducer from
each mapper. This can happen while mappers are generatng data since it is only a data transfer. =n the other hand,
sort and reduce can only start once all the mappers are done.
Why is startng the reducers early a good thing? Because it spreads out the data transfer from the mappers to the
reducers oier tme, which is a good thing if your network is the botleneck.
Why is startng the reducers early a bad thing? Because they "hog up" reduce slots while only copying data. Another
job that starts later that will actually use the reduce slots now can't use them.
You can customize when the reducers startup by changing the default ialue of
mapred.reduce.slowstart.completed.maps in mapred-site.xml. A ialue of 1.00 will wait for all the mappers to fnish
before startng the reducers. A ialue of 0.0 will start the reducers right away. A ialue of 0.5 will start the reducers
when half of the mappers are complete. You can also change mapred.reduce.slowstart.completed.maps on a job-by-
job basis.
Typically, keep mapred.reduce.slowstart.completed.maps aboie 0.9 if the system eier has multple jobs running at
once. This way the job doesn't hog up reducers when they aren't doing anything but copying data. If you only eier
haie one job running at a tme, doing 0.1 would probably be appropriate.
Reference: 24 Interiiew Questons & Answers for Hadoop MapReduce deielopers, When is the reducers are started in
a MapReduce job?
Question 46
Which describes how a client reads a fle from HDFS?
A. The client queries the NameNode for the block locaton(s). The NameNode returns the block locaton(s) to the
client. The client reads the data directory of the DataNode(s).
B. The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the
client. The client reads the data directly of the DataNode.
C. The client contacts the NameNode for the block locaton(s). The NameNode then queries the DataNodes for block
locatons. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that
holds the requested data block(s). The client then reads the data directly of the DataNode.
________________________________________________________________________________________________
Page No | 16
D. The client contacts the NameNode for the block locaton(s). The NameNode contacts the DataNode that holds the
requested data block. Data is transferred from the DataNode to the NameNode, and then from the NameNode to the
client.
Aoswern A
Reference: 24 Interiiew Questons & Answers for Hadoop MapReduce deielopers, How the Client communicates with
HDFS?
Question 47
You are deieloping a combiner that takes as input Text keys, IntWritable ialues, and emits Text keys, IntWritable
ialues. Which interface should your class implement?
A. Combiner <Text, IntWritable, Text, IntWritable>

B. Mapper <Text, IntWritable, Text, IntWritable>
C. Reducer <Text, Text, IntWritable, IntWritable>
D. Reducer <Text, IntWritable, Text, IntWritable>
P. Combiner <Text, Text, IntWritable, IntWritable>
Aoswern D
Question 48
Indentfy the utlity that allows you to create and run MapReduce jobs with any executable or script as the mapper
and:or the reducer?
A. =ozie
B. Sqoop
C. Flume
D. Hadoop Streaming
P. mapred
Aoswern D
Hadoop streaming is a utlity that comes with the Hadoop distributon. The utlity allows you to create and run
Map:Reduce jobs with any executable or script as the mapper and:or the reducer.
Reference: htp:::hadoop.apache.org:common:docs:r0.20.1:streaming.html (Hadoop Streaming, second sentence)
Question 49
How are keys and ialues presented and passed to the reducers during a standard sort and shufe phase of
MapReduce?
A. Keys are presented to reducer in sorted order; ialues for a giien key are not sorted.
B. Keys are presented to reducer in sorted order; ialues for a giien key are sorted in ascending order.
C. Keys are presented to a reducer in random order; ialues for a giien key are not sorted.
D. Keys are presented to a reducer in random order; ialues for a giien key are sorted in ascending order.
________________________________________________________________________________________________
Page No | 17
Aoswern A
Reducer has 3 primary phases:

1. Shufe
The Reducer copies the sorted output from each Mapper using HTTP across the network.
2. Sort
The framework merge sorts Reducer inputs by keys (since diferent Mappers may haie output the same key).
The shufe and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
SecondarySort
To achieie a secondary sort on the ialues returned by the ialue iterator, the applicaton should extend the key with
the secondary key and defne a grouping comparator. The keys will be sorted using the entre key, but will be grouped
using the grouping comparator to decide which keys and ialues are sent in the same call to reduce.
3. Reduce
In this phase the reduce(=bject, Iterable, Context) method is called for each <key, (collecton of ialues)> in the sorted
inputs.
The output of the reduce task is typically writen to a RecordWriter iia TaskInput=utputContext.write(=bject, =bject).
The output of the Reducer is not re-sorted.
Reference: org.apache.hadoop.mapreduce, Class Reducer<KPYIN,VALUPIN,KPY=UT,VALUP=UT>
Question 50
Assuming default setngs, which best describes the order of data proiided to a reducer’s reduce method:
A. The keys giien to a reducer aren’t in a predictable order, but the ialues associated with those keys always are.
B. Both the keys and ialues passed to a reducer always appear in sorted order.
C. Neither keys nor ialues are in any predictable order.
D. The keys giien to a reducer are in sorted order but the ialues associated with each key are in no predictable order
Aoswern D
Reducer has 3 primary phases:

1. Shufe
The Reducer copies the sorted output from each Mapper using HTTP across the network.
2. Sort
The framework merge sorts Reducer inputs by keys (since diferent Mappers may haie output the same key).
The shufe and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
SecondarySort
To achieie a secondary sort on the ialues returned by the ialue iterator, the applicaton should extend the key with
the secondary key and defne a grouping comparator. The keys will be sorted using the entre key, but will be grouped
using the grouping comparator to decide which keys and ialues are sent in the same call to reduce.
3. Reduce
In this phase the reduce(=bject, Iterable, Context) method is called for each <key, (collecton of ialues)> in the sorted
inputs.
The output of the reduce task is typically writen to a RecordWriter iia TaskInput=utputContext.write(=bject, =bject).
The output of the Reducer is not re-sorted.
Reference: org.apache.hadoop.mapreduce, Class Reducer<KPYIN,VALUPIN,KPY=UT,VALUP=UT>
Question 51
You wrote a map functon that throws a runtme excepton when it encounters a control character in input dat
________________________________________________________________________________________________
Page No | 18
a. The input supplied to your mapper contains twelie such characters totals, spread across fie fle splits. The frst four
fle splits each haie two control characters and the last split has four control characters.
Indentfy the number of failed task atempts you can expect when you run the job with mapred.max.map.atempts
set to 4:
A. You will haie forty-eight failed task atempts
B. You will haie seienteen failed task atempts
C. You will haie fie failed task atempts
D. You will haie twelie failed task atempts
P. You will haie twenty failed task atempts
Aoswern E
There will be four failed task atempts for each of the fie fle splits.
Note:
Question 52
You want to populate an associatie array in order to perform a map-side join. You’ie decided to put this informaton
in a text fle, place that fle into the DistributedCache and read it in your Mapper before any records are processed.
Indentfy which method in the Mapper you should use to implement code for reading the fle and populatng the
associatie array?
A. combine
B. map
C. init
D. confgure
Aoswern D
Reference: org.apache.hadoop.flecache , Class DistributedCache
Question 53
You’ie writen a MapReduce job that will process 500 million input records and generated 500 million key-ialue pairs.
The data is not uniformly distributed. Your MapReduce job will create a signifcant amount of intermediate data that it
needs to transfer between mappers and reduces which is a potental botleneck. A custom implementaton of which
interface is most likely to reduce the amount of intermediate data transferred across the network?
A. Parttoner
B. =utputFormat
C. WritableComparable
D. Writable
________________________________________________________________________________________________
Page No | 19
P. InputFormat
F. Combiner
Aoswern F
Combiners are used to increase the efciency of a MapReduce program. They are used to aggregate intermediate
map output locally on indiiidual mapper outputs. Combiners can help you reduce the amount of data that needs to
be transferred across to the reducers. You can use your reducer code as a combiner if the operaton performed is
commutatie and associatie.
Reference: 24 Interiiew Questons & Answers for Hadoop MapReduce deielopers, What are combiners? When should
I use a combiner in my MapReduce Job?
Question 54
Can you use MapReduce to perform a relatonal join on two large tables sharing a key? Assume that the two tables
are formated as comma-separated fles in HDFS.
A. Yes.
B. Yes, but only if one of the tables fts into memory
C. Yes, so long as both tables ft into memory.
D. No, MapReduce cannot perform relatonal operatons.
P. No, but it can be done with either Pig or Hiie.
Aoswern A
Note:
* Join Algorithms in MapReduce
A) Reduce-side join
B) Map-side join
C) In-memory join
: Striped Striped iariant iariant
: Memcached iariant
* Which join to use?

: In-memory join > map-side join > reduce-side join
: Limitatons of each?
In-memory join: memory
Map-side join: sort order and parttoning
Reduce-side join: general purpose
Question 55
You haie just executed a MapReduce job. Where is intermediate data writen to afer being emited from the
Mapper’s map method?
A. Intermediate data in streamed across the network from Mapper to the Reduce and is neier writen to disk.
B. Into in-memory bufers on the TaskTracker node running the Mapper that spill oier and are writen into HDFS.
C. Into in-memory bufers that spill oier to the local fle system of the TaskTracker node running the Mapper.
D. Into in-memory bufers that spill oier to the local fle system (outside HDFS) of the TaskTracker node running the
Reducer
________________________________________________________________________________________________
Page No | 20
P. Into in-memory bufers on the TaskTracker node running the Reducer that spill oier and are writen into HDFS.
Aoswern C
The mapper output (intermediate data) is stored on the Local fle system (N=T HDFS) of each indiiidual mapper
nodes. This is typically a temporary directory locaton which can be setup in confg by the hadoop administrator. The
intermediate data is cleaned up afer the Hadoop Job completes.
Reference: 24 Interiiew Questons & Answers for Hadoop MapReduce deielopers, Where is the Mapper =utput
(intermediate kay-ialue data) stored ?
Question 56
You want to understand more about how users browse your public website, such as which pages they iisit prior to
placing an order. You haie a farm of 200 web seriers hostng your website. How will you gather this data for your
analysis?
A. Ingest the serier web logs into HDFS using Flume.

B. Write a MapReduce job, with the web seriers for mappers, and the Hadoop cluster nodes for reduces.
C. Import all users’ clicks from your =LTP databases into Hadoop, using Sqoop.
D. Channel these clickstreams inot Hadoop using Hadoop Streaming.
P. Sample the weblogs from the web seriers, copying them into Hadoop using curl.
Aoswern A
Question 57
MapReduce i2 (MRi2:YARN) is designed to address which two issues?
A. Single point of failure in the NameNode.

B. Resource pressure on the JobTracker.
C. HDFS latency.
D. Ability to run frameworks other than MapReduce, such as MPI.
P. Reduce complexity of the MapReduce APIs.
F. Standardize on a single MapReduce API.
Aoswern AB
Reference: Apache Hadoop YARN – Concepts & Applicatons
Question 58
You need to run the same job many tmes with minor iariatons. Rather than hardcoding all job confguraton optons
in your driie code, you’ie decided to haie your Driier subclass org.apache.hadoop.conf.Confgured and implement
the org.apache.hadoop.utl.Tool interface.
Indentfy which iniocaton correctly passes.mapred.job.name with a ialue of Pxample to Hadoop?
A. hadoop “mapred.job.namepPxampleN MyDriier input output

B. hadoop MyDriier mapred.job.namepPxample input output
C. hadoop MyDriie –D mapred.job.namepPxample input output
________________________________________________________________________________________________
Page No | 21
D. hadoop setproperty mapred.job.namepPxample MyDriier input output

P. hadoop setproperty (“mapred.job.namepPxampleN) MyDriier input output
Aoswern C
Confgure the property using the -D keypialue notaton:

-D mapred.job.namep'My Job'
You can list a whole bunch of optons by calling the streaming jar with just the -info argument
Reference: Python hadoop streaming : Setng a job name
Question 59
You are deieloping a MapReduce job for sales reportng. The mapper will process input keys representng the year
(IntWritable) and input ialues representng product indentfes (Text).
Indentfy what determines the data types used by the Mapper for a giien job.
A. The key and ialue types specifed in the JobConf.setMapInputKeyClass and JobConf.setMapInputValuesClass
methods
B. The data types specifed in HAD==P_MAP_DATATYPPS eniironment iariable
C. The mapper-specifcaton.xml fle submited with the job determine the mapper’s input key and ialue types.
D. The InputFormat used by the job determines the mapper’s input key and ialue types.
Aoswern D
The input types fed to the mapper are controlled by the InputFormat used. The default input format,
"TextInputFormat," will load data in as (LongWritable, Text) pairs. The long ialue is the byte ofset of the line in the
fle. The Text object holds the string contents of the line of the fle.
Note: The data types emited by the reducer are identfed by set=utputKeyClass() andset=utputValueClass(). The
data types emited by the reducer are identfed by set=utputKeyClass() and set=utputValueClass().
By default, it is assumed that these are the output types of the mapper as well. If this is not the case, the methods
setMap=utputKeyClass() and setMap=utputValueClass() methods of the JobConf class will oierride these.
Reference: Yahoo! Hadoop Tutorial, THP DRIVPR MPTH=D
Question 60
Identfy the MapReduce i2 (MRi2 : YARN) daemon responsible for launching applicaton containers and monitoring
applicaton resource usage?
A. ResourceManager
B. NodeManager
C. ApplicatonMaster
D. ApplicatonMasterSeriice
P. TaskTracker
F. JobTracker
Aoswern B
________________________________________________________________________________________________
Page No | 22
Question 61
Which best describes how TextInputFormat processes input fles and line breaks?
A. Input fle splits may cross line breaks. A line that crosses fle splits is read by the RecordReader of the split that
contains the beginning of the broken line.
B. Input fle splits may cross line breaks. A line that crosses fle splits is read by the RecordReaders of both splits
containing the broken line.
C. The input fle is split exactly at the line breaks, so each RecordReader will read a series of complete lines.
D. Input fle splits may cross line breaks. A line that crosses fle splits is ignored.
P. Input fle splits may cross line breaks. A line that crosses fle splits is read by the RecordReader of the split that
contains the end of the broken line.
Aoswern A
Reference: How Map and Reduce operatons are actually carried out
Question 62
For each input key-ialue pair, mappers can emit:
A. As many intermediate key-ialue pairs as designed. There are no restrictons on the types of those key-ialue pairs
(i.e., they can be heterogeneous).
B. As many intermediate key-ialue pairs as designed, but they cannot be of the same type as the input key-ialue pair.
C. =ne intermediate key-ialue pair, of a diferent type.
D. =ne intermediate key-ialue pair, but of the same type.
P. As many intermediate key-ialue pairs as designed, as long as all the keys haie the same types and all the ialues
haie the same type.
Aoswern E
Mapper maps input key:ialue pairs to a set of intermediate key:ialue pairs.

Maps are the indiiidual tasks that transform input records into intermediate records. The transformed intermediate
records do not need to be of the same type as the input records. A giien input pair may map to zero or many output
pairs.
Reference: Hadoop Map-Reduce Tutorial
Question 63
You haie the following key-ialue pairs as output from your Map task:
(the, 1)
(fox, 1)
(faster, 1)
(than, 1)
(the, 1)
(dog, 1)
How many keys will be passed to the Reducer’s reduce method?
A. Six
B. Fiie
________________________________________________________________________________________________
Page No | 23
C. Four
D. Two
P. =ne
F. Three
Aoswern B
=nly one key ialue pair will be passed from the two (the, 1) key ialue pairs.
Question 64
You haie user profle records in your =LPT database, that you want to join with web logs you haie already ingested
into the Hadoop fle system. How will you obtain these user records?
A. HDFS command
B. Pig L=AD command
C. Sqoop import
D. Hiie L=AD DATA command
P. Ingest with Flume agents
F. Ingest with Hadoop Streaming
Aoswern C
Reference: Hadoop and Pig for Large-Scale Web Log Analysis
Question 65
What is the disadiantage of using multple reducers with the default HashParttoner and distributng your workload
across you cluster?
A. You will not be able to compress the intermediate data.

B. You will longer be able to take adiantage of a Combiner.
C. By using multple reducers with the default HashParttoner, output fles may not be in globally sorted order.
D. There are no concerns with this approach. It is always adiisable to use multple reduces.
Aoswern C
Multple reducers and total ordering

If your sort job runs with multple reducers (either because mapreduce.job.reduces in mapred-site.xml has been set
to a number larger than 1, or because you’ie used the -r opton to specify the number of reducers on the command-
line), then by default Hadoop will use the HashParttoner to distribute records across the reducers. Use of the
HashParttoner means that you can’t concatenate your output fles to create a single sorted output fle. To do this
you’ll need total ordering,
Reference: Sortng text fles with MapReduce
Question 66
Giien a directory of fles with the following structure: line number, tab character, string:
Pxample:
________________________________________________________________________________________________
Page No | 24
1 abialkjfkaoasdfksdlkjhqweroij
2 kadfhuwqounahagtnbiaswslmnbfgy
3 kjfeiomndscxeqalkzhtopedkfsikj
You want to send each line as one record to your Mapper. Which InputFormat should you use to complete the line:
conf.setInputFormat (____.class) ; ?
A. SequenceFileAsTextInputFormat
B. SequenceFileInputFormat
C. KeyValueFileInputFormat
D. BDBInputFormat
Aoswern C
htp:::stackoierfow.com:questons:9s21s54:how-to-parse-customwritable-from-text-in-hadoop
Question 67
You need to perform statstcal analysis in your MapReduce job and would like to call methods in the Apache
Commons Math library, which is distributed as a 1.3 megabyte Jaia archiie (JAR) fle. Which is the best way to make
this library aiailable to your MapReducer job at runtme?
A. Haie your system administrator copy the JAR to all nodes in the cluster and set its locaton in the
HAD==P_CLASSPATH eniironment iariable before you submit your job.
B. Haie your system administrator place the JAR fle on a Web serier accessible to all cluster nodes and then set the
HTTP_JAR_URL eniironment iariable to its locaton.
C. When submitng the job on the command line, specify the –libjars opton followed by the JAR fle path.
D. Package your code and the Apache Commands Math library into a zip fle named JobJar.zip
Aoswern C
The usage of the jar command is like this,

Usage: hadoop jar <jar> [mainClass] args...
If you want the commons-math3.jar to be aiailable for all the tasks you can do any one of
these
1. Copy the jar fle in $HAD==P_H=MP:lib dir
or
2. Use the generic opton -libjars.
Question 68
The Hadoop framework proiides a mechanism for coping with machine issues such as faulty confguraton or
impending hardware failure. MapReduce detects that one or a number of machines are performing poorly and starts
more copies of a map or reduce task. All the tasks run simultaneously and the task fnish frst are used. This is called:
A. Combine
B. IdenttyMapper
C. IdenttyReducer
D. Default Parttoner
P. Speculatie Pxecuton
________________________________________________________________________________________________
Page No | 25
Aoswern E
Speculatie executon: =ne problem with the Hadoop system is that by diiiding the tasks across many nodes, it is
possible for a few slow nodes to rate-limit the rest of the program. For example if one node has a slow disk controller,
then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are already
complete, the system is stll waitng for the fnal map task to check in, which takes much longer than all the other
nodes.
By forcing tasks to run in isolaton from one another, indiiidual tasks do not know where their inputs come from.
Tasks trust the Hadoop platorm to just deliier the appropriate input. Therefore, the same input can be processed
multple tmes in parallel, to exploit diferences in machine capabilites. As most of the tasks in a job are coming to a
close, the Hadoop platorm will schedule redundant copies of the remaining tasks across seieral nodes which do not
haie other work to perform. This process is known as speculatie executon. When tasks complete, they announce
this fact to the JobTracker. Whicheier copy of a task fnishes frst becomes the defnitie copy. If other copies were
executng speculatiely, Hadoop tells the TaskTrackers to abandon the tasks and discard their outputs. The Reducers
then receiie their inputs from whicheier Mapper completed successfully, frst.
Reference: Apache Hadoop, Module 4: MapReduce
Note:
* Hadoop uses "speculatie executon." The same task may be started on multple boxes. The frst one to fnish wins,
and the other copies are killed.
Failed tasks are tasks that error out.
* There are a few reasons Hadoop can kill tasks by his own decisions:
a) Task does not report progress during tmeout (default is 10 minutes)
b) FairScheduler or CapacityScheduler needs the slot for some other pool (FairScheduler) or queue
(CapacityScheduler).
c) Speculatie executon causes results of task not to be needed since it has completed on other place.
Reference: Diference failed tasks is killed tasks
Question 69
For each intermediate key, each reducer task can emit:
A. As many fnal key-ialue pairs as desired. There are no restrictons on the types of those key-ialue pairs (i.e., they
can be heterogeneous).
B. As many fnal key-ialue pairs as desired, but they must haie the same type as the intermediate key-ialue pairs.
C. As many fnal key-ialue pairs as desired, as long as all the keys haie the same type and all the ialues haie the same
type.
D. =ne fnal key-ialue pair per ialue associated with the key; no restrictons on the type.
P. =ne fnal key-ialue pair per key; no restrictons on the type.
Aoswern C
Reference: Hadoop Map-Reduce Tutorial; Yahoo! Hadoop Tutorial, Module 4: MapReduce
Question 70
What data does a Reducer reduce method process?
A. All the data in a single input fle.

B. All data produced by a single mapper.
C. All data for a giien key, regardless of which mapper(s) produced it.
________________________________________________________________________________________________
Page No | 26
D. All data for a giien ialue, regardless of which mapper(s) produced it.
Aoswern C
Reducing lets you aggregate ialues together. A reducer functon receiies an iterator of input ialues from an input list.
It then combines these ialues together, returning a single output ialue.
All ialues with the same key are presented to a single reduce task.
Reference: Yahoo! Hadoop Tutorial, Module 4: MapReduce
Question 71
All keys used for intermediate output from mappers must:
A. Implement a splitable compression algorithm.

B. Be a subclass of FileInputFormat.
C. Implement WritableComparable.
D. =ierride isSplitable.
P. Implement a comparator for speedy sortng.
Aoswern C
The MapReduce framework operates exclusiiely on <key, ialue> pairs, that is, the framework iiews the input to the
job as a set of <key, ialue> pairs and produces a set of <key, ialue> pairs as the output of the job, conceiiably of
diferent types.
The key and ialue classes haie to be serializable by the framework and hence need to implement the Writable
interface. Additonally, the key classes haie to implement the WritableComparable interface to facilitate sortng by
the framework.
Reference: MapReduce Tutorial
Question 72
=n a cluster running MapReduce i1 (MRi1), a TaskTracker heartbeats into the JobTracker on your cluster, and alerts
the JobTracker it has an open map task slot.
What determines how the JobTracker assigns each map task to a TaskTracker?
A. The amount of RAM installed on the TaskTracker node.

B. The amount of free disk space on the TaskTracker node.
C. The number and speed of CPU cores on the TaskTracker node.
D. The aierage system load on the TaskTracker node oier the past ffeen (15) minutes.
P. The locaton of the InsputSplit to be processed in relaton to the locaton of the node.
Aoswern E
The TaskTrackers send out heartbeat messages to the JobTracker, usually eiery few minutes, to reassure the
JobTracker that it is stll aliie. These message also inform the JobTracker of the number of aiailable slots, so the
JobTracker can stay up to date with where in the cluster work can be delegated. When the JobTracker tries to fnd
somewhere to schedule a task within the MapReduce operatons, it frst looks for an empty slot on the same serier
that hosts the DataNode containing the data, and if not, it looks for an empty slot on a machine in the same rack.
Reference: 24 Interiiew Questons & Answers for Hadoop MapReduce deielopers, How JobTracker schedules a task?
________________________________________________________________________________________________
Page No | 27
Question 73
Indentfy which best defnes a SequenceFile?
A. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous Writable objects

B. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous Writable objects
C. A SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.
D. A SequenceFile contains a binary encoding of an arbitrary number key-ialue pairs. Pach key must be the same type.
Pach ialue must be the same type.
Aoswern D
SequenceFile is a fat fle consistng of binary key:ialue pairs.

There are 3 diferent SequenceFile formats:
Uncompressed key:ialue records.
Record compressed key:ialue records - only 'ialues' are compressed here.
Block compressed key:ialue records - both keys and ialues are collected in 'blocks' separately and compressed. The
size of the 'block' is confgurable.
Reference: htp:::wiki.apache.org:hadoop:SequenceFile
Question 74
A client applicaton creates an HDFS fle named foo.txt with a replicaton factor of 3. Identfy which best describes the
fle access rules in HDFS if the fle has a single block that is stored on data nodes A, B and C?
A. The fle will be marked as corrupted if data node B fails during the creaton of the fle.
B. Pach data node locks the local fle to prohibit concurrent readers and writers of the fle.
C. Pach data node stores a copy of the fle in the local fle system with the same name as the HDFS fle.
D. The fle can be accessed if at least one of the data nodes storing the fle is aiailable.
Aoswern D
HDFS keeps three copies of a block on three diferent datanodes to protect against true data corrupton. HDFS also
tries to distribute these three replicas on more than one rack to protect against data aiailability issues. The fact that
HDFS actiely monitors any failed datanode(s) and upon failure detecton immediately schedules re-replicaton of
blocks (if needed) implies that three copies of data on three diferent nodes is sufcient to aioid corrupted fles.
Note:
HDFS is designed to reliably store iery large fles across machines in a large cluster. It stores each fle as a sequence of
blocks; all blocks in a fle except the last block are the same size. The blocks of a fle are replicated for fault tolerance.
The block size and replicaton factor are confgurable per fle. An applicaton can specify the number of replicas of a
fle. The replicaton factor can be specifed at fle creaton tme and can be changed later. Files in HDFS are write-once
and haie strictly one writer at any tme. The NameNode makes all decisions regarding replicaton of blocks. HDFS uses
rack-aware replica placement policy. In default confguraton there are total 3 copies of a datablock on HDFS, 2 copies
are stored on datanodes on same rack and 3rd copy on a diferent rack.
Reference: 24 Interiiew Questons & Answers for Hadoop MapReduce deielopers , How the HDFS Blocks are
replicated?
Question 75
In a MapReduce job, you want each of your input fles processed by a single map task. How do you confgure a
________________________________________________________________________________________________
Page No | 28
MapReduce job so that a single map task processes each input fle regardless of how many blocks the input fle
occupies?
A. Increase the parameter that controls minimum split size in the job confguraton.
B. Write a custom MapRunner that iterates oier all key-ialue pairs in the entre fle.
C. Set the number of mappers equal to the number of input fles you want to process.
D. Write a custom FileInputFormat and oierride the method isSplitable to always return false.
Aoswern D
FileInputFormat is the base class for all fle-based InputFormats. This proiides a generic implementaton of
getSplits(JobContext). Subclasses of FileInputFormat can also oierride the isSplitable(JobContext, Path) method to
ensure input-fles are not split-up and are processed as a whole by Mappers.
Reference: org.apache.hadoop.mapreduce.lib.input, Class FileInputFormat<K,V>
Question 76
Which process describes the lifecycle of a Mapper?
A. The JobTracker calls the TaskTracker’s confgure () method, then its map () method and fnally its close () method.
B. The TaskTracker spawns a new Mapper to process all records in a single input split.
C. The TaskTracker spawns a new Mapper to process each key-ialue pair.
D. The JobTracker spawns a new Mapper to process all records in a single fle.
Aoswern B
For each map instance that runs, the TaskTracker creates a new instance of your mapper.
Note:
* The Mapper is responsible for processing Key:Value pairs obtained from the InputFormat. The mapper may perform
a number of Pxtracton and Transformaton functons on the Key:Value pair before ultmately outputng none, one or
many Key:Value pairs of the same, or diferent Key:Value type.
* With the new Hadoop API, mappers extend the org.apache.hadoop.mapreduce.Mapper class. This class defnes an
'Identty' map functon by default - eiery input Key:Value pair obtained from the InputFormat is writen out.
Pxamining the run() method, we can see the lifecycle of the mapper:
:**
* Pxpert users can oierride this method for more complete control oier the
* executon of the Mapper.
* @param context
* @throws I=Pxcepton
*:
public ioid run(Context context) throws I=Pxcepton, InterruptedPxcepton {
setup(context);
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
cleanup(context);
}
setup(Context) - Perform any setup for the mapper. The default implementaton is a no-op method.
map(Key, Value, Context) - Perform a map operaton in the giien Key : Value pair. The default implementaton calls
Context.write(Key, Value)
________________________________________________________________________________________________
Page No | 29
cleanup(Context) - Perform any cleanup for the mapper. The default implementaton is a no-op method.
Reference: Hadoop:MapReduce:Mapper
Question 77
Determine which best describes when the reduce method is frst called in a MapReduce job?
A. Reducers start copying intermediate key-ialue pairs from each Mapper as soon as it has completed. The
programmer can confgure in the job what percentage of the intermediate data should arriie before the reduce
method begins.
B. Reducers start copying intermediate key-ialue pairs from each Mapper as soon as it has completed. The reduce
method is called only afer all intermediate data has been copied and sorted.
C. Reduce methods and map methods all start at the beginning of a job, in order to proiide optmal performance for
map-only or reduce-only jobs.
D. Reducers start copying intermediate key-ialue pairs from each Mapper as soon as it has completed. The reduce
method is called as soon as the intermediate key-ialue pairs start to arriie.
Aoswern B
Reference: 24 Interiiew Questons & Answers for Hadoop MapReduce deielopers , When is the reducers are started
in a MapReduce job?
Question 78
You haie writen a Mapper which iniokes the following fie calls to the =utputColletor.collect method:
output.collect (new Text (“AppleN), new Text (“RedN) ) ;
output.collect (new Text (“BananaN), new Text (“YellowN) ) ;
output.collect (new Text (“AppleN), new Text (“YellowN) ) ;
output.collect (new Text (“CherryN), new Text (“RedN) ) ;
output.collect (new Text (“AppleN), new Text (“GreenN) ) ;
How many tmes will the Reducer’s reduce method be inioked?
A. 6
B. 3
C. 1
D. 0
P. 5
Aoswern B
reduce() gets called once for each [key, (list of ialues)] pair. To explain, let's say you called:
out.collect(new Text("Car"),new Text("Subaru");
out.collect(new Text("Car"),new Text("Honda");
out.collect(new Text("Car"),new Text("Ford");
out.collect(new Text("Truck"),new Text("Dodge");
out.collect(new Text("Truck"),new Text("Cheiy");
Then reduce() would be called twice with the pairs
reduce(Car, <Subaru, Honda, Ford>)
reduce(Truck, <Dodge, Cheiy>)
Reference: Mapper output.collect()?
________________________________________________________________________________________________
Page No | 30
Question 79
To process input key-ialue pairs, your mapper needs to lead a 512 MB data fle in memory. What is the best way to
accomplish this?
A. Serialize the data fle, insert in it the JobConf object, and read the data into memory in the confgure method of the
mapper.
B. Place the data fle in the DistributedCache and read the data into memory in the map method of the mapper.
C. Place the data fle in the DataCache and read the data into memory in the confgure method of the mapper.
D. Place the data fle in the DistributedCache and read the data into memory in the confgure method of the mapper.
Aoswern C
Question 80
In a MapReduce job, the reducer receiies all ialues associated with same key. Which statement best describes the
ordering of these ialues?
A. The ialues are in sorted order.

B. The ialues are arbitrarily ordered, and the ordering may iary from run to run of the same MapReduce job.
C. The ialues are arbitrary ordered, but multple runs of the same MapReduce job will always haie the same ordering.
D. Since the ialues come from mapper outputs, the reducers will receiie contguous sectons of sorted ialues.
Aoswern B
Note:
* Input to the Reducer is the sorted output of the mappers.
* The framework calls the applicaton's Reduce functon once for each unique key in the sorted order.
* Pxample:
For the giien sample input the frst map emits:
< Hello, 1>
< World, 1>
< Bye, 1>
< World, 1>
The second map emits:
< Hello, 1>
< Hadoop, 1>
< Goodbye, 1>
< Hadoop, 1>
Question 81
You need to create a job that does frequency analysis on input dat
a. You will do this by writng a Mapper that uses TextInputFormat and splits each ialue (a line of text from an input
fle) into indiiidual characters. For each one of these characters, you will emit the character as a key and an
InputWritable as the ialue. As this will produce proportonally more intermediate data than input data, which two
resources should you expect to be botlenecks?
A. Processor and network I:=
B. Disk I:= and network I:=
________________________________________________________________________________________________
Page No | 31
C. Processor and RAM

D. Processor and disk I:=
Aoswern B
Question 82
You want to count the number of occurrences for each unique word in the supplied input dat
a. You’ie decided to implement this by haiing your mapper tokenize each word and emit a literal ialue 1, and then
haie your reducer increment a counter for each literal 1 it receiies. Afer successful implementng this, it occurs to
you that you could optmize this by specifying a combiner. Will you be able to reuse your existng Reduces as your
combiner in this case and why or why not?
A. Yes, because the sum operaton is both associatie and commutatie and the input and output types to the reduce
method match.
B. No, because the sum operaton in the reducer is incompatble with the operaton of a Combiner.
C. No, because the Reducer and Combiner are separate interfaces.
D. No, because the Combiner is incompatble with a mapper which doesn’t use the same data type for both the key
and ialue.
P. Yes, because Jaia is a polymorphic object-oriented language and thus reducer code can be reused as a combiner.
Aoswern A
commutatie and associatie. The executon of combiner is not guaranteed, Hadoop may or may not execute a
combiner. Also, if required it may execute it more then 1 tmes. Therefore your MapReduce jobs should not depend on
the combiners executon.
Question 83
Your client applicaton submits a MapReduce job to your Hadoop cluster. Identfy the Hadoop daemon on which the
Hadoop framework will look for an aiailable slot schedule a MapReduce operaton.
A. TaskTracker
B. NameNode
C. DataNode
D. JobTracker
P. Secondary NameNode
Aoswern D
JobTracker is the daemon seriice for submitng and tracking MapReduce jobs in Hadoop. There is only =ne Job
Tracker process run on any hadoop cluster. Job Tracker runs on its own JVM process. In a typical producton cluster its
run on a separate machine. Pach slaie node is confgured with job tracker node locaton. The JobTracker is single point
of failure for the Hadoop MapReduce seriice. If it goes down, all running jobs are halted. JobTracker in Hadoop
performs following actons(from Hadoop Wiki:)
Client applicatons submit jobs to the Job tracker.
________________________________________________________________________________________________
Page No | 32
The JobTracker talks to the NameNode to determine the locaton of the data
The JobTracker locates TaskTracker nodes with aiailable slots at or near the data
The JobTracker submits the work to the chosen TaskTracker nodes.
The TaskTracker nodes are monitored. If they do not submit heartbeat signals ofen enough, they are deemed to haie
failed and the work is scheduled on a diferent TaskTracker.
A TaskTracker will notfy the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the
job elsewhere, it may mark that specifc record as something to aioid, and it may may eien blacklist the TaskTracker
as unreliable.
When the work is completed, the JobTracker updates its status.
Client applicatons can poll the JobTracker for informaton.
Reference: 24 Interiiew Questons & Answers for Hadoop MapReduce deielopers, What is a JobTracker in Hadoop?
How many instances of JobTracker run on a Hadoop Cluster?
Question 84
Which project giies you a distributed, Scalable, data store that allows you random, realtme read:write access to
hundreds of terabytes of data?
A. HBase
B. Hue
C. Pig
D. Hiie
P. =ozie
F. Flume
G. Sqoop
Aoswern A
Use Apache HBase when you need random, realtme read:write access to your Big Data.
Note: This project's goal is the hostng of iery large tables -- billions of rows X millions of columns -- atop clusters of
commodity hardware. Apache HBase is an open-source, distributed, iersioned, column-oriented store modeled afer
Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leierages the
distributed data storage proiided by the Google File System, Apache HBase proiides Bigtable-like capabilites on top
of Hadoop and HDFS.
Features
Linear and modular scalability.
Strictly consistent reads and writes.
Automatc and confgurable sharding of tables
Automatc failoier support between RegionSeriers.
Conienient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
Pasy to use Jaia API for client access.
Block cache and Bloom Filters for real-tme queries.
Query predicate push down iia serier side Filters
Thrif gateway and a RPST-ful Web seriice that supports XML, Protobuf, and binary data encoding optons
Pxtensible jruby-based (JIRB) shell
Support for exportng metrics iia the Hadoop metrics subsystem to fles or Ganglia; or iia JMX
Reference: htp:::hbase.apache.org: (when would I use HBase? First sentence)
Question 85
________________________________________________________________________________________________
Page No | 33
You use the hadoop fs –put command to write a 300 MB fle using and HDFS block size of 64 MB. Just afer this
command has fnished writng 200 MB of this fle, what would another user see when trying to access this life?
A. They would see Hadoop throw an ConcurrentFileAccessPxcepton when they try to access this fle.
B. They would see the current state of the fle, up to the last bit writen by the command.
C. They would see the current of the fle through the last completed block.
D. They would see no content untl the whole fle writen and closed.
Aoswern C
Question 86
Identfy the tool best suited to import a porton of a relatonal database eiery day as fles into HDFS, and generate
Jaia classes to interact with that imported data?
A. =ozie
B. Flume
C. Pig
D. Hue
P. Hiie
F. Sqoop
G. fuse-dfs
Aoswern F
Sqoop (“SQL-to-HadoopN) is a straightorward command-line tool with the following capabilites:

Imports indiiidual tables or entre databases to fles in HDFS
Generates Jaia classes to allow you to interact with your imported data
Proiides the ability to import from SQL databases straight into your Hiie data warehouse
Note:
Data Moiement Between Hadoop and Relatonal Databases
Data can be moied between Hadoop and a relatonal database as a bulk data transfer, or relatonal tables can be
accessed from within a MapReduce map functon.
Note:
* Cloudera's Distributon for Hadoop proiides a bulk data transfer tool (i.e., Sqoop) that imports indiiidual tables or
entre databases into HDFS fles. The tool also generates Jaia classes that support interacton with the imported data.
Sqoop supports all relatonal databases oier JDBC, and Quest Sofware proiides a connector (i.e., =ra=op) that has
been optmized for access to data residing in =racle databases.
Reference: htp:::log.medcl.net:item:2011:08:hadoop-and-mapreduce-big-data-analytcs-gartner: (Data Moiement
between hadoop and relatonal databases, second paragraph)
Question 87
You haie a directory named jobdata in HDFS that contains four fles: _frst.txt, second.txt, .third.txt and #data.txt. How
many fles will be processed by the FileInputFormat.setInputPaths () command when it's giien a path object
representng this directory?
A. Four, all fles will be processed

B. Three, the pound sign is an inialid character for HDFS fle names
________________________________________________________________________________________________
Page No | 34
C. Two, fle names with a leading period or underscore are ignored

D. None, the directory cannot be named jobdata
P. =ne, no special characters can prefx the name of an input fle
Aoswern C
Files startng with '_' are considered 'hidden' like unix fles startng with '.'.
# characters are allowed in HDFS fle names.
Question 88
You write MapReduce job to process 100 fles in HDFS. Your MapReduce algorithm uses TextInputFormat: the mapper
applies a regular expression oier input ialues and emits key-ialues pairs with the key consistng of the matching text,
and the ialue containing the flename and byte ofset. Determine the diference between setng the number of
reduces to one and setngs the number of reducers to zero.
A. There is no diference in output between the two setngs.

B. With zero reducers, no reducer runs and the job throws an excepton. With one reducer, instances of matching
paterns are stored in a single fle on HDFS.
C. With zero reducers, all instances of matching paterns are gathered together in one fle on HDFS. With one reducer,
instances of matching paterns are stored in multple fles on HDFS.
D. With zero reducers, instances of matching paterns are stored in multple fles on HDFS. With one reducer, all
instances of matching paterns are gathered together in one fle on HDFS.
Aoswern D
* It is legal to set the number of reduce-tasks to zero if no reducton is desired.
In this case the outputs of the map-tasks go directly to the FileSystem, into the output path set by
set=utputPath(Path). The framework does not sort the map-outputs before writng them out to the FileSystem.
* =fen, you may want to process input data using a map functon only. To do this, simply set mapreduce.job.reduces
to zero. The MapReduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be
the fnal output of the job.
Note:
Reduce
In this phase the reduce(WritableComparable, Iterator, =utputCollector, Reporter) method is called for each <key, (list
of ialues)> pair in the grouped inputs.
The output of the reduce task is typically writen to the FileSystem iia =utputCollector.collect(WritableComparable,
Writable).
Applicatons can use the Reporter to report progress, set applicaton-leiel status messages and update Counters, or
just indicate that they are aliie.
The output of the Reducer is not sorted.
Question 89
A combiner reduces:
A. The number of ialues across diferent keys in the iterator supplied to a single reduce method call.
B. The amount of intermediate data that must be transferred between the mapper and reducer.
C. The number of input fles a mapper must process.
D. The number of output fles a reducer must produce.
________________________________________________________________________________________________
Page No | 35
Aoswern B
commutatie and associatie. The executon of combiner is not guaranteed, Hadoop may or may not execute a
combiner. Also, if required it may execute it more then 1 tmes. Therefore your MapReduce jobs should not depend on
the combiners executon.
Question 90
In a MapReduce job with 500 map tasks, how many map task atempts will there be?
A. It depends on the number of reduces in the job.

B. Between 500 and 1000.
C. At most 500.
D. At least 500.
P. Pxactly 500.
Aoswern D
Pxplanaton:
From Cloudera Training Course:
Task atempt is a partcular instance of an atempt to execute a task
– There will be at least as many task atempts as there are tasks
– If a task atempt fails, another will be started by the JobTracker
– Speculatie executon can also result in more task atempts than completed tasks
Question 91
MapReduce i2 (MRi2:YARN) splits which major functons of the JobTracker into separate daemons? Select two.
A. Heath states checks (heartbeats)

B. Resource management
C. Job scheduling:monitoring
D. Job coordinaton between the ResourceManager and NodeManager
P. Launching tasks
F. Managing fle system metadata
G. MapReduce metric reportng
H. Managing tasks
Aoswern B, C
The fundamental idea of MRi2 is to split up the two major functonalites of the JobTracker, resource management
and job scheduling:monitoring, into separate daemons. The idea is to haie a global ResourceManager (RM) and per-
applicaton ApplicatonMaster (AM). An applicaton is either a single job in the classical sense of Map-Reduce jobs or a
DAG of jobs.
________________________________________________________________________________________________
Page No | 36
Note:
The central goal of YARN is to clearly separate two things that are unfortunately smushed together in current Hadoop,
specifcally in (mainly) JobTracker:
: Monitoring the status of the cluster with respect to which nodes haie which resources aiailable. Under YARN, this
will be global.
: Managing the parallelizaton executon of any specifc job. Under YARN, this will be done separately for each job.
Question 92
What types of algorithms are difcult to express in MapReduce i1 (MRi1)?
A. Algorithms that require applying the same mathematcal functon to large numbers of indiiidual binary records.
B. Relatonal operatons on large amounts of structured and semi-structured data.
C. Algorithms that require global, sharing states.
D. Large-scale graph algorithms that require one-step link traiersal.
P. Text analysis algorithms on large collectons of unstructured text (e.g, Web crawls).
Aoswern C
See 3) below.
Limitatons of Mapreduce – where not to use Mapreduce
While iery powerful and applicable to a wide iariety of problems, MapReduce is not the answer to eiery problem.
Here are some problems I found where MapReudce is not suited and some papers that address the limitatons of
MapReuce.
1. Computaton depends on preiiously computed ialues
If the computaton of a ialue depends on preiiously computed ialues, then MapReduce cannot be used. =ne good
example is the Fibonacci series where each ialue is summaton of the preiious two ialues. i.e., f(k+2) p f(k+1) + f(k).
Also, if the data set is small enough to be computed on a single machine, then it is beter to do it as a single
reduce(map(data)) operaton rather than going through the entre map reduce process.
2. Full-text indexing or ad hoc searching
The index generated in the Map step is one dimensional, and the Reduce step must not generate a large amount of
data or there will be a serious performance degradaton. For example, CouchDB’s MapReduce may not be a good ft
for full-text indexing or ad hoc searching. This is a problem beter suited for a tool such as Lucene.
3. Algorithms depend on shared global state
Solutons to many interestng problems in text processing do not require global synchronizaton. As a result, they can
be expressed naturally in MapReduce, since map and reduce tasks run independently and in isolaton. Howeier, there
are many examples of algorithms that depend crucially on the existence of shared global state during processing,
making them difcult to implement in MapReduce (since the single opportunity for global synchronizaton in
MapReduce is the barrier between the map and reduce phases of processing)
Reference: Limitatons of Mapreduce – where not to use Mapreduce
Question 93
In the reducer, the MapReduce API proiides you with an iterator oier Writable ialues. What does calling the next ()
method return?
A. It returns a reference to a diferent Writable object tme.

B. It returns a reference to a Writable object from an object pool.
C. It returns a reference to the same Writable object each tme, but populated with diferent data.
________________________________________________________________________________________________
Page No | 37
D. It returns a reference to a Writable object. The API leaies unspecifed whether this is a reused object or a new
object.
P. It returns a reference to the same Writable object if the next ialue is the same as the preiious ialue, or a new
Writable object otherwise.
Aoswern C
Calling Iterator.next() will always return the SAMP PXACT instance of IntWritable, with the contents of that instance
replaced with the next ialue.
Reference: manupulatng iterator in mapreduce
Question 94
Table metadata in Hiie is:
A. Stored as metadata on the NameNode.

B. Stored along with the data in HDFS.
C. Stored in the Metastore.
D. Stored in ZooKeeper.
Aoswern C
By default, hiie use an embedded Derby database to store metadata informaton. The metastore is the "glue"
between Hiie and HDFS. It tells Hiie where your data fles liie in HDFS, what type of data they contain, what tables
they belong to, etc.
The Metastore is an applicaton that runs on an RDBMS and uses an open source =RM layer called DataNucleus, to
coniert object representatons into a relatonal schema and iice iersa. They chose this approach as opposed to
storing this informaton in hdfs as they need the Metastore to be iery low latency. The DataNucleus layer allows them
to plugin many diferent RDBMS technologies.
Note:
* By default, Hiie stores metadata in an embedded Apache Derby database, and other client:serier databases like
MySQL can optonally be used.
* features of Hiie include:
Metadata storage in an RDBMS, signifcantly reducing the tme to perform semantc checks during query executon.
Reference: Store Hiie Metadata into RDBMS
Question 95
Analyze each scenario below and indentfy which best describes the behaiior of the default parttoner?
A. The default parttoner assigns key-ialues pairs to reduces based on an internal random number generator.
B. The default parttoner implements a round-robin strategy, shufing the key-ialue pairs to each reducer in turn.
This ensures an eient partton of the key space.
C. The default parttoner computes the hash of the key. Hash ialues between specifc ranges are associated with
diferent buckets, and each bucket is assigned to a specifc reducer.
D. The default parttoner computes the hash of the key and diiides that ialule modulo the number of reducers. The
result determines the reducer assigned to process the key-ialue pair.
P. The default parttoner computes the hash of the ialue and takes the mod of that ialue with the number of
reducers. The result determines the reducer assigned to process the key-ialue pair.
________________________________________________________________________________________________
Page No | 38
Aoswern D
The default parttoner computes a hash ialue for the key and assigns the partton based on this result.
The default Parttoner implementaton is called HashParttoner. It uses the hashCode() method of the key objects
modulo the number of parttons total to determine which partton to send a giien (key, ialue) pair to.
In Hadoop, the default parttoner is HashParttoner, which hashes a record’s key to determine which partton (and
thus which reducer) the record belongs in.The number of partton is then equal to the number of reduce tasks for the
job.
Reference: Getng Started With (Customized) Parttoning
Question 96
You need to moie a fle ttled “weblogsN into HDFS. When you try to copy the fle, you can’t. You know you haie
ample space on your DataNodes. Which acton should you take to relieie this situaton and store more fles in HDFS?
A. Increase the block size on all current fles in HDFS.

B. Increase the block size on your remaining fles.
C. Decrease the block size on your remaining fles.
D. Increase the amount of memory for the NameNode.
P. Increase the number of disks (or size) for the NameNode.
F. Decrease the block size on all current fles in HDFS.
Aoswern C
Question 97
In a large MapReduce job with m mappers and n reducers, how many distnct copy operatons will there be in the
sort:shufe phase?
A. mXn (i.e., m multplied by n)

B. n
C. m
D. m+n (i.e., m plus n)
P. mn (i.e., m to the power of n)
Aoswern A
A MapReduce job with m mappers and r reducers iniolies up to m * r distnct copy operatons, since each mapper
may haie intermediate output going to eiery reducer.
Question 98
Workfows expressed in =ozie can contain:
A. Sequences of MapReduce and Pig. These sequences can be combined with other actons including forks, decision
points, and path joins.
B. Sequences of MapReduce job only; on Pig on Hiie tasks or jobs. These MapReduce sequences can be combined
with forks and path joins.
C. Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actons with excepton handlers
________________________________________________________________________________________________
Page No | 39
but no forks.
D. Iterntie repetton of MapReduce jobs untl a desired answer or state is reached.
Aoswern A
=ozie workfow is a collecton of actons (i.e. Hadoop Map:Reduce jobs, Pig jobs) arranged in a control dependency
DAG (Direct Acyclic Graph), specifying a sequence of actons executon. This graph is specifed in hPDL (a XML Process
Defniton Language).
hPDL is a fairly compact language, using a limited amount of fow control and acton nodes. Control nodes defne the
fow of executon and include beginning and end of a workfow (start, end and fail nodes) and mechanisms to control
the workfow executon path ( decision, fork and join nodes).
Workfow defnitons
Currently running workfow instances, including instance states and iariables
Reference: Introducton to =ozie
Note: =ozie is a Jaia Web-Applicaton that runs in a Jaia serilet-container - Tomcat and uses a database to store:
Question 99
Which best describes what the map method accepts and emits?
A. It accepts a single key-ialue pair as input and emits a single key and list of corresponding ialues as output.
B. It accepts a single key-ialue pairs as input and can emit only one key-ialue pair as output.
C. It accepts a list key-ialue pairs as input and can emit only one key-ialue pair as output.
D. It accepts a single key-ialue pairs as input and can emit any number of key-ialue pair as output, including zero.
Aoswern D
public class Mapper<KPYIN,VALUPIN,KPY=UT,VALUP=UT>

extends =bject
Maps input key:ialue pairs to a set of intermediate key:ialue pairs.
Maps are the indiiidual tasks which transform input records into a intermediate records. The transformed
intermediate records need not be of the same type as the input records. A giien input pair may map to zero or many
output pairs.
Reference: org.apache.hadoop.mapreduce
Class Mapper<KPYIN,VALUPIN,KPY=UT,VALUP=UT>
Question 100
When can a reduce class also serie as a combiner without afectng the output of a MapReduce program?
A. When the types of the reduce operaton’s input key and input ialue match the types of the reducer’s output key
and output ialue and when the reduce operaton is both communicatie and associatie.
B. When the signature of the reduce method matches the signature of the combine method.
C. Always. Code can be reused in Jaia since it is a polymorphic object-oriented programming language.
D. Always. The point of a combiner is to serie as a mini-reducer directly afer the map phase to increase performance.
P. Neier. Combiners and reducers must be implemented separately because they serie diferent purposes.
Aoswern A
You can use your reducer code as a combiner if the operaton performed is commutatie and associatie.
________________________________________________________________________________________________
Page No | 40
Question 101
You want to perform analysis on a large collecton of images. You want to store this data in HDFS and process it with
MapReduce but you also want to giie your data analysts and data scientsts the ability to process the data directly
from HDFS with an interpreted high-leiel programming language like Python. Which format should you use to store
this data in HDFS?
A. SequenceFiles
B. Airo
C. JS=N
D. HTML
P. XML
F. CSV
Aoswern B
Reference: Hadoop binary fles processing introduced by image duplicates fnder
Question 102
You want to run Hadoop jobs on your deielopment workstaton for testng before you submit them to your producton
cluster. Which mode of operaton in Hadoop allows you to most closely simulate a producton cluster while using a
single machine?
A. Run all the nodes in your producton cluster as iirtual machines on your deielopment workstaton.
B. Run the hadoop command with the –jt local and the –fs fle::::optons.
C. Run the DataNode, TaskTracker, NameNode and JobTracker daemons on a single machine.
D. Run simldooop, the Apache open-source sofware for simulatng Hadoop clusters.
Aoswern C
Question 103
Your cluster’s HDFS block size in 64MB. You haie directory containing 100 plain text fles, each of which is 100MB in
size. The InputFormat for your job is TextInputFormat. Determine how many Mappers will run?
A. 64
B. 100
C. 200
D. 640
Aoswern C
Pach fle would be split into two as the block size (64 MB) is less than the fle size (100 MB), so 200 mappers would be
running.
Note:
________________________________________________________________________________________________
Page No | 41
If you're not compressing the fles then hadoop will process your large fles (say 10G), with a number of mappers
related to the block size of the fle.
Say your block size is 64M, then you will haie ~160 mappers processing this 10G fle (160*64 ~p 10G). Depending on
how CPU intensiie your mapper logic is, this might be an
acceptable blocks size, but if you fnd that your mappers are executng in sub minute tmes, then you might want to
increase the work done by each mapper (by increasing the block size to 128, 256, 512m - the actual size depends on
how you intend to process the data).
Reference: htp:::stackoierfow.com:questons:11014493:hadoop-mapreduce-appropriate-input-fles-size (frst
answer, second paragraph)
Question 104
What is a SequenceFile?
A. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous writable objects.

B. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous writable objects.
C. A SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects, in sorted order.
D. A SequenceFile contains a binary encoding of an arbitrary number key-ialue pairs. Pach key must be the same type.
Pach ialue must be same type.
Aoswern D
SequenceFile is a fat fle consistng of binary key:ialue pairs.

There are 3 diferent SequenceFile formats:
Uncompressed key:ialue records.
Record compressed key:ialue records - only 'ialues' are compressed here.
Block compressed key:ialue records - both keys and ialues are collected in 'blocks' separately and compressed. The
size of the 'block' is confgurable.
Reference: htp:::wiki.apache.org:hadoop:SequenceFile
Question 105
Which HDFS command displays the contents of the fle x in the user's HDFS home directory?
A. hadoop fs -Is x
B. hdfs fs -get x
C. hadoop fs -cat x
D. hadoop fs -cp x
Aoswern C
Question 106
Which HDFS command copies an HDFS fle named foo to the local flesystem as localFoo?
A. hadoop fs -get foo LocalFoo

B. hadoop -cp foo LocalFoo
C. hadoop fs -Is foo
D. hadoop fs -put foo LocalFoo
________________________________________________________________________________________________
Page No | 42
Aoswern A
Question 107
Which of the following tool was designed to import data from a relatonal database into HDFS?
A. HCatalog
B. Sqoop
C. Flume
D. Ambari
Aoswern B
Question 108
You want to Ingest log fles Into HDFS, which tool would you use?
A. HCatalog
B. Flume
C. Sqoop
D. Ambari
Aoswern B
________________________________________________________________________________________________

HDPCD PRACTICE EXAM - Horton Works

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HDPCD PRACTICE EXAM - Horton Works

Uploaded by

Copyright:

Available Formats

Page No | 1

Product Questions: 108

A NameNode in Hadoop 2.2 manages ______________.

A. Two namespaces: an actie namespace and a backup namespace

A. An HDFS fle that is larger than dfs.block.size is split into blocks

What does the following WebHDFS command do?

A. Make a directory :foo:bar

A. Pach DataNode receiies commands from one designated master NameNode.

Which one of the following statements is true regarding a MapReduce job?

A. The ApplicatonMaster starts the NodeManager in a Container

A. The ApplicatonMaster requests resources from the ResourceManager

A. A Container executes a specifc task as assigned by the ApplicatonMaster

What does Pig proiide to the oierall Hadoop soluton?

A. Legacy language Integraton with MapReduce framework

C. =rdered set of felds, ordered collecton of tuples, ordered collecton of maps

Reiiew the following data and Pig code.

A. ST=RP A INT= &apos;myoutput&apos; USING PigStorage(&apos;,&apos;);

Consider the following two relatons, A and B.

A. C p D=IN B BY a1, A by b2;

Consider the following two relatons, A and B.

Consider the following two relatons, A and B.

What is the output of the following Pig commands?

What does the following command do?

A. Iniokes the user-defned functons contained in the jar fle

A. Accepts only data that has a key:ialue pair structure

Pxamine the following Pig commands:

Which one of the following statements is true?

A. The SAMPLP command generates an "unexpected symbol" error

Reiiew the following data and Pig code:

A. B p FILTPR A BY (zip p p '95102' AND gender p p M");

A. The contents of myfle appear on stdout

Giien the following Pig command:

Reiiew the following &apos;data&apos; fle and Pig code.

Which one of the following statements is true?

A. The =utput =f the DUMP D command IS (M,{(M,62.95102),(M,38,95111)})

Giien the following Pig commands:

Which one of the following statements is true?

A. The $1 iariable represents the frst column of data in 'my.log'

A. Defne an alias to shorten the functon name

Which one of the following statements is true about a Hiie-managed table?

Pxamine the following Hiie statements:

A. Pach reducer generates a fle sorted by age

B. The S=RT BY command causes only one reducer to be used

Which one of the following statements is false about HCatalog?

A. Proiides a shared schema mechanism

Pxamine the following Hiie statements:

A. =perates on multple input rows and creates a single row as output

Giien the following Hiie commands:

Which one of the following statements Is true?

A. The fle mydata.txt is copied to a subfolder of :apps:hiie:warehouse

Giien the following Hiie command:

Which one of the following statements is true?

A. The fles in the mydata folder are copied to a subfolder of :apps:hlie:warehouse

Giien the following Hiie command:

A. The contents of myothertable are appended to mytable

Assuming the following Hiie query executes successfully:

Which one of the following statements describes the result set?

Which one of the following is N=T a ialid =ozie acton?

Which describes how a client reads a fle from HDFS?

A. Combiner <Text, IntWritable, Text, IntWritable>

Reducer has 3 primary phases:

Reducer has 3 primary phases:

A. ST=RP A INT= 'myoutput' USING PigStorage(',');

Reiiew the following 'data' fle and Pig code.