You are on page 1of 6

500 Hadoop Interview Questions will be covered as part

of the Hadoop Online Training.

500 Hadoop Interview Questions

1. What is your favourite tool in the hadoop ecosystem?


2. In you previous project, did you maintain the hadoop cluster in-house or used hadoop in the
cloud?
3. Mention what is the difference between an RDBMS and Hadoop?
4. Mention Hadoop core components?
5. What is NameNode in Hadoop?
6. Mention what are the data components used by Hadoop?
7. Mention what is the data storage component used by Hadoop?
8. Mention what are the most common input formats defined in Hadoop?
9. In Hadoop what is InputSplit?
10. For a Hadoop job, how will you write a custom partitioner?
11. For a job in Hadoop, is it possible to change the number of mappers to be created?
12. Explain what is a sequence file in Hadoop?
13. What happens if you try to run a Hadoop job with an output directory that is already present?
14. How can you debug Hadoop code?
15. What is speculative execution in Hadoop?
16. How can I restart NameNode or all the daemons in Hadoop?
17. What is the difference between an HDFS Block and an Input Split?
18. Name the three modes in which Hadoop can run.
19. What is MapReduce? What is the syntax to run a MapReduce program?
20. What are the main configuration parameters in a MapReduce program?
21. State the reason why we cant perform aggregation (addition) in mapper? Why do we need the
reducer for this?
22. What is the purpose of RecordReader in Hadoop?
23. Explain Distributed Cache in a MapReduce Framework.
24. How do reducers communicate with each other?
25. What does a MapReduce Partitioner do?
26. How will you write a custom partitioner?
27. What is a Combiner?
28. What do you know about SequenceFileInputFormat?
29. What are the benefits of Apache Pig over MapReduce?
30. What are different data types in Pig Latin?
31. What are the different relational operations in Pig Latin you worked with?
32. What is a UDF?
33. What is SerDe in Hive?
34. Explain the differences between Hadoop 1.x and Hadoop 2.x
35. What are the core changes in Hadoop 2.0?
36. Differentiate between NFS, Hadoop NameNode and JournalNode.
37. What are the modules that constitute the Apache Hadoop 2.0 framework?
38. How is the distance between two nodes defined in Hadoop?
39. What is the size of the biggest hadoop cluster a company X operates?
40. For what kind of big data problems, did the organization choose to use Hadoop?
41. What kind of data the organization works with or what are the HDFS file formats the company
uses?
42. When Namenode is down what happens to job tracker?
43. Explain how indexing in HDFS is done?
44. Explain is it possible to search for files using wildcards?
45. List out Hadoops three configuration files?
46. Explain how can you check whether Namenode is working beside using the jps command?
47. Explain what is map and what is reducer in Hadoop?
48. In Hadoop, which file controls reporting in Hadoop?
49. For using Hadoop list the network requirements?
50. Mention what is rack awareness?
51. Explain what is a Task Tracker in Hadoop?
52. Explain how can you debug Hadoop code?
53. Explain what is storage and compute nodes?
54. Mention what is the use of Context Object?
55. Mention what is the next step after Mapper or MapTask?
56. Mention what is the number of default partitioner in Hadoop?
57. Explain what is the purpose of RecordReader in Hadoop?
58. Explain how is data partitioned before it is sent to the reducer if no custom partitioner is defined
in Hadoop?
59. Explain what happens when Hadoop spawned 50 tasks for a job and one of the task failed?
60. Mention what is the best way to copy files between HDFS clusters?
61. How to configure Replication Factor in HDFS?
62. Can free form SQL queries be used with Sqoop import command? If yes, then how can they be
used?
63. Differentiate between Sqoop and distCP.
64. What are the limitations of importing RDBMS tables into Hcatalog directly?
65. What are the benefits of using counters in Hadoop?
66. How can you write a customer partitioner?
67. What are some of the jobs that job trackers do?
68. How will you describe a sequence file?
69. Tell us about the ways of executing in Apache Pig?
70. How to compress mapper output but not the reducer output?
71. What is the difference between Map Side join and Reduce Side Join?
72. How can you transfer data from Hive to HDFS?Wish to Learn Hadoop ? Click Here
73. What companies use Hadoop, any idea?
74. Explain Big Data and what are five Vs of Big Data?
75. What is Hadoop and its components.
76. What are HDFS and YARN?
77. What all modes Hadoop can be run in?
78. What is SequenceFile in Hadoop?
79. What is Job Tracker role in Hadoop?
80. Tell me about the various Hadoop daemons and their roles in a Hadoop cluster.
81. Compare HDFS with Network Attached Storage (NAS).
82. What is a rack awareness and on what basis is data stored in a rack?
83. What happens to a NameNode that has no data?
84. What happens when a user submits a Hadoop job when the NameNode is down- does the job
get in to hold or does it fail.
85. What happens when a user submits a Hadoop job when the Job Tracker is down- does the job
get in to hold or does it fail.
86. Whenever a client submits a hadoop job, who receives it?
87. Explain the usage of Context Object
88. What are the core methods of a Reducer?
89. Explain about the partitioning, shuffle and sort phase
90. How to write a custom partitioner for a Hadoop MapReduce job?
91. When should you use HBase and what are the key components of HBase?
92. What are the different operational commands in HBase at record level and table level?
93. What is Row Key?
94. Explain the difference between RDBMS data model and HBase data model
95. Explain about the different catalog tables in HBase?
96. What is column families? What happens if you alter the block size of ColumnFamily on an
already populated database?
97. Explain the difference between HBase and Hive.
98. Explain the process of row deletion in HBase
99. What are the different types of tombstone markers in HBase for deletion?
100. Explain about HLog and WAL in HBase.
101. Explain about some important Sqoop commands other than import and export.
102. How Sqoop can be used in a Java program?
103. What is the process to perform an incremental data load in Sqoop?
104. Is it possible to do an incremental import using Sqoop?
105. Many more on the hadoop online training
106. What is SerDe in Hive? How can you write your own custom SerDe?
107. What are the stable versions of Hadoop?
108. What is Apache Hadoop YARN?
109. Is YARN a replacement of Hadoop MapReduce?
110. Explain about the different channel types in Flume. Which channel type is faster?
111. Which is the reliable channel in Flume to ensure that there is no data loss?
112. Explain about the replication and multiplexing selectors in Flume.
113. How multi-hop agent can be setup in Flume?
114. What are the basic differences between relational database and HDFS?
115. List the difference between Hadoop 1 and Hadoop 2.
116. What are active and passive NameNodes?
117. Mention how many InputSplits is made by a Hadoop Framework?
118. Mention what is distributed cache in Hadoop?
119. Explain how does Hadoop Classpath plays a vital role in stopping or starting in Hadoop
daemons?
120. How does the framework of Hadoop work or How hadoop works?
121. Give me any three differences between NAS and HDFS?
122. What do you mean my column families? What happens if the size of Column Family is
alterated?
123. What is the difference between HBase and Hive?
124. What do you mean by the term speculative execution in hadoop?
125. How is HDFS fault tolerant?
126. Can NameNode and DataNode be a commodity hardware?
127. Why do we use HDFS for applications having large data sets and not when there are a lot of
small files?
128. How do you define block in HDFS? What is the default block size in Hadoop 1 and in
Hadoop 2? Can it be changed?
129. What does jps command do?
130. How do you define Rack Awareness in Hadoop?
131. What is Speculative Execution in Hadoop?
132. What is the most complex problem the company is trying to solve using Apache Hadoop?
133. Will I get an opportunity to attend big data conferences? Or will the organization incur any
costs involved in taking advanced hadoop or big data certification?
134. What are the challenges that you faced when implementing hadoop projects?
135. How were you involved in data modelling, data ingestion, data transformation and data
aggregation?
136. What is your favourite tool in the hadoop ecosystem?
137. In you previous project, did you maintain the hadoop cluster in-house or used hadoop in the
cloud?
138. Mention what is the difference between an RDBMS and Hadoop?
139. Mention Hadoop core components?
140. What is NameNode in Hadoop?
141. Mention what are the data components used by Hadoop?
142. Mention what is the data storage component used by Hadoop?
143. Mention what are the most common input formats defined in Hadoop?
144. In Hadoop what is InputSplit?
145. For a Hadoop job, how will you write a custom partitioner?
146. For a job in Hadoop, is it possible to change the number of mappers to be created?
147. Explain what is a sequence file in Hadoop?
148. What happens if you try to run a Hadoop job with an output directory that is already present?
149. How can you debug Hadoop code?
150. What is speculative execution in Hadoop?
151. How can I restart NameNode or all the daemons in Hadoop?
152. What is the difference between an HDFS Block and an Input Split?
153. Name the three modes in which Hadoop can run.
154. What is MapReduce? What is the syntax to run a MapReduce program?
155. What are the main configuration parameters in a MapReduce program?
156. State the reason why we cant perform aggregation (addition) in mapper? Why do we need
the reducer for this?
157. What is the purpose of RecordReader in Hadoop?
158. Explain Distributed Cache in a MapReduce Framework.
159. How do reducers communicate with each other?
160. What does a MapReduce Partitioner do?
161. How will you write a custom partitioner?
162. What is a Combiner?Get Started with Big Data Hadoop!
163. What do you know about SequenceFileInputFormat?
164. What are the benefits of Apache Pig over MapReduce?
165. What are different data types in Pig Latin?
166. What are the different relational operations in Pig Latin you worked with?
167. What is a UDF?
168. What is SerDe in Hive?
169. Explain the differences between Hadoop 1.x and Hadoop 2.x
170. What are the core changes in Hadoop 2.0?
171. Differentiate between NFS, Hadoop NameNode and JournalNode.
172. What are the modules that constitute the Apache Hadoop 2.0 framework?
173. How is the distance between two nodes defined in Hadoop?
174. What is the size of the biggest hadoop cluster a company X operates?
175. For what kind of big data problems, did the organization choose to use Hadoop?
176. What kind of data the organization works with or what are the HDFS file formats the
company uses?
177. When Namenode is down what happens to job tracker?
178. Explain how indexing in HDFS is done?
179. Explain is it possible to search for files using wildcards?
180. List out Hadoops three configuration files?
181. Explain how can you check whether Namenode is working beside using the jps command?
182. Explain what is map and what is reducer in Hadoop?
183. In Hadoop, which file controls reporting in Hadoop?
184. For using Hadoop list the network requirements?
185. Mention what is rack awareness?
186. Explain what is a Task Tracker in Hadoop?
187. Explain how can you debug Hadoop code?
188. Explain what is storage and compute nodes?
189. Mention what is the use of Context Object?
190. Mention what is the next step after Mapper or MapTask?
191. Mention what is the number of default partitioner in Hadoop?
192. Explain what is the purpose of RecordReader in Hadoop?
193. Explain how is data partitioned before it is sent to the reducer if no custom partitioner is
defined in Hadoop?
194. Explain what happens when Hadoop spawned 50 tasks for a job and one of the task failed?
195. Mention what is the best way to copy files between HDFS clusters?
196. How to configure Replication Factor in HDFS?
197. Can free form SQL queries be used with Sqoop import command? If yes, then how can they
be used?
198. Differentiate between Sqoop and distCP.
199. What are the limitations of importing RDBMS tables into Hcatalog directly?
200. What are the benefits of using counters in Hadoop?
201. How can you write a customer partitioner?
202. What are some of the jobs that job trackers do?
203. How will you describe a sequence file?
204. Tell us about the ways of executing in Apache Pig?
205. How to compress mapper output but not the reducer output?
206. What is the difference between Map Side join and Reduce Side Join?
207. How can you transfer data from Hive to HDFS?
208. What companies use Hadoop, any idea?
209. Explain Big Data and what are five Vs of Big Data?
210. What is Hadoop and its components.
211. What are HDFS and YARN?
212. What all modes Hadoop can be run in?
213. What is SequenceFile in Hadoop?
214. What is Job Tracker role in Hadoop?
215. Tell me about the various Hadoop daemons and their roles in a Hadoop cluster.
216. Compare HDFS with Network Attached Storage (NAS).
217. What is a rack awareness and on what basis is data stored in a rack?
218. What happens to a NameNode that has no data?
219. What happens when a user submits a Hadoop job when the NameNode is down- does the
job get in to hold or does it fail.
220. What happens when a user submits a Hadoop job when the Job Tracker is down- does the
job get in to hold or does it fail.
221. Whenever a client submits a hadoop job, who receives it?
222. Explain the usage of Context Object
223. What are the core methods of a Reducer?
224. Explain about the partitioning, shuffle and sort phase
225. How to write a custom partitioner for a Hadoop MapReduce job?
226. When should you use HBase and what are the key components of HBase?
227. What are the different operational commands in HBase at record level and table level?
228. What is Row Key?
229. Explain the difference between RDBMS data model and HBase data model
230. Explain about the different catalog tables in HBase?
231. What is column families? What happens if you alter the block size of ColumnFamily on an
already populated database?
232. Explain the difference between HBase and Hive.
233. Explain the process of row deletion in HBase
234. What are the different types of tombstone markers in HBase for deletion?
235. Explain about HLog and WAL in HBase.
236. Explain about some important Sqoop commands other than import and export.
237. How Sqoop can be used in a Java program?
238. What is the process to perform an incremental data load in Sqoop?
239. Is it possible to do an incremental import using Sqoop?
240. Many more on the hadoop online training
241. .Get Hadoop Certification in 50 Hours!

You might also like