You are on page 1of 1

Big Data Testing : Why & How

The use of Big Data is becoming a crucial way for leading companies to outperform their
peers. In most industries, established competitors and new entrants alike will leverage data-
driven strategies to innovate, compete, and capture value.

By now it is mostly common knowledge, that Big Data is essentially a collection of large
volume of information collected through myriad sources, in various formats. Big Data is
growing at a rapid pace. According to IBM, 90% of the world’s data has been created in the
past 2 years.

And as we all know with Big Data comes bad data that cannot be processed using traditional
computing techniques. Big Data testing is completely different. The primary goals of data
testing your Big Data are verifying data completeness, ensure data transformation, ensure
data quality, automate the regression testing. Testing of these datasets involves various
tools, techniques and frameworks to process.

Big Data Testing Strategy


As data engineering and data analytics advances to a next level, Big data testing is
inevitable. Testing Big Data application is more a verification of its data processing rather
than testing the individual features of the software product. When it comes to Big data
testing, performance and functional testing are the key.

● Big data processing could be Batch, Real-Time, or Interactive


● Big Data Testing can be broadly divided into three steps
○ Data staging validation: The first step of big data testing, also referred as
pre-Hadoop stage involves process validation.
○ "MapReduce" validation: The second step is a validation of "MapReduce".
In this stage, the tester verifies the business logic validation on every node
and then validating them after running against multiple nodes.
○ Output validation phase: The final or third stage of Big Data testing is the
output validation process. The output data files are generated and ready to be
moved to an EDW (Enterprise Data Warehouse) or any other system based
on the requirement.
● Architecture Testing is the important phase of Big data testing, as poorly designed
system may lead to unprecedented errors and degradation of performance
● Performance testing for Big data includes verifying
○ Data throughput
○ Data processing
○ Sub-component performance
● Big Data Testing challenges include virtualization, test automation and dealing with
large dataset. Performance testing of Big Data applications is also an issue.

You might also like