DATA ANALYTICS Lab

BIG DATA ANALYTICS
Semester III Hours of Instruction/week: 4

No of credits: 3
17MITC15
Objective
 To explore the fundamental concepts of big data analytics, Filtering Streams, Hadoop,
and Visual data analysis techniques.
UNIT I INTRODUCTION TO BIG DATA 12

Introduction to BigData Platform – Challenges of Conventional Systems - Intelligent data
analysis – Nature of Data - Analytic Processes and Tools - Analysis vs Reporting - Modern Data
Analytic Tools - Statistical Concepts: Sampling Distributions - Re-Sampling - Statistical
Inference - Prediction Error.
UNIT II MINING DATA STREAMS

Introduction To Streams Concepts – Stream Data Model and Architecture - Stream Computing - 12
Sampling Data in a Stream – Filtering Streams – Counting Distinct Elements in a Stream –
Estimating Moments – Counting Oneness in a Window – Decaying Window - Real time
Analytics Platform(RTAP)
Applications - Case Studies - Real Time Sentiment Analysis, Stock Market Predictions.
UNIT III HADOOP 12

History of Hadoop- The Hadoop Distributed File System – Components of Hadoop- Analyzing
the Data with Hadoop- Scaling Out- Hadoop Streaming- Design of HDFS- Java interfaces to
HDFS-Basics-Developing a Map Reduce Application-How Map Reduce Works-Anatomy of a
Map Reduce Job run-Failures-Job Scheduling-Shuffle and Sort – Task execution - Map Reduce
Types and Formats- Map Reduce Features
UNIT IV HADOOP ENVIRONMENT 12

Setting up a Hadoop Cluster - Cluster specification - Cluster Setup and Installation - Hadoop
Configuration-Security in Hadoop - Administering Hadoop – HDFS - Monitoring-Maintenance-
Hadoop benchmarks- Hadoop in the cloud
UNIT V FRAMEWORKS 12
Applications on Big Data Using Pig and Hive – Data processing operators in Pig – Hive services
– HiveQL – Querying Data in Hive - fundamentals of HBase and ZooKeeper - IBM InfoSphere
BigInsights and Streams. Visualizations - Visual data analysis techniques, interaction techniques;
Systems and applications
Total Hours: 60
REFERENCES
1. Michael Berthold, David J. Hand, “Intelligent Data Analysis”, Springer, 2007.

2. Tom White “ Hadoop: The Definitive Guide” Third Edition, O’reilly Media, 2012.
3. Chris Eaton, Dirk DeRoos, Tom Deutsch, George Lapis, Paul Zikopoulos,
“Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data”,
McGrawHill Publishing, 2012
4. Anand Rajaraman and Jeffrey David Ullman, “Mining of Massive Datasets”, Cambridge
University Press, 2012.
5. Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data
Streams with Advanced Analytics”, John Wiley & sons, 2012.
6. Glenn J. Myatt, “Making Sense of Data”, John Wiley & Sons, 2007
7. Pete Warden, “Big Data Glossary”, O’Reilly, 2011.
8. Jiawei Han, Micheline Kamber “Data Mining Concepts and Techniques”, Second
Edition, Elsevier, Reprinted 2008.
9. Da Ruan,Guoquing Chen, Etienne E.Kerre, Geert Wets, Intelligent Data Mining,
Springer,2007
10. Paul Zikopoulos ,Dirk deRoos , Krishnan Parasuraman , Thomas Deutsch , James Giles ,
David Corrigan , Harness the Power of Big Data The IBM Big Data Platform, Tata
McGraw Hill Publications, 2012.
11. Zikopoulos, Paul, Chris Eaton, Understanding Big Data: Analytics for Enterprise Class
Hadoop and Streaming Data, Tata McGraw Hill Publications, 2011.
DATA ANALYTICS -PRACTICAL IV
Semester III Hours of Instruction/week:6

17MITC18 No of credits: 4
List of Programs
1. Implement the following Data structures

a)Linked Lists
b) Stacks
c) Queues
d) Set
e) Map
2. Perform setting up and Installing Hadoop in its different operating modes:
a).Standalone,
b).Pseudo distributed,
c).Fully distributed.
2. Use web based tools to monitor your Hadoop setup.
4. Implement the following file management tasks in Hadoop:
a).Adding files and directories
b).Retrieving files
c).Deleting files
Hint: A typical Hadoop workflow creates data files (such as log files) elsewhere and copies them
into HDFS using one of the above command line utilities.
5. Map Reduce application for word counting on Hadoop cluster.
6. Write a Map Reduce program that mines weather data.

Weather sensors collecting data every hour at many locations across the globe gather a
large volume of log data, which is a good candidate for analysis with MapReduce, since it
is semi structured and record-oriented.
7. Implement Matrix Multiplication with Hadoop Map Reduce
8. Install and Run Pig then write Pig Latin scripts to sort, group, join, project, and filter your
data.
9. Install and Run Hive then use Hive to create, alter, and drop databases, tables, views,
functions, and indexes
10. Unstructured data into NoSQL data and do all operations such as NoSQL query with API.

DATA ANALYTICS Lab

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DATA ANALYTICS Lab

Uploaded by

Copyright:

Available Formats

BIG DATA ANALYTICS

Semester III Hours of Instruction/week: 4

UNIT I INTRODUCTION TO BIG DATA 12

UNIT II MINING DATA STREAMS

UNIT III HADOOP 12

UNIT IV HADOOP ENVIRONMENT 12

1. Michael Berthold, David J. Hand, “Intelligent Data Analysis”, Springer, 2007.

Semester III Hours of Instruction/week:6

1. Implement the following Data structures

6. Write a Map Reduce program that mines weather data.

You might also like