You are on page 1of 26

Slide Pack

Pedagogy .
Data
Warehousing

Data
Warehousing

Text Book 1 On Data Warehousing,

Data Mining

Data Mining

Business
Intelligence

Lab Sessions

Pedagogy
Theory Classes and Lab Sessions
End Term No Mid Term

Projects Submission and Presentation


Formal Class Quiz (No presentation)
Class Participation This is likely to be
augmented by End of the class assignments

Overall Ecology

Analytics 3.0
Analytics 2.0
Era of Big Data

Era of Data
Enriched
Offerings

Analytics 1.0
Era of Business
Intelligence

BC AD
BG AG
BBD - ABG

System Evolution

1990s

ERP
Systems to run day to
day operations in a
integrated manner
1960s
Standalone Applications
Inventory, Accounts, Order
Processing
Systems to run Day to day operations

Overwhelmed by Scale,
Global Operations
Long Terms Decisions

Information Crisis !!! Only the nature changed

1960s
Scarcity of Information

2000s
Information Ubiquity

Information Scarcity to Ubiquity

Gateway to
Information

Web !
Semantic !
3.0 !

Making Sense Out of Data Difficult


Too Much of Data

Technology was supporting storage and capture of more and more data
Number of Transactions was increasing exponentially
IoT
Digital Exhaust

Data in Multiple Systems

Custom Developed Software


SAP
Oracle
Social Media
..

Data in multiple structures


Structured, Unstructured
Videos, Images etc

US Library of Congress
600 + MUSD Budget, 3000 + Employees
Walmart
Database Size : 2.5 Petabytes
167 times that of US Library of Congress

DATA DOUBLES EVERY 18 MONTHS / 24 MONTHS

Data

800 T byte per day

Data Transfer:
I T Byte :
Hard Disk : 500 MB per second
33 Min

Big Data
Refers to datasets whose size is beyond the
ability of typical database software tools to
capture, store, manage and analyze.
We do not define it as these many TB etc.
Size of dataset can vary by the industry depending
upon the Industry (nature of data, speed of data etc.)
As technology improves the size of data sets that can
be tackled also increase

Big Data : Tackled by Distributed


Computing

500 MB per Second


33 Minutes

500 MB per Second


3 Minutes

Storage is a smaller challenge as compared to reading and crunching data

Big Data - Distributed File System (DFS)

Physically at different
place but logically look like
at one place.

Big Data - Hadoop


Hadoop is a
framework that
allows for
distributed
processing of
large data sets
across clusters of
commodity
computers using
a simple
programming
model.
Human Genome Project : 10 years / 1
week.

Data 3V Model
3Vs
V : Volume
Volume of Data

V : Variety
Range of Data and Data Sources
Data, Voice, Videos, Structured Data / Unstructured Data etc.

V : Velocity
Speed of Data
IoT, Machine Interfaces, SmartGrid Systems

Other Vs
Value (of Data), Veracity (truthfulness of data)

Technologies Helping us Comprehend


Explosive Growth of Data
Technologies made elaborate data analysis
possible
Moores Law

Distributed Computing
Data Warehousing
Data Mining

Human Genome Project


10 Years 1 week

BI Names Galore
Business
Intelligence

Competitive
Intelligence

Decision
Support
Systems

Business
Analytics

Executive
Information
System

Strategic
Information
Systems

???

Strategic Information

Strategic Information Systems Evolution

Operational Information

Small
Applications
Special
Extract
Programs
Ad hoc /
Canned
Reports

Time

Information
Centers

Decision
Support
System

Executive
Information
Systems

Early Decision Support Systems Failed !

Operational System

Informational System

Operational Systems/ Informational Systems

Informational System New Class of Systems


Separate from Operational System
Is not used for daily running of business
Is used for strategic analysis and decision making
Running of simple queries against current / historical
data
What if analysis
Ability to query, step back, analyze and then continue
the process
Ability to spend historical trends

Business Intelligence

Data to
Information

Information
to
Knowledge

Business
Intelligence

Business Intelligence

Data Warehousing

The focus is not on generating new data but to use existing data.
Existing Data transformed into forms that is suitable for providing strategic information.
User Centric Environment Users are put in touch with the Data
Informational Environment that is Flexible ? , Interactive ? , User Driven ?
Combines Internal Data and External Data
Combines Historical Data with Current Data (but not Real Time Data)

ANNEXURE

Attempts of IT to provide Strategic Information


Floundered
1. Too many Requests
2. Ever Changing Requests
3. Supplementary Requests on
account of improper
requirement
4. Users needed to go to IT every
time
5. Information environment was
not flexible as is needed for
strategic decisions

You might also like