You are on page 1of 34

Introduction to Analytics

Lesson 1

Copyright 2014, Simplilearn, All rights reserved.


Copyright 2014, Simplilearn, All rights reserved.

Objective Slide
After completing
this course, you will
be able to:

Understand what is analytics and the difference between analysis and


analytics

Know the popular tools used in analytics

Understand the role of a data scientist

Know the processes involved in analytics

Define a problem statement

Collect and summarize data

Detect and treat outliers in the data

Copyright 2014, Simplilearn, All rights reserved.

Analytics versus Analysis

Analytics
Analytics is the science of analysis whereby
statistics, data mining, computer
technology, etc. is used in doing analysis

Analysis
Analysis is the process of breaking down a
complex object into its simpler forms

Copyright 2014, Simplilearn, All rights reserved.

What is Analytics?
Its the science of wisely acquiring meaningful results from given data using various methods and
technologies.
Aims at discovering pattern of variation from the given data.

It helps to understand the future from past data and the uncertainty related to business.
Its a sophisticated process that uses statistics, mathematics and economics models to predict the
future and prescribe strategies.

How analytics works

Gather Data

Organize Data

Analyze Data

Copyright 2014, Simplilearn, All rights reserved.

Analytics Stages

How many
students
dropped out last
year?

Descriptive

Why has the


drop-out rate
increased in the
last one year?

Which students
are most likely to
drop out?

Which students
should I target to
keep from
dropping out?

Diagnostic

Predictive

Prescriptive

Information

Decision
Insights

Copyright 2014, Simplilearn, All rights reserved.

Popular Tools
R
Revolution R
R Studio
Tableau
SAP HANA
Weka
KXEN
SAS
Copyright 2014, Simplilearn, All rights reserved.

Role of a Data Scientist

Inquisitive, can stare at data and spot trends.


Come out with unrevealed stories hidden in data that helps in creating more useful insights
and help solving business problems.
Work in sync with application developer to get relevant data for analysis.
Make an analytical plan in such a way that the results satisfy the business needs.
Come up with an effective data mining architecture and prepare suitable models.
Respond to and resolve data mining performance issues.
Generate reports that are affordable from a business perspective.

Data Analytics Methodology


DISCOVERY

PUT INTO USE

DATA PREPARING

DELIVER RESULTS

MODEL PLANNING

MODEL BUILDING

Problem Definition

WHAT IS THE PROBLEM?

WHAT IS IT NOT ?

WE HAVE THIS PROBLEM BECAUSE?

WE DONT HAVE A SOLUTION BECAUSE?

Techniques involved in defining a problem

State the problem in a general way

Understand the nature of the problem

Survey the available literature

Go for discussions for developing ideas

Rephrase the research problem into a working proposition

Types of Data

Data can be of two types qualitative and quantitative


Qualitative Data

Data expressed as groups or categories


Descriptive data
E.g. Dividing a population into high,
medium and low height groups

Quantitative Data

Data expressed as numbers


Definitive Data
E.g. The height of a person

Summarizing Data

Summarizing is the process of converting


huge amounts of raw data into a format
that can be easily analyzed.
Summaries differ based on the type of
data; and can be descriptive or graphical.

Marital Status

Frequency

Single

203

Married

2,580

Widowed

334

Divorced

367

Separated

46

Total

3,530

Summarizing Data
Numeric - Descriptive

Mean
Median
Mode

Categorical - Descriptive

Numeric - Graphical

Box plot

Frequency distribution tables

Categorical - Graphical

Bar charts
Histograms

Data Collection

Process of collecting relevant data that aids


in solving the problem statement

Data Collection process needs to be defined,


and systematic.

Collect Relevant Data

Categorize the Data

Observations need to be recorded and


organized for optimal usefulness
Organize the Data

Data Collection Methods


Observation

Experiment

Census
Questionnaire
Survey

Reporting
Registration
Data Sources

Data collection methods fall


broadly into two categories
primary and secondary.
Primary methods are where
the data is gathered directly
through investigating,
experimenting or observing
various entities.
Secondary methods refer to
the methods where the data
has already been gathered
before the study, and is
available as already published
facts and reports.

Data Dictionary

A Data Dictionary is a file that describes the structure of the database itself.

Includes details like

Number of records
Name of each field
Characteristic of each field
Description of each field
Relationships between different fields

It helps in analyzing different data variables and their relationships between each other.

Outlier is a point or an observation that


deviates significantly from the other
observations.
Due to experimental errors or special
circumstances
Outlier detection tests to check for
outliers
Outlier treatment

Retention
Exclusion
Other treatment methods

Mark (Percentage)

Outlier Treatment

Outlier!

Study time (Minutes)

Summary
Here is a quick
recap of what we
have learned in this
lesson

What is analytics and analysis, and what are the differences between them

Popular tools used in analytics

What does a data scientist do

The processes involved in analytics life cycle

How to formally define a problem statement

Methods of collecting and summarizing data for analytics

Data dictionary and its contents

What are outliers and how to detect and treat outliers

Copyright 2014, Simplilearn, All rights reserved.

Quiz

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

Which of the following is a secondary data collection method?

a.

Surveys

b. Interviews
c.

Data Sources

d.

Experiments

Copyright 2014, Simplilearn, All rights reserved.

QUIZ

Which of the following is a secondary data collection method?

a.

Surveys

b. Interviews
c.

Data Sources

d.

Experiments

Answer: c.
Explanation: Surveys, Interviews and Experiments are personally conducted by the
researchers, and hence belong to primary data collection methods. Data sources are already
existing sources of data thus belongs to secondary methods.
Copyright 2014, Simplilearn, All rights reserved.

QUIZ
2

Which of the following is NOT a part of data dictionary?

a.

Number of records

b. Characteristic of fields
c.

Type of fields

d.

Actual records

Copyright 2014, Simplilearn, All rights reserved.

QUIZ
2

Which of the following is NOT a part of data dictionary?

a.

Number of records

b. Characteristic of fields
c.

Type of fields

d.

Actual records

Answer: d.
Explanation: Data dictionary refers to the meta data, i.e., defining the attributes of the data.
It does not contain the actual data.
Copyright 2014, Simplilearn, All rights reserved.

QUIZ
3

Which of the following is a way of summarizing categorical data?

a.

Mean

b. Frequency distribution
c.

Median

d.

Mode

Copyright 2014, Simplilearn, All rights reserved.

QUIZ
3

Which of the following is a way of summarizing categorical data?

a.

Mean

b. Frequency distribution
c.

Median

d.

Mode

Answer: b.
Explanation: Mean, median and mode are mathematical summaries of numeric or
quantitative data. Frequency distribution is used to summarize categorical or qualitative
data.
Copyright 2014, Simplilearn, All rights reserved.

QUIZ
4

Which one of the following is NOT a step in data analytics methodology?

a.

Discovery

b. Deliver results
c.

Model building

d.

Re-checking

Copyright 2014, Simplilearn, All rights reserved.

QUIZ
4

Which one of the following is NOT a step in data analytics methodology?

a.

Discovery

b. Deliver results
c.

Model building

d.

Re-checking

Answer: d.
Explanation: Re-checking is not a step in data analytics methodology.

Copyright 2014, Simplilearn, All rights reserved.

QUIZ
5

What are the two categories of data collection methods?

a.

Primary and secondary

b. Interval and ratio


c.

Nominal and ordinal

d.

Random and selective

Copyright 2014, Simplilearn, All rights reserved.

QUIZ
5

What are the two categories of data collection methods?

a.

Primary and secondary

b. Interval and ratio


c.

Nominal and ordinal

d.

Random and selective

Answer: a.
Explanation: Data collection methods are classified into primary and secondary

Copyright 2014, Simplilearn, All rights reserved.

QUIZ
6

Which of the following is NOT a step in analytics?

a.

Prescriptive

b. Predictive
c.

Descriptive

d.

Productive

Copyright 2014, Simplilearn, All rights reserved.

QUIZ
6

Which of the following is NOT a step in analytics?

a.

Prescriptive

b. Predictive
c.

Descriptive

d.

Productive

Answer: d.
Explanation: Productive is not a step in analytics.

Copyright 2014, Simplilearn, All rights reserved.

QUIZ
7

Which of the following is FALSE with reference to the role of a data scientist?

a.

Develop and plan required analytic projects in response to business needs

b. Work with application developers to extract data relevant for analysis.


c.

There is no need of considering statistical algorithm working process.

d.

Contribute to data mining architectures, modeling standards, reporting,


and data analysis methodologies.

Copyright 2014, Simplilearn, All rights reserved.

QUIZ
7

Which of the following is FALSE with reference to the role of a data scientist?

a.

Develop and plan required analytic projects in response to business needs

b. Work with application developers to extract data relevant for analysis.


c.

There is no need of considering statistical algorithm working process.

d.

Contribute to data mining architectures, modeling standards, reporting,


and data analysis methodologies.

Answer: c.
Explanation: Data scientist needs to consider statistical algorithm working process.

Copyright 2014, Simplilearn, All rights reserved.

Thank You

Copyright 2014, Simplilearn, All rights reserved.


Copyright 2014, Simplilearn, All rights reserved.

You might also like