You are on page 1of 34

:

...


Data Analysis and Presentation

What is a data analysis ()?


Data analysis is the act of transforming data with
the aim of extracting useful information and
facilitating conclusions.
(From Wikipedia, the free encyclopedia)


Data Types

Data

Categorical data
(no order)

Numerical data
(ordered)

Examples:

Marital Status
Political Party
Eye Color
(Defined categories)

Discrete

Continuous

Examples:

Examples:

No. of

Children

Defects per hour


(Counted items)

Weight
Voltage
(Measured
characteristics)


Data Types

Time Series Data

Ordered data values observed over time

Cross Section Data

Data values observed at a fixed point in


time


Data Types

2004

2005

2006

2007

2008

<12

1.2%

1.3%

1.4%

1.3%

1.4%

12-15

47.2% 52.1% 54.1% 55.2% 57.5%

16-17

29.9% 31.5% 29.5% 27.1% 27.7%

18-20

21.6% 15.1% 15.0% 16.5% 13.4%

Time
Series
Data

Cross Section
Data
200910


Statistical Tables and Charts / Graphs

A good statistical chart is quiet and


lets the data tell its story accurately and clearly.


Statistical Tables and Charts / Graphs

Categorical
Data

Numerical
Data

Bar Chart

Pie Chart

Stem and Leaf Plot,


Histogram

(1967-2007)-


Population pyramids

(1967-2007)
-


/
25%

20%

15%

10%

5%

0%
2002

2003

2004

2008

2005

2006

2007


Misleading Statistical Chart


Misleading Histogram


Summary Statistics

Describing Data Numerically

Central Tendency

Other Measures

Variation

Mean

Percentiles

Median

Quartiles

Mode

Weighted Mean

Range

Interquartile Range

Variance

Standard Deviation

Coefficient of Variation


Measures of Central Tendency

Central Tendency

Mean

Median

Mode

Weighted Mean

x
i1

XW

i 1

wx

w
wx


1.52

$16,000


Shape of a Distribution
Describes

how data is distributed

Symmetric

or skewed

Left-Skewed

Symmetric

Right-Skewed

Mean = Median

Median < Mean

Mean < Median


(Longer tail extends to left)

(Longer tail extends to right)


Other Measures

Other Measures

Percentiles

Quartiles

The pth percentile in a data array:

p% are less than or equal to this


value

(100 p)% are greater than or


equal to this value
(where 0 p 100)

1st quartile = 25th percentile

2nd quartile = 50th percentile


= median

3rd quartile = 75th percentile


Use of precentile: Value at Risk (VaR)

5% of the area

-4

-3

-2

-VaR

-1

00

Profit(+)/Loss(-)

95% VaR is an 5%-percentile of the distribution.

00

00

00

00

00

00

>2

50
0

50
0

00

1.0
%
0.1
%

6.2
%

10.00%

~1

~5

1.7
%
2.1
%
2.0
%

1.3
%
1.5
%
1.8
%

0.5
%
1.0
%
1.1
%

0.9
%
0.5
%

0.4
%
0.4
%

0.1
%
0.2
%

0.3
%
0.3
%

20
.9%

30.00%

20
00

10
00

10
00

20

30

40

50

60

70

~-

~-

~-

~-

~-

~-

~-

00
0

-5 0
0~

-1 5
00

-2 5
00

-3 5
00

-4 5
00

-5 5
00

-6 5
00

-7 5
00

< -8

56
.3%

Accumulator (I kill you later)

60.00%

50.00%

40.00%

20.00%

0.00%


Quartiles

Quartiles split the ranked data into 4 equal


groups:
25% 25%

25%

25%

Q1
Q2
Q3
Note that the second quartile (the 50th percentile)
is the median

Deciles?


Distribution Shape and Box and Whisker Plot

Left-Skewed

Symmetric

Right-Skewed

Q1

Q2 Q3

Q1 Q2 Q3

Q1 Q2 Q3


Box and Whisker Plot


Employment Income


Measures of Variation

Variation

Range

Variance

Standard Deviation

Coefficient of
Variation

Interquartile
Range

Population
Variance

Sample
Variance

Population
Standard
Deviation

Sample
Standard
Deviation


Variation

Measures of variation give information on the


spread or variability of the data values.

Same center,
different variation


Interquartile Range Example
Example:
X

minimum

Q1

25%

12

Median
(Q2)
25%

30

25%

45

Q3

maximum

25%

57

Interquartile range
= 57 30 = 27

70


Aim of Statistical Charts

The aim of good statistical charts is to display


data accurately and clearly.

2008
20081001
31960

60

2.6

Garbage In Garbage Out

No Copy!!!

Team Work

You might also like