You are on page 1of 4

Descriptive Statistics

Continuous and Discrete Variables


Continuous variables can (in principle) take any real values within some interval
Discrete variables can only take certain values, but not any others
Many discrete variables have only a finite number of possible values.
o The simplest possibility is a binary (dichotomous) variable, with just two
values
A discrete variable can also have an unlimited number of possible values (e.g.
number of people visiting a shop)
The ordering of discrete variables has two forms:
o Nominal data: Unordered categories
o Ordinal data: Ordered categories however this does not necessarily give an
indication to magnitude of difference between values
The sample distribution
Sample distribution of a variable consists of:
A list of the values of the variable that are observed in the sample
The number of times each value occurs (frequencies of observed values)
Skewness

Left-skewed
Longer left tail
Right-skewed
Longer right tail
Summation Notation


Properties



Proof:

( )




Proof:

) (

)



(



Proof: (

= [(X1 + Y1) + (X2 + Y2) (Xn + Yn)]


= [(X1 + X2 Xn) + (Y1 + Y2 Yn)]
= (X1 + X2 Xn) + (Y1 + Y2 Yn)
=




Product notation


Properties
1.


2.


3.

)(


n times
The Sample Mean


Sum of deviations from the mean is 0
(


Proof: (


Mean minimises the sum of squared deviations
The smallest possible value of the sum of squared deviations (

for
any constant C is obtained when

.
(

[(

) (

)]


[(

)(

) (

]
(

)(

) (

) (

) (


since (

for any choice of C.


Equality is only obtained only when

, so that (


The Sample Median
If n is odd,

)

If n is even,

[
(

)
]


In general, the mean is affected much more than the median by outliers.
The mode is the only measure of central tendency which can be used for nominal data
Sample Variance



Alternative expression for the sum of squares in S
2

(


Proof: (


Standard deviation is the square root of the variance
Standard deviation and variances are never negative. They are 0 only if all
observations X
i
are the same.
Quantile-based measures of dispersion
Range = maximum value minimum value
The range is extremely sensitive to outliers
Interquartile range = q
75
q
25
= Upper quartile Lower quartile
IQR is completely insensitive to outliers

You might also like