You are on page 1of 34

Probability and Statistics (Lecture 1)

Instructor: Mr. Ronrick A. Da-Ano


Reference Book: Elementary Statistics by Ronald Walpole
Statistics: concerned with statistics method of collecting, organizing, presenting, analysis and interpretation of data.
Categories of Statistics:
1. Descriptive Statistics: is the discipline of quantitatively describing the main features of a collection of
information, or the quantitative description itself.
*concerned about organizing, summarizing, presenting and interpretation of data.
* describing lang (mean, median, mode)
2. Inferential Statistics: deals with making generalizations about the population where only part of it is examined
* from the word infer which means conclude
Types of Data:
1. Primary: one data which have been acquired directly from the source.
2. Secondary: studies made by others for another purpose
Variable: is a particular attribute of interest that is measurable or observable
Types of Variable:
a. Quantitative: any attribute that can be measured by numbers (e.g. height, grades, weight, age)
b. Qualitative: have labels / names rather than numbers

Population and Sample:


Population - sum total of all units of analysis (e.g. all TIP students)
Sample: a subject or portion of the total population
Distribution: is a pattern of variation of a variable
Scale of Measurement
1. Nominal (categorical): names / labels (gender, course)
2. Ordinal: order / ranking
3. Interval - 75, 80, 83, 90, 100 or IQ: 100, 103, 120, 121
4. Ratio: obtained from interval, 1.00 = 99-100% / 1.25 = 96-98
Notation:

Properties:

MATH 009 Page 1

*constants can be multiplied after doing the summation

*adding two variables can be done by getting the summation individually and add their sum together.
*Do exponents first before multiplying the coefficient. Extract coefficient out of the notation first.
*Always check the upper and lower limits.

MATH 009 Page 2

MATH 009 Page 3

Probability and Statistics (Lecture 2)


Methods of Collecting Data
1. Objective: may use a measuring device like a meter stick or weighing scale, which aims to accumulate data.
2. Subjective: relying on people's subjective responses, which may all be different like a survey.
3. Use of existing records - library, publication house
Methods of Presenting Data
1. Textual form: report / paragraph
2. Tabular form: data in rows and columns
3. Graphical form:
a. Histogram (bar graph)
b. Line graph
c. Pie graph
d. Stem and Leaf Plot

A
Frequency Distribution Table (FDT)
Steps:
1. Arrange the numbers by value. Follow the columns x rows of the given.
2. Determine the range (R) = highest value- lowest value.
3. Identify the number of classes (K).
a. Rule of thumb: 2k N (number of population)
b. Choose the value of k which makes the value of 2k just above N, but nearest to N. One step higher.
c. Determine the Class Size Interval. It must be a whole number. Then, to determine the classes, add it to the
lowest value.

4. Tally the data based on the # of frequency (F).


5. Compute the Class Mark (X). It is just the average of the limits.

6. Compute for the relative frequency (RF).

7. Determine the True Class Boundaries (TCB).


a. Lower TCB = LL - 0.5
b. Upper TCB = UL + 0.5
8. Get the Cumulative Frequency (CF), which are <CF (pataas) and >CF (pababa).
9. Get the Cumulative relative frequency (RCF)

MATH 009 Page 4

Example 1:
Create a Frequency Distribution Table using the following given:
6

20

21

25

10

18

30

23

11

13

21

28

24

12

15

27

30

16

11

11

29

13

19

14

22

10

13

20

25

11

14

21

27

11

15

21

28

11

16

22

29

12

18

23

30

13

19

24

30

1. Arrange the given into ordered numbers.

Step 2: Determine the Range (R) = Highest - Lowest. R = 30 - 6 = 24


Step 3: Identify the number of classes (K) 2k N. 2k 30, so k = 5.
Step 4: Determine the Class Size Interval (C) = R / K = 24 / 5 = 4.8 or 5.
Step 5: Create the table
CF
Class

6 - 10

11 - 15

16 - 20

RF

TCB

RCF

<CF

>CF

<RCF

20.00% 5.5 - 10.5

30

20.00% 100.00%

13

26.67% 10.5 - 15.5

14

24

46.67%

80.00%

18

16.67% 15.5 - 20.5

19

16

63.33%

53.33%

21 - 25

23

20.00% 20.5 - 25.5

25

11

83.33%

36.67%

26 - 30

28

16.67% 25.5 - 30.5

30

100.00% 16.67%

N = 30

MATH 009 Page 5

>RCF

Probability and Statistics (Lecture 3)


Sampling: is concerned with selection of a subset of individuals from a statistical population
Types of Variable
1. Quantitative: any attribute that can be measured by numbers (e.g. height, grades, weight, age)
2. Qualitative: have labels / names rather than numbers
a. Continuous (R) - Variables that include all kinds of numbers (integers, fraction, floating
numbers, etc)
b. Discrete (Z) - Variables where only integers are allowed (85,86,87, etc.)
Sampling Methods:
1. Random Sampling - all subsets of the population are given an equal probability.
a. No bias.
b. Choosing selected TIP students from a fishbowl.
2. Stratified Sampling - sample of the population is chosen through stratification, which is the
process of dividing members of the population into homogeneous subgroups of sampling.
a. From a club, we will choose 50% females and 50% males.
3. Cluster Sampling - is commonly clustered by geography or by time frame
a. We will choose a few from Quezon City and a few from Marikina.
4. Systematic Sampling - relies on arranging the study population according to some ordering
scheme.
a. Students will be arranged first according to GPA from lowest to highest and then we'll
choose.
5. Convenience Sampling - is a type of non-probability sampling that involves the sample being
drawn from that of the population that is close at hand
a. Club supposedly will schoose engineering students, but since Marine peeps are more
accessible, we will choose them instead.
Measures of Central Tendency
Measures
1.

Mean
(Arithmetic Mean)

2.

Mode
(Most Frequent)

3.

Median

Ungrouped Data
(Raw Data)

Grouped Data

*Observation
of the most frequent

Example:
Recall

MATH 009 Page 6

11

14

20

27

11

15

21

27

11

16

21

28

12

17

22

29

13

18

23

30

10

13

19

24

30

1. Mean () = 530 / 30 = 17.33


2. Mode = 11 since it appeared 3 times.
a. Note that modes can be more than 1. 2 modes (bimodal), 3 modes (tri modal) and 4
modes (quadmodal)
3. Median:

Since N = 30 is even,

= 16.5
*But if N = 31, then

16th value = 17
Recall:
CF

RCF

Class

RF

TCB

<CF

>CF

<RCF

>RCF

6 - 10

20.00%

5.5 - 10.5

30

20.00%

100.00%

11 - 15

13

26.67% 10.5 - 15.5

14

24

46.67%

80.00%

16 - 20

18

16.67% 15.5 - 20.5

19

16

63.33%

53.33%

21 - 25

23

20.00% 20.5 - 25.5

25

11

83.33%

36.67%

26 - 30

28

16.67% 25.5 - 30.5

30

100.00%

16.67%

N = 30
1. Mean:

= 17.33
2. Median:

*To get the median class, N / 2 = 30 / 2 = 15. Get the class that has the 15th frequency!

MATH 009 Page 7

= 16.5
3. Mode:

*To get the modal class, look for the class with the highest frequency.
= 12.5

MATH 009 Page 8

Probability and Statistics (Lecture 4)


Measures of Location:
Where:
N = number of samples or the total population
j = percentile, quartile or decile
Percentile: P1 (1%), P2 (2%), P3 (3%), , P100 (100%)
Quartile: Q1 (25%), Q2, (50%), Q3 (75%) and Q4 (100%)
Decile: D1 (10%), D2 (20%), D3 (30%), , D10 (100%)
Conversion:
Q25 = P25
D2 = P20
Example:

Measure of Dispersion (How widely dispersed yung data.)


E

Ungrouped Data Grouped Data

1.Variation

2. Standard Deviation

Example:
Example 1: Ungrouped Data
= 17.33
6

11

14

20

27

11

15

21

27

11

16

21

28

12

17

22

29
MATH 009 Page 9

12

17

22

29

13

18

23

30

10

13

19

24

30

Variance:

= 54.47
Standard Deviation:
= 7.38
Example 2 (Grouped Data):
= 17.33
CF

RCF

Class

RF

TCB

<CF

>CF

<RCF

>RCF

6 - 10

20.00%

5.5 - 10.5

30

20.00%

100.00%

11 - 15

13

26.67% 10.5 - 15.5

14

24

46.67%

80.00%

16 - 20

18

16.67% 15.5 - 20.5

19

16

63.33%

53.33%

21 - 25

23

20.00% 20.5 - 25.5

25

11

83.33%

36.67%

26 - 30

28

16.67% 25.5 - 30.5

30

100.00% 16.67%

N = 30

Variance:

= 48
Standard Deviation

= 6.93

MATH 009 Page 10

Probability and Statistics (Lecture 5)


Measure of Variation
1. Interquartile Range (IQR) = 75th percentile - 25th percentile
2. Semi-Interquartile Range (SIQR) = IQR / 2
Example:
6

11

14

20

27

11

15

21

27

11

16

21

28

12

17

22

29

13

18

23

30

10

13

19

24

30

IQR = 23.5 - 11 = 12.5


SIQR = 12.5 / 2 = 6.25
Measure of Skewness
*symmetry of the central tendencies
*horizontal or x-axis ang measured
*If positive skewness, Mode < Median < Mean
*If negative skewness, Mode > Median > Mean
*If skewness is equal to 1, Mode = Median = Mean

Measure of Kurtosis
*Measure of peakedness, or kung gaano kataas yung graph.
*Vertical y-axis is measured.

MATH 009 Page 11

Ungrouped Data

Grouped Data

Skewness
Kurtosis

Example 1 (Ungrouped Data):


= 17.33
Median = 16.5
= 7.38
6

11

14

20

27

11

15

21

27

11

16

21

28

12

17

22

29

13

18

23

30

10

13

19

24

30

Skewness:

= 0.34
*positively skewed
Kurtosis:

= 1.82 - 3
= -1.18
Recall:
MATH 009 Page 12

Recall:
= 17.33
Median = 16.5
= 6.93
CF

RCF

Class

RF

TCB

<CF

>CF

<RCF

>RCF

6 - 10

20.00%

5.5 - 10.5

30

20.00%

100.00%

11 - 15

13

26.67% 10.5 - 15.5

14

24

46.67%

80.00%

16 - 20

18

16.67% 15.5 - 20.5

19

16

63.33%

53.33%

21 - 25

23

20.00% 20.5 - 25.5

25

11

83.33%

36.67%

26 - 30

28

16.67% 25.5 - 30.5

30

100.00% 16.67%

N = 30
Skewness

= 0.36
*Positively skewed
Kurtosis:

= 1.72 - 3
= -1.28

MATH 009 Page 13

M - Probability Distribution
Probability Distribution:
Continuous Probability Distribution is a random variable that can assume an uncountable infinite
numbers of possible values. Say we have a function f(x) from which probability estimates about x
are made, then the function is called the probability density function of x: pdf(x).
1. Normal Probability Distribution

where -<x<
2. Standard Normal Distribution is a normal distribution with mean 0 and standard deviation of *
z~N(0,1)
*normal distribution can be standardized by

Example:
*If a person scored a 70 in a test with mean of 50 and standard deviation of 10, converting it to z
will be?
Areas of the Normal Curve:
1. P (0 < Z < Z1) = A(Z1)
2. P (-Z1 < Z <0) = A(-Z1)
3. P (Z1 < Z < Z2) = A (Z2) - A(Z1)
4. P (-Z1 < Z < -Z2) = A (-Z1) - A(-Z2)
5. P (-Z1 < Z < Z2) = A (-Z1) + A(Z2)
6. P (Z1 below) = 0.5 + A(Z1) , P (Z1 above) = 0.5 - A(Z1)
7. P (-Z1 above) = 0.5 - A(Z1), P (-Z1 above) = 0.5 + A (Z1)

MATH 009 Page 14

M - Normal Table

MATH 009 Page 15

M - Permutation and Combination

MATH 009 Page 16

M - Probability, Experiments, Random Experiments


Probability - is synonymous to chance. The probability of an event occurring is a measure of how likely an event will occur.
Experiment - is a process designed to discover, test or illustrate a truth, principle, or effect.
Random experiment - a process for gathering data. It can be repeated under basically the same conditions leading to welldefined outcomes.
*Well-defined outcomes = no doubts about the results.
Examples of Random Experiments:
1. Tossing a coin
2. Throwing a pair of dice.
3. Observing the number of students who secure dropping forms per semester.
4. Recording the time it takes to enroll under BSE Program.
5. Number of commercial breaks in a TV program per show.
Sample Space - is the set of all possible outcomes of a random experiment usually denoted by the letter S.
*It is the total possible outcomes.
*In the Venn diagram, S is the universe.
Example:
1. In the experiment of tossing a coin, the sample space is S = {H,T}.
2. In throwing a dice, S = {1,2,3,4,5,6}.
Event - is a subset of the sample space, denoted by E or any letter in the alphabet except S.
*In the Venn diagram, E is one of the circles.
Examples:
1. In tossing three fair coins,
S = {HHH,HHT,HTH,HTT,THH,THT,TTT,TTH} = 2n = 23 = 8 possibilities.
Event of getting at least two heads.
E = event of getting at least two heads
= {HHT,HTH,THH,HTH} = 4 possibilities
2. In throwing a pair of dice
S = {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6)} = 6*6 = 36 possibilities
E = event of getting a sum of 5
= {(1,4),(2,3),(3,2),(4,1)} = 4 possibilities
Operations on Events
1. Union of Two Events - Combine
2. Intersection of Two Events - Common Components
3. Complement of an Event = (S - E) or E' (All S components that are not in E)
4. Mutually Exclusive Events - (E1^E2) = null or empty, then they are mutually exclusive.
5. Independent Event = Event 1 doesn't affect event 2.
Example: In the random experiment of tossing three coins the sample space,
S = {HTT, HHT, HTH,HHH,THH,THT,TTH,TTT}
E1 = {HHT,HTH,THH}
E2 = {(HTT,HHT,HTH,THH,TTH,TTT}
E3 = {HHH,THT,HTT}
MATH 009 Page 17

E3 = {HHH,THT,HTT}
Then, E1 U E2 = {HHT,HTH,THH,HTT,TTH,TTT}
E2 ^ E3 = {HTT}
E2' = {HHH,THT}
E1 ^ E3 = Null / Empty, hence they are mutually exclusive events.
Approaches to Probability
1. A Priori Approach
*You have knowledge beforehand.

Example 1:
S = {HHH,HHT,HTH,HTT,THH,THT,TTT,TTH} = 8 possibilities
E = event of getting at least two heads
= {HHT,HTH,THH,HTH} = 4 possibilities
P(E) = 4/8 = 1/2
Example 2:
S = {HTT, HHT, HTH,HHH,THH,THT,TTH,TTT} --> 8
E1 U E2 = {HHT,HTH,THH,HTT,TTH,TTT} --> 6
P(E1 U E2) = 6/8 = 3/4
Example 3:
E2 ^ E3 = {HTT}
P (E2 ^ E3) = 1/8
Example4:
E2' = {HHH,THT}
P (E2') = 2/8 = 1/4
Example 5:
E1 ^ E3 = Null / Empty
P(E1 ^ E3) = 0
2. A Posteriori Approach
*Law of large numbers
*Ilang beses ginawa?
Out of 100 experiments:
HTT

HHT

HTH

HHH

THH

THT

TTH

TTT

10

13

12

17

18

12

10

P(E) of two heads = {HHT,HTH,THH,HTH}


= (13+12+18+12) / 100
= 11/20
3. A Subjective Approach (the approach is based on someone, not relative)
*Pacquiao VS Mayweather. (Panalo si Pacquiao sabi ni X)
Operations on Probability:
1. Addition Rule: (U) (Union) (Or)
MATH 009 Page 18

1. Addition Rule: (U) (Union) (Or)


*P[E1 U E2] = P[E1] + P[E2] - P[E1 ^ E2]
*If they are mutually exclusive events: P[E1 U E2] = P[E1] + P[E2]

2. Multiplication Rule: (^) (Intersection) (And) (But)


*P[E1 ^ E2] = P[E1/E2] X P[E2]
*If E1 and E2 are independent events, P(E1 / E2) = P (E1)
* P[E1 ^ E2] = P[E1] X P[E2]
3. The probability of the complement of E is
*P(E') = 1 - P(E)
Example: Throwing two dice, what is the probability that the sum is not equal to 6.
Answer: 31/36
4. Conditional Probability
*Probability that A will happen after B has occurred.

MATH 009 Page 19

M - More Examples

MATH 009 Page 20

MATH 009 Page 21

MATH 009 Page 22

MATH 009 Page 23

M - Random Variables and Probability Distributions


Random Variable - a rule or function defined over a sample space and is denoted by any capital letter in the English alphabet.
It assigns a real number to every event of the same space.
Two Types
1. Discrete Random Variable - a random variable that can assume a finite or a countable number of values. (e.g. number of
heads in tossing of 2 coins, number of car owners)
2. Continuous Random Variable - random variable that can assume an interval or continuum of values. (e.g height of an TIP
student, weight of newly born baby)

MATH 009 Page 24

Chapter 15
Wednesday, February 11, 2015

4:34 PM

MATH 009 Page 25

F - Chi Square Test


Chi-Squared Test
* to test how likely it is than an observed distribution is due to chance.
* Goodness of fit statistic or test of independence
*significant relationship between two variables
When to use:
1. Random sampling method is used.
2. Each population is at least 10 times as large as its respective sample.
3. Variables understudy are categorical.
4. The expected frequency count for each cell of the table is at least 5.
Steps:
1. State null (H0) and alternative hypotheses (HA).
*H0 (Variable A and B are independent) and HA (Variable A and B are dependent)
2. Degrees of Freedom, Expected and Observed Frequencies, Chi-Squared
*Degrees of Freedom DF = (r-1) * (c-1)
*Expected Frequency Er,c = (nr * nc ) / n (nr and nc are the row and column totals and n is the overall total)
*Observed Frequency Or,c = based on the table
*Test Statistic X2 = (O - E)2 / E
3. Level of Significance ( = 0.05 )
4. Decision Rule: If significance level > P - value, reject H0. Otherwise, fail to reject H0. Conclusion.

MATH 009 Page 26

F - Analysis of Variance (ANOVA)


ANOVA - statistical comparison of at least two population
One Way of Analysis of Variance - technique used to compare the means of three or more samples

Formula:
n = total population size
p = number of groups
(Square of the sum divided by population)
Total SS =
Between SS =
Within SS = Total SS - Between SS
Between MS =
Within MS =

x=

MATH 009 Page 27

Analysis of Variance (ANOVA)


*is a simultaneous test-taking at the samples all at a single time
*is a technique designed to test whether or not more than two samples (a group) are significantly related to each other
*t-test together with z-test is used to test non-significance of difference between a single pair of samples.
Problem: The data below represents the number of hours of pain relief provided by 3 different brands of headache tables administered to
subjects. It shows the mean number of hours of relief provided by the tablets.
XA

XB

XC

XD

Step 1: Formulate the null hypothesis.


*Null Hypothesis Ho always shows that THERE IS NO SIGNIFICANT DIFFERENCE between the samples.
*Alternate Hypothesis Ha always shows that THERE IS A SIGNIFICANT DIFFERENCE between the samples.
HO: There is no significant difference in the number of hours of relief provided by the 3 different brands of headache tablets.
Step 2: Set the level of significance
*The level of significance is at default 0.05 unless otherwise stated in the problem!
= 0.05
Step 3: Choose the appropriate test statistic
*F-test is normally employed since where comparing variances.
Test Statistic: F-test (ANOVA)
Step 4: Compute the ANOVA
a. Compute for the TSS (Total Sum of the squares)

*(I-square mo lahat ng sum tapos iadd sila together ) minus (add mo sa lahat si sum tapos isquare mo divided by total population)
XA

XB

XC

XD

XA2

X B2

X C2

XD2

25

81

16

49

25

64

64

16

MATH 009 Page 28

b. Determine the between-column sum of squares (SS B) defined by the formula:

*(Summation of squared totals divided by number of rows) - (Sum of the totals squared over total population)
XA

XB

XC

XD

c. Compute the within column-variance or within-column sum of squares defined by,

d. Construct an analysis of variance table (ANOVA table) as shown below:


ANOVA table on the three samples subjected to different tables
Source of Variation Sum of Squares

Df

MSS = SS / Df

Between-column

48.67

16.22

Within-column

17.33

2.17

Total

66.00

11

*between column df = columns (k) - 1 = 4 - 1 = 3


*total column df =( rows * column ) - 1 = 12 - 1 = 11
*within column df = total column - within column = 8
*total = SSB + SSW = 48.67 + 17.33 = 66.00
*between column MSS = SSB / DF = 48.67 / 3 = 16.22
*within-column MSS = SSW / DF = 17.33 / 8 = 2.17
e. Compute the F-test (Fisher) formula:
*
f. Locate the tabular value of F.
*To do this, use

FTV = 4.07
a. Decision Rules:
1.
2.

MATH 009 Page 29

2.

MATH 009 Page 30

Chi-Square Test
Chi-Squared Test
* to test how likely it is than an observed distribution is due to chance.
* Goodness of fit statistic or test of independence
*significant relationship between two variables
When to use:
1. Random sampling method is used.
2. Each population is at least 10 times as large as its respective sample.
3. Variables understudy are categorical.
4. The expected frequency count for each cell of the table is at least 5.
Test the hypothesis that educational attainment does not depend on socio-economic status for the
following 100 persons in a particular community.
Finished College Did not finished College

Total

Poor

18

10

28

Middle Class

28

24

52

Rich

14

20

Total

60

40

100

Step 1: State null (H0).


*H0 (Variable A and B are independent)
Null Hypothesis (HO) = Socio-Economic Status is independent from Educational Attainment
Step2. Degrees of Freedom, Expected and Observed Frequencies, Chi-Squared
a. Degrees of Freedom DF = (r-1) * (c-1)
*Rows (not including total) = 3
*Column (not including total) = 2
DF = (r-1)*(c-1) = (3-1)*(2-1) = 2
b. FE (Expected Frequency) =
c. FO (Observed Frequency) = based on the table
Finished College
Poor

Total

FO = 18 / FE =

FO = 10 / FE =

28

Middle Class FO =28 / FE =

FO = 24 / FE =

52

Rich

FO =14 / FE =

Total
d.

Did not finished College

Test Statistic X2

FO = 6 / FE =

60
= (O -

E)2

40

/E

MATH 009 Page 31

20
100

Step 3. Level of Significance


= 0.05
Step 4. Get the tabulated value.
*To get it, use the coordinates, (level of significance, df)

MATH 009 Page 32

Z - Test for two population samples


Comparing two sample means:

Problem: In a study of abstract reasoning, a sample group of male and female students scored as shown below:
Gender Sample Size

Mean Standard Deviation

Male

95

29.25

10.83

Female

85

30.72

8.72

Step 1: Get the Null Hypothesis (HO)


*The two samples are normally independent.
HO = There is no significant difference between sample 1 and sample 2.
Step 2: Get the level of significance
= 0.10
Step 3: Use appropriate statistic
Use Z - test since population is greater than 30.
Test Statistic: Z-test
Step 4: Get tabulated value
*To get tabulated value, use this coordinate (significance level, two tailed test)
TV = 1.645
Step 5: Compute for the Z-Value

Step 6: Compare Calculated Value and Tabulated Value


CV = -1.00
TV = 1.645
*Decision Rule, if CV > TV, reject Ho. If CV < TV, accept Ho.
MATH 009 Page 33

*Decision Rule, if CV > TV, reject Ho. If CV < TV, accept Ho.
Since -1.00 < 1.645, accept Ho.

MATH 009 Page 34

You might also like