Statistics

Probability & Statistics
Prepared by: Presented to:

Mr. Dishank Upahyay Mr. S. S. Pathan
ME CAD/CAM (Batch 2018-20) Associate Professor - Mechanical Department
LDCE, Ahmedabad
LDCE, Ahmedabad
1
Outline
1. Point Estimation of Parameter
2. Confidence Interval
3. Testing Hypothesis & Decision
4. Goodness of fit , chi – square test
5. Non Parametric Test
6. Linear Regression Analysis
7. Correlation
2
1. Point Estimation of Parameters
• What?
• Decision?
• Why?
• To make estimation about the population
• Where?
• Everywhere, where decision is to be made
• Who?
• Mangers
• When?
• On demand
• How?
• By estimation and defining the interval
3
1. Point Estimation of Parameters
• Types
• Point Estimation
• Interval Estimation
• Point Estimation
• Used to estimate the population estimate.
• Interval Estimate
• Range of values of population parameters
• Confidence Interval
4
2.Confidence Interval
• It can be constructed using 2 types
• By z Statistics (For Larger data size)
• By t Statistics (For smaller data size)
Z Statistics T Statistics
For n
For n >30
>30 For n<30
For n<30
Uses
Uses Normal
Normal distribution
distribution curve
curve with
with Uses
Uses Normal
Normal distribution
distribution curve
curve tt
values of z
values of z transformation
transformation and
and degree
degree of
of freedom
freedom
5
Z Statistics T Statistics
Interval Equation Interval Equation

Interval Equation
Values estimated within the confidence By increasing samples, the values will
level match with Standard normal curve.
Values estimated within the confidence By increasing samples, the values will
level match with Standard normal curve.
6
• Example (z statastics):
• A researcher has taken a random sample of size 70 from a population
with a sample mean of 35 and a population standard deviation of
4.62. construct a 90% confidence interval to estimate the population
mean.
7
8
• Example (t statastics):
• The personal department of an organization wants to apply cost-cutting measure for improving
efficiency. As the first step, the personnel department wants to curtail telephone expenses
incurred by employees. For this, Personal department had taken random sample of 10 employees
and gathered the following data of telephone expenses (in thousand) in previous year;
• 10,12,24,23,11,14,15,34,16,23
• Construct a 95% confidence interal to estimate the average telephone expenses of the employees
in population
9
10
• In Excel:
• CONFIDENCE.NORM
• By putting
• Alpha
• Std.Dev.
• Sample Size
• Give us value of
11
• Assumption about unknown parameter
• Process help us to decide we should accept to reject the hypothesis
• Process: Step 1: Set Null and alternative hypothesis
Step 2: Determine the appropriate statestical test
Step 3: Set the level of significance
Step 4: Set the decision rule
Step 5: Collect the sample data
Step 6: Analyse the data
Step 7: Arrive at a statistical conclusion and business implication 12

• Step1: Set null and alternative hypothesis
Step 2: Determine the appropriate statistical test.
• H0 (H sub – zero)  To obtain best result appropriate test should be carried out.
• It is tested for possible rejection under the
assumption that is true. Step 3: Set the level of Significance.
• Theoretically, a null hypothesis is set as no  α - alpha
difference or status quo and considered true,  It is the probability which is attached to a null hypothesis.
until and unless it is proved wrong by the  Which may be rejected even it may be true.
collected sample data. It also known as the size of rejection region or size of critical
• H0 = µ = µ0 region
• µ = population mean Step 4: Set the decision rule.
• µ0 = hypothesized value of population measn •Critical region is the area under the normal curve
divided into 2 mutually exclusive regions. This
• Alternative Hypothesis (H1) regions are termed as acceptance region and
rejection region or critical region.
• Logical opposite to H0 •Rejection Region = H0 is rejected
•Accepted Region = H0 is accepted
• H1 = µ ≠ µ0
•A researchers have to decide the critical value
• µ > µ0 which seperates the rejection and acceptance region.
13
• µ < µ0
 Sample data to be collected
Step 6: Analyse the Data

 Researchers has to compute the test statistics.
 Methods to compute the statistics are
o z test
o t test
o F test
o χ2 test
Step 7: Arrive at a statistical conclusion and business implication

Statistical conclusion can be drawn and decision can be taken
14
• Test of Hypothesis
• • Two tailed
• Rejection in both tails
• One tailed
• Rejection in one tail
• Z Statistics:
• Testing for large samples n>=30 is based on the assumption that population from which is sample
is drawn, has a normal distribution.
• z formula for a single population mean
• = population mean
• = population standard deviation
• n = number of sample size
15
• x’ = sample mean
• Test of Hypothesis
• • Two tailed
• Rejection in both tails
• One tailed
• Rejection in one tail
• Z Statistics:
• Testing for large samples n>=30 is based on the assumption that population from which is sample
is drawn, has a normal distribution.
• z formula for a single population mean
• = population mean
• = population standard deviation
• n = number of sample size
16
• x’ = sample mean
• Example
• A marketing research firm conducted a survey 10 years ago and found that the average household income of a particular
geographic region is ₹10,000. Mr.Gupta, who has recently joined the firm as vice president has expressed doubt about the
accuracy of the data. For verifying the data, the firm has decided to take a random sample of 200 households that yield a
sample mean (for household income) of ₹ 11,000. Assume that the population standard deviation of the household
income is ₹1200. Verify Mr.Gupta’s doubts using the seven steps of hypothesis testing. Let α = 0.05
Step 1: Set Null and alternative hypothesis
17
• t Statestics
•
• For a small random sample n<30 to estimate the population mean µ and when the population standard deviation is
unknown and population is normally distributed, t-test can be applied.
• Example:
• Royal tyres has launched a new brand of tyres for tractors and claims that under normal circumstances that average life of
tyre is 40,000 km. A retailer wants to test this claim and has taken a random sample of 8 years. He tests the life of the tyres
under normal circumstances. The result obtained are presented in Table below.
Tyres 1 2 3 4 5 6 7 8
km 35,000 38,000 42,000 41,000 39,000 41,500 43,000 38,500
18
• Solution
Step 1: Set Null and alternative hypothesis

4. Goodness of Fit / X2 Test (Chi
Square Test):
• Statistical tests for comparing a random sample with a theoretical
• random sample with a theoretical probability distribution.
• ꭓ2 is a continuous probability distribution with range 0 to
• The statistic can be defines as follows Acceptance Ration (Accept H0)
• ꭓ2 =
• Observed Frequency Rejection Ratio
• = Expected Frequecny (Reject H0)
• ꭓ2 value is compared with the critical value of ꭓ2.

• To decide,
• ꭓ2calculated > ꭓ2critical
• It rejects the null hypothesis, otherwise do not reject the null hypothesis
• ꭓ2 test provides a platorm that can be used to ascertain whether theoretical
probability distributions coincide with empirical sample distribution.
20
Square Test):
• Example:
•
• A company is concerned about the increasing violent altercation between its employees. The number of
violent incidents recorded by the management during six randomly selected months is given in the table. Use
= 0.06=5
Months to determine whether
Jan the data
Febfits a uniform
March distribution.
April May June
Number of violent incidents 55 65 68 72 80 85
 Step 1: Set null and alternative hypothesis

o H0 = Numbers of violent altercation are uniformly distributed over the months.
o H1 = Numbers of violent altercation are not uniformly distributed over the months
 Step 2: Determine the appropriate statistical test

o ꭓ2 =
o With df = k – 1 - c
 Step 3: Set the level of significance
o Alpha = 0.05
 Step 4: Set the decision rule

o For a given significance 0.05, rules for acceptance and rejection of null hypothesis are given below:
 ꭓ2calculated > ꭓ2critical
21
o The critical value of ꭓ2 is = ꭓ20.05,5 = 11.07 where, d.o.f = 5, alpha = 0.05
Square Test):
• Step 5-6: Collect the sample data & Analyse the Data
Months f0 fe ꭓ2
• Expected Frequency : fe = Sum(f0) / Number of months
Jan
Feb
55
65
70
70
3.2142
0.3571
March 68 70 0.0571
• Expected Frequency : fe = 425 / 6 = 70.8333
April
May
72
80
70
70
0.0571
0.9142
June 85 70 2.0571
Total 425 8.1717
• Step 7: Arrive at the statistical conclusion

• The calculated value is 8.1717 which is less then tabular value, hence the
result is fall in the acceptance region, hence the null hypothesis is accepted
and the alternative hypothesis is rejected.
22
Square Test):
• Solution using MS Excel:
• Functions used:
• To get Probability: Formula > Functions > Statistics > CHISQ.TEST
• To get Final Result: Formula > Functions > Statistics > CHISQ.INV.RT
Months fo fe X^2 Probability X^2
Jan 55 70.83333 3.539216
Feb 65 70.83333 0.480392
March 68 70.83333 0.113333
April 72 70.83333 0.019216 0.14701995 8.171765
May 80 70.83333 1.186275
June 85 70.83333 2.833333
Total 425 425 8.171765
23
• Distribution free test
• Valid for any distribution
• Used in cases when the kind of distribution is unknown
• Tests to be discussed here:
• Sign Test for Median
• Test of Arbitrary Trend
• Sign Test for Median
• A median of the population is a solution x = µ’ of the equation where F(x) = 0.5 is the distribution function of the population.
• Steps:
• 1. Tests One Population Median, h
• 2. Corresponds to t-Test for 1 Mean
• 3. Assumes Population Is Continuous
• 4. Small Sample Test Statistic: # Sample Values Above (or Below) Median
• 5. Can Use Normal Approximation If n ³ 10
24
Solution:
• Example: Here α = 5%
• Suppose that eight radio operators were tested, P+ = P-; P = 0.5
first in rooms without air-conditioning and then in
air-conditioned rooms over the same period of X = No of positive calues along n values
time, and the difference of errors (unconditioned Sample have 8 values, remove 0 from it gives total 6 values,
minus conditioned) were
P (X = 6) = (█(6@6)) (0.5)^6 (0.5)^0
9 4 0 6 4 0 7 11 = (1) (0.0125625) (1)
• Test the hypothesis µ’=0 (that is, air-conditioning = 0.0156
has no effect) against the alternative µ’>0 (that is, = 1.56% < 5% Therefor here , µ’> 0
inferior performance in unconditioned rooms).
The number of errors made in unconditioned rooms is
significantly higher, so the installation of the air condition
should be considered
25
T_0 T_2 T_3
12345 12453 12543 23154
T_1 12534 13452 23415
• Test of Arbitrary Trend 12354 13254 13524 24135
31254
12435 13425 14253
31425
• Example: 13245 14235 14325
15234 32145
21345 21354 21453 41235
• A certain machine is used for cutting lengths of wire. Five 21435 Etc.
21534
successive pieces had the lengths 23145
31245
29 31 28 30 32
• Using this sample, test the hypothesis that there is no trend, that
is, the machine does not have the tendency to produce longer and
longer pieces or shorter and shorter pieces. Assume that the type
of machine suggests the alternative that there is positive trend, From this we obtain:
that is, there is the tendency of successive pieces to get longer.
P(T<=3) = 1/120 + 4/120 + 9/120 +15/120

Solution:
= 29/120
• We count the number of transposition in the sample: No of times
a larger value precedes a smaller value: = 24%
• 29 precedes 28 (1 Transposition)
• 31 precedes 28 & 30 (2 Transposition) We accept the hypothesis because we have observed an event
• Therefore here: total 3 Transpositions are there, we now consider that has a relatively large probability (certainly much more than
the random variable. If the hypothesis is true then each of the 5! = 5%) if the hypothesis is true.
120 permutations has the same probability (1/125)
26
6. Simple Linear Regression
Analysis (SLRA)
• Regression analysis is the process of developing a statistical model.
• It is used to predict the value of a dependent variable by at least one
independent variable.
• We consider the modelling between the dependent and one independent
variable. When there is only one independent variable in the linear regression
model, the model is generally termed as simple linear regression model.
• When there are more than one independent variables in the model, then the
linear model is termed as the multiple linear regression model
27
Analysis (SLRA)
• Determining the equation:
•
• SLRA is based on the slope intercept equation of line: y = ax + b
• b = y intercept of the line
• a = slope
• SLRA with respect to population parameters β0 & β1 can be given as
• y = β0 + β1x
• β0 = Population y intercept which represent the average value of dependent variable when x = 0  obtained.
• β1 = Slope of the regression line which indicates expected change in the value of y for per unit  After b0 & b1 are determined, researcher
change in the value of x can plot the graph and compare with its
original data.
• In case of dependent variable  Least square criterion is given by
• y = β0 + β1x + εi
• εi = random error  Slope
• The equation for simple regression line is given as o = SSxy/ SSxx
• b0 = sample y intercept which represent the average value of the independent variable when x = 0
• b1 = slope of the sample regression line
28
Analysis (SLRA)
Regression Model
Sample statistics
• b0 provides estimate
population
Estimate Regression parameters of β0 &
Equation β1
Sample Layout • b0 , b1 ^& y^ is
• x y computed
• x1 y1
• x2 y2
• . .
• . .
• xn yn
29
Analysis (SLRA)
• Example
• A cable wire company has spent heavily on advertisements. The sales and advertisement expenses (in
thousand rupees) for the 12 randomly selected months are given in table. Develop a regression model to
predict the impact of advertisement on sales.
Months Advertisement (in Sales (in thousand
thousand rupees) rupees)
January 92 930
February 94 900
March 97 1020
April 98 990
May 100 1100
June 102 1050
July 104 1150
August 105 1120
September 105 1130
October 107 1200
November 107 1250
December 110 1220
30
Analysis (SLRA)
• Solution:
Step 2
• Step 1
Advertisement Sales (in = 19.07044
Months (in thousand thousand x^2 xy
rupees) x rupees) y = -852.084
January 92 930 8464 85560
February 94 900 8836 84600
March 97 1020 9409 98940
April 98 990 9604 97020
= -852.084 + 19.07 x
May 100 1100 10000 110000
June 102 1050 10404 107100
July 104 1150 10816 119600  This indicates that for each unit increase in x, y is predicted
August 105 1120 11025 117600
Septembe to increase by 19.07 units.
11025 118650
r
October
105
107
1130
1200 11449 128400
 b0 indicates the value of y when x = 0
Novembe
11449 133750  When there is not expenditure in advertisement, sales is
r 107 1250
December 110 1220 12100 134200 predicted to decrease by 852.08 thousand rupees.
Total 1221 13060 124581 133540
31
Analysis (SLRA)
• Solution:
• Step 1
Advertisement Sales (in
Months (in thousand thousand x^2 xy
rupees) x rupees) y Regression Analysis
January 92 930 8464 85560 1300
February 94 900 8836 84600
1250
March 97 1020 9409 98940 f(x) = 19.07x - 852.08
April 98 990 9604 97020 1200
May 100 1100 10000 110000
1150
June 102 1050 10404 107100
July 104 1150 10816 119600 1100
August 105 1120 11025 117600 1050
Septembe
11025 118650
r 105 1130 1000
October 107 1200 11449 128400
950
Novembe
11449 133750
r 107 1250 900
December 110 1220 12100 134200
850
Total 1221 13060 124581 133540
800
90 95 100 105 110 115
32
7. Correlation
• Correlation measures the degree of association between two variables
• We will determine the method of finding out correlation between 2 variables using: Karl Pearson’s coefficient of
correlation.
• Karl Pearson’s coefficient of correlation.
•
• r lies between +1 to -1
• Relationship details are shown below,
33
7. Correlation
• Example:
• Table shows the sales revenue and advertisement expenses of a company for the past 10 months. Find the
coefficient of correlation between sales and advertisement.
Sales Advertisement
Month (x) (y)
January 110 10
February 120 11
March 115 12
April 128 13
May 137 11
June 145 10
July 150 9
August 130 10
Septemb
er 120 11
October 115 14
34
7. Correlation Sales Advertisemen
Month (x) t (y) xy x^2 y^2
• Solution: January 110 10 1100 12100 100
February 120 11 1320 14400 121
Coefficient of Correlation : March 115 12 1380 13225 144
April 128 13 1664 16384 169
May 137 11 1507 18769 121
June 145 10 1450 21025 100
July 150 9 1350 22500 81
August 130 10 1300 16900 100
September 120 11 1320 14400 121

• r = -0.51 October 115 14 1610 13225 196
Sum 1270 111 14001 162928 1253
• Result:
• This indicates that sales and
advertisement are negatively correlated
to the extent of -0.51. We can conclude
that an increase in the expenditure on
advertisement will not result in an
increase in sale.
35
Reference
• Book:
• Advanced Engineering Mathematics – Ervin Kreyzig
• Business Statistics – Naval Bajpai, Pearson
36
Thank You
• Open for discussion
37

Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics

Uploaded by

Copyright:

Available Formats

Probability & Statistics

Prepared by: Presented to:

Interval Equation Interval Equation

Step 2: Determine the appropriate statestical test

Step 3: Set the level of significance

Step 4: Set the decision rule

Step 5: Collect the sample data

Step 6: Analyse the data

Step 7: Arrive at a statistical conclusion and business implication 12

Step 6: Analyse the Data

Step 7: Arrive at a statistical conclusion and business implication

• Rejection in both tails

• Rejection in both tails

Step 2: Determine the appropriate statestical test

Step 3: Set the level of significance

Step 4: Set the decision rule

Step 5: Collect the sample data

Step 6: Analyse the data

Step 2: Determine the appropriate statestical test

Step 3: Set the level of significance

Step 4: Set the decision rule

Step 5: Collect the sample data

Step 6: Analyse the data

Step 7: Arrive at a statistical conclusion and business implication

• ꭓ2 value is compared with the critical value of ꭓ2.

 Step 1: Set null and alternative hypothesis

 Step 2: Determine the appropriate statistical test

 Step 4: Set the decision rule

• Step 7: Arrive at the statistical conclusion

P(T<=3) = 1/120 + 4/120 + 9/120 +15/120

• The equation for simple regression line is given as o = SSxy/ SSxx

September 120 11 1320 14400 121

You might also like