Professional Documents
Culture Documents
Dr. K.S.Basavarajappa Professor & Head Department of Mathematics Bapuji Institute of Engineering and of Technology Davangere-577004
Email: ksbraju@hotmail.com
Statistical Inference: It is necessary to draw some valid and reasonable conclusions concerning a large mass of individuals or things. Every individual or the entire group is known as population. Small part of this population is known as a sample. The process of drawing some
valid and reasonable conclusion about the entire population is Statistical Inference. Random sampling: A large collection of individuals or attributed or numerical data can be understood as population or universe. A finite subset of the universe is called a sample. number of individuals in a sample is called a Sample Size (n). Sampling distribution: For every sample size (n) we can compute quantities like mean, median, standard deviation etc., obviously these will not be the same. Suppose we group these characteristics according to their frequencies, the frequency distributions so generated are called Sampling Distributions. The sampling distribution of large samples are assumed to be a normal distribution. The standard deviation of a sampling The
Making certain assumption to arrive at a decision regarding the population a sample population will be referred to as hypothesis The hypothesis formulated for the purpose of its rejection under the assumption that the true is called as the null hypothesis denoted as H0 . Errors In a test process there can be four possible situations lead to the two types of errors and same is tabulated as follows: Accepting hypothesis Hypothesis is true Correct decision the Rejecting hypothesis Wrong decision Type I error Hypothesis is false Wrong decision Type II error In order to minimize both these types of errors we need to increase the sample size. Significance level: The probability level below which leads to the hypothesis is known as the significance level. This probability is conventionally fixed at 0.05 or 0.01 i.e., 5% or 1% Therefore rejecting hypothesis at 1% level of significance, implies that at 5% level of significance, there may be errors of either types (Type I or II) is 0.05. Correct decision the
TESTS OF SIGNIFICANCE AND CONFIDENCE INTERVALS The process which helps us to decide about the acceptance or rejection of the hypothesis is called as the test of significance. Suppose that we have a normal population with mean and S . If x is the sample mean of a random sample size (n), the = (1)
D as
quantity t defined by
is called as the standard normal variate (SNV) whose x = 0 , = From the table of the normal areas, we find that 95% of the
area lies between t = -1.96 and t = 1.96 1.96 x+ Further 5% level of significance is denoted by t0.05, therefore, 1.96 1.96 +
1.96 x .
(2) 1.96 .
1.96 and x
(3) Similarly from the table of the normal areas 99% of the area lies between
(4) Therefore representation (3) is that 95% confidence interval and Representation (3) is the 99% confidence level.
Graph:
Tests of significance for large samples: Let N be the large sample having n members. Let p and q denote number of success and failure respectively, then p+ q = 1. By binomial distribution, N (p + q) n denotes the frequencies of samples. Therefore N (p + q) n denotes the sampling distribution of the number of successes in the sample. We know that by binomial distribution then, = =
and
=p
The standard normal variate Z is defined as, If Z 2.58, we conclude that the differences is highly significant and reject the hypothesis. Then p 2.58 p + 2.58 be the probable Z= =
limits of Z. p 2.58
For a normal distribution, only 5% of members lie outside 1.96 while only 1% of the members lie outside 2.58 =
If x be the observed number of successes in the sample and Z is the standard normal variate the Z =
We have the following test of significance If Z < 1.96, difference between the observed and expected number of successes is not significant. If Z > 1.96 difference is significant at 5% level of significance. If Z > 2.58, difference is significant at 1% level of significance.
Example: A coin is tossed 1000 times and it turns up head 540 times , decide on the hypothesis is un biased . Solution: Let us suppose that the coin is unbiased Since p + q = 1, q =
= 1000 = 500
= 540 500 = 40 =
= 540 = 40
1000
Example:
A survey was conducted in one locality of 2000 families by selecting a sample size 800. It was revealed that 180 families were illiterates. Find the probable limits of the literate families in a population of 2000.
= 0.225
2.58
Therefore Probable limits of illiterate families in a sample of 2000 is = 0.187 2000 = 374 and 526 0.263 2000
Example: A die was thrown 9000 times and a throw of 5 or 6 was obtained 3240 times. On the assumption of random throwing, do the data indicate an unbiased die. Solution: Suppose the die is unbiased then Probability of throwing 5 or 6 with one die = p(5) or p(6) = p(5) + p(6) = (1/6 ) + (1/6) = 1/3 q = 1-p = 1- (1/3) = 2/3
Then expected number of successes np = But the observed value of successes = 3240 = 240 1 , q =, np = 3000 3 = say
1 9000 = 3000 3
Z SNV =
x np npq
9000 =
1 2 = 44.72 3 3
A biased dice is tossed 500 times a particular appears120 times. Find the 95% confidence limit of obtaining the value. Also find the standard error of proportion of success (Use binomial distribution). Solution: Let p = = 0.24
then q = 0.76, n = 500. Standard error = 9.55 Then mean proportion of success = np/n = p = 0.24 and
mean proportion of S. E =
The interval is [101 , 138 ]. We say that with 95% confidence that out of 500 times always we get particular number between 101 and 138 times.
101 np 138
Degrees of freedom (d.f ) It is the number of values in a set which may be set arbitrarily. d.f = n -1 for n number of observations d.f = n -2 for n -1 number of observations d.f = n -3 for n - 2 number of observations etc. Ex: for 25 observations we have 24 d.f
Students t distribution It is to test the significance of a sample mean for a normal population where the population S is not known. It is given by where = = = , = ,
We need to test the hypothesis, whether the sample mean significantly from the population mean . If the calculated value of t i.e.
1 1
differs
Example: A machine is expected to produce nails of length 3 inches. A random sample of 25 nails gave an average length of 3.1 inches with standard deviation 0.3 can it be said that the machine is producing nail as per the specification.(value of students t 0.05 for 24 d.f is 2.064 ) Solution: Given =3, = 3.1 , n = 25 , s = 0.3
The hypothesis that the machine is producing nails as per speci ication is accepted at 5% level of signi icance .
Example: Ten individuals are chosen at random from a population and their heights in inches are found to be 63, 63, 66, 67, 68, 69, 70, 71,71, test the hypothesis that the mean height of the universe is 66 inches (value of t 0.05 = 2.262 for 9 d.f). Solution: We have = = = = 66 , n = 10, d.f = 9 = 67.8 =9.067 =
.
71 67.8 We have
63 67.8
+ +
S = 3.011 =
the problem)
Example: Eleven school boys were given a test in drawing. They were given a months further tution and a second test of equal difficulty was healed at the end of it do the marks give evidence that the students have benefitted by extra coaching (t 0.05 for d.f = 10) = 2.228
Boys
2 20
3 19
4 21
5 18
6 20
7 18
8 17
9 23
10 16
11 19
19
22
18
20
22
20
20
23
20
17
Chi-Square distribution: (
It provides a measure of correspondence between the Theoretical frequencies and observed frequencies Let Oi ( i = 1 , 2 , .. n ) observed frequencies Ei ( i = 1 , 2 , .. n ) estimated frequencies The quantity = (chi square) distribution is defined as ; degrees of freedom = n-1
Chi square test as a test of goodness of fit: test helps us to test the goodness of fit of the distributions such as Binomial, Poisson and Normal distributions. If the calculated value of is less than the table value of
at a specified level of significance, the hypothesis is accepted. Otherwise the hypothesis is rejected.
Example: A die is thrown 264 times and the number appearing on the face (x) follows the following frequency distribution x f 1 40 2 32 3 28 4 58 5 54 6 60
Calculate the value of Solution: Frequencies in the given table are the observed frequencies. Assuming that the die is unbiased the expected number of frequencies for the numbers 1, 2, 3,4,5,6 to appear on the face is 264/6 = 44 each Then the data is as follows No. on the 1 2 die Observed 40 32 frequency(Oi) Expected 44 44 frequency(Ei) = = 22 3 28 44 4 58 44 5 54 44 6 60 44
Example:
Five dice were thrown 96 times and the numbers 1 or 2 or 3 appearing on the face of the die follows the following frequency distribution No. of 5 dice showing 1 or 2 or 3 Frequency 7 4 3 2 1 0
19
35
24
Test the hypothesis that the data follows a binomial distribution. Solution: Probability of a single die throwing 1 or 2 or 3 is P = 1/6+1/6+1/6 = q= Binomial distribution to fit the data N q+p
1 1 = 96 + 2 2 =96
,96 5
4 19 15
3 35 30
2 24 30
1 8 15
0 3 3
Hence the hypothesis that the data follows a binomial distribution is rejected.
Example:
Fit the Poisson distribution for the following data and test the goodness of fit given that X20.05 = 7.815 for degrees of freedom = 4 x f 0 1 122 60 2 15 3 2 4 1
Ne-mmx/x! = 200
where x = 0, 1, 2, 3, 4
= 7.815
Example: The number of accidents per day (x) as recorded in a textile industry over a period of 400 is given below. Test the goodness of fit in respect of Poisson distribution of fit to the given data x f 0 1 2 173 168 37 3 18 4 3 5 1
0.05
= 9.49
In experiments of pea breeding, the following frequencies of seeds were obtained Round & yellow 315 Wrinkled & Round & yellow 101 green 108 Wrinkled & total green 32 556
Theory predicts that the frequency should be in proportion 9:3:3:1. Examine the correspondence between theory and experiment.
Solution:
Corresponding frequencies are 313, 104, 104, 35. The calculated value of = 0.51 <
0.05