You are on page 1of 4

7 The Central Limit Theorem

7.1 Statistics and Their Distributions


We denote the observations in a single sample by x1 , x2 , , xn . Before we obtain data, there is uncertainty about the value of each xi . Because of this uncertainty, before the data becomes available we view
each observation as a random variable and denote the sample by X1 , X2 , , Xn .
This variation in observed values in turn implies that the value of any function of the sample observations
such as the sample mean and sample standard deviation also varies from sample to sample. That is,
prior to obtaining x1 , x2 , , xn there is uncertainty as to the value of x, the value of s, and so on.
Definition: A statistic is any quantity whose value can be calculated from sample data.
Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result.
Therefore, a statistic is a random variable and will be denoted by an uppercase letter; a lowercase letter is
used to represent the calculated or observed value of the statistic.
For instance, the sample mean, regarded as a statistic (before a sample has been selected or an experiment carried out), is denoted by X; the calculated value of this statistic is x.
Example 1. Suppose that material strength for a randomly selected specimen of a particular type has
a certain distribution with mean = 4.4311, median
e = 4.1628, and standard deviation = 2.316. We
take 6 samples with sample size n = 10 from this distribution (material strengths for six different groups
of ten specimens each). Following table shows the results obtained from MINITAB. Comment on your
observations.

7.2 Random Sample


The probability distribution of a statistic is sometimes referred to as its sampling distribution. This
emphasizes that it describes how the statistic varies in value across all samples that might be selected.
There are several methods to select a sample. Random sampling is one sampling method often encountered (at least approximately) in practice.
Definition: The random variables X1 , X2 , , Xn are said to form a (simple) random sample of size n if
the Xi s are independent random variables and every Xi has the identical (same) probability distribution
and often denoted as iid (independent and identically distributed ).
If sampling is either with replacement or from an infinite (conceptual) population, these conditions are
satisfied exactly. These conditions will be approximately satisfied if sampling is without replacement, yet
the sample size n is much smaller than the population size N . (In practice, if n/N .05.)
There are two general methods for obtaining information about a statistics sampling distribution. One
method involves calculations based on probability rules, and the other involves carrying out a simulation
experiment.
7.3 Deriving a Sampling Distribution
Example 2. A large automobile service center charges $40, $45, and $50 for a tune-up of four-,six-, and
eight-cylinder cars, respectively. If 20% of its tune-ups are done on four cylinder cars, 30% on six-cylinder
cars, and 50% on eight-cylinder cars, then the probability distribution of revenue from a single randomly
selected tune-up is given below.
Check:
Population mean = 46.5
x
40 45 50
Population variance 2 = 15.25
p(x) .2 .3 .5
Population standard deviation = 3.9
Suppose on a particular day only two servicing jobs involve tune-ups. Let
X1 = the revenue from the first tune-up and X2 = the revenue from the second.
Suppose that X1 and X2 are independent, each with the probability distribution shown above.
(a) List all possible samples [(x1 , x2 ) pairs] with corresponding probabilities, sample mean x and sample
variance s2 .
(b) Give the sampling distribution (probability distribution) of sample mean X and sample variance S 2 .
(c) Calculate the expectation of sample mean X, E(X) and variance of sample mean X, V (X).
(d) Identify any relation between population mean and E(X).
(e) Identify any relation between population variance 2 and V (X).
(a)
x1
40
40
40
45
45
45
50
50
50

x2
40
45
50
40
45
50
40
45
50

P(x1 , x2 )
.04
.06
.10
.06
.09
.15
.10
.15
.25

x
40
42.5
45
42.5
45
47.5
45
47.5
50

s2
0
12.5
50
12.5
0
12.5
50
12.5
0

(b)
x
p(x)
s2
p(s2 )

40
.04

42.5
.12

0
.38

12.5
.42

45
.29

47.5
.30

50
.25

50
.20

(c) E(X) = 46.5 and V (X) = 7.625.


(d) E(X) =
(e) V (X) = 12 2

7.4 Simulation Experiments


Example 3. Consider a simulation experiment in which the population distribution is quite skewed. Figure below shows the density curve for lifetimes of a certain type of electronic control with = E(X) = 21.7,
and V (X) = 82.1.

Note that this is not a normal curve and it is not


symmetric. [This is actually a lognormal distribution
with E(ln(X)) = 3 and V (ln(X)) = .16.]
Again the statistic of interest is the sample mean X. The experiment utilized 500 replications for the
sample sizes n = 5, 10, 20 and 30. Observe the corresponding histograms. Comment on your observations.

Observe that X based on a large n tends to be closer to than does X based on a small n and as n
increases the histograms look more like a normal curve.

7.5 The Distribution of the Sample Mean


Theorem: Let X1 , X2 , , Xn be a random sample from a distribution with mean value and standard
2
deviation . Then E(X) = and V (X) = n . Further, if T = X1 + X2 + + Xn , then E(T ) = n and
V (T ) = n 2 .
Example 4. In a notched tensile fatigue test on a titanium specimen, the expected number of cycles to
first acoustic emission (used to indicate crack initiation) is = 28, 000, and the standard deviation of the
number of cycles is = 5000. Let X1 , X2 , , X25 be a random sample of size 25, where each Xi is the
number of cycles on a different randomly selected specimen.
(a) Find the expected value of the sample mean number of cycles until first emission.
(b) Find the standard deviation of the sample mean.
(c) Find the expected total number of cycles for the 25 specimens.
(d) Find the standard deviation of the total number of cycles.
(e) Repeat (a)-(d) if the sample size increases to n = 100 and compare your answers.
Answers.

(a) E(X) = 28, 000 (b) X = 5000


= 1000 (c) E(T ) = 28, 000 (d) T = 5000 25 = 25, 000 (e) Expected
25
values remain unchanged. Standard deviation of sample mean decreases to 500 and standard deviation of
total number increases to 50,000.
Theorem: Let X1 , X2 , , Xn be a random sample from a normal distribution with mean and standard
deviation . Then for any n, X is normally distributed (with mean and standard deviation n ). Further,

T is normally distributed (with mean n and standard deviation n).


Example 5. The time that it takes a randomly selected rat of a certain subspecies to find its way through
a maze is a normally distributed random variable with = 1.5 min and = .35 min. Suppose five rats
are selected. Let X1 , , X5 denote their times in the maze. Assume the Xi s to be a random sample from
this normal distribution.
(a) What is the probability that the total time for the five is between 6 and 8 min?
(b) What is the probability that the sample average time X is at most 2 min?
Answers.
 6 7.5

8 7.5 
2 1.5 
(a) P (6 < T < 8) = P
<Z<
= .7115 (b) P (X < 2) = P Z <
= .9993
.783
.783
.1565
7.6 The Central Limit Theorem (CLT)
Central Limit Theorem: Let X1 , X2 , , Xn be a random sample from a distribution with mean and
variance 2 . Then if n is sufficiently large, X has approximately a normal distribution with E(X) = and
2 h
i
V (X) =
. X = . (Rule of Thumb: If n > 30, the Central Limit Theorem can be used.)
n
n
Example 6. The amount of a particular impurity in a batch of a certain chemical product is a random
variable with mean value 4.0 g and standard deviation 1.5 g. If 50 batches are independently prepared,
what is the (approximate) probability that the sample average amount of impurity X is between 3.5 and
3.8 g?
 1.5 
Apply the CLT; Then X N 4,
= N (4, .2121). Thus,
50
 3.5 4
3.8 4 
P (3.5 X 3.8) = P
Z
= P (2.36 Z .94) = .1645
.2121
.2121

You might also like