Professional Documents
Culture Documents
CHAPTER 1
z What is Statistics?
The discipline of statistics teaches us how to make intelligent judgments and informed decision in the presence of uncertainty and variation.
Branches of Statistics
F Some Definitions
Histogram
Constructing a Histogram for Discrete Data
First, determine the frequency or relative frequency of each X value. Then mark possible X values on a horizontal scale, above each value draw a rectangle whose height
is the frequency or relative frequency of that value.
Examples:
Median
This is the middle of the measurements when ordered them.
The position of the median =
n+1
2
Example: The prices for 14 different brands of water-packed light tuna are 0.99,
1.92,1.23, 0.85, 0.65, 0.53, 1.41, 1.12, 0.63, 0.67, 0.69, 0.60, 0.60, 0.66
Percentiles
pth percentile is the value of measurement that is more than p% of the measurements
in ordered data.
Quartiles Quartiles divide the data set into four equal parts, the first one is lower
quartile and the second one is median and the third one is upper quartile.
Box plot
Box plot describes center of data, how spread the data, the extend and nature of any departure from symmetry, and identification of outliers.
Order the n observations from smallest to largest and separate the smallest half from the
is included in both halves if n is odd. Then the lower forth is
largest half; the median X
the median of the smallest half and the upper forth is the median of the largest half. The
forth spread fs , is
fs = upper forth lower forth
In general, box plot is based on five number summary:
Smallest value, lower forth, median, upper forth, largest value.
5
(X
)
i
i
N
Population variance : 2 = i=1
= i=1
N
N
P
2
Pn
Pn
( n
2
i=1 Xi )
2
Sxx
i
i=1
2
i=1 (Xi X)
n
Sample variance : s =
=
=
n1
n1
n1
P 2
P
2
Note:
Xi = sum of squares of measurements and, ( Xi ) = square of the sum of
measurements.,
PN
Standard deviation
CHAPTER 2
z PROBABILITY
Probability is used in inference statistics as a tool to make statement for population from
sample information.
Experiment is a process for generating observations
Sample space is all possible outcomes of an experiment.
Event is a collection of one or some outcomes from sample space, usually denoted by
a capital letter.
Simple Event: The event that cannot decomposed.
Venn Diagram is used to show the result of an experiment, for this reason all simple
event show in a box by a point.
Tree Diagram is used when the experiment generated in several steps.
Some Relations Between Events
Union: The union of events A and B, denoted by A B is the event that contains all
outcomes that are either in A or B or both.
Intersection: The intersection of events A and B, denoted by A B is the event that
contains all outcomes that are in both A and B.
Complement: The complement of an event A, denoted by A0 is the event that contains
all outcomes in sample space S but not in A.
Two events are mutually exclusive or disjoint, if they dont have any common outcome, or when one event occurs, the other cannot, and vice versa.
Calculating Probability: P (A) is a measure of the chance that A will occur.
P (Ai )
i=1
Properties of Probability
P (A) = 1 P (A0 ) A.
P (A B) = 0 A and B mutually exclusive events.
P (A B) = P (A) + P (B) P (A B) A and B.
For any three events A, B, and C,
P (AB C) = P (A)+P (B)+P (C)P (AB)P (AC)P (B C)+P (AB C)
Example: Consider the following table
Used eyeglasses for reading
Judge to need eyeglasses
Yes
No
Yes
No
0.44
0.02
0.14
0.40
If a person is selected from this large group, find the probability of each event:
a. The adult is judged to need eyeglasses.
b. The adult needs eyeglasses for reading but does not use them.
c. The adult uses eyeglasses for reading whether he or she needs them or not.
Counting Techniques
One of the method for computing probability is using simple events and
P (A) =
n(A)
n
P (A B)
P (B)
if P (B) 6= 0
Example: A new magazine publishes three columns entitled Art (A),Book (B), and
Cinema (C). Reading habits of a randomly selected reader with respect to these columns
are
Read regularly
Probability
AB
AC
BC
ABC
0.08
0.09
0.13
0.05
Multiplication Rule
P (A B) = P (A|B)P (B) = P (B|A)P (A)
Law of Total Probability
If A1 , A2 , , Ak be mutually exclusive and exhaustive events,for an event B,
P (B) = P (B|A1 )P (A1 ) + + P (B|Ak )P (Ak ) =
k
X
P (B|Ai )P (Ai )
i=1
Bayes Rule
Let A1 , A2 , , Ak be mutually exclusive and exhaustive events,if an event B occurs, then
P (Aj |B) =
P (B|Aj )P (Aj )
P (Aj B)
= Pk
P (B)
i=1 P (B|Ai )P (Ai )
j = 1, k
Example: Only 1 in 1000 adults is afflicted with a rare disease for which a diagnostic test
has been developed. The test is such that when an individual actually has the disease, a
positive result will occur 99% of the time, whereas an individual without the disease will
show a positive test result only 2% of the time. If a randomly selected individual is tested
and the result is positive, what is the probability that the individual has the disease?
Independence
Two events A and B are independent if the
P (A B) = P (A)P (B)
A, B and C are mutually independent if
P (A B C) = P (A)P (B)P (C)
4
Example: Two cards are drown from a deck of 52 cards. calculate the probability that the
draw includes an ace and a ten.
Suggested Exercises for Chapter 2: 3, 11, 13, 17, 21, 23, 25, 45, 47, 49, 51, 63, 71, 73,
77,79, 80, 83, 87, 91
CHAPTER 3
Random Variable
A rule that associate a number to each outcome of an experiment (or each outcome in S) is
random variable.
Bernoulli random variable: Any random variable whose only possible values are 0 and 1
Example: Three automobiles are selected at random, and each is categorized as having a
diesel (S) or nondiesel (F) engine. If X=the number of cars among the three with diesel
engine, list each outcome in S and its associated X value.
y
p(y)
45 46 47 48 49 50 51 52 53 54 55
.05 .10 .12 .14 .25 .17 .06 .05 .03 .02 .01
a. What is the probability that flight will accommodate all ticketed passengers who show
up?
b. What is the probability that not all ticketed passengers who show up can be accommodate?
An automobile service facility specializing in engine tune-ups knows that 45% of all tuneups are done on four cylinder automobiles, 40% on six cylinder automobiles, and 15% on
eight-cylinder automobiles. Let X= the number of cylinders on the next car to be tuned.
What is the pmf of x?
1 if x = 0
p(x; ) =
if x = 1
0
otherwise.
Example: Starting at fixed time, we observe that the gender of each newborn child until a
boy (B) is born. Let p = P (B), and define the random variable X by X=number of birth
observed, then
(
(1 p)x1 p x = 1, 2, 3,
p(x) =
0
otherwise.
For any number x, F (x) is the probability that the observed value of X will be at most x.
Cumulative distribution function for random variable in above example is:
(
1 (1 p)[x]
x>1
F (x) =
0
x < 1.
2
1 2 3 4
.4 .3 .2 .1
Based on definition for cdf, for any two number a and b with a 6 b.
P (a 6 X 6 b) = F (b) F (a )
where a represent the largest possible X value that is strictly less than a. In particular, if
the only possible values are integers and if a and b are integers, then
P (a 6 X 6 b) = F (b) F (a 1)
Taking a = b yields P (X = a) = F (a) F (a 1).
0
x<1
0.3
16x<3
0.4
36x<4
F (x) =
0.45 4 6 x < 6
0.60 6 6 x < 12
1
12 6 x
a. What is the pmf of X?
b. Using just the cdf, compute P (3 6 X 6 6) and P (4 6 X).
Expected Values of Discrete Random Variable
E(X) = x =
xD
P
Expected value for a function h(x) is E[h(x)] = h(x)p(x)
Expected value for a linear function is E(aX + b) = aE(X) + b, therefore for any constant
E(aX) = aE(X)
E(X + b) = E(X) + b
3
(x )2 p(x) = E[(X )2 ].
Also
V (X) = E(X 2 ) [E(X)]2 =
p
The standard deviation of X is x = x2 .
hX
i
x2 p(x) 2 .
P
2
h(x) E[h(x)] p(x).
Compute
a. E(X)
b. V (X)
c. The standard deviation of X.
z Binomial Distribution
A binomial experiment is one that has these five characteristics:
1. The experiment consists of n identical trials.
2. Each trial results in one of two outcomes. The one outcome is called a success S, and
the other a failure, F .
3. The probability of success on a single trial is equal to p and probability of failure is
equal to (1 p) = q.
4. The trials are independent.
5. We are interested in X, the number of successes observed during the n trials, for
X = 0, , n.
4
Example: A marksman hits a target 80% of the time. He fires five shots at the target.
What is the probability that exactly 3 shots hit the target? What is the probability that
more than 3 shots hit the target?
Cumulative Probability Tables
You can use the cumulative probability tables to find probabilities for selected binomial
distributions.
5
N M
CxM Cnx
CnN
max(0, n N + M ) 6 x 6 min(n, M )
Example: A candy dish contains five blue and three red candies. A child reaches up and
selects three candies without looking.
a. What is probability that there are two blue and one red candies in the selection?
b. What is the probability that the candies are all red?
c. What is the probability that the candies are all blue?
Examples:
An instructor who taught two sections of engineers statistics last term, the first with 20
students and the second with 30, decided to assign a term project. After all projects had
been turned in, the instructor randomly order them before grading. Consider the first 15
graded projects.
a. What is the probability that exactly 10 of these are from the second section?
b. What is the probability that at least 10 of these are from the second section?
c. What is the probability that at least 10 of these are from the same section?
d. What is the mean value and standard deviation of the number among these 15 that are
from the second section?
e. What are the mean value and standard deviation of the number of projects not among
these first 15 that are from the second section?
A family decides to have children until it has three children of the same gender. Assuming
P (B) = P (G) = 0.5, what is the pmf of X = the number of children in the family?
E(X) =
z Poisson Distribution
The Poisson random variable x is a model for data that present the number of occurrences
of a specified event in a given unit of time or space.
Examples:
7
x e
,
x!
x = 0, 1, 2,
The mean and standard deviation of the Poisson random variable X are
Mean : E(X) =
Variance : V (X) =
Example: Suppose pulses arrive at the counter at an average rate of six per minute, what
is the probability that in a 0.5-min interval at least one pulse is received?
Cumulative Probability Tables
You can use the cumulative probability tables to find probabilities for selected Poisson
distributions.
Find the column for the correct value of .
The row marked k gives the cumulative probability, P (x 6 k) = P (x = 0) + +
P (x = k)
The Poisson Approximation to the Binomial Distribution
The Poisson probability distribution provides a simple, easy-to-compute, and accurate approximation to binomial probabilities when n is large and = np is small, preferably with
n > 50 and np < 5, i.e.
b(x; n, p) p(x, ) when n , p 0
Examples:
1. The number X of people entering the intensive care unit at the particular hospital on
any one day has a Poisson probability distribution with mean equal to five persons per
day.
a. What is the probability that the number of people entering the intensive care unit
one particular day is two? Less than or equal to two?
b. Is it likely that X will exceed 10? Explain.
2. Sporadic outbreaks of E.coli have occurred at a rate of 2.5 per 100,000 for period of
one year.
a. What is the probability that at most five cases of E.coli per 100,000 are reported
in a given year?
b. What is the probability that more than five cases of E.coli per 100,000 are reported
in a given year?
Suggested Exercises from Chapter 3: 7, 11, 13, 17, 23, 29, 39, 47, 49, 55, 57, 65,
69, 71, 73, 79, 81, 85, 95, 97, 101, 103, 109,
CHAPTER 4
Continuous Random Variables and Probability Distributions
z Basic definitions and properties of continuous random variables
Continuous random variable: A random variable is continuous if its set of possible values
is an entire interval of numbers.
Probability distribution for continuous variables: It is possible to construct a probability histogram (same as relative frequency histogram) for continuous variable. But by
measuring the variable more and more finely, the resulting histogram approaches to a smooth
curve. It is obvious that the total area under this curve is 1, also probability that the variable
be between two points is the area under the curve between two points. It means probability distribution or probability density function (pdf ) for a continuous random variable
X is a function f (x), such that
Z b
f (x)dx
P (a 6 X 6 b) =
a
z Uniform Distribution
A continuous random variable X has uniform distribution on interval [a, b] if the pdf of X is
(
1
a6X6b
ba
f (x; a, b) =
0
otherwise.
z Cumulative Distribution Function
Same as discrete random variable, the probabilities of intervals can be computed from F (x) as
P (X > a) = 1 F (a),
P (a 6 X 6 b) = F (b) F (a)
h(x)f (x)dx
x2
(x )2 f (x)dx = E(X )2
= V (X) =
p
V (X)
To compute P (a 6 X 6 b) when X N (, 2 ) is
Z b
(x)2
1
e 22 dx
a 2
For evaluating this expression, use standard normal ( = 0, = 1) which
tabulated for different values of a and b.
A random variable with = 0 and = 1 is called a standard normal distribution and denoted by Z. The pdf of Z is
x2
1
f (z; 0, 1) = e 2
2
Rz
The cdf of Z is P (Z 6 z) = f (y)dy which will denoted by (z).
z Notation
z denotes the value on z axis for which of the area under the z curve lies
to the right of z . Thus z is the 100(1 )th percentile of the standard
4
normal distribution.
Examples:
has a standard
b
b
a
a
6Z6
) = (
) (
)
npq, then if the probability histogram is not too skewed, X has approximately normal distribution with same mean and standard deviation and for
5
npq
The condition for this approximation is both np > 10 and nq > 10.
Example: Suppose only 40% of all drivers in a certain state wear a seatbelt.
A random sample of 500 drivers is selected, what is the probability that
a. Between 180 and 230 (inclusive) of the drivers in the sample wear a seatbelt?
b. Fewer than 170 of those in the sample wear a seatbelt? Fewer than 150?
The Gamma Distribution
Normal distribution is bell shape and symmetric, but there is many random
variables that have a skewed situation. For these kind of variables first define
the gamma function. For > 0, the gamma function () is
Z
() =
x1 ex
o
3. ( 12 ) =
In general, continuous random variable X has gamma distribution of the pdf
of X is
(
x
1
1
x
e
x>0
()
f (x; , ) =
0
otherwise.
and are the parameters of distribution and > 0, > 0.
For standard gamma distribution = 1, then pdf for standard gamma ran6
dom variable is
(
f (x; ) =
x1 ex
()
x>0
otherwise.
E(X) = =
Examples:
1.
Evaluate the following
(6)
(5/2)
F (5; 4)
P (3 < X < 8)
P (X < 4 or X > 6)
P (X > t + t0 |X > t0 ) =
CHAPTER 5
Jointly Distributed Random Variable
There are some situations that experiment contains more than one variable and researcher
interested in to study joint behavior of several variables at the same time.
Jointly Probability Mass Function for Two Discrete Distributed Random Variables:
Let X and Y are discrete random variables. The joint pmf p(x, y) is defined for each pair of
numbers (x, y) by
p(x, y) = P (X = x and Y = y),
then the probability P [(X, Y ) A] can find by
P [(X, Y ) A] =
XX
p(x, y),
(x,y) A
pY (y) =
p(x, y)
f (x, y)dydx.
a
f (x, y)dy
fY (y) =
Two continuous random variables X and Y are independent, if for every pair of x and y
f (x, y) = fX (x)fY (y)
Example: Each front tire on a particular type of vehicle is supposed to be filled to a pressure
of 26 psi. Suppose the actual air pressure in each tire is a random variable (X) for the right
tire and (Y ) for the left tire, with joint pdf
(
K(x2 + y 2 ) 20 6 x 6 30, 20 6 y 6 30
f (x, y) =
0
otherwise.
a. What is the value of K?
b. What is the probability that both tires are under filled?
c. What is the probability that the difference in air pressure between the two tires is at most
2 psi?
d. Determine the distribution of air pressure in the right tire alone.
e. Are X and Y independent rvs?
For two continuous rvs X and Y , the conditional pdf of Y given that X = x is
fY |X (y|x) =
f (x, y)
<y <
fX (x)
pY |X (y|x) =
p(x, y)
<y <
pX (x)
If X and Y be discrete
X and Y discrete
X and Y continuous
Also
Cov(X, Y ) = E(XY ) X Y
The correlation coefficient of two random variables is
Corr(X, Y ) = X,Y =
Cov(X, Y )
X Y
0
0
.02
5 .04
10 .01
y
5
.06
.15
.15
10
.02
.2
.14
15
.1
.1
.01
a. What is E(X + Y )?
b. What is expected value for maximum of X and Y ?
c. Compute the covariance for X and Y .
d. Compute for X and Y .
A statistic is any quantity that calculated from sample like sample mean (X).
Random variables X1 , X2 , Xn from a random sample of size n if
1. The Xi s are independent random variables.
2. Every Xi has the same probability distribution.
If X1 , X2 , Xn be a random sample from a distribution with mean and variance 2 , then
is unbiased
= X = X
1. E(X)
= 2 =
2. V (X)
x
2
n
N (, )
X
n
also
T N (n, n 2 )
The Central limit theorem
For a random sample X1 , X2 , Xn from a distribution with and 2 , sample mean has
2
approximately a normal distribution with mean and variance n , if n is sufficiently large.
(Also total sample has a normal distribution)
If n > 30, the central limit theorem can be used.
Example: The inside diameter of a randomly selected position ring is a random variable
with mean value 12 cm and standard deviation 0.04 cm.
is the sample mean for a random sample of n = 16 rings, where is the sampling
a. If X
centered, and what is the standard deviation of the X
distribution?
distribution of X
b. Answer the question part (a) for a sample size of n = 64 rings.
is more likely to be within 0.01 cm of 12 cm?
c. For which of the two random samples, X
6 12.01) when n = 64.
d. Calculate P (11.99 6 X
Example: Let X1 , X2 , X3 , X4 , X5 be the observed numbers of miles per gallon for the five
cars. suppose these variables are independent and normally distributed with 1 = 2 =
20, 3 = 4 = 5 = 21, and 2 = 4 for X1 and X2 and 2 = 3.5 for others, define Y as
Y =
X1 + X 2 X3 + X 4 + X5
2
3
Suggested Exercises for Chapter 5: 3, 5, 11, 13, 15, 19, 25, 27, 31, 37, 39, 41, 47, 49,
51, 55, 59, 63, 65, 69, 73, 75,
CHAPTER 6
Point Estimate
The goal in this section, is to estimate a parameter of population based on a random sample
of size n. If we consider a single number as a parameter estimate, named it point estimate.
Therefore, point estimate is a suitable statistics that its value computing from the sample
data. For example
sample mean, is a point estimator for population mean .
X,
sample median, is a point estimator for population median
X,
.
p, sample proportion, is a point estimator for population proportion p.
s2 , sample variance, is P
a point estimator for population variance 2 , another alternative
2
X)
.
as estimator for 2 is (X
n
xi
x
xi
X
Examples:
( + 1)x
0
06x61
otherwise.
CHAPTER 7
Statistical Intervals Based on a Single Sample
The point estimate report a single number that does not provide any information about the
precision and reliability of estimation. An alternative estimate is interval estimate. A confidence interval is always calculated by first selecting a confidence level, which is a measure
of the degree of reliability of the interval.
For example a confidence of 95% implies that 95% of all samples would give an interval that
includes the parameter.
Confidence Intervals for
Suppose that the parameter of interest is the population mean and also
The population distribution is normal.
The population standard deviation is known.
By knowing that the area under the standard normal curve between -1.96 and 1.96 is 0.95,
P (1.96 <
then
X
< 1.96) = 0.95
/ n
P (
x 1.96 < < x + 1.96 ) = 0.95 is a 95% CI for .
n
n
In general, a 100(1-)% confidence interval for the mean of a normal population when
the value of is known is given by
(
x z/2 , x + z/2 )
n
n
Choice of Sample Size The width of the 95% interval is 2(1.96)/ n which specify the
precision or accuracy of interval. This is possible to determine n by knowing this width, i.e.,
2
n = 2z/2
w
where w is width of interval.
= 58.3.
c. Compute a 99% CI for when n = 100 and X
= 58.3.
d. Compute a 82% CI for when n = 100 and X
e. How large must n be if the width of the 99% interval for is to be 1.0?
Z=
S/ n
has approximately a standard normal distribution, then
s
x z/2
n
is a 100(1-)% large-sample CI for , which is not related to the shape of the population
distribution.
Example: A random sample of 110 lightning flashes in a certain region resulted in a sample
average radar echo duration of 0.81 sec. and a sample standard deviation of 0.34 sec. Calculate a 99% CI for the true average echo duration , and interpret the resulting interval
pq
w2
s
x t,n1
n
3
CHAPTER 8
Test of Hypotheses Based on a Single Sample
Hypothesis testing is the method that decide which of two contradictory claims about the
parameter is correct. Here the parameters of interest are population mean and proportion.
Hypotheses and Test Procedures
In any hypothesis testing problem, there are:
Null Hypothesis, denoted by H0 which is the initially assumption about population parameter. H0 is an equality claim and the form of null hypothesis is H0 : = 0 where is
parameter of interest.
Alternative Hypothesis denoted by Ha is the contradictory to H0 and looks like one of
the following cases
Ha : > 0
Ha : < 0
Ha : 6= 0
Test of Hypotheses is a method for using sample data to decide that H0 should be reject
or not.
Test Procedures is a rule, based on sample data, for deciding whether to reject H0 and
contains:
Test Statistic, which is a function of sample data for making decision
Reject Region, which is the set of all test statistic values for which H0 will be rejected.
that the weight on each trial is normally distributed with = 0.2 kg. Let denote the true
average weight reading on the scale.
a. What hypotheses should be tested?
b. Suppose the scale is to be recalibrated if either x > 10.1032 or x 6 9.8968. What is the
probability that recalibration is carried out when it is actually unnecessary?
c. What is the probability that recalibration is judge unnecessary when in fact = 10.1
When = 9.83?
d. Let z = (
x 10)/(/ n). For what value c is the rejection region of part (b) equivalent
to the two tailed region either z > c or z 6 c?
e. If the sample size were only 10 rather than 25, how should the procedure of part (d) be
altered so that = 0.05?
f. Using the test part (e), what would you conclude from the following sample data:
9.981 10.006 9.857 10.107 9.888 9.728 10.439 10.214 10.190 9.793
g. Reexpress the test procedure of part (b) in terms of the standardized test statistic
10)/(/n)
Z = (X
0
X
/ n
has a standard normal distribution which is the test statistic for H0 : = 0 . First
consider Ha : > 0 , the rejection region calculate based on type I error (level of
significant denoted by ), if = 0.05 by using this fact that the distribution of Z is
standard normal, the cut of point c is 1.645 and H0 is rejected if z > 1.645. We have
same argument for other kinds of alternatives.
In general
Null hypothesis: H0 : = 0
Test statistic value: z =
0
/ n
Alternative hypothesis
Ha : > 0
z > z (upper-tailed test)
Ha : < 0
z 6 z (lower-tailed test)
Ha : 6= 0
z > z/2 or z 6 z/2 (two-tailed test)
Example: The melting point of 16 samples of a certain brand of hydrogenated vegetable oil was determined, resulting in x = 94.32. Assume that the distribution of
melting point is normal with = 1.20. Test H0 : = 95 versus Ha : 6= 95 using a
two-tailed level 0.01 test.
2. Large-Sample Tests
When the population is not normal and also is unknown, again based on CLT
Z=
s/ n
0
X
s/ n
0
s/ n
Example: The amount of shaft wear after a fixed mileage was determined for each of n = 8
internal combustion engines having copper lead as a bearing material, resulting in x = 3.72
and s = 1.254. Assuming that the distribution of shaft wear is normal with mean , use the
t test at level 0.05 to test H0 : = 3.5 versus Ha : > 3.5.
pp0
p0 (1p0 )/n
Ha : p > p0
z > z (upper-tailed test)
Ha : p < p0
z 6 z (lower-tailed test)
Ha : p 6= p0
z > z/2 or z 6 z/2 (two-tailed test)
These test valid if np0 > 10 and n(1 p0 ) > 10
Example: A random sample of 150 recent donations at a certain blood bank reveals that
92 were type A blood. Does it suggest that the actual percentage of type A donations differ
from 40%, the percentage of the population having type A blood Carry out a test of the
appropriate hypotheses using a significance level of 0.01. Would your conclusion have been
different if a significance level of 0.05 had been used?
P-Value is the smallest level at which H0 would be rejected. Once P-value has been determined, the conclusion at any particular level results from comparing the P-value to :
P-value 6 reject H0 at level
P-value > do not reject H0 at level
4
Suggested Exercises for Chapter 8: 3, 7, 15, 17, 19, 23, 27, 31, 35, 37, 39, 47, 49, 53,
57, 59, 61, 65
CHAPTER 9
Inference Based on Two Samples
z Test and Confidence Intervals for a Difference Between Two Population Means
Case I: Normal populations with known variances
Assumptions:
1. X1 , X2 , , Xm is a random sample from a population with mean 1 and variance 12
2. Y1 , Y2 , , Yn is a random sample from a population with mean 2 and variance 22
3. The X and Y samples are independent of one another
1. Null hypothesis:
wish to tests.
2. Alternative hypothesis:
One-Tailed Test
Ha : (1 2 ) > D0 or (1 2 ) < D0
Two-Tailed Test
(1 2 ) 6= D0
3. Test statistic:
z=
(
x1 x2 ) D0
q
12
22
+
m
n
12 22
+
m
n
1. Null hypothesis:
wish to tests.
2. Alternative hypothesis:
One-Tailed Test
Ha : (1 2 ) > D0 or (1 2 ) < D0
Two-Tailed Test
(1 2 ) 6= D0
3. Test statistic:
z=
(
x1 x2 ) D0
q
s21
s2
+ n2
m
Assumptions: The samples are randomly and independently selected from the two
populations and m > 30 and n > 30.
(1-)% Confidence Interval
r
x y z/2
s21 s22
+
m
n
1. Null hypothesis:
wish to tests.
2. Alternative hypothesis:
One-Tailed Test
Ha : (1 2 ) > D0 or (1 2 ) < D0
Two-Tailed Test
Ha : (1 2 ) 6= D0
3. Test statistic:
t=
(
x1 x2 ) D0
q
s21
s2
+ n2
m
s21 s22
+
m
n
Examples:
1. Random samples of 50 recent college graduates in each major were selected and the following information was obtained:
Major Education Social science
Mean
40554
38348
SD
2225
2375
x y t/2
a. Do the data provide sufficient evidence to indicate a difference in average starting salaries
for college graduates who majored in education and the social sciences? Test using = 0.05.
b. Find a 95% confidence interval for difference between means for the two groups in the
general population. Compare your result with part a.
2. A geologist collected the titanium contents of the samples, found using two different
methods:
Method 1: 0.011, 0.013, 0.013, 0.015, 0.014, 0.013, 0.010, 0.013, 0.011, 0.012
Method 2: 0.011, 0.016, 0.013, 0.012, 0.015, 0.012, 0.017, 0.013, 0.014, 0.015
a. Use an appropriate method to test for a significant difference in the average titanium
contents using the two different methods.
b. Determine a 95% confidence interval estimate for (1 2 ). Does your interval estimate
support your conclusion in part a?
H0 : (p1 p2 ) = 0, or alternatively H0 : p1 = p2 .
2. Alternative hypothesis:
One-Tailed Test
Two-Tailed Test
(p1 p2 ) 6= 0
3. Test statistic:
(
p1 p2 )
(
p1 p2 )
z = p p1 q1 p2 q2 = p pq pq
+ n
+ n
m
m
where p1 = x/m and p2 = y/n. Since the common value of p1 = p2 = p (used in the
standard error) is unknown,it is estimated by
p =
and the test statistic is
x+y
m+n
(
p1 p2 )
z=q
pq m1 + n1
z > z or z < z
Two-Tailed Test
or when p-value<
Assumptions: Samples are selected in a random and independent manner from two
binomial populations and m and n are large enough.
Example: Independent random samples of 280 and 350 observations were selected from
binomial populations 1 and 2 respectively. Sample 1 had 132 successes, and sample 2 had
178 successes. Do the data present sufficient evidence to indicate that the proportion of
successes in population 1 is smaller than the proportion in population 2?
Two-Tailed Test
Ha : d 6= 0
3. Test statistic: t =
d0
sd / n
d
sd / n
where
n =Number of paired differences
d = Mean of the sample difference
sd = Standard deviation of the sample differences
t > t (
Two-Tailed Test
t > t/2
or t < t/2
Albertsons
254.26
240.62
231.90
234.13
Ralphs
256.03
255.65
255.12
261.18
a. Is there a significant difference in the average prices for these two different supermarket
chains?
b. What is the approximate pvalue for the test conducted in part a?
c. Construct a 99% confidence interval for the difference in the average prices for the two
supermarket chains. Interpret this interval.
Suggested Exercises For Chapter 9: 3, 7, 19, 25, 29, 37, 39, 41, 43(b,c), 47, 49,
51,
6