You are on page 1of 8

Hypothesis testing Usually the population parameters and are unknown and have to be estimated by sample statistics and

s. we know that a confidence interval can be used to represent the precision with which estimates . Here we will take this concept further by looking at ways to use the sample mean to test claims made about the population mean. More generally, hypothesis testing has many business applications. Example the testing required on information collected in a marketing research survey Example 1 An advertisement put out recently by a manufacturer of Compact Fluorescent Light (CFL) bulbs indicates that the average life time of their new product is 10000 hours. Imagine that you run the marketing division of competing company and your boss comes in demanding you find out whether the claim of the first company is correct. After all, your bulbs don't last as long. How would you approach evaluating the claim that = 10000 hours? you should take a sample of the life times of the company's light bulbs, e.g.:

The concept is that the sample mean will provide some information about the true mean Example 2 Okay, now suppose: You collect a sample, say n=100 bulbs You test the life times of each (wait approximately 10000 hours!) You compute the sample mean to find that = 9930 You also compute the sample standard deviation to find that s = 400 hours What then? How does this help you test their claim that the true mean = 10000 hours? This is where the concept of hypothesis testing comes in. The idea is that we're going to define a hypothesis and test it against some condition. Let's start by looking at what that condition is. Significance level The condition for a hypothesis test is often defined by something called a significance level. For now you should know the following: It is the probability of rejecting our hypothesis when the hypothesis is in fact true, e.g. probability that a device indicates you have a disease when in fact you do not It is represented by the symbol It is typically in the range: = 0.01 to 0.1 On the standard normal curve, looks like the following, which explains why it is called twotailed hypothesis testing:

The shaded areas are known as the rejection region. The manager is usually responsible for specifying a significance level based on the issue at hand.. Step 1: Define the hypothesis The primary, or null, hypothesis represented as H0. It is a statement concerning the population parameter that the researcher wishes to discredit. In the case of population means, the null hypothesis takes the form: H0: = 0 where 0 is the value of the mean to be tested. In the case described in Example 1, the null hypothesis is: H0: = 10000 We also usually define an alternative hypothesis, Ha, which is the complementary statement to H0. In Example 1, this is: Ha: 10000 When solving problems of this type, you need to write both the null and alternative hypothesis in your solution. Be clear however, that the task of hypothesis testing is to either reject H0 or fail to reject H0. Step 2: Sketch the situation The next step is to sketch the situation, including the rejection region and the z-scores. Let's do this assuming we've been told to use a significance level = 0.100. First, draw and label the rejection regions:

Then add the z-scores, which you should know because you can figure out that the area between 0 and z is 0.45 (and because of all the z-score values we summarized for your equation sheet):

Computationally, the goal of the test is to find the z-score corresponding to , for which we use the symbol z*, and looking to see whether it falls in one of the rejection regions:

Let's do this now to find out. Step 3: Compute the test statistic (z-score) Finding the test statistic z* just means finding the z-score for . You should be familiar enough now to recognize that the following applies for this:

Because we don't know the population standard devation equation with the following:

, we'll approximate the above

Plugging in the numbers from Example 2 we get: z* = (9930 - 10000) / (400/sqrt(100)) = -1.75 Step 4: Determining whether to reject So, you can add the following to your sketch, showing that because -1.75 is less than -1.645, z* does fall in the lower rejection region as follows:

This means that the null hypothesis should be rejected. You could write something like: I reject H0 because -1.75 is less than -1.645 Step 5: Give a conclusion Now just give a succint conclusion, e.g.: Based on a sample of 100 and a significance level of 0.10, the average life time of the CFL bulbs is not 10000. Notes: If you reject H0, you can say that there is a difference (as we did above) But if you don't reject H0, you should NEVER say that you proved there is no difference - you didn't prove that! - you can only say that we cannot reject their claim That's it! In the case presented in Example 1, you could now tell your boss that the claim made by the competing company is not supported by your data at the 0.10 significance level. Now let's summarize the basic steps: 1. Define the hypotheses, i.e.: H0: = some number Ha: that number 2. Sketch the situation, e g.:

3. Compute the test statistic, i.e. using:

4. Determine whether to reject 5. Give a conclusion You should perform each of these steps every time you have a two-tailed test of the mean. Let us compare the hypothesis testing situation to the confidence interval situation. Let's do that with another example. Example 3 a) Compute the confidence interval for the level of confidence corresponding to the significance level =0.10 in Example 2. Use the same values of n = 100, = 9930, s = 400. b) Does the confidence interval support your hypothesis testing conclusion obtained above? c) Sketch both situations together Recall that: For large n, for n > 30. We can use : CI = [ - z ( ), + z ( )] A significance level of 0.10 corresponds to a confidence level of 0.90 or 90%. Convince yourself of this using the following sketch:

z is therefore = -1.645. Filling in the values, you get: CI = [ 9930 - 1.645

400 , 100

9930 + 1.645

400 ] = [ 9864.21, 9995.79 ] 100

We found that: CI = [ 9864.21, 9995.79 ] This is range in which we are 90% confident the true mean will fall. The company is claiming that 0 = 10000, which does not call in this range. So, yes, the confidence interval supports the conclusion I came to with hypothesis testing. The following sketch shows both situations. Notice that the complementary nature of the two: In the first case, we are testing if 0 is outside the confidence interval (which it is here) In the second case we are testing whether is in one of the rejection regions (which it is here)

Practice Problem 2 (ES) You work for a manufacturer of Compact Fluorescent Light (CFL) bulbs that has been selling standard CFL bulbs for many years. The average life time of its standard bulbs is 10000 hours. The company has recently come out with a new type of bulb that it believes last longer on average. You take a sample of 65 of the new bulbs and find that the average life time is 10100 hours and the standard deviation is 450 hours. a) Using a 5% level of significance, determine whether there is a significant difference in the average life time of the new bulbs b) Find the confidence interval for the average lifetime for a 95% confidence level c) Do the hypothesis test and confidence interval lead you to the same conclusion? Solution a) Step 1: Define the hypotheses: H0: = 10000 Ha: 10000 Step 2: Sketch the situation:

Step 3: Compute the test statistic:

= (10100 - 10000) / (450/sqrt(65)) = 1.7916 Step 4: Determine whether to reject: Since 1.7916 is less than 1.96, we don't reject H0 Step 5: Give a conclusion Based on the sample of 65 of the new bulbs and a significance level of 5%, the average life time of the new bulbs is not significantly different from 10000 b) Do this to find: CI = [ 9990.6034, 10209.3966 ]

c)

For the confidence interval, we would notice that 0 = 10000 is in the confidence interval, i.e. between 9990.6034 and 10209.3966, so we would not reject. So yes, both methods lead us to the same conclusion.

Example A donut factory has just completed contract negotiations with the union representing its workers. The company claims that the workers' average salary is now $400 per week. In a survey of 28 workers, the average wage was found to be $378.86 and the standard deviation was $49.20. The wages are known to be normally distributed. Is the company's claim incorrect? Use a level of significance of 2%. Solution Given: = 378.86 s = 49.20 n = 28[1] Step 1: Hypotheses: H0: = 0 = 400 Ha: 400 Step 2: Sketch the situation Here, you should show that the area on the outsides is 0.01 and that z=2.33 Step 3: Compute the test statistic: z* = -2.274 Step 4: Determine whether to reject: Do not reject H0 since -2.33 < -2.274 Step 5: Draw a conclusion: Based on the sample of 28 and a significance level of 2%, the average salary is not significantly different from $400. Evidence was not found that the company's claim is incorrect. yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy Suppose we want to show that only children have an average higher cholesterol level than the national average. It is known that the mean cholesterol level for all Trinis is 190. Construct the relevant hypothesis test: H0: m = 190 H1: m > 190 We test 100 only children and find that x = 198 and suppose we know the population standard deviation s = 15. Do we have evidence to suggest that only children have an average higher cholesterol level than the national average? We have

z is called the test statistic. Since z is so high, the probability that Ho is true is so small that we decide to reject H0 and accept H1. Therefore, we can conclude that only children have a higher average cholesterol level than the national average. Example 50 smokers were questioned about the number of hours they sleep each day. We want to test the hypothesis that the smokers need less sleep than the general public which needs an average of 7.7 hours of sleep. We follow the steps below. A. Compute a rejection region for a significance level of .05. B. If the sample mean is 7.5 and the population standard deviation is 0.5, what can you conclude? Solution First, we write write down the null and alternative hypotheses H0: m = 7.7 H1: m < 7.7

This is a left tailed test. The z-score that corresponds to .05 is -1.645. The critical region is the area that lies to the left of -1.645. If the z-value is less than -1.645 there we will reject the null hypothesis and accept the alternative hypothesis. If it is greater than -1.645, we will fail to reject the null hypothesis and say that the test was not statistically significant. We have

Since -2.83 is to the left of -1.645, it is in the critical region. Hence we reject the null hypothesis and accept the alternative hypothesis. We can conclude that smokers need less sleep ppppppppppppppppppp Hypothesis Testing for a Population Proportion We have seen how to conduct hypothesis tests for a mean. We now turn to proportions. The process is completely analogous, although we will need to use the standard deviation formula for a proportion. Example Suppose that you interview 1000 exiting voters about who they voted for governor. Of the 1000 voters, 550 reported that they voted for the democratic candidate. Is there sufficient evidence to suggest that the democratic candidate will win the election at the .01 level? H0: p =.5 H1: p >.5

Since it a large sample we can use the central limit theorem to say that the distribution of proportions is approximately normal. We compute the test statistic:

Notice that in this formula, we have used the hypothesized proportion rather than the sample proportion. This is because if the null hypothesis is correct, then .5 is the true proportion and we are not making any approximations. We compute the rejection region using the z-table. We find that zc = 2.33. The picture shows us that 3.16 is in the rejection region. Therefore we reject H0 so can conclude that the democratic candidate will win with a p-value of .0008. Example 1500 randomly selected pine trees were tested for traces of the Bark Beetle infestation. It was found that 153 of the trees showed such traces. Test the hypothesis that more than 10% of the Tahoe trees have been infested. (Use a 5% level of significance) Solution The hypothesis is H0: p = .1 H1: p > .1 We have that

Next we compute the z-score

Since we are using a 95% level of significance with a one tailed test, we have zc = 1.645. The rejection region is shown in the picture. We see that 0.26 does not lie in the rejection region, hence we fail to reject the null hypothesis. We say that there is insufficient evidence to make a conclusion about the percentage of infested pines being greater than 10%.

You might also like