Statistical Power

Statistical Power
Power refers to the ability of a study to detect a difference that is real. One way to think about this is in terms of false negative results: power refers to the likelihood of avoiding a false negative. A bit like your motorcycle: the more power it has, the less likely it is to get stuck in the mud. Real statisticians use more complex terms, of course, and speak of power as the probability of not making a beta, or a "Type II" error, which refers to falsely concluding that there was no difference (e.g., between experimental and control groups) when in fact there was a difference, but the study failed to show it. Any study involves only a sample of people from the population of interest, and there are several reasons why the study may fail to detect the real difference that exists in the population. What factors influence whether or not a study will be able to detect a real difference?
One of the most interesting introductions to the idea of statistical power is given in the 'OJ' Page which was created by Rob Becker to illustrate how the decision a jury has to reach (guilty vs. not guilt) is similar to the decision a researcher makes when assessing a relationship. The OJ Page uses the infamous OJ Simpson murder trial to introduce the idea of statistical power and illustrate how manipulating various factors (e.g., the amount of evidence, the "effect size", and the level of risk) affects the validity of the verdict. There are four interrelated components that influence the conclusions you might reach from a statistical test in a research project. The logic of statistical inference with respect to these components is often difficult to understand and explain. The four components are: sample size, or the number of units (e.g., people) accessible to the study effect size, or the salience of the treatment relative to the noise in
measurement alpha level (, or significance level), or the odds that the observed result is
due to chance power, or the odds that you will observe a treatment effect when it occurs Given values for any three of these components, it is possible to compute the value of the fourth. For instance, you might want to determine what a reasonable sample size would be for a study. If you could make reasonable estimates of the effect size, alpha level and power, it would be simple to compute the sample size. Some of these components will be more manipulable than others depending on the circumstances of the project. For example, if the project is an evaluation of an educational program or counseling program with a specific number of available consumers, the sample size is set or predetermined. The goal is to achieve a balance of the four components that allows the maximum level of power to detect an effect if one exists, given programmatic, logistical or financial constraints on the other components.
PUTRI NADIA BINTI ZULKIFLI
M20121000113
2|Page
Table below shows the basic decision matrix involved in a statistical conclusion. All statistical conclusions involve constructing two mutually exclusive hypotheses, termed the null (labeled H0) and alternative (labeled H1) hypothesis. Together, the hypotheses describe all possible outcomes with respect to the inference. The central decision involves determining which hypothesis to accept and which to reject. For instance, in the typical case, the null hypothesis and alternative hypothesis might be: H0: Program Effect = 0 H1: Program Effect < 0 or H1: Program Effect > 0
The null hypothesis is so termed because it usually refers to the "no difference" or "no effect" case. Usually in social research we expect that our treatments and programs will make a difference. So, typically, our theory is described in the alternative hypothesis. Table below is a complex figure that you should take some time studying. First, look at the header row (the shaded area). This row depicts reality, whether there really is a program effect, difference, or gain. Of course, the problem is that you never know for sure what is really happening. Nevertheless, because we have set up mutually exclusive hypotheses, one must be right and one must be wrong. Therefore, consider this knowing which hypothesis is correct. The first column of the 2x2 table shows the case where our program does not have an effect; the second column shows where it does have an effect or make a difference. The left header column describes the world we mortals live in. Regardless of whats true, we have to make decisions about which of our hypotheses is correct. This header column describes the two decisions we can reach, that our program had no effect (the first row of the 2x2 table) or that it did have an effect (the second row). Now, lets examine the cells of the 2x2 table. Each cell shows the Greek symbol for that cell. Notice that the columns sum to 1 (i.e., + (1 -) = 1 and + (1-) = 1). Why can we sum down the columns, but not across the rows? Because if one column is true, the other is irrelevant -- if the program has a real effect (the right column) it cant at the same time not have one. Therefore, the odds or probabil ities have to
PUTRI NADIA BINTI ZULKIFLI M20121000113 3|Page
sum to 1 for each column because the two rows in each column describe the only possible decisions (accept or reject the null/alternative) for each possible reality. Below the Greek symbol is a typical value for that cell. You should especially note the values in the bottom two cells. The value of is typically set at .05 in the social sciences. A newer, but growing, tradition is to try to achieve a statistical power of at least .80. Below the typical values is the name typically given for that cell (in caps). If you havent already, you should note that two of the cells describe errors -you reach the wrong conclusion -- and in the other two you reach the correct conclusion. Sometimes its hard to remember which error is Type I and which is T ype II. If you keep in mind that Type I is the same as the or significance level, it might help you to remember that it is the odds of finding a difference or effect by chance alone. People are more likely to be susceptible to a Type I error, because they almost always want to conclude that their program works. If they find a statistical effect, they tend to advertise it loudly. On the other hand, people probably check more thoroughly for Type II errors because when you find that the program was not demonstrably effective, you immediately start looking for why (in this case, you might hope to show that you had low power and high -- that the odds of saying there was no treatment effect even when there was were too high). Following the capitalized common name are several different ways of describing the value of each cell, one in terms of outcomes and one in terms of theory-testing. In italics, we give an example of how to express the numerical value in words. To better understand the strange relationships between the two columns, think about what happens if you want to increase your power in a study. As you increase power, you increase the chances that you are going to find an effect if its there (wind up in the bottom row). But, if you increase the chances that you wind up in the bottom row, you must at the same time be increasing the chances of making a Type I error! Although we cant sum to 1 across rows, there is clearly a relationsh ip. Since we usually want high power and low Type I Error, you should be able to appreciate that we have a built-in tension here.
M20121000113
4|Page
H0 true H1 false
H0 false H1 true
In reality... In reality...

There is a relationship There is a difference or gain Our theory is correct
There is no relationship There is no difference, no gain Our theory is wrong
We accept H0 We reject H1 We say...
1- (e.g., .95)
(e.g., .20)
"There "There "Our wrong"
is is theory
no no is
THE CONFIDENCE TYPE II ERROR LEVEL

The odds of saying there is no The odds of saying there is no relationship, difference, gain, when in fact there is none The odds of not confirming our The odds of correctly not theory when its true 20 times out of 100, when there is 95 times out of 100 when there is no effect, well say there is none an effect, well say there isnt relationship, difference, gain, when in fact there is one
relationship"
difference, no gain"
confirming our theory
M20121000113
5|Page
H0 true H1 false
H0 false H1 true
In reality... In reality...

There is a relationship There is a difference or gain Our theory is correct
There is no relationship There is no difference, no gain Our theory is wrong
We reject H0 We accept H1 We say...
(e.g., .05) TYPE I ERROR

is is theory a a
The odds of saying there is an
1- (e.g., .80) POWER

The odds of saying that there is an relationship, difference, gain, when in fact there is one
"There "There "Our correct"
relationship"
(SIGNIFICANCE LEVEL)
difference or gain"
is
relationship, difference, gain, when in fact there is not The odds of confirming our theory incorrectly 5 times out of 100, when there is no effect, well say there is on We should keep this small when we cant afford/risk wrongly
The odds of confirming our theory correctly 80 times out of 100, when there is an effect, well say there is We generally want this to be as large as possible
concluding that our program works
M20121000113
6|Page
With all of this in mind, lets consider a few common associations evident in the table. the lower the , the lower the power; the higher the , the higher the power the lower the , the less likely it is that you will make a Type I Error (i.e., reject the null when its true) the lower the , the more "rigorous" the test an of .01 (compared with .05 or .10) means the res earcher is being relatively careful, the researcher is only willing to risk being wrong 1 in a 100 times in rejecting the null when its true (i.e., saying theres an effect when there really isnt) an of .01 (compared with .05 or .10) limits ones chanc es of ending up in the bottom row, of concluding that the program has an effect. This means that both your statistical power and the chances of making a Type I Error are lower. an of .01 means you have a 99% chance of saying there is no difference when there in fact is no difference (being in the upper left box) increasing (e.g., from .01 to .05 or .10) increases the chances of making a Type I Error (i.e., saying there is a difference when there is not), decreases the chances of making a Type II Error (i.e., saying there is no difference when there is) and decreases the rigor of the test increasing (e.g., from .01 to .05 or .10) increases power because one will be rejecting the null more often (i.e., accepting the alternative) and, consequently, when the alternative is true, there is a greater chance of accepting it (i.e., power)
M20121000113
7|Page

Statistical Power

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Power

Uploaded by

Copyright:

Available Formats

Statistical Power

PUTRI NADIA BINTI ZULKIFLI

PUTRI NADIA BINTI ZULKIFLI

There is a relationship There is a difference or gain Our theory is correct

There is no relationship There is no difference, no gain Our theory is wrong

We accept H0 We reject H1 We say...

"There "There "Our wrong"

THE CONFIDENCE TYPE II ERROR LEVEL

confirming our theory

PUTRI NADIA BINTI ZULKIFLI

There is a relationship There is a difference or gain Our theory is correct

There is no relationship There is no difference, no gain Our theory is wrong

We reject H0 We accept H1 We say...

(e.g., .05) TYPE I ERROR

1- (e.g., .80) POWER

"There "There "Our correct"

concluding that our program works

PUTRI NADIA BINTI ZULKIFLI

PUTRI NADIA BINTI ZULKIFLI

You might also like