You are on page 1of 24

The STANDARD ERROR OF MEASUREMENT, (which we will

discuss in detail shortly), ESTIMATES the RANGE of scores


within which an individual’s true score or true level of ability lies.

Student A gets a 75 on a test, we can only hope that A’s TRUE


SCORE - her actual level of ability- is somewhere around 75.
The closer the reliability of the test is to perfect (r = 1.00), the more
likely it is that the true score is very close to 75.
ERROR
•If your obtained scores do not always reflect your true ability (if
they underestimated or overestimated your true ability), then they
were associated with some error.

•In other words, your OBTAINED SCORE has a TRUE SCORE


component (actual level of ability, skill, knowledge), and an
ERROR component (which acts to raise or lower the obtained
score).

•Obtained score = True score +/- error score


The Standard Error of Measurement (Sm)
•The standard error of measurement is the STANDARD
DEVIATION of the ERROR scores of a test.

Although we can never know the error scores, we can ESTIMATE


the standard error of measurement by using the following formula
where r is the reliability of the test:

Sm = SD✓1- r

•Where r is the reliability of the test.


Using the Standard Error of Measurement

•The distribution of error scores approximates the normal


distribution.

•We can extend this information to construct a band around any


obtained score to identify the range of scores that, at a certain level
of confidence, will capture or span an individual’s true score.
The SEM can be used to provide the following:
To make an estimation of the value of a person’s true score. In
other words, we can use it to predict what would happen if a person
took additional equivalent tests.

68% of the scores would fall between +or - 1 SEM of the true score.
95% of the scores would fall between +or - 2 SEM of the true score.
99.7% of the scores would fall between +or - 3 SEM of the true
score.
Thus, if a person achieved a score of 80 on a math test, and the
SEM for that test was 5, then we could state the following:

68% of the scores would fall between ____ and____

95% of the scores would fall between ____and ____

99.7% of the scores would fall between ___and____


When applied to the prediction of future test performance,
these ranges are known as CONFIDENCE INTERVALS
That is, we can be:

68%of the scores would fall between +or - 1 SEM of the true score.

95%of the scores would fall between +or - 2 SEM of the true score.

99.7% of the scores would fall between +or - 3 SEM of the true score.
Confidence Intervals
•Finally, the SEM can be used to determine whether a score is
significantly different from a particular criterion such as a cutoff
score.

•If a person received a score of 105 on the WAIS, that has an SD of


15, a reliability of .97, and an SEM of 2.5, how confident can we be
that repeated testing would not place this person in the gifted range
(130 or above)?

68% confident that the true score lies between____and_____


95% confident that the true score lies between ___and_____
99.7% confident that the true score lies between___and____
Confidence Intervals
In conclusion, the SEM is a statistic that estimates for us just
how fallible, or error-prone tests are.
Confidence Interval
In education, we have long had a tendency to
OVERINTERPRET small differences in test scores since we
too often consider obtained scores to be completely accurate.
Reliability and the SEM
If a test is perfectly reliable (r = 1.00), then a student will always get
exactly the same score, there will be no error and the SEM will be 0.
If the test is not reliable, the SEM will be almost as big as the SD;

Remember:
The SD is the variability of raw scores; the SEM is the variability of
error scores.
Sources of Error
•Error Within Test-Takers (Intra-Individual Error)
These include any within-student factors that would result
in obtained scores being lower or higher than true scores.

Error Within the Test


This error is within-test and can include: trick questions;
reading level too high; ambiguous questions; grammatical cues in
the items; items too easy or too difficult; and poorly written items.
Error in Test Administration
This error includes the following:
•Physical comfort
•Instructions & explanations- Different test
administrators provide different amounts to test takers.
•Test administrator attitudes - Administrators differ in
the notions they convey about the importance of the
test, the extent to which they are emotionally supportive
of students, and the way in which they monitor the test.
Error in Test Administration
Error is Scoring

Computer scoring has decreased this source of error.


But teachers and administrators can still make mistakes on
answer keys; students don’t use #2 pencils or make stray marks;
and hand scoring can lead to error.
Sources of Error Influencing Various Reliability
Coefficients
Test-Retest Reliability

If test-retest coefficients are determined over a short time, the effects


of within-student error should be small.
What about sources of:
within test error ?
error in administration?
error in scoring?
Sources of Error
Alternated-forms reliability
Since this form of reliability is determined by administering two
different forms of the test to the same group close together in
time, the effects of within-student error should be small.
Sources of Error

Internal consistency
With this type of reliability, neither within-student nor within-
test sources of error will exert an influence, since only 1 test is
given one time. The same goes for administration and scoring
error.
Band Interpretation
John’s scores on end of year achievement test

Sub-tests Scores
Reading 103
Listening 104
Writing 105
Social Studies 98
Science 100
Math 91
Band Interpretation
How large a difference do we need between test scores to
conclude that the differences represent real and not chance
differences?
We can use the SEM to answer these questions, using a technique
called BAND INTERPRETATION.
1. First, determine the SEM for each sub-test.
2. Add and subtract the SEM for each sub-test score.
Band Interpretation
3. Graph each scale- Shade in the bands to represent the
range of scores that has a 68% (or 95%) chance of capturing
John’s true score.

4. Interpret the bands- Interpret the profile of bands by


visually inspecting the bars to see which band overlap and
which do not.
Band Interpretation
Using the 68% band-- those bands that overlap probably
represent differences that occurred by chance.

In John’s case, his the difference between math and the


other sub-tests, and Social Studies and Writing represent
real differences (because there is no overlap).
Band Interpretation
What happens if we take a more conservative approach by
using the 95% band?

Since the bands are larger with 95% approach, the only real
difference we find at the 95% level are between John’s math
achievement and his achievement in listening and writing.
Band Interpretation

All the other bands overlap, suggesting that at the 95% level the
differences in obtained scores are due to chance.
If we employ the more conservative approach, we would conclude
that even though the difference between John’s obtained reading and
math scores is 12 points (103-91= 12), the difference is due to
chance, not to a real difference in achievement.
Band Interpretation
To make it simpler, let differences at the 68% level be a signal to
you. Let differences at the 95% level be a signal to the school and
to parents.

Also, use the 95% approach when determining real differences


between a student’s potential for achievement (aptitude) and
actual achievement.

You might also like