You are on page 1of 33

GM

HAY

SWOTC

AGG

EPSRE

IA

More often than not teachers prepare, administer, and score a test, return the test papers to their students, possibly discuss the test, and then either file or discard the test. Ebel (1965) believes that one of the common mistakes of teachers is that they do not check on the effectiveness of their test. Some of the probable reasons are:

1. Teachers feel that test analysis is too time-consuming; 2. Teachers are not aware of the methods of analysing tests; 3. Teachers do not always understand the importance of accurate evaluation.

After this training the select Master Teachers 1 are expected to: 1. Give the importance of item analysis. 2. Define difficulty index and discrimination index. 3. Compute and interpret difficulty and discrimination indices. 4. Enumerate the different ways of establishing test validity and reliability. 5. Try out the test items prepared. 6. Discuss the reasons for interpreting test results.

Trying Out the Test A test cannot be considered good unless it is tried out. The ultimate judge of the test is the user.

A. First Tryout

The test is tried out on a larger sample. The physical condition should be as comfortable as possible and the examinees should be as relaxed as possible. The main purpose of the first tryout is for item analysis- the process of examining the students responses to each test item. Specifically, what one looks for is the difficulty and discriminating ability of the item as well as the effectiveness of each alternative.

The analysis of student response to objective test items is a powerful tool for test improvement. Item analysis indicates which items may be too easy or too difficult and which may fail for other reasons to discriminate clearly between the better and the poorer examinees. Item analysis sometimes suggests why an item has not functioned effectively and how it might be changed to improve it. A test composed of items revised and selected on the basis of item analysis data is almost certain to be a much more reliable test than one composed of an equal number of untested items

Item Analysis Procedure One method that can be used for item analysis is the U-L index Method.

A procedure simple enough to be used regularly by classroom teachers but complete and precise enough to contribute substantially to test improvement is outlined below.

1. 1. Arrange the scored test or answer sheets in order of score, from high to low. 2. Separate two subgroups of test papers, a higher group consisting of approximately 27% of the total group, who received highest scores on the test, and a lower group consisting of an equal number of papers from those who received lowest scores. 3. Tally the number of times each possible response to each item was chosen on the papers of the highest group. Do the same separately for the papers of the lower group. 4. Add the counts from the higher and lower groups to the keyed correct response. Divide this sum by the maximum possible sum. The result is an index of item difficulty (p). 5. Subtract the lower group count of correct responses from higher group count of correct responses. Divide this difference by the maximum possible difference. This quotient, expressed as a decimal fraction, is the index of discrimination (D).

Formula for p and D

P = RH + RL N
Where:

D =RH RL 1/2 N

p = difficulty index D = discrimination index RH = number of pupils in the higher (H) group who answered the item correctly RL = number of pupils in the lower (L) group who answered the item correctly N = number of pupils in the H-group and Lgroup (number of cases)

6. The values obtained from the computations for each item (difficulty and discrimination indices and effectiveness of distracters) are interpreted using the following criteria:

DIFFICULTY Interpretation INDEX 0.76 and above Very easy 0.26 to 0.75 Moderate 0.25 and below Very Difficult

DISCRIMINATION INDEX 0.40 and above 0.20 to 0.39 0.19 and below

Interpretation High Satisfactory Low

ITEM CATEGORY
Item Category Good Conditions Recommendation on the Test Item Include as is

Fair

Poor

High discrimination index (0.40 and above) Moderate difficulty index ( 0.26 0.79) Satisfactorily discrimination index (0.20 Include as is, or revise to 0.39) Revise Moderate or somewhat high or low difficulty index (0.80 or higher, 0.20 or lower) Lower discrimination index (0.16 0.19) Very low discrimination index ( below 0.16) Do not include Extreme difficulty index, too easy (0.80 and above) Too difficult (0.20 and below)

Effectiveness of Distracters
1. Each distracter should be selected by equal number of members of the lower groups. 2. Substantially more students in the higher group than the lower group should respond to the correct alternatives. 3. Substantially, more students in the lower group should respond to the distracters.

Carrying the inspection one step further, once can tell how effectively each distracter is operating. The formula for the discrimination index can be employed to calculate an index of effectiveness for each alternative. Options A (correct response) B C D U (27%) 9 3 2 1 L (27% ) 3 4 5 3 D .40 -.07 -.20 -.13

Analysis: The discrimination value for the correct response (a) is the discrimination index as defined. The positive D value for the correct response shows that the item has been effective in identifying high-scoring students. Distracters b, c and d are functioning approximately and as intended, since they attract a large proportion of the higher group. The negative values for alternatives b, c and d indicate that more low-scoring students chose each distracter than the higher scoring students which is what good distracter should do.

ANALYSIS OF STUDENT SELECTIONS

1. In the event that students in the upper group select an incorrect option for a multiple-choice item nearly as often as the correct option for an item, the teacher is alerted to some difficulty in the adequacy of the item. Apparently the item is vague, ambiguous, undefined or not specific enough to allow the better students to respond correctly. For some reason the students apparently lack understanding of the items content because of lack of knowledge in the content area of a poorly written item.

2. When an incorrect option to an item is selected by a large number of students in the higher group the possibility exists that the item may be keyed incorrectly. The teacher should scan the test key for all items which have been missed by a large number of students to check for this possibility

3. For a good test item, the incorrect choices are in the lower group and should be distributed about equally among the distracters. Ideally, none of the higher group would choose a distracter, and all of the lower group would, however, in practice this is a standard rather than expectation.

4. Some of the general guidelines for effective item writing, as well as the simpler statistical techniques of item analysis, can materially improve classroom tests and are worth using even with small groups.

B. Second Tryout After analysing the results of the first tryout, test items are usually revised for improvement. After revising those items which need revision, another tryout is necessary. The revised form of the test is administered to a new set of samples. The same conditions as in the first tryout are followed. After the tryout, another item analysis is done. This is to find out if the test items revised improved in terms of the difficulty index and discrimination index.

C. Third or Final Tryout Usually after two revisions the test is considered ready to be in its final form. The test now is good in terms of the difficulty and discrimination indices. The test is ready to be tested for reliability and validity.

RELIABILITY Reliability can be defined as the degree of consistency between two measures of the same thing. Tests are reliable when they can be reproduced. Confusing or ambiguous test items may mean different things to a test-taker at different circumstances. In addition, tests may be too short to adequately determine the abilities being tested.

If a test yields different results when it is administered on different occasions or when scored by different people, it lacks reliability.

Ways of testing for reliability There are a least four methods used in testing reliability. These are (1) test-retest method or measures of stability; (2) parallel-forms method or measures of equivalence (3) split-half method, and (4) internal consistency method

Test-Retest Method
This means that same test is administered twice to the same group or sample. The correlation coefficient is determined on the test scores. There are, however, built-in limitations to the method such as (1) when the time interval is short, the subject may recall his previous responses, resulting to a high correlation coefficient: (2) when the time between testing is too long, such factors as unlearning (pattern is destroyed), forgetting and maturation, among others, may affect test results, thereby resulting in a low correlation coefficient for the two tests.

Split-Half Method It requires that the same test yield two sets of scores for every student. These are called the odd and even number scores. When the scores obtained are correlated (odd-even scores), the result has a reliability coefficient for a half-test.

Internal Consistency Method

An internal consistent pattern in test-taking is described as a tendency of an individual to obtain a passing score for easier items and to fail in more difficult ones.

VALIDITY A test is valid when it measures what it aims to measure. There are three kinds of validity. They are the content validity, criterion-related validity and construct validity. Content Validity is related to how adequately the content of the test samples the domain about which inferences are to be made. Criterion-related Validity pertains to the empirical technique of studying the relationship between the test scores and some external measures. Construct Validity is the degree to which the test scorers can be accounted for by certain explanatory constructs in psychological theory. If an instrument has construct validity, peoples scores will vary as the theory underlying the construct would predict.

GL

TYVMFL

You might also like