You are on page 1of 32

Where Did Your Cut Score Come From?

: A Primer in Standard Setting


Dr. Susan Gracia Click to edit Master subtitle style Director of Assessment Simmons College Boston, MA

March 27, 2012

Cut Score

A selected point on the score scale of a test. The cut score separates students into various categories, such as a passing and failing. The process of setting a cut score is called standard setting, which is a multi-stage, judgmental process.
March 27, 2012 22

Nomenclature

Cutoff scores Achievement levels Mastery levels Classification scores Passing scores Criterion levels Performance levels Cut points Performance Cut scores standards Standards Cutscores
March 27, 2012
(ETS, 2005)

Cutting scores
33

Determine the Need for Cut Scores

Why is there a need to set cut scores? What benefits are expected from the use of cut scores? What decisions will be made on the basis of the cut scores? How are those decisions being made now in the absence of cut scores?
March 27, 2012

What reasons are there to believe 44

There is no true cut score

Theres no objective way to set cut scores. Cut scores are constructed, not found. They are based on judgments about people, student work, or test items. Judges depend on subjective, internalized norms about what people can do.
55

March 27, 2012

Errors of Classification

Some people who deserve to pass will fail. Some people who deserve to fail will pass. Raising or lowering the cut score to reduce one type of error will increase the other type of error. Theres no way to prove that a cut score is correct.
66

March 27, 2012

To Do for All Methods


1. 2. 3.

Select judges. Teach judges about cut scores. Define borderline knowledge/skills for a particular task/assessment.
1.

Borderline = Minimally competent

4.

Train judges in the use of the method. Implement the cut score study. Document the results.
77

5. 6.

March 27, 2012

Judges

Judges need to:

Be qualified Know subject and population Be representative and diverse Be acceptable to stakeholders Be willing to follow procedures

Teach them about test purpose, cut score purpose, consequences of passing March 27, 2012 and failing 88

Defining Borderline
1.

Make sure judges understand what the assessment measures and how scores will be used.

2.

Ask judges to describe in their own words a person whose knowledge and skills would represent the borderline between acceptable and unacceptable levels of knowledge/skills the assessment measures. March 27, 2012 99

Examples of Borderlines

Worst reader who should get a diploma / best reader who should not Worst surgeon who should be licensed / best surgeon who should not Worst essay that deserves a score of 6/ best essay that deserves a 5 Highest bacteria count in safe water / March 27, 2012 1010 lowest bacteria count in polluted

Judge Training

Overview of the standard setting method: What is it? Why do it? How is it done? Practice using the standard setting method. Observe judges. Correct errors. Answer all questions.
27, 2012 1111 Practice more until all judges are

March

Selecting a Method

Theres no one best method. Different methods will result in different cut scores. Some methods are better for certain types of assessments and assessment situations.

March 27, 2012

1212

3 Standard Setting Methods


Methods Based on Best judgments about: Test items Test that is only or primarily multiple choice Modified Angoff Method

Contrasting Groups Method

People

Small assessment situations where faculty have strong knowledge of students abilities

Body of Work Method March 27, 2012

Student work

Performance tasks 1313

Modified Angoff Method

Judges examine each question on an assessment and estimate the probability that a borderline student would answer the question correctly.

Or: Imagine a group of 100 borderline students and estimate how many of them would answer the question correctly.

March 27, 2012

Probabilities will range from .00 to 1.00.


1414

Modified Angoff Method

Discuss probabilities. Aim for range of 10-15 % points per question. Judges can change judgments based on discussion. Sum each judges probabilities for all questions to get a recommended cut score for each judge. Average the judges recommended cut scores to arrive at a average cut March 27, 2012 1515 score for minimum competency.

Modified Angoff Example

Judge 1

March 27, 2012

Source: Livingston &

1616

Modified Angoff Example


Judge 1 2 3 4 5 6 7 8 9 10 Recommended Cut Score 5.80 6.00 6.00 5.40 5.00 5.30 5.50 4.80 6.10 5.50 Average Cut Score for 5.54 Minimum Competency March 27, 2012 1717

Modified Angoff

A lot Cons Most researched Pros of data entry method Difficult Stands up to cognitive task court challenges Does not require work well with heavily student data so openit can be carried ended/performa out prior to test nce-based tests administration
1818

March 27, 2012

Contrasting Groups Method

Faculty consider all they know about a population of students. Faculty predict individual students level of performance on an assessment (e.g., expected passing/failing, proficient/nonproficient) without reference to scores. Obtain assessment scores.
March 27, 2012

Predictions are compared to actual

1919

Contrasting Groups Example

March 27, 2012

2020

Another Example

Graph the scores of expected passing and expected failing students in 2 separate distributions. The point at which the 2 distributions intersect is the cut score.

March 27, 2012

2121

Contrasting Groups

Inconvenient to Cons Uses Pros real data, get judgments not conjecture of people Uses external Relies on info in validationhuman judgment addition to test without scores examining actual Easy to explain test performance Need scores before cut is set 2222

March 27, 2012

Body of Work Method

Pre-work:

Performance level categories, as well as performance level descriptors for each performance level, must be established and agreed upon. Scoring of a large sample of student work must occur before standard-setting can begin.

Select 40 to 50 intact samples of student work to represent the range of Marchstudent performance on an assessment. 27, 2012 2323

Body of Work Method

Range-Finding phase:

Identical sets of student work are provided to each judge. Judges are asked to independently categorize the student work samples based on the performance level descriptors, without any discussion.

This process reveals which work samples (e.g., Graduation Portfolios) generate the most agreement and which Marchgenerate the most disagreement 27, 2012 2424

Examination of Student Work Method

Pinpointing phase:

Judges examine sets of work about which they disagreed in the rangefinding phase, along with additional work samples representing those same score intervals. Judges assign performance levels to these work samples.

The minimum score for each performance level is precisely March"pinpointed" by determining the score 27, 2012 2525

Body of Work

Need Cons Uses scores Pros real data, before cut is set not conjecture Requires a lot More intuitive;of prior panelists are not preparation, asked to imagine volumes of a hypothetical materials minimally competent Grueling work examinee or to estimate the
2626

March 27, 2012

Setting the Operational Cut Score Only a legally authorized entity (e.g., policy makers) can authorize the use a cut score. Once authorized, a study cut score becomes an operational cut score.
2727

March 27, 2012

Complying with the Standards

The Standards of the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education specify that the following be included in the standard setting process:
Document how judges were selected, their qualifications and training, the procedures used, whether or not judges Marchratings were independent, the2828 27, 2012 level of

Observe Cut Score Effects


Seek opinions on cut scores. Find out what happened to people who failed.

Is there evidence that any of them were actually qualified?

Is there evidence that any people who passed are really unqualified? What were the consequences of misclassification errors? March 27, 2012 2929

Comments? Questions?

March 27, 2012

3030

Discussion Questions

How could you use standard setting in your setting? Which standard setting approach might you utilize? Why? What challenges do you anticipate in implementing this approach? How might you address these challenges?
March 27, 2012 3131

References
American Educational Research Association, American Psychological Association, National Council on Measurement in Education. ( Standards for 1111). Educational and Psychological Testing. Washington, DC: American Psychological Association . Cizek, G.J. 1111 ). An NCME instructional module on setting pass scores. ( Educational Measurement: Issues and Practice (1 111 , 11 ), -1 . Cizek, G.J., Bunch, M.B., Koons, H. (W inter 1111 ). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice-1 . , 111 Horn, C., Ramos, M., Blumer, I. & Madaus, G. 1). Cut scores: Results may vary. 1 (1 1 National Board on Educational Testing and Public Policy Monographs, 1(1 ). Chestnut Hill, MA: Boston College. Livingston, S.A. & Zieky, M.J.(1111 . ). Excerpts from Passing Scores: A Manual for Setting Standards of Performance on Education and Occupational . Tests Princeton,NJ: Educational Testing Service. Measured measures: Technical considerations for developing a local assessment system. (1111 ). Augusta, ME: Maine Department of Education. Pitoniak, M (1111 . ).Considerations in Setting Performance Standards ( Cutscores) . Training session at 1111 National Council for Measurement in Education conference, Montreal, Canada.

March 27, 2012

3232

You might also like