Characteristics of Quality Educational Assessments

Characteristics of Quality
Educational Assessments

Northwest Evaluation Association
Nicole A. Zdeb

1 | P a g e

Characteristics of Quality Educational Assessments

Assessment literacy involves understanding that assessments are designed to provide teachers,
students, parents, and other stakeholders the information needed to make decisions that will support
students on their educational paths. Validity, reliability, fairness, student engagement, and
consequential relevance are traits of high quality educational assessment, whether large-scale, high-
stakes, or classroom-based. Assessment designers strive to create assessments that show a high degree
of fidelity to these traits. It is a path of continuous improvement as we learn more about how students
learn, what motivates them, and how assessment can best help teachers and students.

One of the most important characteristics of a quality educational assessment is content validity. This
simply means that the assessment measures what it is intended to measure. Content validity is
sometimes referred to as construct validity, and it most relates to the concept of accuracy.
There are two main considerations with content validity. One is that the content itself is correct,
accurate, and agreed upon by the majority of practitioners in the field. This means, first of all, that there
are not errors, misconceptions, or esoteric understandings embedded in the content. Areas of
unresolved controversy and trivialittle bits of knowledge or information that might be accurate, but
are not widely known or taughtshould also be avoided.
The other main consideration involves designing the assessment to measure only the intended learning
in a specific content area and not something else incidentally, such as reading comprehension or prior
background knowledge. If the assessment is about math, then reading comprehension issues should not
interfere with a students ability to demonstrate what he or she knows, understands, and can do.
Content validity is supported in a number of ways in large-scale assessments, such as:
General assessment design principles that control for readability
Many content expert review cycles
Evidence centered design methodology
Statistical analysis of student performance on test items
Content Validity: The assessment measures only what it is supposed to measure: the
intended learning targets. The assessment enables learners and educators to make accurate
inferences about what the learner understands, knows, and can do.

2 | P a g e

For informal, classroom-based assessment, one way to ensure content validity is to answer the guiding
questions:
1. What knowledge or skills does the student most need to perform successfully on this
assessment?
2. What is this assessment asking the student to demonstrate?
3. If the student performs successfully on this assessment, what does that mean?
4. How closely does what the assessment measures match the intended (instructed) content?
Content validity is foundational to making accurate inferences. If one is unclear about what the
assessment is measuring, then the inferences made will be muddy, weak, and essentially uninformative,
which means that the assessment has failed in its prime directive: to provide valuable information. An
assessment can have all sorts of bells and whistles, incorporate cutting edge technology and
functionality, have a great suite of reports that tell a compelling assessment narrativebut if it does not
have content validity, it is not worth much. That is why content validity is central to a high quality
educational assessment.

Reliability is another key concept in educational assessment; in fact, it is a virtue for all measurement
tools. You want your bathroom scale to be consistent day to day, and person to person; likewise, you
need to be able to trust the measurement given to you by your thermometer to track whether your
fever is getting worse or better.
Reliability is particularly an issue for high-stakes assessments such as end of course tests required for
progressing to the next grade or course, state summative assessments, high school graduation exams,
and college readiness assessments, but it is important for all large-scale assessment. It is a trait
achieved through statistical analysis in a process called equating. Equating is one of the many behind-
the-scenes functions performed by psychometricians, folks trained in the statistical measurement of
knowledge.
Informal, classroom based, teacher-created assessments generally do not directly engage with the
concept of reliability as it does require advanced statistical analysis, but they do informally engage with
the concept. When a student has to take a make-up test, for example, the test should be approximately
as difficult as the original test. There are many such informal assessment examples where reliability is a
Reliability: Reliability is concerned with making sure that different test forms in a single
administration are equivalent; that retests of a given test are equivalent to the original test;
and that test difficulty remains constant year to year, administration to administration.

3 | P a g e

desired trait. In fact, it is hard to conceive of a situation where reliability would not be a desired trait.
The main difference is how it is tracked. For informal assessments, professional judgment is often called
upon; for large-scale assessments, it is tracked statistically.

Fairness in educational assessment is an issue that came to light on the heels of the civil rights
movement in the early seventies. Since then, there have been great strides in assessment development
practices to ensure an assessment experience that is as fair as possible to the largest possible population
of students. Why fairness in testing is important is self-evident: every student deserves an equal
opportunity to demonstrate what he or she understands, knows, and can do. It is an ethical imperative,
and in the case of summative assessment providers, a legal imperative, that assessments are culturally
inclusive, accommodating to students with special physical or cognitive needs, and accessible to
students for whom English is not a first language. In short, that they are fair.
The issue of fairness in testing can be subdivided into three distinct categories: cultural sensitivity, bias,
and accessibility to special populations, such as English Language Learners and special education
students.
An assessment that demonstrates cultural sensitivity respects diversity, strives to fairly represent gender
in non-stereotypical ways, and contains content that a student from anywhere in the country from any
socio-economic strata would have access to understanding.
Cultural sensitivity is more about including content, scenarios, and contexts that are relevant to people
from all sorts of different backgrounds and perspectives than it is about policing content and
bowdlerizing, or sanitizing, it. Cultural sensitivity is a qualitative trait that is ensured through rigorous
reviews during large-scale assessment development, often including the use of rubrics and checklists.
For state summative assessment providers, cultural sensitivity is ensured through well-documented
external panel reviews by folks who represent a wide swath of the constituency of the state.
Formative, classroom-based, and teacher-developed assessments can foster cultural sensitivity by
intentionally and explicitly reviewing for it using professional judgment and easily available checklists.
Bias is when a group of students has an unfair advantage and that advantage is statistically observable.
Unfair advantages do not include things such as better preparation, higher aptitude, or ease with test
taking. Unfair advantages can come from many different directions. They can occur when students have
Fairness: All students regardless of their individual characteristics have the same chance to
show what they understand, know or can do. Nothing about the assessment is systematically
unfair to a group of students based on gender, culture, geographical location, linguistic
heritage, physical capabilities, etc.

4 | P a g e

not had the Opportunity to Learn (OTL) the content, or when the content matter privileges a certain
kind of background knowledge or experience.
Bias is not a trait that classroom or informal assessment tracks since it is revealed through analyzing the
response patterns of various testing populations and looking for statistically meaningful deviation from
the general spread of response patterns. Ferreting out bias is another service provided by
psychometricians when they are parsing and making sense of student response data.
Accessibility for special education students and English Language Learners is a fundamentally different
issue than sensitivity and bias, but it relates to the same organizing concept: fairness in assessment.
Accessibility for state summative assessments is legislated because it directly addresses the rights of
individuals based on the legal premise that every American student has the right to a quality public
education.
Accessibility in assessment translates into the tools, assists, devices, and accommodations that are
allowed so that students can either take the same test as their peers, or have an equivalent assessment
experience. On a classroom level, teachers are acutely aware when issues of accessibility due to
linguistic, physical, cognitive, or emotional capabilities arise. In a school ecosystem, there are teams of
support providers, including classroom and special education teachers, tutors, school psychologists, case
workers and social services personnel focused on ensuring that students have equal access to the same
educational opportunities as their peers.

Student engagement and motivation are somewhat intangible traits. Engagement is not synonymous
with enjoyment. A student might be engaged in the assessment, but still not enjoy the activity of being
assessed. Engagement speaks to the effort that the student is putting forth in the assessment. The idea
is that the better effort the student is putting forth, the more likely the results of her effort (as
evidenced by her assessment performance) will give a representative snapshot of what she
understands, know, and can do.
Student engagement is tracked by many different indicators depending on the type of assessment and
its delivery. Some indicators might include:
Student response patterns on the infamous bubble tests. If a student has an abnormal
response pattern (one that seems artificial, such as AAAABBBBCCCCC), then the test will likely
be considered invalid.
Student Engagement and Motivation: The assessment provides an accurate picture
of what students understand, know, and can do because students are motivated to produce
their best work.
5 | P a g e

With assessments delivered via a computer, student response time can be measured. If a
students response time is consistently shorter than the average response time range, their test
will likely be invalidated.
With assessments that require a written response, such as an extended essay, student
engagement might be tracked by the length of their response. Students submitting a response
that falls below expectation in word count, for example, might be invalidated.
Engagement can be observed through response patterns, word count, time spent on the item, etc., but
motivation can only be inferred. Engagement and motivation are not the same thing. A student
demonstrates engagement because of his or her motivation. A student can be motivated to do his or her
best in an assessment experience for a variety of reasons, generally organized into an external/internal
schema. External motivations to demonstrate engagement might come from prizes or rewards tied to
assessment performance, or, conversely, punishments or consequences such as extra study sessions or
more homework. Internal motivation to demonstrate engagement might come from the desire to
garner positive attention or praise, or the competitive urge to be best in class.
Classroom teachers in many ways have an advantage when it comes to gauging student engagement
and understanding the motivating, and demotivating, factors that might be in play for their students.
During formative assessment practice and classroom assessment, teachers can visibly see if students are
engaged or not and adjust midstream. Teachers can work with the students and support the factors that
motivate them, and help them navigate through the factors that demotivate them.

Teachers, educators, and stakeholders are constantly making decisions. Some decisions are small, local,
and inconsequential; others have far-reaching implications on a students educational experience. Some
stakeholders, such as superintendents, make decisions from assessment data that impact entire districts
and communities.
Assessments are given for many reasons, but foundationally, the reasons are all the same: assessments
intend to help answer questions and inform decisions. These questions can range from whether a
student has shown proficiency on a state summative exam, or whether students have performed well
enough to earn college credit via an AP exam.
Consequential relevance: The usefulness of the assessment results justifies the
investment of time and effort in administering and scoring the assessment, and then
understanding and meaningfully applying the information to adjust instruction and better
support student learning.

6 | P a g e

The usefulness of an assessment resides in the usefulness of its data to help inform decisions. Another
aspect of usefulness is how well the results of the assessment-based inferences relate to other data. For
example, the SAT test is considered relevant because it has been a reasonable predictor of freshman
grades.
For data to be useful, one needs to understand what it means: what it is measuring, what kind of
inferences can be made from it, and what kind of decisions the data can inform. You wouldnt ask a ruler
to tell you how much you weigh or to inform your decision to modify your sleeping habits. A ruler is a
good toolinarguably usefulbut not for those purposes.
How do the educational assessments you use reflect these traits?
Still have questions? Ask an expert! We are here to provide you with the information and support you
need to understand assessment and unleash its power. Please send any queries to info@nwea.org. Put
assessment literacy question into the subject line and your question will be forwarded to an
assessment specialist.

Characteristics of Quality Educational Assessments

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Characteristics of Quality Educational Assessments

Uploaded by

Copyright:

Available Formats

Characteristics of Quality

You might also like