Professional Documents
Culture Documents
Untuk Menilai
Proses
Oleh :
Markus Simeon K.
Maubuthy
Definition
Target
Compone
nt
+/Validity &
Reliability
Examples
Clarifying
Performan
ce
Developin
g
Performan
ce
Exercises
Scoring
and
Recording
Result.
Definition
Target
Compone
nt
+/Validity &
Reliability
Examples
Clarifying
Performan
ce
Developin
g
Performan
ce
Exercises
Scoring
and
Recording
Result.
Rating Scale
Anecdotal
Record
Mental Record
List or key
attributes of
good
performance
checked present
or absent
Performance
continuum maped
on several point
numerical scale
ranging from low to
high
Student
performance
is described
in detail in
writing
Assessor store
judgments and/or
descriptions of
performance in
memory
Quick, useful
with large
number of
criteria
Can record
judgment and
rationale with one
rating
Can provide
rich potraits
of
achievement
Time
consuming
Difficult to retain
accurate
Definition
Menentukan indikator
Target
Compone
nt
+/Validity &
Reliability
Examples
3
4
Thorndike (1971:249)
Scientific Method of Thinking: Ideas logically organized,
well-planned, complete and concise story, thorough and
accurate research, evidence that student knows the
subject thoroughly.
Effective, Dramatic Presentation of Scientific Truths:
Idea of exhibit should be clearly and dramatically
conveyed. Explanatory lettering should be large, neat,
brief.
Creative Ability: Exhibit should be original in plan and
execution. Clever use of salvage materials in important.
New applications of scientific ideas are best of all.
Technical Skills: workmanship, craftmanship, rugged
construction, neatness, safety, lettering, etc.
Gronlund (1985:384)
Skill
Work
habits
Social
attitudes
Scientific
attitudes
Interest
Appreciatio
n
Adjustment
Stiggins (1994:171-174)
Target
Compone
nt
+/Validity &
Reliability
Examples
Caution!
The only target for which performance assessment is
not recommended is the assessment of simple
elements or complex components of subject matter
knowledge to be mastered through memorization.
Validity &
Reliability
Examples
Target
Compone
nt
+/Validity &
Reliability
Examples
omponents |
Restricted
Types
Definition
Extended
Task
Target
Components
Task
description
Task question
Component
Components
Graphic
Qualitative
Rubric
Examples
Numerical
Rating Scale
+/Validity &
Reliability
Performa
Criteria
Holistic
Analitic
Types
General
Specific
Description
graphic
Definition
A performance task is an assessment activity that requires a
student to demonstrate her/his achievement by producing
an extended written or spoken anwer, by engaging in group
or individual activities, or by creating a specific product.
The performance task you administered may be used to
assess product the student produces and/or the process a
student uses to complete the product.
(Nitko & Brookhart, 2007:244).
Performance task is what student are required to do in the
performance assessment, either individually or in group.
(McMillan, 2007:239)
Types of Task
Target
Component
Extended-type tasks are more complex, elaborate, and timeconsuming. Extended type tasks often include collaborative work
with small groups of student. The assignment usually requires that
+/Validity &
Reliability
Examples
Target
Component
+/-
Validity &
Reliability
Examples
Develop
Task and
Context
Description
Generate or
Identify
Idea for a
Task
Develop
Task
Question or
Prompt
(McMillan, 2007:239)
Component
Content
standards
Information Processing
Standard
Habits of Mind
Standard
+/Validity &
Reliability
DRAF
T1
DRAF
T2
DRAF
T3
Effective
Communication
Standard
DRAF
T4
Examples
Complex Reasoning
Standard
Collaboration/Cooperative
Standard
1.
Essential
The task fits into the core of the curriculum. It represents a big idea
2. Authentic
The task should be authentic. Wiggins (1998) as cited in (McMilan, 2007:244)
suggests standard for judging the degree of authenticity in an assessment
task as follows:
1) Realistic. The task replicates the ways in which a persons knowledge
and ability are tested in real-word situation.
2) Requires judgment and innovation. The student has to use knowledge
and skills wisely and effectively to solve unstructured problem, and the
solution involves more than following a set routine or procedure or
plugging in knowledge.
3) Ask the student to do the subject. The student has to carry out
exploration and work within the discipline of the subject area, rather than
restating what is already known or what was taught.
4) Replicates or stimulates the contents in which adults are tested in
workplace, in civic life, and in personal life. Contexts involve specific
situation that have particular constraints, purposes, and audiences.
5) Assesses the students ability to efficiently and effectively use a repertoire
of knowledge and skill to negotiate a complex task. Students should be
required to integrate all knowledge and skills needed.
6)
3.
4.
5.
Component
+/-
Validity &
Reliability
Examples
Step
1
Reflective brainstorming
Step
2
Component
Step
3
+/-
Step
4
Contrasting
Validity &
Reliability
Step
5
Describing success
Examples
Step
6
Target
(Stiggins, 1994:181-186)
Target
Component
+/Validity &
Reliability
Examples
Qualitative
Uses verbal description to indicate different level
Never, Seldom, Occasionaly, Frequently, Always
Excellent, good, fair, poor
(McMilan, 2007:250-252)
Target
Component
+/Validity &
Reliability
Examples
(Gronlund, 1985:391-392)
Target
Component
+/Validity &
Reliability
Examples
Step
5
Top-Down Approach
The top-down approach begins with a conceptual frame-work that you can
use to evaluate students performance to develop scoring rubrics; follows
these steps:
1. Adapt or create a conceptual framework of achievement dimensions
that describes the content and performance that you should assess.
2. Develop a detailed outline that arrange the content and performance
from step 1 in a way that identifies what you should include in the
general rubric.
3. Craft a general scoring rubric that conforms to this detailed outline and
focuses on the important aspects of content and process to be
assessed across different tasks. It can be used as is to score student
work, or it can be used to craft specific rubrics.
4. Craft a specific scoring rubric for the specific performance task you
are going to use.
5. Use the specific scoring rubric to assess the performances of several
students; use this experience to revise the rubric as necessary.
Bottom-Up Approach
With the bottom-up approach you begin with samples of students work,
using actual responses to create your own framework. Use examples of
different quality levels to help you identity the dimensions along which
student can be assessed, follow these steps :
1. Obtain of about 10 to 12 students actual responses to a performance
item. Be sure the responses you select illustrate various levels of
quality of the general achievement you are assessing.
2. Read the responses and sort all of them into three groups: high-quality
responses, medium-quality responses, and low-quality responses.
3. After sorting, carefully study each students responses within the
groups, and write very specific reasons why you put that responses
into particular group.
4. Look at your comment across all categories and identify the emerging
dimensions.
5. Separately for each of the three quality levels of each achievement
dimension you identified in step 4, write a specific student-centered
description of what the responses at the level are typically like.
Component
+/-
Validity &
Reliability
Examples
Leniency error occurs when a rater tends to make almost all rating
toward the high end of the scale, avoiding the low end.
Severity error is the opposite of leniency error.
Central tendency error occurs when a rater hesitates to use
extremes and uses the the middle part of the scale only.
Halo effect occurs when a raters general impression of a person
influences the rating of individual characteristics.
Personal bias occurs when a rater tends to rate based on
inapropriate or irrelevant stereotypes favoring boys over girls,
whites over blacks, etc.
A logical error occurs when a rater gives similar ratings on two or
more dimensions of performance that the rater believes are
logically related but that are in fact unrelated.
Rater drift occurs when the raters, whose ratings originally agreed,
begin to redefine the rubrucs for themselves.
Reliability decay is related error: Immediately after training, raters
apply the rubrics consistently across students and mark
consistently with one another. However, as time passes, the ratings
become less consistent.
trengths|Performance Assessment
Definition
Target
Component
Strength &
Weakness
Validity &
Reliability
Examples
eaknesses|Performance Assessment
Definition
Target
Component
Strength &
Weakness
Validity &
Reliability
Examples
Validity|Performance Assessment
Definition
Target
a. Validity
b. Reliability
Component
c.
d.
e.
f.
+ /Validity &
Reliability
Examples
Objectity
Ease of administering
Ease of scoring
Ease of interpreting
g. Adequate norms
h. Equivalent forms
i. Economy
Noll, et al. (1979:90)
Validity|Performance Assessment
Definition
Target
Component
+ /Validity &
Reliability
Examples
Validity|Performance Assessment
Source of Information for Validity
Definition
Target
Component
+ /Validity &
Reliability
Examples
Content-related evidence
The extent to which the assessment is representative of the domain
of interest.
When a teacher gives a test that measures appropriately the content
and behavior that are the objectives of intruction.
Criterion-related evidence
The relationship between an assessment and another measure of
same trait. Generally based on agreement between the scores on a
test and some outside measure, called criterion.
Constuct-related evidence
Is concerned with the question of how well differences in test scores
conform to predictions about characteristics that are based on an
underlying theory or construct. Judgments of construct validity are in
fact most often based on a combination of logical analyses and an
accumulation of empirical studies.
Validity|Performance Assessment
Definition
Target
Component
+ /Validity &
Reliability
Examples
eliability|Performance Assessment
Definition
Target
Component
+ /Validity &
Reliability
Examples
Definition
eliability|Performance Assessment
Definition
Target
Component
Estimating Reliability
The reliability of ratings is an important criterion for evaluating performance
assessments.
Estimating reliability Over Time
Test-retest
Alternate forms on different occasions
Estimating Reliability on a Single Occasion
+ /Validity &
Reliability
Examples
Alternate forms
Coefficient alpha
Split-halves coefficient
Estimating Scorer Reliability
eliability|Performance Assessment
Estimating Reliability
Definition
Target
Product Moment
Component
+ /Validity &
Reliability
Examples
Spearman-Brown
Flanagan
Rulon
eliability|Performance Assessment
Estimating Reliability
Definition
KR20
Target
KR21
Component
+ /Validity &
Reliability
Examples
Alpha coeficient
Kappa coeficient
Validity|Performance Assessment
How to improve the reliability of ratings
Definition
Target
Component
+ /Validity &
Reliability
Examples
Example|
References
Arikunton, S. (2012). Dasar-dasar evaluasi pendidikan (Ed. 2). Jakarta: Bumi Aksara
Definition
Target
Component
+ /Validity &
Reliability
Examples