Asesmen Kinerja Proses

Penilaian Kinerja
Untuk Menilai
Proses
Oleh :
Markus Simeon K.
Maubuthy
efinition |Performance Assessment

Thorndike (1971:238)
Definition
Target
Compone
nt
+/Validity &
Reliability
Examples
... one in which some criterion situation is simulated to a much

greater degree than is represented by the usual paper and pencil
test.
James H. McMilan (2007:229)

A performance assessment is one in which the teacher observes
and makes a judment about the students demonstration of a skill
or competency in creating a product, constructing a response, or
making a presentation.
Nitko & Brookhart (2007:244)

A performance assessment (a) presents a task requiring
students to do an activity (make something, produce a
report, or demonstrate a process) that requires applying
their knowledge and skill and (b) uses clearly defined
criteria to evaluate how well the student has achieved this
application.

Definition
Target
Compone
nt
+/Validity &
Reliability
Examples
Pedoman MP IPA (Lampiran Permendikbud No. 58 Thn 2014,

465)
Penilaian kinerja atau praktik dilakukan dengan penilaian

kinerja, yaitu dengan cara mengamati kegiatan peserta
didik dalam melakukan sesuatu.

How to Design a Performance Assessment
Definition
Target
Compone
nt
+/Validity &
Reliability
Examples
Clarifying
Performan
ce
Developin
g
Performan
ce
Exercises
Scoring
and
Recording
Result.
Behavior to be demonstrated or Product to be

created?
Individual performance or group performance?
Performance criteria
The ways to cause students to perform in a
manner that will reveal their level of proficiency.
Stuctured assignment or naturally occuring
events
Defines targets, conditions, and standards
Number of exercises.

Definition
Target
Compone
nt
+/Validity &
Reliability
Examples
Clarifying
Performan
ce
Developin
g
Performan
ce
Exercises
Scoring
and
Recording
Result.
Level of detail results (holistik or analytical)

Recording procedures
Checklist
Rating Scale
Anecdotal
Record
Mental Record
List or key
attributes of
good
performance
checked present
or absent
Performance
continuum maped
on several point
numerical scale
ranging from low to
high
Student
performance
is described
in detail in
writing
Assessor store
judgments and/or
descriptions of
performance in
memory
Quick, useful
with large
number of
criteria
Can record
judgment and
rationale with one
rating
Can provide
rich potraits
of
achievement
Quick and easy way

to record
Time
consuming
Difficult to retain
accurate
Result can lack

Can demand
extensive,
depth
Identify the rater

Definition
Menentukan indikator
Target
Memilih fokus asesment
Compone
nt
Memilih tingkat realisme
Memilih metode observer,

pencatatan, dan penskoran
Mengujikan task dan rubtik
Memperbaiki task dan rubrik untuk

pembelajaran berikutnya
+/Validity &
Reliability
Examples
(Ana Ratna Wulan)

When I choose to focus on process?
Definition
Target
Compone
nt
+/Validity &
Reliability
Examples
The steps in a procedure can be

specified and have been explicitly
taught.
The extent to which an individual

deviates from accepted procedure can
be accurately and objectively
measured.
3
4
Much or all of the evidence needed to evaluate

performance is to be found in the way that
performance is carried out and/or little or none of
the evidence needed to evaluate performance is
present at the end of performance.
An ample number of persons are available to

observe, record and score the procedures
used during performance.
argets |Performance Assessment

Definition
Target
Compone
nt
+/Validity &
Reliability
Examples
Thorndike (1971:249)
Scientific Method of Thinking: Ideas logically organized,
well-planned, complete and concise story, thorough and
accurate research, evidence that student knows the
subject thoroughly.
Effective, Dramatic Presentation of Scientific Truths:
Idea of exhibit should be clearly and dramatically
conveyed. Explanatory lettering should be large, neat,
brief.
Creative Ability: Exhibit should be original in plan and
execution. Clever use of salvage materials in important.
New applications of scientific ideas are best of all.
Technical Skills: workmanship, craftmanship, rugged
construction, neatness, safety, lettering, etc.

Definition
Target
Compone
nt
+/Validity &
Reliability
Examples
Gronlund (1985:384)
Skill
Speaking, writting, listening, oral reading, performing

laboratory experiment
Work
habits
Efectiveness in planing, use of time, use of equipment,

use of resources
Social
attitudes
Concern for the welfare of others, respect for laws,

respect for the property of others
Scientific
attitudes
Open mindedness, willingness to suspend judgment,

sensitivy to cause-effect relations, an inquiring mind
Interest
Expressed feeling toward variois educational activity
Appreciatio
n
Feeling of satisfaction and enjoyment expressed toward

an instruction.
Adjustment
Relationship to peers, reaction to praise and criticism,

rection to authority, emotional stability, social
adaptability

Definition
Stiggins (1994:171-174)
Target
Compone
nt
+/Validity &
Reliability
Examples
Knowledge: Use of reference material to acquire knowledge.

Detemine if student have gained control over a body of
knowledge through the proper and efficient use of reference
materials.
Reasoning: Application of that knowledge in variety of problemsolving contexts.
Skill: Proficiency in a range of skill arenas.
Affective: Feelings, attitudes, values, and other affective
characteristics
Caution!
The only target for which performance assessment is
not recommended is the assessment of simple
elements or complex components of subject matter
knowledge to be mastered through memorization.

Definition
Target
Compone
nt
+/Validity &
Reliability
Examples
Marzano, et al (Nitko & Brookhart, 2007:244)

Complex thingking learning target
a. Effectively translate issues and situations into meaningful
tasks that have a clear purpose.
b. Effectively uses a variety of complex reasoning strategies.
1. Comparison
2. Classification
3. Induction
4. Deduction
5. Error Analysis
6. Constructing Support
7. Abstracting
8. Analyzing Perspectives
9. Decision Making
10. Investigation
11. Problem Solving
12. Eksperimental Inquiry

Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

Information processing learning target your ability to
review and evaluate how valuable each source of
information is to the parts of your project.
a. Effectively interprets and synthesizes information
b. Effectively uses a variety of information-gathering
techniques and resources.
c. Accurately assesses the value of information
d. Recognizes where and how projects would benefit from
additional information.

Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

Habits of mind learning target your ability to effectively
define your goal in the assiggment and to explain your plan
for attaining the goal.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Is aware of own thinking.

Makes effective plans.
Is aware of and uses necessary resources.
Evaluates the effectiveness of own actions.
Is sensitive to feedback.
Is accurate and seeks accuracy
Is clear and seeks clarity.
Is open-minded.
Restrains impulsivity.
Takes a possition when the situation warrants it.
Is sensitive to the feelings and level of knowledge of other.
Engages intensively in tasks even when answers or solutions are not
immediately apparent.
13. Pushes the limits of own knowledge and ability.
14. Generates, trusts, and maintains own standarts of evaluation.
15. Generates new ways of viewing a situation outside the boundaries of
standard convention.

Definition
Target
Compone
nt
+/-

Effective communication learning target your ability to
communicate your conclusions and findings.
1. Expresses idea clearly.
2. Effectively communicates with diverse audiences.
3. Effectively communicates in a variety of ways.
4. Effectively communicates for a variety purposes.
5. Creates quality products.
Validity &
Reliability
Examples

Definition

Collaboration/Cooperation.
1. Works toward the achievement of group goals.
Target
Compone
nt
+/Validity &
Reliability
Examples
2. Demonstates effective interpersonal skill

3. Contributes to group maintenance.
4. Effectively performs a variety of roles within a group.
Content learning target your understanding of the subject.
omponents |
Restricted
Types
Definition
Extended
Task
Target
Components
Task
description
Task question
Component
Components
Graphic
Qualitative
Rubric
Examples
Numerical
Rating Scale
+/Validity &
Reliability
Performa
Criteria
Holistic
Analitic
Types
General
Specific
Description
graphic
omponents |Performance Task

Definition
Target
Component
+/Validity &
Reliability
Examples
Definition
A performance task is an assessment activity that requires a
student to demonstrate her/his achievement by producing
an extended written or spoken anwer, by engaging in group
or individual activities, or by creating a specific product.
The performance task you administered may be used to
assess product the student produces and/or the process a
student uses to complete the product.
(Nitko & Brookhart, 2007:244).
Performance task is what student are required to do in the
performance assessment, either individually or in group.
(McMillan, 2007:239)

Definition
Types of Task
relatively brief responses. The task in structured and specific.
Target
Component
Restricted-types tasks target a narrowly defined skill require
Extended-type tasks are more complex, elaborate, and timeconsuming. Extended type tasks often include collaborative work
with small groups of student. The assignment usually requires that
+/Validity &
Reliability
Examples
students use a variety of sources of information.

Task Description
Definition
Target
Component
+/Validity &
Reliability
Examples
Task description is used to provide a blueprint or listing of

spesifications to ensure that essential criteria are met, that the
task is reasonable, and that it will elicit desired student
performance.
The task description should include the following:
Content and skill target to be assessed
Description of student activities
Group or individual
Help allowed
Resources needed
Teacher role
Administrative process
Scoring procedures

Task Question or Prompt
Definition
The actual question, problem, or prompt that giving to students

based on the task description.
Target
Component
It needs to be stated so that it clearly identifies what the

outcome is, outlines what students are allowed and
encouraged to do, and explains the criteria that will be used to
judge.
+/-
It also provides a context that helps student understand the

meaningfulness and relevance of the task.
Validity &
Reliability
Examples

How to Craft A Task
Definition
Target
Develop
Task and
Context
Description
Generate or
Identify
Idea for a
Task
Develop
Task
Question or
Prompt
(McMillan, 2007:239)
Component
Content
standards
Information Processing
Standard
Habits of Mind
Standard
+/Validity &
Reliability
DRAF
T1
DRAF
T2
DRAF
T3
Effective
Communication
Standard
Final Draft of Task
DRAF
T4
Examples
Complex Reasoning
Standard
Collaboration/Cooperative
Standard
(Marzano., et al, (1993) as cited

in Nitko & Brookhart (2007:265)

Criteria for Performance Task
Definition
Target
Component
+/Validity &
Reliability
Examples
1.
Essential
The task fits into the core of the curriculum. It represents a big idea
2. Authentic
The task should be authentic. Wiggins (1998) as cited in (McMilan, 2007:244)
suggests standard for judging the degree of authenticity in an assessment
task as follows:
1) Realistic. The task replicates the ways in which a persons knowledge
and ability are tested in real-word situation.
2) Requires judgment and innovation. The student has to use knowledge
and skills wisely and effectively to solve unstructured problem, and the
solution involves more than following a set routine or procedure or
plugging in knowledge.
3) Ask the student to do the subject. The student has to carry out
exploration and work within the discipline of the subject area, rather than
restating what is already known or what was taught.
4) Replicates or stimulates the contents in which adults are tested in
workplace, in civic life, and in personal life. Contexts involve specific
situation that have particular constraints, purposes, and audiences.
5) Assesses the students ability to efficiently and effectively use a repertoire
of knowledge and skill to negotiate a complex task. Students should be
required to integrate all knowledge and skills needed.

Definition
Target
Component
+/Validity &
Reliability
Examples
6)
Allows appropriate opportunities to rehearse, practice, consult resources,

and get feedback on and refine performance and products. Rather than
rely on secure tests as an audit of performance, learning should be
focused through cycles of performance-feedback-revision-performance,
on the production of known high-quality products and standars, and
learning in context.
As standards has been developed by Freed Newmann (1997) cited in

(McMilan, 2007:245) stated that authentic tasks require the following:
) Construction of meaning (use of reasoning and higher-order thinking skills
to produce meaning or knowledge)
1) Organization of information
2) Consideration of alternatives
) Disciplined inquiry (thinking like experts searching for in-depth
understanding)
3) Disciplinary content
4) Disciplinary proses
5) Elaborted written communication
) Value beyond school (aesthetic, utilitarian, or personal value apart from
documenting the competence of the learner)
6) Problem connected to the world
7) Audience beyond the school

Definition
Target
Component
+/Validity &
Reliability
Examples
3.
4.
5.
Structure the task to assess multiple learning targets.

Structure the task so that you can help students succeed.
Fesible Think through What Student Will Do to be sure that the task is
feasible
) It is developmentally appropriate for students. It should be realistic for
students to implement the task. (Consider: resources, time, costs, and the
opportunity to be successful).
) It is safe.
6. The task should allow for multiple solution.
7. The task should be clear
8. Engaging The task should be challenging and stimulating to students
) The task is thought provoking.
) It fosters persistence.
9. Include explicitly stated scoring criteria as part of the task.
10. Include constraints for completing the task.
omponents |Scoring Rubric

Definition
Definition
Target
Component
+/Validity &
Reliability
Examples
A coherent set of rules using to assess the quality of a

students performance.
The rules guide your judgments and ensure that you
apply your judgments consistenly.
The rules may be in the form of rating scale or a
checklist.
Rubric contains scoring criteria/performance criteria and
rating scale. A rubric or scoring rubric, is a scoring guide
that uses criteria to differentiate between levels of
student proficiency.
(McMillan, 2007:252).

Performance Criteria
Definition
Target
Scoring criteria/criteria/performance criteria are what

you look for in student responses to evaluate their
progress toward meeting the learning target.
(McMilan, 2007)
Component
+/-
The specific behaviors a student should display when

properly carrying out a performance or creating a
poduct.
(Russel & Airasian, 2012:209)
Validity &
Reliability
Examples

Developing Performance Criteria
Definition
Step
1
Reflective brainstorming
Step
2
Categorize the many elements
Component
Step
3
Define each key dimension in clear, simple

language
+/-
Step
4
Contrasting
Validity &
Reliability
Step
5
Describing success
Examples
Step
6
Revising & Refining
Target
(Stiggins, 1994:181-186)

Developing Performance Criteria
Definition
Target
Component
+/Validity &
Reliability
Examples
1. Select the performance to be assessed and either perform it

yourself or imagine yourself performing.
2. List the important aspects of the performance.
3. Try to limit the number of performance criteria, so they all can
be observed during a students performance.
4. If possible, have groups of teachers think through the
important criteria included in a task.
5. Express the performance citeria in terms of observable student
behaviors.
6. Do not use ambiguous words that cloud the meaning of the
performance criteria.
7. Arrange the performance criteria in order in which they are
likely to be observed.
8. Check for existing performance criteria before defining your
own.
(Russel & Airasian, 2012:214-215)

Rating Scale
Definition
Target
Component
A rating scale is used to indicate the degree to which a

particular dimension is present.
Provides a way to record and communicate qualitatively
different level of performance.
Types of rating scales:
Numerical
Uses numbers on a continuum to indicate difference levels
of proficienci in terms of frequency or quality.
Complete understanding 5 4 3 2 1 No understanding
+/Validity &
Reliability
Examples
Qualitative
Uses verbal description to indicate different level
Never, Seldom, Occasionaly, Frequently, Always
Excellent, good, fair, poor
(McMilan, 2007:250-252)

Rating Scale
Definition
Target
Component
+/Validity &
Reliability
Examples

Numerical
A series of numbers to indicate the degree to which a characteristic
is present. Typically, each of a series of numbers is given a verbal
description that remain constant from one characteristic to another.
Directions: Indicate the degree to which this pupil contributes to class
discusion by circling the appropriate number. The number
represent the following values: 5 outstanding, 4 above
average, 3 average, 2 below average, and 1
unsatisfactory
1. To what extent does the pupil participate in discusion?
1 2 3
4 5
Graphic Rating Scale
The distinguis feature if the graphic rating scale is that each
characteristic is followed by a horizontal line.

Rating Scale
Definition

Description Graphic Rating Scale
Uses descriptive phrase to identify the points on a graphic scale.
Target
Component
+/Validity &
Reliability
Examples
(Gronlund, 1985:391-392)

How to Craft a Rubric
Definition
General Steps In Preparing and Using Rubrics

Step Select a performance/proces to be assessed.
1
Target
Component
+/Validity &
Reliability
Examples
Step State performance criteria for the process.

2
Step Decide on the number of scoring levels for the rubrics, usually three to five.
3
Step
4
State the description of performance criteria at the highest level of student

performance.
Step
5
State the description of performance criteria at the remaining scoring

levels.
Step Compare each students performance with each scoring level.

6
Step Select the scoring level closest to a students actual performance.
7
Step Grade the student.
8

Definition
Target
Component
+/Validity &
Reliability
Examples
Top-Down Approach
The top-down approach begins with a conceptual frame-work that you can
use to evaluate students performance to develop scoring rubrics; follows
these steps:
1. Adapt or create a conceptual framework of achievement dimensions
that describes the content and performance that you should assess.
2. Develop a detailed outline that arrange the content and performance
from step 1 in a way that identifies what you should include in the
general rubric.
3. Craft a general scoring rubric that conforms to this detailed outline and
focuses on the important aspects of content and process to be
assessed across different tasks. It can be used as is to score student
work, or it can be used to craft specific rubrics.
4. Craft a specific scoring rubric for the specific performance task you
are going to use.
5. Use the specific scoring rubric to assess the performances of several
students; use this experience to revise the rubric as necessary.

Definition
Target
Component
+/Validity &
Reliability
Examples
Bottom-Up Approach
With the bottom-up approach you begin with samples of students work,
using actual responses to create your own framework. Use examples of
different quality levels to help you identity the dimensions along which
student can be assessed, follow these steps :
1. Obtain of about 10 to 12 students actual responses to a performance
item. Be sure the responses you select illustrate various levels of
quality of the general achievement you are assessing.
2. Read the responses and sort all of them into three groups: high-quality
responses, medium-quality responses, and low-quality responses.
3. After sorting, carefully study each students responses within the
groups, and write very specific reasons why you put that responses
into particular group.
4. Look at your comment across all categories and identify the emerging
dimensions.
5. Separately for each of the three quality levels of each achievement
dimension you identified in step 4, write a specific student-centered
description of what the responses at the level are typically like.

Definition
Target
Component
+/Validity &
Reliability
Examples
Suggestion for developing rubrics

1. Be sure the criteria focus on important aspect of the performance.
2. Match the type of rating with the purpose of the assessment. If your
purpose is more global and you need an overall judgment, a holistic
scale should be used. If the major reason for the assessment is to
provide feedback about different aspect of the performance, an
analytical approach would be best.
3. The descriptions of the criteria should be directly observable.
4. The criteria should be written so that student, parent, and other
understand them. Recall that the criteria should be shared with
student to incorporate the descriptions as standards in doing their
work.
5. The characteristics and traits used in the scale should be clearly and
specifically defined. You need to have sufficient detail in you
descriptions so that the criteria are not vague. If a few general terms
are used, observed behaviors are open to different interpretations.
The wording needs to be clear and unambiguous .
6. Take appropriate steps to minimize scoring error.
7. The scoring system needs to be feasible.

Types of Scoring Rubrics
Definition
Target
Component
+/Validity &
Reliability
Examples
Analytic Scoring Rubric

Each criterion is evaluated separately
Gives diagnostic information to teacher
Gives formative feedback to students
Easier to link to instruction than holistic rubrics
Good for formative assessment; adaptable for summative assessment
Takes more time to score than holistic rubrics
Take more time to achieve inter-rater reliability then with holistic rubrics
Holistic Scoring Rubric
All criteria are evaluated simultaneously
Scoring is faster than with analytic rubrics
Requires less time to achieve inter-rater reliability
Good for summative assessmnet
Single overall score does not communicate information about what to
do to improve
Not good for formative assessment

Types of Scoring Rubrics
Definition
Target
Component
+/Validity &
Reliability
Examples
Generic Scoring Rubric

Description of work gives characteristics that apply to a whole family of
tasks
Can share with students, explicitly linking assessment and instruction
Reuse same rubrics with several tasks or assignments
Supports learning by helping students see good work as bigger than
one task
Support student self-evaluation
Students can help construct generic rubricss
Lower reliability at first than with task-spesific rubrics
Requires practice to apply well
Task-spesific Rubric
Description at work refers to the specific content of a particular task
Teacher sometimes say using these makes scoring easier
Requires less time to achieve inter-rater reliability
Cannot share with students
Need to write new rubrics for each task
For open-ended tasks, good answers not listed in rubrics may be
evaluated poorly

How to Craft a Checklist
Definition
Target
Component
+/Validity &
Reliability
Examples
List and describe clearly each specific sub

performance or step in the procedure you want the
student to follow
Add to the list specific errors that students

commonly make (avoid unwieldy lists, however)
Order the correct steps and the errors in the

approximate sequence in which they should occur
Make sure you include a way either to check the

steps as the student performs them or to number
the sequence in which the student performs them

Common Error in Rating
Definition
Target
Component
+/-
Validity &
Reliability
Examples
Leniency error occurs when a rater tends to make almost all rating
toward the high end of the scale, avoiding the low end.
Severity error is the opposite of leniency error.
Central tendency error occurs when a rater hesitates to use
extremes and uses the the middle part of the scale only.
Halo effect occurs when a raters general impression of a person
influences the rating of individual characteristics.
Personal bias occurs when a rater tends to rate based on
inapropriate or irrelevant stereotypes favoring boys over girls,
whites over blacks, etc.
A logical error occurs when a rater gives similar ratings on two or
more dimensions of performance that the rater believes are
logically related but that are in fact unrelated.
Rater drift occurs when the raters, whose ratings originally agreed,
begin to redefine the rubrucs for themselves.
Reliability decay is related error: Immediately after training, raters
apply the rubrics consistently across students and mark
consistently with one another. However, as time passes, the ratings
become less consistent.
trengths|Performance Assessment
Definition
Target
Component
Strength &
Weakness
Validity &
Reliability
Examples
Integrates assessment with instruction.

Learning occurs during assessment.
Provides the agreement between teachers and students about
assessment criteria and given tasks.
Emphasis pupils to demonstrate a proces that can be directly observed
(Provides additional way for students to show what they know and can
do).
Performance task require integration of knowledge, reasoning, skill, and
abilities.
Emphasis on application of knowledge (real world situation).
Performance tasks clarify the meaning of complex learning targets.
Tends to be more authentic than other types of assessments.
More engaging; active involvement of students.
Provides opportunities for formative assessment.
Performance tasks let teachers assess the processes students use as
well as the products they produce.
Forces teacher to establish specific criteria to identify successful
performance.
Encourages student self-assessment.
eaknesses|Performance Assessment
Definition
Target
Component
Strength &
Weakness
Validity &
Reliability
Examples
High-quality performance task & scoring rubrics are difficult to

craft.
Requires considerable teacher time to prepare and student
time to complete.
Scores from performance tasks may have lower scorer
reliability (reliability may be difficult to establish).
Students performance on one task provides little information
about their performance on other tasks
Limited ability to generalize to a larger domain of knowledge.
Performance tasks do not assess all learning targets well.
Completing performance tasks may be discouraging to less
able students
Performance assessments may underrepresent the learning of
some cultural groups.
Performance assessments may be corruptoble (measurement
error due to subjective narute of scoring may be significant).
Validity|Performance Assessment
Definition
Principle of a good measuring instrument

All good measuring instruments have certain primary qualities.
Target
a. Validity
b. Reliability
Component
c.
d.
e.
f.
+ /Validity &
Reliability
Examples
Objectity
Ease of administering
Ease of scoring
Ease of interpreting
g. Adequate norms
h. Equivalent forms
i. Economy
Noll, et al. (1979:90)
Definition
Target
Validity means the degree to which it is relevant to its

purpose. In the case of performance test, validity is the
degree of correspondence between performance on the test
and ability to perform the criterion activity.
(Thorndike, 1971:240)
Component
+ /Validity &
Reliability
Examples
Validity is the effectiveness of a test for the purposes for

which it is used.
(Noll, et al.,1979:1971)
Source of Information for Validity
Definition
Target
Component
+ /Validity &
Reliability
Examples
Content-related evidence
The extent to which the assessment is representative of the domain
of interest.
When a teacher gives a test that measures appropriately the content
and behavior that are the objectives of intruction.
Criterion-related evidence
The relationship between an assessment and another measure of
same trait. Generally based on agreement between the scores on a
test and some outside measure, called criterion.
Constuct-related evidence
Is concerned with the question of how well differences in test scores
conform to predictions about characteristics that are based on an
underlying theory or construct. Judgments of construct validity are in
fact most often based on a combination of logical analyses and an
accumulation of empirical studies.
Definition
Target
Component
+ /Validity &
Reliability
Examples
To ensure valid performance assessment, students should be

instructed on the desired performance criteria before being assessed.
To improve validity of performance assessment:
Stating performance criteria in terms:
a. setting performance criteria at an appropriate difficulty level for
students
b. limiting the number of performance criteria
c. maintaining a written record of student performance and
checking to determine whether extraneous factors influenced a
students performance
(Russel & Airasian, 2012).
) Be sure that what you require students to do in your performance
activity matches the learning targets and that your scoring rubrics
evaluate those same learning targets.
) Be sure the performance tasks you craft require students to use
curriculum-specified thingking processes.
) Be sure to use many different types of assessment procedures to
sample the breadth of your states standards and your local
curriculums learning targets
eliability|Performance Assessment
Definition
Target
Component
+ /Validity &
Reliability
Examples
Definition
Reliability refers to the consistency of measurement.
Reliability is concerned with the consistency, stability, and

dependability of scores.
Relaibility is the degree to which students results remain consistent

over replication of an assessment procedure.
A reliable measure in one that provides consistent and stable

indication of the characteristic being investigated.
Definition
Target
Component
Estimating Reliability
The reliability of ratings is an important criterion for evaluating performance
assessments.
Estimating reliability Over Time
Test-retest
Alternate forms on different occasions
Estimating Reliability on a Single Occasion
+ /Validity &
Reliability
Examples
Alternate forms
Coefficient alpha
Split-halves coefficient
Estimating Scorer Reliability
Correlation of two scorers results

Percentage agreement
Kappa coefficient
Definition
Target
Product Moment
Component
+ /Validity &
Reliability
Examples
Spearman-Brown
Flanagan
Rulon
Definition
KR20
Target
KR21
Component
+ /Validity &
Reliability
Examples
Alpha coeficient
Kappa coeficient
How to improve the reliability of ratings
Definition
Target
Component
+ /Validity &
Reliability
Examples
Organize the achievement dimensions within a scoring rubric into

logical groups that match the content and process framework of the
curriculum.
For each achievement dimension, use behavioral descriptors to
define each level of performance.
Provide specimens or examples of students work to help define
each level of an achievement dimension.
Have several teachers work together to develop a scoring rubric or
rating scale
Have several teachers review and critique the draft of a scoring
rubric or rating scale.
Provide training and supervised practice for all persons who will
use the scoring rubric or rating scale.
Have more than one rater rate each students performance on the
task
Monitor raters by periodically sampling their ratings, checking on
the accuracy and consistency with which they are applying the
scoring rubrics and rating scales. Retrain those persons whose
ratings are inaccurate or inconsistent.
Example|
References
Arikunton, S. (2012). Dasar-dasar evaluasi pendidikan (Ed. 2). Jakarta: Bumi Aksara
Definition
Gronlund, N. E. (1985). Measurement and evaluation in teaching. New York: Macmillan

Publishing Company
Target
McMillan, J. H. (2007). Classroom assessment: principles and practice for effective

standar-based instruction (4th ed.). Boston: Pearson Education, Inc
Component
+ /Validity &
Reliability
Examples
Miller, M. D., Linn R. L., Gronlund, N. E. (2009). Measurement and assessment in

teaching. New Jersey: Pearson Education, Inc
Nitko, A. J., & Brookhart, S. M. (2007). Educational assesment of students (5th ed.).
New Jersey: Pearson Education, Inc
Noll, V. H., Scannell, D. P., Craig, R. C. (1979). Introduction to educational
measurement. Boston: Houghton Mifflin Company
Russell, M. K., Airasian, P. W. (2012). Classroom assessment (7th ed.). New York:
McGraw-Hill
Thorndike. R. L. (1971). Educational measurement (2nd ed.). Washington:
Wulan, A. R. Penilaian Kinerja dan Portofolio pada Pembelajaran Biologi (Handout
Penilaian Kinerja dan Portofolio). Bandung: FMIPA UPI

Asesmen Kinerja Proses

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Asesmen Kinerja Proses

Uploaded by

Copyright:

Available Formats

Penilaian Kinerja

efinition |Performance Assessment

... one in which some criterion situation is simulated to a much

James H. McMilan (2007:229)

Nitko & Brookhart (2007:244)

efinition |Performance Assessment

Pedoman MP IPA (Lampiran Permendikbud No. 58 Thn 2014,

Penilaian kinerja atau praktik dilakukan dengan penilaian

efinition |Performance Assessment

Behavior to be demonstrated or Product to be

efinition |Performance Assessment

Level of detail results (holistik or analytical)

Quick and easy way

Result can lack

efinition |Performance Assessment

Memilih fokus asesment

Memilih tingkat realisme

Memilih metode observer,

Mengujikan task dan rubtik

Memperbaiki task dan rubrik untuk

(Ana Ratna Wulan)

efinition |Performance Assessment

The steps in a procedure can be

The extent to which an individual

Much or all of the evidence needed to evaluate

An ample number of persons are available to

argets |Performance Assessment

argets |Performance Assessment

Speaking, writting, listening, oral reading, performing

Efectiveness in planing, use of time, use of equipment,

Concern for the welfare of others, respect for laws,

Open mindedness, willingness to suspend judgment,

Expressed feeling toward variois educational activity

Feeling of satisfaction and enjoyment expressed toward

Relationship to peers, reaction to praise and criticism,

argets |Performance Assessment

Knowledge: Use of reference material to acquire knowledge.

argets |Performance Assessment

Marzano, et al (Nitko & Brookhart, 2007:244)

argets |Performance Assessment

Marzano, et al (Nitko & Brookhart, 2007:244)

argets |Performance Assessment

Marzano, et al (Nitko & Brookhart, 2007:244)

Is aware of own thinking.

argets |Performance Assessment

Marzano, et al (Nitko & Brookhart, 2007:244)

argets |Performance Assessment

Marzano, et al (Nitko & Brookhart, 2007:244)

2. Demonstates effective interpersonal skill

omponents |Performance Task

omponents |Performance Task

relatively brief responses. The task in structured and specific.

Restricted-types tasks target a narrowly defined skill require

students use a variety of sources of information.

omponents |Performance Task

Task description is used to provide a blueprint or listing of

omponents |Performance Task

The actual question, problem, or prompt that giving to students

It needs to be stated so that it clearly identifies what the

It also provides a context that helps student understand the

omponents |Performance Task

Final Draft of Task

(Marzano., et al, (1993) as cited

omponents |Performance Task