You are on page 1of 25

EVALUATION

Basic Concepts

Sylvester Saimon Simin


Keningau Teachers Training College
You should be able to….
Knowledge Skills Values / Remarks
6. Evaluation
6.1 Test Blue Print and Prepare a test blue print T & L Resources:
Question based on KBSM Science •Kementerian Pendidikan
Construction (10 Syllabus Malaysia (1995d)
hours) Construct objective, •Past year PMR and SPM
structure, and essay Examination papers
questions based on the
test blue print
Prepare marking schemes
for the above questions

Study and analyze the Values:


6.2 Centralised format of the examination To be aware of the
examinations (PMR papers on the following importance of planning
and SPM) aspects: distribution of and preparation before
Format of PMR and multiple choice, structure administering a test.
SPM examination and essay questions.
papers Analyze each MCQ and To be aware of the
classify it based on accountability of the test.
Bloom’s Taxonomy
2
I Have a Dream About Assessment
Roger Farr
• I have a dream that assessment...
– ...will be accepted as a means to help teachers plan
instruction rather than as a contrivance to force teachers to
jump through hoops;
– ...will be based on trust in a teachers judgment as much as
numbers on a page are trusted;
• I have a dream that assessment...
– ...will become a helpful means to guide children to identify
their own literacy strengths rather than a means to
conveniently label them;
– ...will support each child in becoming the best he or she can
be rather than
a means to sort children into groups of the best and the
worst;
• And I have a dream that assessment...
– ...will be put to use to honor what children can do rather
than destroying them for what they can’t do.
• If we all work together we can make such dreams
become a reality as we work to help each child grow.
3
4
Purpose of Evaluation
• to determine the students’ achievement of certain
knowledge and skills as specified by the syllabus of the
subject
• to measure students’ progress over time,
• to rank students’ in terms of their achievement,
• to diagnose the main difficulties faced by the students
in the areas of study,
• to determine how effective are the teacher’s
instructional strategies,
• to determine the effectiveness of the curriculum, its
strengths and weaknesses,
• to encourage good study habits,
• to motivate students

5
Assessment Terms
• Performance Assessments
– Assessment requiring students to
demonstrate their acheivement of
understandings and skills by actually
performing a task or set of tasks (eg. Writing a
story, giving a speech, conducting an
experiment, operating a machine)

6
Assessment Terms
• Alternative Assessment
– A title for performance assessments that
emphasizes that these assessment methods
provide an alternative to traditional paper-and-
pencil testing.

7
Assessment Terms
• Authentic Assessment
– A title for performance assessments that
stresses the importance of focusing on the
application of understandings and skills to real
problems in “real-world” contextual settings

8
TERM WORKING DEFINITION
accreditation The official endorsement of the procedures and/or standards of an
institution by an authority. For example, an examination board may accredit
a center for the assessment of course work.

aim (educational aim) A long-term goal which may or may not be achievable within the teaching
program.

appeal A challenge by a candidate or a school to the results awarded by an


examining authority.

assessment General term used for the 'measurement' of a behavior or characteristic

assessment component One part of an assessment package - e.g., a written paper, a practical test,
an oral exam, a piece of coursework.

assessment objective A statement of an expected learning outcome which will be assessed.

assessment package The total assessment scheme which may be composed of one or more
components

aural examination Listening test (not to be confused with an 'oral test' i.e., a test of speaking.)

backwash effect The effect (positive or negative) of the scheme of assessment on the
(occasionally 'washback teaching/ learning program which precedes it.
effect')
bias Tendency of a test, or an item, to place one group at an advantage over
another on the basis of a factor (e.g., gender, ethnicity, language) other
than that which the test purports to assess.

camera-ready copy (CRC) Final proof of an examination paper as it will appear, after printing, on the
candidate's desk.

centralized marking Administrative arrangement where all answer scripts are brought to a
central location for marking. Where markers remain at the center
throughout the marking period, this may be referred to as 'residential
marking'.

certification Use of examination results to provide individuals with documentary


evidence of achievement (i.e., a certificate).

classical item statistics Statistics describing the behavior of a test item (typically its level of
difficulty and its discriminatory power) by analysis the responses of a
particular group of test-takers. Note that such statistics are dependent on
the group taking the test. (See also IRT).

coaching Special preparation of candidates for an examination typically by practicing


the techniques of test taking, rote learning of past questions and answers,
'question spotting' etc.

code of practice Set of guidelines and/or regulations controlling the procedures of


assessment authorities in the conduct of public examinations. Where
examination bodies have constitutional autonomy, this may have to be a
voluntary code of practice. 9
curriculum All educational aspects of an institution and its teaching programs -
including non-examined subjects

cut-off point Test score at which students are deemed successful (and below which
they are deemed unsuccessful). See also grade threshold.

double marking Procedure in which answer scripts are independently scored by two raters.
Where there is a discrepancy between scores, set procedures apply for
reaching the final score. Typically these include averaging small
differences and using an 'expert marker' as an arbiter where differences
are large.

end-users Individuals or institutions who use examination results for their own
purposes e.g., universities, schools, employers.

equity An equitable examination ensures that all students who possess the same
degree of ability receive the same result. Where there are inequities, an
individual or group gains an unfair advantage over others. It follows that
inequity places some individuals and/or groups at a disadvantage due to
factors other than the ability that the examination purports to assess.

evaluation Assessment for the purpose of making a value judgment, e.g., to judge the
effectiveness of a teaching program

examination center Place officially recognized for the conduct of examinations. Typically
centers are state schools, private schools, university halls or private
buildings hired for examination purposes.

feedback The systematic flow of information gained from an assessment to


educationists, policy makers, and others e.g., examiner reports for
teachers.

formative assessment Assessment which takes place as an integral part of the teaching-learning
program (see also summative assessment).

grade threshold Test score between two reporting grades. For example, if the A-grade
threshold is 81%, students scoring 80% will be awarded grade B and those
scoring 81%, grade A.

group certificate Examination system which requires candidates to take a prescribed


number and combination of subjects. The award of the certificate is
dependent on the candidate meeting pre-determined criteria for success.

10
high-stakes examination An examination where students, parents and teachers invest a great deal
of effort, and perhaps money, in preparing because success can potentially
bring great rewards whilst failure may damage the candidate's life-chances.

impersonation Form of malpractice where someone takes an examination in place of the


registered candidate.

invigilator Person who supervises and is responsible for the conduct of an


examination in a particular examination room/hall.

IRT Item Response Theory (sometimes IRM - Item Response Modeling).


Psychometric tool which, in its simplest form, uses a mathematical model
to link a student's chance of being successful on an item with the student's
ability and the item's difficulty. This allows items to be calibrated on an
absolute measurement scale.

item bank A collection of items categorized according to their characteristics e.g., type
of item, topic, skill being assessed, level of difficulty, etc. Items are then
drawn from the bank to build a test according to predetermined test
specifications.

league table A table which ranks schools on the basis of examination results and other
indicators (see also 'value added').

leakage Unauthorized release of examination materials and/or information prior to


the official release date.

localization Where an independent country takes responsibility for the maintenance


and further development of an examination system introduced by a former
colonial authority.

malpractice Any deliberate act of wrongdoing, contrary to the rules of the examination,
designed to give a candidate an unfair advantage or, albeit less frequently,
to place a candidate at a disadvantage.

marker One who marks/scores candidate responses (also rater).

marking scheme Instructions as to how marks are to be allocated to student responses


(answers). These may be detailed for objective and semi-objective tasks.
For open-ended and subjective tasks, they may take the form of general
descriptions ('band descriptors').

measurement An assessment made using the concept of a well-defined ability scale to


quantify a behavior or characteristic e.g. mathematical ability.

moderation General term used by examining authorities for the process of checking
quality. Question paper moderation typically involves the review of draft
question papers by an expert panel. Moderation of school-based
assessment may involve a Board representative visiting the school to look
at work and interview teachers and students. Alternatively, samples of
student work may be sent for review by a Board moderator.

National Assessment Assessment designed to determine national standards usually conducted


using a representative sample of students.
11
objective item Item that can be scored without the marker making a personal judgment as
to the quality of the response e.g., multiple-choice.

OMR Optical Mark Reader - scanning device for reading marks from special
forms thereby allowing the automatic input of student responses to, for
example, multiple-choice question papers.

parastatal Term applied, especially in Africa, to an organization established by a


government but which, through its constitution and budgetary
arrangements, enjoys a great degree of operational freedom and insulation
from direct political interference.
pedagogy The science of teaching including both theory and practice.

private candidate Candidate who enters, and pays for, his/her own entry to a public
examination as compared with a candidate who is entered by the institution
(school) in which he/she is studying and which is recognized by the
examining authority as an authorized center.

psychometry Field concerned with the measurement, and hence quantification, of human
(psychometrics) behaviors and characteristics. Psychometric strategies are built on
statistical models of measurement and human behavior.

public examination An examination offered by a national or provincial (state) authority, or on


behalf of such an authority, to students at a particular level of an education
system. The primary purpose is to certify the level of achievement of
individual students and/or to select students for the next level of the
education system.

quota system Form of selection system where the share of available opportunities to be
awarded to a particular group is pre-determined. For example, in order to
ensure gender balance in a selective secondary school system, 50% of
places may be awarded to boys and 50% to girls. As a consequence, some
boys may be selected with lower examination scores than those achieved
by girls who are rejected (or vice versa).

rater One who marks/scores candidate responses - a marker.

registration Key process whereby the details of individuals (students) are entered into
the administrative database as candidates for forthcoming examinations.

regular candidate Term used, particularly in the Asian sub-continent, for candidates
registering through recognized centers for a series of examinations for the
first time. Private candidates and those re-sitting examinations are
considered irregular.

reliability A measure of the stability of the results produced by an examination. This


includes the stability of scores on re-testing, the stability of scores with re-
marking, and the correlation of scores for sub-sections within the test
(homogeneity).

school-based assessment Any assessment of student performance which takes place in a school and
is incorporated into the public examination result. Note that the degree of
freedom allowed to the school will depend on the regulations and
moderation procedures of the examining authority.
12
script General term for an answer booklet or sheets produced by a candidate in
response to an assessment task.

selection Use of examination results to select individuals for educational or


employment opportunities where the number of such opportunities is
limited. In many developing countries, examination results are used to
select students for the next phase of education e.g. primary-secondary,
lower secondary-higher secondary, secondary-tertiary.

specification grid A plan or 'blueprint' giving the format of a question paper or other
assessment component.

stakes (of an examination) The importance of an examination as judged by what may be gained
through success - and what may be lost through failure. Therefore, a 'high-
stakes' examination will typically be highly competitive because the
successful will enjoy greatly enhanced opportunities.

structured question Task composed of a number of sub-questions (items) linked by a common


context or piece of stimulus material. The sub-questions may be
independent of each other or may be sequenced to lead candidates
through a more complex task (progressive).

subjective item Item that requires the marker (rater) to make a personal judgment as to the
quality of the response e.g. the literary merits of an essay or the artistic
merits of a painting. Note that in order to minimize variation, rater
judgments may be guided and constrained by marking schemes and
descriptors of performance.

summative assessment Assessment which takes place at the end of the teaching-learning program
to record 'final achievement' (see also formative assessment).

supplementary examination A follow-up examination allowing students to retake subjects in which they
have not reached the required level. This issue is of particular importance
in systems awarding group certificates.

syllabus (examination A document formally specifying what will be assessed by the examination
syllabus) and how the assessment will be carried out.

tamper-evident packaging Plastic envelopes for examination materials which cannot be resealed
without showing obvious signs of being opened.

teaching objective A specific short-term goal of the teaching program.


(curriculum objective)

teaching program The program of instruction.

teaching/learning program The instruction delivered by a teacher coupled with the learning that takes
place during the program.

transparency Extent to which the processes involved in the examination system are
visible to the public - especially schools, teachers and students.

validity A measure of the extent to which an examination measures what it


purports to measure.
13
Achievement A demonstration of learning at a particular moment in time
Alternative Any and all assessments that differ from multiple choice, one word answer,
assessment timed items that characterize standard tests
Assessment The gathering of data about students or program, often used as a formative
process to guide instruction
Criterion The standard against which performance is measured
Criterion-referenced Judgement of performance against a previously agreed standard
Diagnostic
assessment
Determines the level of achievement/performance prior to entering a
Evaluation The application of judgement to the data in the form of a grade or
comment, placing a value on that work
Formative Ongoing feedback on a student=s performance throughout the learning
assessment process
Grading Assigning a letter, percentage or score
Ipsative assessment The measure of student growth
Learning outcome A general statement which describes an observable result by which a
student demonstrates knowledge, skill or attitude
Norm-referenced Judgement of performance against the norm for the group
Objective A specific statement of intent
Peer assessment Reflective practice in which students make observations about the
performance of their peers 14
Performance Usually an alternate or authentic assessment, where a student completes a
assessment relevant task which demonstrates learning by using or applying
knowledge
Portfolio
assessment
The assessment of a representative collection of a students work over time

Process Focuses on the variety of strategies, thinking skills and processes that a
assessment student uses to complete a task
Product
assessment
Focuses on the end product of a learning process

Communicating process or achievement to the student or his/her parents


Reporting
or guardian

Rubric A set of quality criteria

Reflective practice in which students make observations about their own


Self-assessment
performance

Reflective practice in which students make observations about their own


Self-referenced
performance

Standard A point of reference against which judgements can be made


Summative A report on the final achievement -- given at the end of a unit or work or
Evaluation
semester or year
15
What is performance assessment?
• A performance assessment is an assessment
activity that requires students to construct a
response, create a product or demonstrate a
skill they have acquired. Rubrics, based on the
selected criteria, are given to students to ensure
that they know what they need to do to meet or
exceed the learner outcomes.
• Well-constructed performance assessments:
– are the most authentic types of assessment since
they replicate out of school experiences, encourage
self-evaluation and demonstrate what students know
and can do;
– put students in a role (e.g. scientist, newspaper
editor) and provide an audience for their task
– provide degrees of proficiency based on criteria and
make public the criteria.
16
A few things to know ……
• Bloom’s taxonomy
• Difference between;
– Testing, measurement, evaluation
– objective & subjective items
– formative & summative evaluation
– critrion reference test & norm reference test
• Validity & Reliability

17
18
19
20
21
The Assessment Process
• Preparation (including Test / Task Blueprint)
• Determine the kind of information needed and
decide how and when to obtain it.
2. Information gathering
– Obtain a variety of information as accurately as
possible
3. Forming judgements
– Judgements are made by comparing the
information to selected criteria.
4. Decision making and reporting
– Record significant findings and determine
appropriate courses of action.

22
INFORMATION GATHERING
• Information gathering techniques
– Procedure for obtaining information
– Inquiry (asking), observations (senses), analysis
(performance, product), testing (common situation to which
all students respond,common set of instructions governing
response, set of rules for scoring responses & description
of performance ie score)
• Information gathering instrument
– Tools used to gather information
– 3 basic types : tests, rubrics and questionnaires
• Teacher made test / classroom tests vs standardized tests
• Rubric : set of rules for scoring student products or
performance. Typically take the form of a checklist or a rating
scale
• Questionnaires : useful for getting opinions, feelings and
interests

23
Information Gathering Techniques
inquiry observation analysis testing
•Opinions •Performance or •Learning •Attitude and
•Self-perceptions end products of outcomes during acheivement
•Subjective some performance the learning •Terminal goals
•Affective process •Cognitive
judgements
Kind of (especially (intermediate
•Affective (especially outcomes
information emotional goals)
attitudes) •Maximum
obtainable reactions) •Cognitive and
•Social perceptions performance
•Social interaction psychomotor
•Psychomotor skills skills
•Some affective
•Typical behavior
outcomes
•Least objective •Subjective, but can •Objective but •Most objective
•Highly subject to be objective if care not stable over and reliable
bias and error is taken in the time
Objectivity
construction and
use of the
instruments
•Inexpensive but can •Inexpensive but •Fairly •Most expensive,
be time consuming time-consuming inexpensive but most
•Preparation information gained
cost per unit of time
time is
somewhat
lenghty but
crucial 24
Information Gathering Instrument
Type Used Advantage Disadvantage
when accurate Usually well developed Often not measuring exactly
information is and reliable. Include what had been taught.
Standardized
needed norms for comparing the Expensive. Limited in what is
test
performance of a class or measured.
an individual
Routinely as a way Usually measure exactly No norms beyond the class are
to obtain what has been taught. available. Often unreliable.
Teacher made
achievement Inexpensive. Can be Require quite a bit of time to
test
information constructed as need construct.
arises.
Checklists Helpful in keeping Measure only presence or
To determine the observations focused on absence of a trait or behavior.
presence or absence key points or critical
Rubrics behaviors.
of specific
To assess the charateristics of
quality of performance
student
performance Rating Scales Allow observational data Take time and effort to
To judge quality of to be used in making construct. Can be clumsy to
performance qualitative as well as use if too complex.
quantitative judgements
To inquire about Keep inquiry focused and Take time and effort to
feelings, opinions, help teacher obtain the construct. Difficult to score.
Questionnaires
and interests same information from No right or wrong answers.
each student. Data difficult to summarize. 25

You might also like