Professional Documents
Culture Documents
This paper describes performance assessment and some of its important attributes in the
vocational education and training setting. It is asserted that the widespread introduction of
competency-based education and assessment in recent years has reinforced the use of
performance assessment. Some examples of performance assessment in vocational education
and training are examined to illustrate some of the important attributes of performance
assessment. The paper then discusses some of the key issues concerning performance
assessment, including validity, reliability, costs and consequences of assessment. It is argued
that performance assessment is a very useful tool in vocational education and training, and
may not need to satisfy all the reliability criteria of paper and pencil tests.
Introduction
In recent years the use of performance assessment has increased steadily
in vocational education and training and other sectors in education
systems across the world (Broadfoot 1995). Increasingly educators are
called upon to promote the learning of skills, knowledge and attitudes
that cannot be tested by the traditional paper and pencil assessment
techniques. Performance assessments are expected to deliver a wide
range of benefits to the learner such as higher motivation, deeper and
more meaningful learning, better connection between assessment and
learning and more valid assessment.
There has been much debate in the United Kingdom and the United
States concerning the use of performance assessment for elementary
school students and particularly the use of performance assessments for
accountability purposes to make comparisons across schools and
regions. The proponents of performance assessment (Wiggins 1989;
Frederickson & Collins 1989; Linn & Baker 1996) have mounted a
sustained attack on the traditional measurement model of assessment
and especially on standardised norm-referenced testing using multiple
choice items. The opponents of performance assessment have questioned
its validity (Messick 1994) and outlined its psychometric problems
(Shavelson et al. 1993).
69
... refers to assessment tasks that require students to perform an activity (e.g.
laboratory experiment in science) or construct a response. Extended periods of
time, ranging from several minutes to several weeks, may be needed to
perform a task. Often the tasks are simulations of, or representations of,
criterion activities valued in their own right. Evaluations of performance
depend heavily on professional judgement.
(Linn 1993, p.9)
70
71
Assessment is multi-dimensional
Performance assessments are practical assessment methods which assess
a wider range of outcomes than knowledge alone.
Product or process
Performance assessments can involve the assessment of a product (e.g. a
business plan, a soup or a stained microscope slide) or the observation of
72
Simple to complex
Within the area of performance-based assessment there is'a continuum of
complexity ranging from the performance of a small, simple skill sample
(turning on an oven) through to a complex multi-dimensional activity
(planning, preparing and serving a three-course meal for a function).
Linn and Gronlund (1995) use the terms restricted performance to refer to
assessment of specific skills and extended performance to refer to the
integration of knowledge, skills and attitudes in the asse,ssment of more
complex learning outcomes.
Openendedness
Another attribute of performance assessment is its openendedness
(Baker et al. 1993). The assessments are not fixed-choice and a stu~ent
may respond to a task in a number of ways, some of which may be
unexpected by the assessor. This is more likely to occur as the
complexity of the set task increases.
73
Human judgement
Another feature of performance assessments is that they are usually
rated by human scorers using predetermined criteria (Green 1995). Some
degree of assessor expertise is therefore required. The assessors can score
the performance holistically or globally, based on an overall impression,
or analytically, using a list of criteria (Athanasou 1997). The development
of rating sheets or assessment guidelines and the training of assessors to
do the scoring have become aspects of assessment that are of great
importance.
Are you a member of the tribe of Ephraim? they asked. If the man replied that
he was not, then they demanded, 'Say Shibboleth'. But if he could not
pronounce the 'sh' and said Sibboleth instead of Shibboleth he was dragged
away and killed. As a result 42000 people of Ephraim died there at that time.
(Judges 12: 5-6).
Performance assessments have also been used in the military for at least
50 years and in industry for 70 years (Bond 1995). Performance
assessments have a long history in being employed in examinations for
professional certification such as medicine (van der Vleuten &. Swanson
1990) law and architecture. Projects, portfolios, extended problems,
presentations and others have long been used in school, vocational and
higher education.
74
75
76
validity
consequences of assessment
use of resources
reliability
holistic assessment
Validi1y
Some argue that in comparison with traditional assessment approaches,
performance assessment provides more valid information. Performance
assessments appear to have good face validity (Mehrens 1992) in being
acceptable to industry and the community. This face validity assists in
the acceptance of performance assessment to the various stakeholder
groups but it is not sufficient on its own and cannot take the place of
overall test validity. Overall test validity may be defined as the 'degree to
which a certain inference from a test is appropriate and meaningful'
(Athanasou 1997, p.160).
Two related aspects of validity are authenticity and directness. It is
claimed that compared to' other approaches to assessment, performance
assessment has the central advantages of authenticity and directness
(Frederickson & Collins 1989). Authenticity in assessment means that all
or nearly all of the criterion construct (usually a vocational education
and training goal expressed as a job performance at the workplace) is
captured by the assessment task. Directness in assessment means that
none or few skills outside the criterion construct are captured by the
assessment task. Performance assessment is often thought to be more
reflective of workplace requirements because of its authenticity and
77
Consequences of assessment
Authenticity and directness in performance assessment lead to positive
consequences for teaching and learning (Linn 1993; Torrance 1995;
Wiggins 1989; Frederickson & Collins 1989). For vocational education
78
Use of resources
There may be problems with either keeping the questions secure or
developing new questions for further use (Mehrens 1992). If exactly the
same performance is required semester after semester students can
memorise the response to higher order questions just as they can to more
basic questions. This may present no problems for some performances
(e.g. baking a croissant) but any performance tasks that involve a larger
meta-cognitive component cannot be so readily reused: This may be at
odds with notions of transparency of assessment. In these cases there
79
may be higher developmental costs than anticipated and there may also
be difficulties making comparisons of cohorts of students/trainees
across years if this is required.
Time required and therefore the cost of performance assessment is high.
There are also consumables and equipment costs. Assessors may need
extra training to develop the skills of observation and recording for
performance assessments. Performance assessments generally yield less
information per hour than traditional assessment such as short answer
supply or multiple choice tests (Green 1995).
Wolf (1995) cautions that in the United Kingdom there has been an
enormous increase in the volume of assessment (and the amount of class
time devoted to it). This may be due in part to the atomistic approach
adopted to assessment and the fragmentation of the curriculum. Also
noted by Wolf is the observation that when the volume of assessment
becomes large, formative assessment disappears.
Reliability
Problems with reliability centre on sampling issues, subjectivity of the
assessors and to what extent we can generalise from the performance to
the larger domain. Because performance assessments take more time
than traditional assessments and use more assessor time, there are
generally fewer tasks. This lesser quaJ.ltity of information collected from
performance assessment usually will lower reliability (due to the
sampling problem discussed above). For acceptable levels of score
reliability there should be more than one task, and preferably several
tasks. Brennan and Johnson (1995) found that the number of tasks has a
strong effect on reliability. However, having several tasks in the
performance assessment may be costly in time and resources if each task
is time consuming. This may be justified if the assessment is critical, such
as for high cost/high risk situations such as the licensing of a doctor
(Linn 1993). A second justification for the use of multiple tasks is that the
task performance itself is a beneficial part of instruction. When the
assessment tasks are valued learning activities in their own right, the
result is a better integration of instruction and assessment. According to
80
Holistic assessment
Performance assessments are able to assess complex thinking in a more
holistic way relating learning outcomes in the cognitive, psychomotor
and affective domains. An important feature of performance assessment
is its trait of testing for complex competencies rather than their
component skills which have been decomposed from the complex
competency. According to Resnick and Resnick (1991) testing for the
decomposed skills fails to recognise that:
... complicated skills and competencies owe their complexity not just to the
number of components they engage but also to interactions among the
components and heuristics for calling upon them.
(Resnick & Resnick 1991, p.42)
81
While this point is not in dispute in the literature, the implications for
teaching and assessment practice are in dispute. Messick (1994) suggests
that in many settings it would be most effective to teach and assess both
the complex skill and its component skills. He asks: 'Might not
assessment of component skills help one to understand the nature of the
complex skill and the sources of its complexity, providing a functional
basis for improving methods of teaching?' (p.20). However, focussing on
component skills will not be sound if it means that effective teaching and
practice of the complex skill are foregone. In the process of decomposing
the complex skill into its components, something could be left out. The
implication is that performance assessment of the complex skill as a
functioning whole guarantees that nothing important will be left out
(Messick 1994, p.20).
Conclusion
In this paper we have described performance assessment and some of its
important attributes in the vocational education and training setting.
Although performance assessments have a long history of use in the
vocational education and training sector and in vocational courses in
higher education, the widespread introduction of competency-based
education and assessment in recent years has reinforced the use of
performance assessment. Some examples of performance assessment in
vocational education and training were examined. We then discussed
some of the key issues concerning performance assessment.
Performance assessment is very useful in most vocational education and
training settings, having the key characteristics of authenticity and
directness. Programs that may be flexibly delivered off the job and/or on
the job may rely heavily on performance assessment because of its easy
application in both settings and ease of use by most teachers and
trainers. This does not mean that paper and pencil or oral tests of
knowledge should not be used to supplement performance tests.
The assessment tasks chosen need to be carefully selected to ensure
satisfactory validity, and other strategies (such as increasing the number
of tasks assessed) may be required to ensure adequate reliability.
82
Rsfsrsnces
Athanasou, J 1997, Introduction to educational testing, Social Science Press, Wentworth
FallS,NSW.
Arter, J & Spandel, V 1992, 'Using portfolios of student work in instruction and
assessment', Educational Measurement: Issues and Practice, vol. 11 , no.3, pp.36-44.
Baker, E, O'Neil, H & Linn, R 1993, 'Policy and performance prospects for
performance-based assessment', American Psychologist, vol.48, no.12, pp.1210-1218.
Bond, L 1995, 'Unintended consequences of performance assessment: Issues of bias
and fairness', Educational Measurement: Issues and Practice, vo1.14, no.4, pp.21-24.
Brennan, R & Johnson, E 1995, 'Generalisability of performance assessments',
Educational Measurement: Issues and Practice, vo1.14, no.4, pp.9-12 & 27.
Broadfoot, P 1995, 'Performance assessment in perspective: International trends and
current English experience', in Evaluating authentic assessment, ed. H Torrance,
Open University Press, Buckingham.
Fitzpatrick, R & Morrison, E 1971, 'Performance and product evaluation', in
Educational Measurement, 2nd edition, ed. R Thorndike, American Council on
Education, Washington, DC.
83
84
van der Vleuten, C & Swanson, D 1990, 'Assessment of clinical skills with
standardised patients: The state of the art', Teaching and Learning in Medicine, vo1.2,
no.2, pp.58-76.
Wiggins, G 1989, 'Teaching to the (authentic) test', Educational Leadership, April,
pp.41-47.
Wolf, A 1995, 'Authentic assessments in a competitive sector: Institutional
prerequisites and cautionary tales', in Evaluating authentic assessment, 00.
H Torrance, Open University Press, Buckingham.
85