You are on page 1of 32

Part 1: Defining Speaking for Assessment Purposes

Part 2: Assessing Speaking: Challenges and Solutions


Dr. Sari Luoma

Why do we assess speaking skills?


important part of life
important part of a language curriculum
assessment needs to reflect that
However, assessing speaking is challenging
So many dimensions, so little time sampling
Links with personality, intelligence, culture,

context
Important to define to ensure fairness

Values in fair assessment practices


Transparency
of the meaning and role of the assessment in society
Coherence
between learning, teaching, assessment, and score use
Shared understanding
of the meaning and purpose of the assessment
Validity of the scores for the intended purpose
Reliability, or consistency of measurement
Relevance and utility of the scores for stakeholders

Values in fair assessment practices


Transparency
Coherence

System-related

Shared understanding
Validity
Reliability
Relevance and utility

Score-related

Fundamental considerations
in creating good speaking assessments
WHY this speaking assessment?
Who wants to know what about whom for what purpose?

(McNamara, 1996)

WHAT SPEAKING SKILLS will be assessed?


What is essential? What is practical? What is (not)

needed?

Purposeful design
Create tasks and criteria that serve the defined need
Monitor quality

High quality implementation (administration, scoring)

Responsible score use


Monitor, report, evaluate how well the assessment works

The Cycle of Speaking Assessment


Formal Assessment (Testing)

Score
use

Score
need

The Cycle of Speaking Assessment


Formal Assessment (Testing)
Purpose
Design
Specs

Scores

Score
use

Score
need
QA/QC
Design
System Development

Performances
Criteria

Tasks
Criteria
Instructions

The Cycle of Speaking Assessment


Formal Assessment (Testing)
Purpose
Design
Specs

Scores
Raters
Score
use

Score
need
QA/QC
Design

Performances

Rating/Evaluation

Performances
Criteria

Criteria

Tasks

System Development
Tasks
Criteria
Instructions

Examiner(s)
Test taker(s)
Administration/Performance

Values: Coherence
Everything fits together
WHO wants to know WHAT about WHOM for WHAT
PURPOSE? (McNamara 1996)
WHY do they want to know it?
HOW will the scores be used?
All actors during the process know enough to do their

job well so that their work supports test quality


The actual actions follow the design: test taking
process, scoring process, score use

Once you decide how to define speaking in a

particular assessment situation, it affects


everything in the assessment
Setting
Participants
Tasks
Criteria
Scores and score reporting

Fundamental considerations
in creating good speaking assessments
WHY this speaking assessment?
Who wants to know what about whom for what purpose?

(McNamara, 1996)

WHAT SPEAKING SKILLS will be assessed?


What is essential? What is practical? What is (not)

needed?

Purposeful design
Create tasks and criteria that serve the defined need
Monitor quality

High quality implementation (administration, scoring)

Responsible score use


Monitor, report, evaluate how well the assessment works

What is speaking?
Sound/Pronunciation
Speed & pausing
Stress & intonation
Variation in pitch & volume
Words/Vocabulary
Specific & generic words (splendid/fine; saunter/go)
Fixed phrases & fillers (what a nice thing to say; I mean)
Grammar/Structures
Sentences and idea units
Topic-comment structure & conversation structure
Fluency

Pronunciation
Speed, pausing, pitch, volume: flow, communication
Rhythm, stress, intonation: sequences of sounds,

meaning
Individual sounds: identification, accuracy
Accuracy and comprehensibility
Is it possible to define a gold standard?
At least, a criterion for success for this test
Pronunciation is part of identity & personality, so

there are limits to justifiable scoring criteria

Vocabulary in speaking
Breadth and depth of vocabulary
Naturalness, appropriateness for speech
Precise, well-chosen words and expressions
Simple, ordinary words, generic words
this one/that one, thing, do, go, fine, good
Fixed phrases, fillers
I thought youd never ask; What a nice thing to say
Accuracy, comprehensibility, and naturalness

Grammar in speaking
Speech consists of idea units
Short phrases/clauses
Strung together with and, or, but
Unplanned and planned speech
Continuum from written-like to spoken-like grammar
Fractured sentences, topicalization
hes quite a comic, that fellow, you know
Accuracy, comprehensibility, and communicative

effectiveness

Fluency
= proficiency

speed, pausing

Fluency
= proficiency
Fillers, speech particles
(Hasselgreen, 1998; discourse analysts)
flow or smoothness
rate of speech
absence of excessive pausing
absence of disturbing hesitation markers
length of utterances
connectedness
(Koponen, 1995)

speed, pausing

Conceptual fluency
Presentation skills
Intelligence

Fluency
= proficiency
Fillers, speech particles
(Hasselgreen, 1998; discourse analysts)
flow or smoothness
rate of speech
absence of excessive pausing
absence of disturbing hesitation markers
length of utterances
connectedness
(Koponen, 1995)

speed, pausing

Implications: Tasks
Pronunciation
Probably not much
variation across tasks

Grammar
Unplanned vs.
planned tasks
Prepared speech

Vocabulary

Fluency

Can vary a lot by


topic and task
familiarity

Unplanned vs.
planned tasks
Comfort vs. anxiety

Implications: Criteria
Pronunciation
Clear range of levels
Low levels quite clearly
definable; top?

Grammar
Clear range of levels
Are top level descriptors
natural for speech?

Fluency
Vocabulary
Clear range of levels
Clear range of levels
Need to keep scale
Topic & task variation
focus constant, or at
least defensible, across
What is enough for
levels
higher levels?

Implications: Examiners & Raters


Examiners: awareness of

Raters: awareness of

Task goals

Rating criteria

Possible scores

Ways of connecting

Allowable, and possibly

performance features with


ratings
Restrictions of the task
setting (fair expectations)
Independence of different
analytical criteria (no halo)
Training

forbidden, ways of
prompting
Ways of getting around
construct-irrelevant task
difficulty
Practice, share experiences

Discourse skills (Co-construction)


Taking the other speaker(s) into account
Supportive speech moves
Paying attention, indicating interest, indicating

comprehension, agreeing
Picking up new information introduced by other and
making it the next theme in talk
Repeating own and others words and structures,
explaining links/coherence
Providing topic closure

Co-construction in speaking assessment


How is co-construction typically done in the target

language and culture?


How relevant is it to the tasks on the test?
Politeness
Cooperation

Power in the test situation

How much cultural variation is acceptable in the test?


How do we define the assessment criteria fairly?
Knowledge of language & culture vs. personality

Tasks, Language Functions, Action:


Task Completion as a Criterion
Functional views of language (1950s & 60s onwards)
U.S. Foreign Service Institute ILR scale
Threshold Level Common European Framework

Certain tasks are more predictable and easier linguistically

progression of task difficulty


sequencing of language learning materials in terms of topics,

words, structures, and increasingly complex texts and tasks

Task-based language learning


Authentic, meaningful tasks from the very beginning; criterion for

success is completing the task rather than language form


Scaffolding (support) is an integral part of activities
Weaker task-based task types: information gap, reasoning gap,
opinion gap

Implications: Tasks & Criteria


Range of tasks at each proficiency level is determined

by the theory-driven order of functions & topics


Criterion for success: STRONG performance tests
Success at completing the task using whatever means
and strategies available

Criterion for success: WEAK performance tests


Proficiency levels determined by the expected
progression of control of linguistic features

Types of speaking tasks


Structured
Reading aloud
Sentence repetition
Sentence completion
Factual short-answer
questions
Reacting to phrases

Short, predictable answers


Often recorded

(computer)
Often scored 0/1
Values: comparability,
control, scalability

Open-ended
Text types: describe,

narrate, instruct, compare,


explain, justify, predict,
decide
Role play tasks
Reacting in situations

Based on examinee

opinions or task materials


Any testing mode
Usually scored on a scale
Values: naturalness,
directness, interaction

Scoring criteria and scales


Important because they direct what is tested and what

is assessed
Scales: holistic/analytic, verbal/numerical
Verbally defined scales describe what the learner can
do, and how well
Concrete descriptors are easiest to apply, but may only
be applicable to a narrow selection of situations (CEFR)
Relationship between rating scales and score reporting
Single scores most useful for decision-making
Detailed score information most useful for diagnosis

and feedback

Rating scales
Holistic scales express an overall impression of a test

takers ability in one score


Analytic scales contain a limited number of criteria,
usually 3-5. Each criterion has descriptors for each level
of the scale.
Different rules for combining for an overall score, or not.

There is no single correct solution for rating scales;

only options for different situations


Number of levels, number of criteria, wording of criteria,

methods for arriving at a total score

Solutions depend on test purpose, construct definition,

implementation conditions, and intended score use

When tasks, scales, test takers,


examiners, and raters come
together

High quality implementation: two processes


Task(s)

Test taker(s) Examiner(s)

The performance
process

High quality implementation: two processes


Task(s)

The assessment
process
Criteria

Test taker(s) Examiner(s)

The performance
process

Performance(s)

Rater(s)

Ensuring quality in the process


Reliability
Validity
Fairness
Impact
Practicality

Challenges
& Solutions

Share information
Test takers
Teachers
Examiners
Raters
Score users
Decision makers
Train assessment

personnel
Monitor process