You are on page 1of 145

APHASIOLOGY

Volume 17 Number 5 May 2003


32nd Clinical Aphasiology Conference

Editor: Patrick J.Doyle

CONTENTS

Preface
Patrick J.Doyle

viii

Papers
Right hemisphere syndrome is in the eye of the beholder
Margaret Lehman Blake, Joseph R.Duffy, Connie A.Tompkins, and
Penelope S.Myers

A comparison of the relative effects of phonologic and semantic cueing


treatments
Julie L.Wambaugh

14

Measures of lexical diversity in aphasia


Heather Harris Wright, Stacy W.Silverman, and Marilyn Newhoff

26

Limb apraxia, pantomine, and lexical gesture in aphasic speakers:


Preliminary findings
Miranda Rose and Jacinta Douglas

39

Teaching self-cues: A treatment approach for verbal naming


Gayle DeDe, Diane Parris, and Gloria Waters

55

Functional measures of naming in aphasia: Word retrieval in


confrontation naming versus connected speech
Jamie F.Mayer and Laura L.Murray

77

Narrative and conversational discourse of adults with closed head


injuries and non-brain-injured adults: A discriminant analysis
Carl A.Coelho, Kathleen M.Youse, Karen N.Le, and Richard Feinn

99

Relationship between discourse and Western Aphasia Battery


performance in African Americans with aphasia
Hanna K.Ulatowska, Gloria Streit Olness, Robert T.Wertz, Agnes
M.Samson, Molly W.Keebler, and Karen E.Goins

114

The inter-rater reliability of the story retell procedure


William D.Hula, Malcolm R.McNeil, Patrick J.Doyle, Hillel
J.Rubinsky, and Tepanta R.D.Fossett

129

32nd Clinical Aphasiology Conference


Ridgedale, Missouri, May 31st to
June 4th, 2002

Editor
Patrick J.Doyle, Ph.D., Geriatric Research Education & Clinical Center, VA
Pittsburgh Healthcare System, and Department of Communication Science and
Disorders, University of Pittsburgh, Pittsburgh, PA, USA.
Associate Editors
Kirrie J.Ballard, Ph.D., Department of Speech Pathology and Audiology,
University of Iowa, Iowa City, IA, USA.
Annette Baumgaertner, Ph.D., Neurologische Universitaetsklinik, University
of Hamburg, Hamburg, Germany.
Mary Boyle, Ph.D., Department of Communication Sciences and Disorders,
Montclair State University, Upper Montclair, NJ, USA.
Carol Frattali, Ph.D., National Institutes of Health, Bethesda, MD, USA.
Michael E.Groher, Ph.D., Department of Communicative Disorders,
University of Florida Health Science Center, Gainesville, FL, USA.
Katherine Odell, Ph.D., Department of Communication, University of
Wisconsin, Madison, WI, USA.
Grace H.Park, Ph.D., National Institutes of Health, NIDCD, Language Section,
Bethesda, MD, USA.
Anastasia Raymer, Ph.D., Child Study Center, Old Dominion University,
Norfolk, VA, USA.
Linda Schuster, Ph.D., University of West Virginia, Morgantown, WV, USA.
Nina Simmons-Mackie, Ph.D., Department of Communication Science &
Disorders, Southeastern Louisiana University, Hammond, LA, USA.
Evy Visch-Brink, Ph.D., Department of Neuropsychology, Erasmus
University Rotterdam, Rotterdam, The Netherlands.

iv

Julie Wambaugh, Ph.D., Department of Communication Disorders, University


of Utah and VA Salt Lake City Healthcare System, Salt Lake City, UT, USA.

APHASIOLOGY

SUBSCRIPTION INFORMATION
Subscription rates to Volume 17, 2003 (12 issues) are as follows:
To individuals: UK 361.00; Rest of World $596.00
To institutions: UK 857.00; Rest of World $1414.00
A subscription to the print edition includes free access for any number of
concurrent users across a local area network to the online edition, ISSN 1464
5041.
Print subscriptions are also available to individual members of the British
Aphasiology Society (BAS), on application to the Society.
For a complete and up-to-date guide to Taylor & Francis Groups journals and
books publishing programmes, visit the Taylor and Francis website: http://
www.tandf.co.uk/
Aphasiology (USPS permit number 001413) is published monthly. The 2003
US Institutional subscription price is $1414.00. Periodicals postage paid at
Champlain, NY, by US Mail Agent IMS of New York, 100 Walnut Street,
Champlain, NY.
US Postmaster: Please send address changes to pAPH, PO Box 1518,
Champlain, NY 12919, USA.
Dollar rates apply to subscribers in all countries except the UK and the
Republic of Ireland where the pound sterling price applies. All subscriptions are
payable in advance and all rates include postage. Journals are sent by air to the
USA, Canada, Mexico, India, Japan and Australasia. Subscriptions are entered
on an annual basis, i.e. from January to December. Payment may be made by
sterling cheque, dollar cheque, international money order, National Giro, or
credit card (AMEX, VISA, Mastercard).

vi

Orders originating in the following territories should be sent direct to the local
distributor.
India Universal Subscription Agency Pvt. Ltd, 101102 Community Centre,
Malviya Nagar Extn, Post Bag No. 8, Saket, New Delhi 110017.
Japan Kinokuniya Company Ltd, Journal Department, PO Box 55, Chitose,
Tokyo 156.
USA, Canada and Mexico Psychology Press, a member of the Taylor &
Francis Group, 325 Chestnut St, Philadelphia, PA 19106, USA
UK and other territories Taylor & Francis Ltd, Rankine Road, Basingstoke,
Hampshire RG24 8PR.
The print edition of this journal is typeset by DP Photosetting, Aylesbury and
printed by Hobbs the Printer, Totton, Hants. The online edition of this journal is
hosted by Metapress at journalsonline.tandf.co.uk
Copyright 2003 Psychology Press Limited. All rights reserved. No part
of this publication may be reproduced, stored, transmitted or disseminated,
in any form, or by any means, without prior written permission from
Psychology Press Ltd, to whom all requests to reproduce copyright material
should be directed, in writing.
Psychology Press Ltd grants authorization for individuals to photocopy
copyright material for private research use, on the sole basis that requests for
such use are referred directly to the requestors local Reproduction Rights
Organization (RRO). In order to contact your local RRO, please contact:
International Federation of Reproduction Rights Organisations (IFRRO), rue
de Prince Royal, 87, B1050 Brussels, Belgium; email: ifrro@skynet.be
Copyright Clearance Centre Inc., 222 Rosewood Drive, Danvers, MA 01923,
USA; email: info@copyright.com Copyright Licensing Agency, 90 Tottenham
Court Road, London, W1P 0LP, UK; email: cla@cla.co.uk This authorization
does not extend to any other kind of copying, by any means, in any form, and for
any purpose other than private research use.

The Editors of Aphasiology and the Guest


Editors of the CAC issues are grateful to the
following people for reviewing papers for
the CAC issues during 2002:

Larry Boles, Ph.D.


Caterina Breitenstein, Ph.D.
Ruby Drew, Ph.D.
Donald Freed, Ph.D.
Margaret Greenwald, Ph.D.
Katrina Haley, Ph.D.
Brooke Hallowell, Ph.D.
Jackie Hinckley, Ph.D.
Stefan Kemeny, M.D.
Margaret Lemme, Ph.D.
Jamie Mayer, Ph.D.
Robert M.Miller, Ph.D.
Charlotte Mitchum, M.S.
Penelope Myers, Ph.D.
Mary Oeschlager, Ph.D.
Philippe Paquier, Ph.D.
Janet Patterson, Ph.D.
Scott Rubin, Ph.D.
Barry Slansky, Ph.D.

Preface

The papers that appear in this special edition of Aphasiology were selected based
upon their theoretical importance, clinical relevance, and scientific merit, from
among the many platform and poster presentations comprising the 32nd Annual
Clinical Aphasiology Conference held in Ridgedale, Missouri in June of 2002.
Each paper was peer-reviewed by the Editorial Consultants and Associate
Editors acknowledged herein consistent with the standards of Aphasiology and
the rigours of merit review that represent this indexed, archival journal.
Patrick J.Doyle, Ph. D.
VA Pittsburgh Healthcare System
Pittsburgh, PA, USA

2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html DOI: 10.1080/


02687030344000210

Right hemisphere syndrome is in the eye of


the beholder
Margaret Lehman Blake
Syracuse University, NY, USA
Joseph R.Duffy
Mayo Clinic, MN, USA
Connie A.Tompkins
University of Pittsburgh, PA, USA
Penelope S.Myers
Mayo Clinic, MN, USA
Background: Specific information about prevalence and patterns of
deficits associated with right hemisphere brain damage (RHD) is
incomplete. A recent large-scale study of inpatients in a United
States rehabilitation centre (Lehman Blake, Duffy, Myers &
Tompkins, 2002) provided initial estimates of deficit prevalence and
co-occurrence. The data obtained were based on information from
multiple medical disciplines, and may not adequately reflect the
typical caseload seen by US speech-language pathologists (SLP).
Differences in how professionals view RHD may influence whether
patients are appropriately referred for services.
Aims: The first aim was to evaluate whether prevalence and
patterns of deficits differ when diagnoses are made by SLPs versus
other disciplines. The second aim was to examine whether the
presence of certain deficits is associated with referrals to SLP.
Methods and Procedures: A retrospective chart review was
conducted examining medical records for 122 adults with RHD in an
inpatient rehabilitation unit. Diagnoses were obtained from speechlanguage pathology versus a group of other medical professionals,
including neurology/physiatry, neuropsychology, and occupational
therapy. Frequencies and cluster analyses were computed for both
groups of diagnosticians to examine differences between groups.
Relationships between performance on a screening measure of
mental status and cognitive/communicative diagnoses were
examined to determine if there were obvious connections between
specific disorders and referrals to SLP.

2 APHASIOLOGY

Outcomes and Results: Diagnoses of pragmatic and


communicative deficits were made more often by SLPs, while the
other professionals more often diagnosed deficits in attention,
visuoperception, and learning/memory. Moderate-strong correlations
between diagnoses from the two groups were obtained only for
deficits of attention, linguistics, and neglect. Referral to SLP was not
related to performance on a general mental status screening test.
Patients who presented with neglect, aprosodia, or deficits in
interpersonal interactions were more likely to be referred to SLP than
when these deficits were absent.
Conclusions: This study raises the question of how to ensure
appropriate referrals to SLP when referring professionals may not
always identify the communicative disorders exhibited by
individuals with RHD. A descriptive definition of right hemisphere
syndrome and a consistent set of terminology would facilitate
communication about right hemisphere deficits within and across
disciplines. A broader scope of referrals to SLP would increase the
number of patients who receive appropriate care for their cognitive
and communicative deficits.
Despite the well-known conventional descriptions of deficits associated with
right hemisphere brain damage (RHD), limited data are available regarding
specific deficits and patterns of deficits caused by RHD (e.g., Joanette & Goulet,
1994, Myers, 1999, Tompkins, 1995). A previous study (Lehman Blake et al.,
2002) evaluated the prevalence and patterns of co-occurrence of cognitive/
communicative deficits in a large retrospective sample. This was the first largescale exploration of right hemisphere syndrome as seen in a US inpatient
rehabilitation unit. Deficit categories were created to classify the large number of
diagnostic labels used in the medical charts (see Appendix A). Results from that

Address correspondence to: Margaret Lehman Blake PhD,


University of Houston, Communication Disorders, 100 Clinical
Research Center, Houston, TX 772046018, USA. Email:
mtblake@uh.edu Margaret Lehman Blake is now at the University of
Houston, TX, USA. The data for this project were collected while
the first author was a post-doctoral fellow in Speech Pathology at the
Mayo Clinic in Rochester, Minnesota, under the direction of the
second author. Thanks to Jacque Danielson for her assistance in
retrieving the medical records.
2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html DOI: 10.1080/


02687030344000120

RIGHT HEMISPHERE SYNDROME 3

study indicated that the most commonly diagnosed deficits were in attention,
neglect, visuoperception, and learning/memory. Additionally, the deficit
categories of calculation, hyperaffectivity, and linguistics were not closely
related to any of the other deficits evaluated.
The current study examined the same group of patients to explore some
questions that remained unanswered after the initial study. Deficit categories
analysed in the original study were based on diagnoses made by four disciplines
combined: neurology/physiatry, neuropsychology, occupational therapy (OT),
and speech-language pathology (SLP). Because diagnoses from all four
professions were used, the picture of right hemisphere syndrome described in the
previous study (Lehman Blake et al., 2002) may not be reflective of the typical
caseload of RHD patients seen by a US speech-language pathologist, because
SLPs and other professionals may not recognise or identify the same deficits.
Thus, the first aim was to evaluate whether prevalence and deficit patterns differ
when diagnoses are made by SLPs versus other disciplines. Differences between
disciplines may provide insight into how cognitive/communicative deficits are
perceived by various medical professionals. The previous study also indicated
that while 94% of cases exhibited at least one cognitive/communicative deficit,
only 44% were referred for an SLP evaluation. Thus, the second aim was to
examine which deficits are likely to lead to a referral to SLP.
METHOD
Inpatient medical records were reviewed for patients with RHD consecutively
admitted to a US inpatient rehabilitation unit over a 3year period. Diagnoses of
RHD were made by neurologists. For 88% of the cases, CT or MRI scans
confirmed the diagnosis. The initial list contained 246 cases. Seven of these were
excluded because the patients did not release their medical records for research
purposes. Another 117 cases were excluded due to incomplete charts, lesions
restricted to the cerebellum or brain stem, other neurological disease (e.g.,
dementia, Parkinsons disease), psychiatric disorder other than depression, and/
or bilateral cerebral lesions (see Lehman Blake et al., 2002, for complete details
of the chart review). This left a total of 122 cases available for group analyses.
Demographic and clinical data are provided in Table 1. Information about the
presence or absence of selected disorders and deficits was obtained from
inpatient neurology/ physiatry, neuropsychology, OT, and SLP reports. As
detailed in Lehman Blake et al. (2002), the long list of diagnostic labels obtained
from the medical charts was reduced to 14 deficit categories based on broad
traditional classifications (e.g., linguistics, attention,

4 APHASIOLOGY

TABLE 1 Demographic and clinical information for cases with lesions restricted to the
right hemisphere
Demographic and clinical
variables

All RHD cases (n=122)

Cases referred to SLP


(n=54)

Sex

71 male
51 female
68.6 (12.4)
1295

26 male
28 female
68.6 (12.6)
1594

12.0 (3.0)
720

12.0 (3.3)
720

87% right
5% left
1% ambidextrous
[7% missing]
86% RH stroke
14% other medical
condition*
81% no previous stroke
19% prior RH stroke

87% right
9% left
2% ambidextrous
[2% missing]
87% RH Stroke
13% other medical
condition*
83% no previous stroke
17% prior RH stroke

13.9 (26.6)
0240

13.7 (21.5)
1120

Age (years)
Mean (SD)
Range
Education (years)
Mean (SD)
Range
Handedness

Reason for hospital


admission
Presence of previous stroke
Number of days between
onset and admit to
rehabilitation unit
Mean (SD)
Range

RHD = right hemisphere brain damaged; SLP = speechlanguage pathology; RH =


right hemisphere.
* These represent patients who were admitted to the hospital for a medical condition
other than CVA, who either had a CVA while hospitalised (e.g., as a
complication), or who then received inpatient therapy for deficits resulting from
a CVA that occurred prior to the current admission.

learning, and memory), and other behavioural characteristics (e.g.,


hyporesponsive, hyperresponsive). Two of the authors independently classified
the labels. Initial agreement was 83%. Disagreements were resolved by
discussion. Appendix A contains descriptions and examples of these categories.
For each patient, a deficit category was considered present if one or more of the
labels within the category was reported by any one discipline. Aprosodia and
neglect were not merged into a deficit category, but were analysed as distinct
disorders. (Essentially each one was a category of its own.) In this paper, deficit
categories will be indicated by italics, while the separate deficits (aprosodia and
neglect) will be printed in regular font.

RIGHT HEMISPHERE SYNDROME 5

ANALYSES AND RESULTS


The subset of individuals evaluated by SLP (n=54) was used to compare
diagnoses made by SLP versus the other three disciplines. Results of frequency
analyses, provided in Table 2, indicate that for both groups of diagnosticians the
most commonly identified disorder was neglect. Following that, SLPs most
commonly diagnosed deficits in other cognitive deficits and hyporesponsivity. In
contrast, the other disciplines most often reported deficits in the categories
attention, visuoperception, and learning/memory. Further examination of the
results illustrates how the focus of a discipline affects diagnosis. Speech
pathologists, focusing on communication, diagnosed deficits in interpersonal
interactions in nearly 30% of patients, while other disciplines identified such
TABLE 2 Frequency of occurrence of deficits and deficit categories present in 54 patients,
diagnosed by SLPs or other medical professionals
Deficits and deficit
categories

Prevalence diagnosed by
neurology/physiatry,
neurophyschology and OT

Prevalence diagnosed by
SLP only

neglect
attention
perception
learning/memory
reasoning & problem
solving
other cognitive deficits
orientation
awareness
hyperresponsive
hyporesponsive
calculation
hypoaffective
linguistic
hyperaffective
aprosodia
interpersonal interactions

66.4%
63.9%
58.2%
58.2%
56.6%

53.7%
35.2%
27.8%
24.1%
37.0%

45.1%
40.2%
38.5%
36.1%
30.3%
28.7%
24.6%
21.3%
15.6%
12.3%
7.4%

42.6%
27.8%
27.8%
18.5%
38.9%
5.6%
18.5%
24.1%
7.4%
25.9%
29.6%

OT = occupational therapy; SLP = speechlanguage pathology.


Deficit categories are indicated by italics.

pragmatic deficits in only 13% of those same patients. Aprosodia also was
diagnosed twice as often by SLPs (26%) as by the other professionals (12%).
In order to examine differences in patterns of co-occurrence related to
disciplines, hierarchical cluster analyses (SPSS, 1999) were performed. A cluster
analysis is an exploratory tool that identifies related groups or clusters within a

6 APHASIOLOGY

body of data (Aldenderfer & Blashfield, 1984). The two categories that co-occur
most often are linked to form a cluster, and the linking continues until all categories
fit into a specified number of clusters. For the current purposes, clusters were
based on how often deficit categories co-occurred across the sample of cases. Six
clusters were specified based on findings from the previous study (Lehman
Blake et al., 2002). Analyses were conducted first on the data from SLP
diagnoses, then on diagnoses from the other disciplines combined. As shown in
Table 3, the affective deficits (hypoaffective and hyperaffective) separated into
their own clusters when diagnosed by either SLP or other professionals. This
result indicates that these deficit categories are relatively dissimilar to all others,
regardless of who makes the diagnosis. No other obvious patterns were
identified.
Phi correlation coefficients also were computed to evaluate similarities
between the diagnoses by SLPs versus other disciplines. Based on Cohens rule
of thumb for evaluating correlation coefficients (Cohen, 1988), moderate to high
correlations were obtained for diagnoses of linguistics (phi = .70), attention (phi
= .46), and neglect (phi = .42). Small correlations were obtained for all other
deficits and deficit categories (phi = .15 to .29), with the exception of learning/
memory (phi = .09).
To address the second aim, identifying which patients with RHD are referred
to SLP, chi-square cross-tabulation analyses (SPSS, 1999) were performed to
evaluate the association between referral to SLP and presence of deficits. First,
the relationship
TABLE 3 Results of cluster analyses for diagnoses by speech-language pathologists (a)
and other medical professionals (b)
Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

(a) Clusters of deficit categories based on diagnoses by speech-language pathologists


learning
visuopercep linguistic
hyperaffective hypoaffective attention
orientation
hyperrespo tion
awareness
interperson
hyporespons nsive
reasoning
al
ive
aprosodia
other
neglect
cognitive
calculation
(b) Clusters of deficit categories based on diagnoses by neurologists/physiatrists,
neuropsychologists, and occupational therapists
calculation
hyperrespo linguistic
hyperaffective hypoaffective attention
nsive
awareness
interperson
learning/
al
memory
visuopercept aprosodia
ion
hyporespons
ive

RIGHT HEMISPHERE SYNDROME 7

Cluster 1

Cluster 2

Cluster 3
orientation
reasoning
other
cognitive
neglect

Cluster 4

Cluster 5

Cluster 6

between presence of dysarthria and SLP referral was examined to see if a


majority of cases was referred based on that diagnosis. If most cases were
referred to SLP due to a motor speech disorder, then it would be difficult to
determine what cognitive or other communicative deficits might influence the
referral process. The entire group of 122 cases was included in these analyses,
using diagnoses of dysarthria from neurology, neuropsychology, and OT. The
data indicated that only 50% of the patients diagnosed with dysarthria were
referred for a SLP evaluation (phi = .09, p > .05). Given that presence of
dysarthria did not compel a SLP referral, cross tabulation procedures were
conducted to examine the relationship between the presence/absence of
cognitive/communicative deficits and referral to SLP. The results suggest that
the presence of deficits in visuoperception (phi = .18, p = .05), interpersonal
interactions (phi = .19, p = .04), neglect (phi = .29, p = .001), or aprosodia (phi = .
22, p = .02) were associated with SLP referrals more often than when these deficits
were absent. Although significant, all of these correlations are small.
A second analysis was conducted to further explore the second aim of the
study. Data from The Short Test of Mental Status (Kokmen, Naessens, &
Offord, 1987) were available for 63 of the RHD patients. This screening tool is
similar to the Mini Mental State Examination (Folstein, Folstein, & McHugh,
1975), and provides information about cognitive abilities such as orientation,
memory, language, attention, calculation, and visuoperception. A copy of the
screening is provided in Appendix B. A series of onetailed independent t-tests
was conducted to compare Kokmen mental state scores for patients who were
and were not referred for SLP evaluation. There was no difference in mean total
scores for these two groups (t = 0.26, p > .05). Examination of each subtest
indicated that patients who scored low on orientation were more likely to be
referred to SLP than those with higher scores (t = 1.8, p < .05). No group
differences were found for any other subtest score. This suggests that
neurologists did not base referrals to SLP solely on patients performance on this
screening. The small range of possible points per subtest (38 points) also may
have contributed to the nonsignificant results.
Another explanation for the nonsignificant results is that this screening tool is
not a valid measure of cognitive/communicative deficits. To address this
possibility, a second series of onetailed independent t-tests was conducted to
evaluate the relationship between this assessment and the deficit categories.
Means on the Kokmen subtests were compared in patients who did or did not
present with deficits in a corresponding category, using diagnoses from all four

8 APHASIOLOGY

disciplines. As shown in Table 4, relationships were found between orientation (t


= 2.98), recall (t = 2.39), and maths (t = 1.74) subtests and their corresponding
deficit categories (all p < .05). However, there was no difference in Kokmen
scores on the attention, construction, and abstraction subtests for patients who
were and were not diagnosed with deficits in the comparable categories (all p > .
05). These results suggest that for the more complex , multifaceted abilities, the
Kokmen screening and the deficit categories do not clearly capture the same
behavioural characteristics.
CONCLUSIONS AND CLINICAL IMPLICATIONS
This study examined a large group of adults with RHD and provides initial data
regarding how RHD syndrome is perceived by different medical professionals.
The results obtained must be interpreted with caution given the retrospective
nature of the study and the imprecision of the deficit classification scheme.
Additionally, the data were gathered
TABLE 4 T-test results of scores on the Short Test of Mental Status (Kokman et al.,
1987) for patients diagnosed with deficit categories present or absent
Kokmen subtestdeficit
category

orientationorientation*
absent
40
present
23
attentionattention
absent
22
present
41
recalllearning/memory*
absent
22
present
41
abstractionreasoning/prob.solving
absent
23
present
40
constructionvisuoperception
absent
9
present
20
mathscalculation*
42
absent
21
present

Means (SD)

7.7 (.72)
6.8 (1.7)

2.98

.002

6.0 (1.5)
6.1 (1.0)

0.16

.88

2.6 (1.4)
1.7 (1.4)

2.39

.010

2.5 (.90)
2.l (2.1)

1.35

.09

2.4 (1.4)
2.1 (1.2)

0.78

.22

2.5 (1.3)
1.9 (1.6)

1.74

.04

Present = deficit diagnosed as present by at least one of four disciplines; Absent = deficit
not present.
* Significantly different at p < .05.

RIGHT HEMISPHERE SYNDROME 9

from only one rehabilitation unit, and thus are influenced by sampling biases
present in that facility. Despite these limitations, broad clinical implications can
be drawn. One implication is that the characteristics of right hemisphere
syndrome may vary depending on who makes the diagnosis, as prevalence of
deficits may be a reflection of the biases of the professional conducting the
evaluation. There appears to be substantial overlap across disciplines in the
conceptualisation and recognition of attention, neglect, and linguistic deficits,
but much diversity across disciplines for other cognitive/communicative
disorders. The important question is not who is right? about the deficits that
occur after RHD; the data suggest that different professionals focus on different
deficits, which is appropriate given the training and expertise that characterise
various professions. The relevant question that arises from this study is how can
we ensure that patients with RHD are appropriately referred to SLP when they
exhibit deficits that are not consistently recognised by those professionals who
make such referrals?.
There is no obvious explanation for which patients are referred to SLP. The
presence of some deficits (e.g., interpersonal interactions, aprosodia, neglect)
was associated with SLP referrals. This suggests that when other professionals
do identify communicative disorders (pragmatic deficits and aprosodia), they
refer those patients to SLP. However, the frequency analyses indicated that
neurologists, neuropsychologists, and OTs do not consistently identify
communicative deficits, or may not be as stringent in judging aspects of
communication, and thus many appropriate referrals are missed. Performance on
a general mental status screening test was not meaningfully related to referrals or
to higher-level deficit categories, and thus does not add much information about
how referral decisions are made. Several factors not taken into account here
include experience of the referring neurologist/physiatrist, and individual
referring preference. For example, some physicians are more likely to refer to
SLPs due to their approach to referring in general, without regard for patients
specific deficits.
As discussed in the initial study (Lehman Blake et al., 2002), one important
weakness with current practices of diagnosis and treatment of adults with RHD
is the absence of a definition of right hemisphere syndrome. This study suggests
that different disciplines have their own criteria or expectations regarding what
deficits may occur after RHD, likely based on their professional expertise. While
it is appropriate that different disciplines focus on different disorders, some
patients may not receive proper referrals if deficits that can be treated by one
discipline are not recognised by another. Related to this problem is the lack of
consistent terminology, both within and across disciplines. A descriptive
definition of the deficits associated with RHD would benefit our discipline and
would be a step towards developing criteria for other disciplines to use when
making decisions about referral for SLP evaluation and management. Of course,
terminology or definitions alone cannot solve the problems associated with
diagnosis and treatment of right hemisphere syndrome, and it may be impossible

10 APHASIOLOGY

to develop a standard set of terms that is used consistently across disciplines.


Additionally, even with an official diagnostic label, referrals may not always
be forthcoming. For example, in this study only 50% of individuals diagnosed
with dysarthria were referred to SLP. Perhaps the best solution to the current
referral problem is to urge that all patients admitted to a rehabilitation unit with a
cerebral lesion should be referred to SLP. This practice would most definitely
increase the rate of identifying cognitive and communicative deficits (not only
those associated solely with RHD), although it also would presumably increase
the number of evaluations in which such disorders are not identified. Open and
active communication within the SLP community and between SLP clinicians
and other medical professionals is needed to find an optimal solution that ensures
that patients receive the best care possible without being submitted to undue
examinations.
REFERENCES
Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis (pp. 6274). Beverly
Hills, CA: Sage.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).
Hillsdale, NJ: Lawrence Erlbaum Associates Inc.
Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). Mini Mental State: A practical
method for grading the cognitive state of patients for the clinician. Journal of
Psychiatric Research, 12, 189198.
Joanette, Y., & Goulet, P. (1994). Right hemisphere and verbal communication:
Conceptual, methodological, and clinical issues. Clinical Aphasiology, 22, 123.
Kokmen, E., Naessens, J. M., & Offord, K. P. (1987). A short test of mental status:
Description and preliminary results. Mayo Clinic Proceedings, 62, 281288.
Lehman Blake, M., Duffy, J. R., Myers, P. S., & Tompkins, C. A. (2002). Prevalence and
patterns of right hemisphere cognitive/communicative deficits: Retrospective data
from an inpatient rehabilitation unit. Aphasiology, 16, 537547.
Myers, P. S. (1999). Right hemisphere damage: Disorders of communication and
cognition, San Diego, CA: Singular.
SPSS. (1999). SPSS Base 10.0 users guide. Chicago: SPSS Inc.
Tompkins, C. A. (1995). Right hemisphere communication disorders: Theory and
management. San Diego, CA: Singular.

APPENDIX A
Deficits and deficit categories defined by Lehman Blake et al. (2002)
Category

Description

Illustrative labels
encompassed under
category

Hyperaffective

heightened affective
response

labile, pseudobulbar effect,


hallucinations

RIGHT HEMISPHERE SYNDROME 11

Category

Description

Illustrative labels
encompassed under
category

Hypoaffective

dampened or restricted
affective response
ability to focus on stimuli;
includes focused,
sustained, and divided
attention
awareness of, or insight into
deficits and consquences of
the deficits
ability to learn and retain
new information
visual and tactile perception
and construction

flat affect

Attention

Awareness

Learning/Memory
Perception

Hyperresponsive

Hyporesponsive

heightened responsivity to
stimuli verbosity, talkative,
tangential,
dampened or restricted
responsivity to stimuli

Linguistic

basic expressive and


receptive language
functions

Orientation

orientation to self, time,


situation

Reasoning & Problem


solving

cognitive skills associated


with identifying problems,
identifying relevant
information and
appropriate solutions, and
goal achievement

Other Cognitive Deficits

cognitive skills associated


with organising,
sequencing, categorising,
and integrating information

Calculation

mathematical skills

Attention, concentration,
distractible

insight, awareness, refusal,


denial
learning, memory
visuoperception (includes
perception &
construction),
agraphesthesia
impulsive, disinhibition

paucity of speech, slow


responses, poor initiation,
unelaborated speech
aphasia or other language
deficits, auditory
comprehension, anomia,
paraphasias
orientation, confusion,
confabulation, right/left
orientation
problem solving, verbal
reasoning, planning,
executive function, mental
flexibility, abstraction,
inferencing, higher
cognitive deficits,
perseveration, detail
oriented
organisation, sorting,
sequencing, integration,
cognitive deficits, slow
processing, vague speech,
poor details
calculation, money
handling

12 APHASIOLOGY

Category

Description

Illustrative labels
encompassed under
category

Interpersonal Interactions

behavioural aspects of
interpersonal
communication

eye contact, humour,


inappropriate pragmatics,
overpersonalisation
aprosodia
visuospatial, hemispatial,
or leftsided neglect

aprosodia
visuospatial neglect

APPENDIX B
The Short Test of Mental Status (Kokmen et al., 1987)

Orientation (8 points)
Full name, day, date, month, year, address, city, building name (1 point per
item)
Attention: forward digit span (7 points)
Repeat a string of numbers, starting with five digits, increasing to seven
Score is number of digits correctly repeated
Learning (4 points)
Patient repeats four words after all are presented (apple, Mr. Johnson, charity,
tunnel)
Examiner can repeat the words up to four times if needed for patient to learn
all of them.
Score is the total number of words, minus number of trials needed if more than
one
(e.g., if patient requires two trials, then score = 3; if patient requires only one
trial, score = 4)
Calculation (4 points)
multiply 5 by 13
substract 7 from 65
divide 58 by 2
add 11 and 29
Abstraction: similarities (3 points)
orange/banana
horse/dog
table/bookcase
Information (4 points)
current president
first president
number of weeks in a year
define the word island
Construction (4 points)

RIGHT HEMISPHERE SYNDROME 13

draw the face of a clock, showing the time of 11:15


copy a 3D cube
points (per picture): adequate drawing = 2; incomplete = 1; inability to
perform task = 0
Recall (4 points)
Recall the four words presented earlier in the Learning task
Total score possible = 38 points
Mean for normally ageing adults (average age = 51.5) = 33.1 (SD = 3.0)

A comparison of the relative effects of


phonologic and semantic cueing treatments
Julie L.Wambaugh
VA Salt Lake City Healthcare System and University of Utah, USA

Background: Lexical retrieval problems are pervasive in aphasia and


are often an important focus of treatment. Although many treatments
have been demonstrated to positively impact lexical retrieval in
aphasia, comparisons of such treatments have been relatively rare.
Aims: The purpose of this investigation was to compare the
relative effects of two lexical retrieval cueing treatments when
administered concurrently with a participant with chronic anomic
aphasia. The cueing treatments, phonological cueing treatment
(PCT) and semantic cueing treatment (SCT) were designed to target
the lexical phonologic and lexical semantic levels of processing,
respectively.
Methods & Procedures: The participant received both treatments
concomitantly in the context of an alternating treatments design and
multiple baseline design across behaviours. Separate lists of words
were assigned to each treatment and additional word lists were
designated for generalisation assessment. Following achievement of
criterion levels of performance, each treatment was then applied to
the additional lists in order to attempt to replicate treatment effects.
Outcomes & Results: The participant showed a positive response
to both treatments. However, he achieved higher levels of accuracy of
naming for items treated with SCT. This effect was observed in both
phases of treatment application.
Conclusions: For this participant, SCT appeared to be the
preferred treatment, at least in the context of concurrent
administration of the treatments. This preferential response may be
related to a pretreatment pattern of responding in which the

COMPARISON OF CUEING TREATMENTS 15

participant routinely used descriptions and semantically related


sentence cues to attempt to retrieve words.
The development and evaluation of effective treatments for wordfinding
deficits continues to be an important issue in the remediation of aphasia. Recent
trends have consistently reflected a movement towards modelbased treatments
designed to target specific levels of lexical retrieval processing. In general, process
oriented treatments have been shown to result in positive increases in word
finding behaviours (Nickels & Best, 1996). However, evidence has rarely been
presented to indicate that the participant(s) who received processoriented
treatment did not or would not respond positively to an alternative treatment. As
with most aphasia treatments, there is little research comparing types of word
retrieval treatments within or across participants.
Early and influential research by Howard, Patterson, Franklin, Orchard-Lisle,
and Morton (1985) indicated that semantically oriented therapy may have more
robust effects than phonologically oriented therapy. However, several
subsequent noncomparative investigations have indicated that phonological
approaches may produce more lasting effects than had previously been predicted
(Davis & Pring, 1991; Miceli, Amitrano, Capasso, & Carramazza, 1996; Raymer,
Thompson, Jacobs, & LeGrand, 1993). Findings from a few recent studies that
have examined both lexical semantic and lexical phonological treatments suggest
that each may produce positive effects.
Visch-Brink, Doesborgh, van Harskamp, Bippel, Koudstaal, and van de
SandtLoenderman (2002) compared the effects of lexical semantic and lexical
phonological therapy across two groups of patients with aphasia, with 58 patients
randomly assigned to the treatments. Therapists were provided with a variety of
tasks that fell within each general category (i.e., semantic or phonological). The
investigators designated the type of therapy each patient received, but allowed
the therapists to utilise tasks within the specified category at their own
discretion. Each patient received 40 to 60 hours of therapy applied between 3 to
12 months post-onset. Progress on the ANELT (Blomert, 1992) was measured

Address correspondence to: Julie L.Wambaugh, Department of Communication


Sciences and Disorders, Rm. 1201, 390 South 1530 East University of Utah, Salt
Lake City, Utah 84112, USA. Email: julie.wambaugh@health.utah.edu Thanks
are extended to Aida Martinez, Michelene KalinyakFliszar, and Michele
Allegre for their assistance with this project. This research was supported by
Rehabilitation Research and Development, Department of Veterans Affairs.
2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000085

DOI:

10.1080/

16 APHASIOLOGY

and results revealed no significant differences between groups/therapy


conditions.
Wambaugh and colleagues (Wambaugh, Doyle, Martinez, & Kalinyak-Fliszar,
2002; Wambaugh, Linebaugh, Doyle, Martinez, Kalinyak-Fliszar, & Spencer,
2001; Wambaugh, Linebaugh, Doyle, Spencer, & Kalinyak-Fliszar, 1999) have
evaluated the effects of two cueing treatments for word-finding deficits in
aphasia using single-case experimental designs. The treatments, phonological
cueing treatment (PCT) and semantic cueing treatment (SCT), were designed to
target the lexical phonologic and lexical semantic levels of treatment,
respectively. Findings suggested that targeting treatment at one or the other of
these levels of lexical processing may not be critical for many patients with
aphasia in terms of the effects of PCT and SCT. That is, most patients responded
positively to both treatments. Unfortunately, the multiple baseline designs
employed with PCT and SCT to date have not allowed for direct comparison of
the treatments.
The purpose of the present investigation, therefore, was to further examine the
effects of SCT and PCT by comparing the treatments within an individual
speaker with aphasia.
METHOD
Participant
The participant in this investigation was a 44year-old male speaker with chronic
anomic aphasia who was 15 months post-onset of a single, left-hemisphere,
thromboembolic stroke. He was premorbidly right-handed, had completed 14
years of formal education, and had worked as an electrician prior to his stroke.
Pretreatment assessment results are shown in Table 1. The participant exhibited
moderate word-finding difficulties that appeared to be predominantly semantic in
nature, as indicated by lexical retrieval assessments and a word-retrieval error
analysis. As seen in Table 1, the majority of the participants errors took the form
of semantic paraphasias (37% of all errors), with an additional 13% of errors
being mixed, semanticphonemic paraphasias. He exhibited few nonmixed,
phonemic paraphasias (7% of all errors). The participant provided the response
of I dont know for a relatively large percentage of items (24% of all error items).
Comprehension testing for error items on the TAAWF (German, 1990) revealed
accurate verbal label to picture matching, indicating sufficient
TABLE 1 Pretreatment assessment results
Measure

Score or result

Western Aphasia Battery (Kertesz, 1982)


(administered 2 months prior to study)
Aphasia Quotient (100 possible)

79

COMPARISON OF CUEING TREATMENTS 17

Measure

Score or result

Aphasia Classification
Subtests (AQ totals)
Spontaneous Speech
Comprehension
Repetition
Naming
Test of Adolescent/Adult Word Finding (German, 1990)
Total raw score (107 possible)
Subtests
Picture Naming: Nouns
Sentence Completion Naming
Description Naming
Picture Naming: Verbs
Category Naming
Spoken WordPicture Matching
(Comprehension) (88 possible)
Category Sorting (informal assessment)
Total items correct (7 categories, 49 possible correct)
PALPA Subtests (Kay, Lesser, & Coltheart, 1992)
Auditory synonym judgements (#49)
High imageability (30 possible)
Low imageability (30 possible)
Word semantic association (printed stimuli, #51)
High imageability (15 possible)
Low imageability (15 possible)
Confrontation Naming of Objects: Snodgrass and Vandenvart (1980)
items
(1st administration)
Total items correct (260 possible)
Error analysis (71 errors)
Semantic paraphasias
Phonemic paraphasias
Mixed semantic & phonemic paraphasias
Gestural response
Unrelated, realword response
Neologism
Perseverative response
No response
Calculated as percentage of total errors.

Anomic
16.5
9.5
6.8
6.7
19 (18%)
4
4
1
2
8
88 (100%)
49 (100%)

28 (93%)
21 (70%)
15 (100%)
15 (100%)

189 (73%)
26 (37%)*
5 (7%)*
9 (13%)*
3 (4%)*
5 (7%)*
5 (7%)*
1 (1%)*
17 (24%)*

18 APHASIOLOGY

semantic information to make such judgements. Similarly, he performed


accurately with category sorting and semantic association tasks. However, the
participant exhibited difficulty in making auditory synonym judgements. In light
of his performance on these tasks and the types of errors exhibited, the participants
lexical retrieval difficulties seemed to be predominately semantic in nature, with
the likelihood of some co-existing phonologiclevel disruptions.
Experimental stimuli
The participant was asked to name a set of 260 line drawings depicting objects
(Snodgrass & Vanderwart, 1980) twice on two separate occasions. Performance
on the two administrations of the 260 items was used as the basis for selection of
the experimental stimuli. Specifically, items that were selected were those that the
participant has missed on both naming occasions. Four sets of stimuli, of 12 items
each, were individually selected (see Appendix A). The participants stimuli sets
were matched as closely as possible for frequency of occurrence and relative
difficulty during baseline testing.
The two lists with the most similar and stable baseline naming performance
were selected for initial application of treatment and the remaining lists were
designated for generalisation assessment and for secondary application of
treatment.
Treatments
The treatments studied in this investigation, SCT and PCT, were hierarchical
cueing treatments that were designed to be similar to each other in terms of
general application of treatment. Each treatment was comprised of a
prestimulation phase and a traditional cueing hierarchy (Patterson, 2001). SCT
and PCT both began with a prestimulation phase in which the target item was
presented with three picture foils and the participant was asked to point to the
picture that corresponded to either a description (SCT) or nonword rhyme (PCT)
(see Appendix B for treatment descriptions).
Following the prestimulation phases, the cueing treatments were applied. Both
cueing hierarchies were composed of five levels of cueing that were response
contingent (i.e., the cues were applied only upon an incorrect response). With
both hierarchies, the successive cues became increasingly powerful in terms of
eliciting the target response. Upon elicitation of a correct response, the cueing
hierarchies were applied in reverse order, beginning with the level of cue that
preceded the correct response.
For both treatments, each of the pictures designated for treatment was
presented individually, in random order. One presentation of each of the 12
pictures constituted a treatment trial. The participant completed three trials per
treatment session.

COMPARISON OF CUEING TREATMENTS 19

Treatment was conducted until 100% accuracy of naming was achieved on


two of three consecutive probes for at least one of the treated lists, or until 20
treatment applications were conducted for both treatments.
Experimental design
An alternating treatments design (ATD) was employed in combination with a
multiple baseline design across behaviours. PCT and SCT were applied to two
word lists while the remaining lists remained untreated. Following achievement
of probe performance criterion, the treatments were applied to the remaining two
word lists.
As indicated previously, the two word lists for which performance was the
most similar and stable over three baseline probes were selected for initial
application of treatment. PCT and SCT were then randomly assigned to those
lists. Treatments were applied concurrently to the word lists. Specifically, on
each day that a participant received treatment, one treatment was applied, a rest
period of 1020 minutes was provided, and then the other treatment was applied.
Treatments were alternated (across days) in keeping with design constraints.
Baseline phase. During baseline probes, the experimental picture stimuli (i.e.,
four sets of pictures) were presented in random order. The participant was
instructed to name each picture to the best of his/her ability and a 15second
response interval was provided. Each final response was scored according to a
multidimensional scoring system. A binary scoring system was used for the
purposes of graphing.
Treatment phases. Treatment was conducted two to three times per week.
Probes, identical to those administered in baseline, were conducted immediately
prior to the start of each treatment session. Items in lists that were not currently
undergoing treatment were scheduled for probing every fourth session.
Maintenance phase and follow-up phases. Following completion of treatment
with Lists 1 and 2, maintenance probing of items in those lists continued during
treatment of Lists 3 and 4. Follow-up probes with all items were conducted at 2and 6-week intervals following the completion of treatment.
RESULTS
The percentage of items named correctly by the participant in probe sessions is
depicted in Figure 1. The top graph shows responses to items in Sets 1 and 2 and
the bottom graph shows responses to items in Sets 3 and 4. The participant
named only one item correctly per probe for both Sets 1 and 2 during baseline
for an accuracy level of 8% for each. He correctly named two to three items for
Sets 3 and 4 during baseline (accuracy levels ranging from 17% to 25%).
Following application of PCT to Set 1 items and SCT to Set 2 items, correct
responses increased for both sets of items. The participant achieved criterion for

20 APHASIOLOGY

SCT items following five treatment sessions and reached 83% correct responding
for PCT items.
The participants responses to untrained items (Sets 3 and 4) remained relatively
stable during the initial training phase. Specifically, there was an increase of one
additional item named correctly for the SCT set in comparison to the maximum
baseline level (i.e., an increase from 25% correct to 33% correct). Because of
this increase, additional probing was conducted to establish stability of
responding. Two additional probes (see sessions 9 and 10 on the lower graph)
indicated relative stability at 33% correct for Sets 3 and 4.
After the termination of treatment with Sets 1 and 2 and the extended probing
of Sets 3 and 4, treatment was then extended to the untreated sets: PCT was
applied to Set 3 and SCT was applied to Set 4. Increases in correct responses
were seen for both sets. Criterion was reached for Set 3 (SCT) after nine
treatment sessions. Correct naming of Set 4 (PCT) items reached 75%.
Probes of performance with Set 1 and 2 items during treatment of Sets 3 and 4
revealed initially strong maintenance (i.e., 100% and 83% accuracy for SCT and
PCT items, respectively) followed by a reduction in accuracy (i.e., 67% and 58%
accuracy for SCT and PCT, respectively). However, follow-up probing at 2 and 6
weeks following cessation of all treatment indicated maintenance of trained
behaviours at levels that approximated treatment probe performance for all sets:
PCT #1 (Set 1) = 83% and 75%; SCT #1 (Set 2) = 83% and 92%; PCT #2 (Set 3)
= 75% and 75%; SCT #2 (Set 4) = 92% and 92%.
CONCLUSIONS
The results of this investigation are in accord with findings by VischBrink et
al. (2002) and Wambaugh et al. (2001) in that both treatments produced positive
changes with this participant. Although the participant displayed superior
performance with SCT, he did respond positively to PCT as well.
The participant achieved higher levels of accuracy of naming with SCT for
two treatment comparisons (i.e., the ATD was replicated within the participant).
For both treatment comparisons, he correctly named SCT items at levels that
were approximately 20% higher than PCT items. This difference also remained at
6 weeks posttreatment. His greater success with SCT may be related to word
retrieval behaviours that were observed prior to the start of treatment. That is, he
often spontaneously used semantically related sentence cues and descriptions to
facilitate word retrieval. It is unknown whether this self-cueing strategy was selfinitiated or was a result of previous therapy. In either case, the participant may
have been predisposed to favour semantic cues. It is also possible that SCT had a
more facilitative impact than PCT in effecting accurate lexical processing.
It should be noted that the participant received a limited number of treatment
sessions during both treatment phases (i.e., five applications of each treatment
during the first treatment phase and nine applications during the second phase).
Additional treatment sessions may have resulted in increased levels of accuracy

COMPARISON OF CUEING TREATMENTS 21

Figure 1. Percentage of items named correctly in probes.

of responding to PCT items. That is, the maximal effects of PCT may not have
been observed with this participant.

22 APHASIOLOGY

The results of this investigation provide further support for the use of SCT and
PCT, in that both appear likely to be beneficial in promoting increased accuracy
of naming of trained items. Clinicians may consider the use of a period of trial
therapy in the form of an ATD to assist in treatment selection.
The use of an ATD to compare speech/language treatments is almost always
complicated by the issue of possible generalisation effects. Although the
measurement of additional, untreated behaviours is not a requisite in the
application of an ATD (Barlow & Hersen, 1984), the use of such measurements
may assist in the determination of the presence of potential generalisation
effects. However, in the case of treating a behaviour that may improve through
repeated exposure to probe stimuli (as in the case of word retrieval),
improvements in performance may be misinterpreted as generalisation. If
measuring untrained behaviours repeatedly, the investigator may wish to
measure other untreated behaviours at pre- and post-treatment intervals (i.e.,
limited repeated measurement) to compare the effects of repeated exposure on
untrained behaviours. If previous research has indicated that generalisation
effects can be expected to be minimal and the researchers interest in is the
relative differences of treatments being administered concurrently, the researcher
may chose to forgo the repeated measurement of untreated behaviours and utilise
a more traditional ATD. Regardless of design specifics, the replication of the
observed effects is recommended both within and across speakers to strengthen
internal and external validity, respectively.
REFERENCES
Barlow, D. H., & Hersen, M. (1984). Single case experimental designs: Strategies for
studying behavior change (2nd ed.). New York: Pergamon Press.
Blomert, L. (1992). The AmsterdamNijmegen Everyday Language Test (ANELT). In
N.Steinbuchel & D.Y. von Cramon (Eds.), Neuropsychological rehabilitation
(pp. 121127). Berlin: Springer Verlag.
Davis, A., & Pring, T. (1991). Therapy for word-finding deficits: More on the effects of
semantic and phonologic approaches to treatment with dysphasic patients.
Neuropsychological Rehabilitation, 1, 135145.
German, D. J. (1990). Test of adolescent/adult word finding. Austin, TX: Pro-Ed.
Howard, D., Patterson, K., Franklin, S., Orchard-Lisle, V., & Morton, J. (1985). Treatment
of word retrieval deficits in aphasia: A comparison of two methods. Brain, 108,
817829.
Kay, J., Lesser, R., & Coltheart, M. (1992). Psycholinguistic Assessment of Language
Processes in Aphasia (PALPA). Hove, UK: Lawrence Erlbaum Associates Ltd.
Kertesz, A. (1982). The Western Aphasia Battery. New York: Grune & Stratton.
Miceli, G., Amitrano, A., Capasso, R., & Caramazza, A. (1996). The treatment of anomia
resulting from output lexical damage: Analysis of two cases . Brain and Language,
52, 150174.
Nickels, L., & Best, W. (1996). Therapy for naming disorders (Part I): Principles,
puzzles, and progress. Aphasiology, 10,

COMPARISON OF CUEING TREATMENTS 23

Patterson, J. (2001). The effectiveness of cueing hierarchies as a treatment for word


retrieval impairment. ASHA Special Interest Division2 Newsletter, 11(2), 1117.
Raymer, A. M, Thompson, C. K., Jacobs, B., & LeGrand, H. R. (1993). Phonological
treatment of naming deficits in aphasia: Model-based generalization analysis.
Aphasiology, 7, 2753.
Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260 pictures: Norms for
name agreement, image agreement, familiarity, and complexity. Journal of
Experimental Psychology: Human Learning and Memory, 6, 174215.
Vishch-Brink, E., Doesborgh, S., van Harskamp, F., Bippel, D., Koudstaal, P., & van de
Sandt-Loenderman, M. (2002, June). The efficacy of lexical semantic therapy in
aphasia, a randomized controlled trial. Paper presented at the annual Clinical
Aphasiology Conference, Branson, MO.
Wambaugh, J. L., Doyle, P. J., Martinez, A. L., & Kalinyak-Fliszar, M. (2002). Effects of
two lexical retrieval cueing treatments on action naming in aphasia. Journal of
Rehabilitation Research and Development, 39(4), 455466.
Wambaugh, J. L., Linebaugh, C. W., Doyle, P. J., Martinez, A. L., Kalinyak-Fliszar, M.
M., & Spencer, K. A. (2001). Effects of two cueing treatments on lexical retrieval in
aphasic speakers with different levels of deficit. Aphasiology, 10/11, 933950.
Wambaugh, J. L., Linebaugh, C. W., Doyle, P. J., Spencer, K. A., & Kalinyak-Fliszar, M.
(1999). Effects of deficit-oriented treatments on lexical retrieval in a patient with
semantic and phonologic deficits. Brain and Language, 73, 446450.

APPENDIX A
EXPERIMENTAL STIMULI
List 1PCT #1

List 2SCT #1

List 3PCT #2

List 4SCT #2

axe
cannon
eagle
frog
ironing board
kettle
necklace
peanut
sled
suitcase
toothbrush
wheel

alligator
bowl
chisel
fence
lips
lobster
mitten
pumpkin
rollerskate
ruler
thread
toaster

basket
cigarette
couch
crown
donkey
garbage can
grapes
mushroom
saw
skirt
traffic light
wagon

bottle
bow
caterpillar
desk
doorknob
envelope
guitar
kangaroo
lettuce
pliers
tennis racket
windmill

24 APHASIOLOGY

APPENDIX B
DESCRIPTION OF TREATMENTS
Semantic Cueing Treatment (SCT)
Prestimulation. The target item was presented in picture form with three picture
foils (two semantically related, one unrelated). The examiner provided a verbal
phrase corresponding to the item and asked the participant to point to the correct
picture.
Cueing hierarchy. The application of the steps of the hierarchy was responsecontingent. The steps were applied sequentially until a correct naming response
was elicited. Then, the order of the steps was reversed, to elicit correct responses
at each of the preceding steps. In the event that an incorrect response occurred
during the hierarchy reversal, the order of hierarchy steps was again reversed
until a correct response was obtained.
(1) Picture of target item presented, naming response requested, verbal feedback
provided for correct or incorrect responses (78-second response time
allowedsame for following steps).
(2) Picture of target item presented along with a verbal description of target,
naming response requested, verbal feedback provided for correct or
incorrect responses (e.g., target = cow, a farm animal that gives milk).
(3) Picture of target item presented along with a semantically nonspecific
sentence completion phrase, naming response requested, verbal feedback
provided for correct or incorrect responses (e.g., The farmer fed the).
(4) Picture of target item presented along with a semantically loaded sentence
completion phrase, naming response requested, verbal feedback provided for
correct or incorrect responses (e.g., The farmer went to the barn to milk
the).
(5) Picture of target item presented along with verbal model of target word,
repetition of target word requested.
Phonologic cueing treatment (PCT)
Prestimulation. The target item was presented in picture form with three picture
foils (two phonetically related, one unrelated). The examiner provided a verbal
phrase corresponding to the item and asked the participant to point to the correct
picture.
Cueing hierarchy. The application of the steps of the hierarchy was the same
as above.
(1) Picture of target item presented, naming response requested, verbal feedback
provided for correct or incorrect responses (78-second response time
allowedsame for following steps).

COMPARISON OF CUEING TREATMENTS 25

(2) Picture of target item presented along with a verbal production of a non-real
word that rhymed with the target (e.g., target = pig, it rhymes with chig).
(3) Picture of target item presented along with a verbal first sound cue (e.g., it
starts with /p/).
(4) Picture of target item presented along with a sentence completion phrase
that included the rhyme and the sound cue, naming response requested,
verbal feedback provided for correct or incorrect responses (e.g., The name
of this picture rhymes with chig, it is a /p/).
(5) Picture of target item presented along with verbal model of target word,
repetition of target word requested.

Measures of lexical diversity in aphasia


Heather Harris Wright
University of Kentucky, USA
Stacy W.Silverman
University of Missouri-Columbia, USA
Marilyn Newhoff
San Diego State University, USA

Background: Important to the assessment of aphasia are analyses of


discourse production and, in particular, lexical diversity analyses of
verbal production of adults with aphasia. Previous researchers have
used type-token ratio (TTR) to measure conversational vocabulary in
adults with aphasia; however, this measure is known to be sensitive
to sample size, requiring that only samples of equivalent length be
compared. The number of different words (NDW) is another measure
of lexical diversity, but it also requires input samples of equivalent
length. An alternative to these measures, D, has been developed
(Malvern & Richards, 1997) to address this problem. D allows for
comparisons across samples of varying lengths.
Aims: The first objective of the current study was to examine the
relationships among three measures of productive vocabulary in
discourse for adults with aphasia: TTR, NDW, and D. The second
objective was to use these measures to determine in what ways, and
to what degree, they each can differentiate fluent and nonfluent
aphasia.
Methods & Procedures: Eighteen adults with aphasia participated
in this study (nine with nonfluent aphasia; nine with fluent aphasia).
Participants completed the Western Aphasia Battery (WAB) and
produced language samples consisting of conversation and picture
description. Samples were then subjected to the three lexical
diversity analyses.
Outcomes & Results: Results indicated that, although the
measures generally correlated with each other, adults with fluent
aphasia evidenced significantly higher D and NDW values than those

LEXICAL DIVERSITY IN APHASIA 27

with nonfluent aphasia when whole samples were subjected to


analyses. Once samples were truncated to 100- and 200-word
samples, groups differed significantly for all three measures.
Conclusions: These findings add further support to the notion that
because TTR and, although to a lesser extent, NDW are sensitive to
sample size, length differences across samples tend to confound
results. As an alternative to these measures, the use of D for the
measurement of conversational vocabulary of adults with aphasia
enables the analysis of entire language samples, so that discarding
language sample data is not necessary. In the present study, D values
differed for fluent and nonfluent aphasia samples.

Adults with aphasia present with word retrieval deficits during discourse
production. These deficits may present themselves in discourse through the
persons use of nonreferential terminology, pauses, filler terms, paraphasias, or
neologisms. Typically, adults with a nonfluent type of aphasia use pauses and
filler terms as they struggle with verbal output. By contrast, adults with a fluent
type of aphasia have little difficulty with verbal output, although they do produce
paraphasias and neologisms during verbal production.
Clearly, an important aspect to aphasia assessment is the analysis of discourse
production, especially given the fact that many of the above characteristics are
primarily detectable through the analysis of discourse. Several researchers have
assessed percent of information units provided by adults with aphasia when
stimuli are controlled (e.g., McNeil, Doyle, Fossett, Park, & Goda, 2001;
Nicholas & Brookshire, 1993). However, one important aspect of discourse that
has not been readily assessed in adults with aphasia is the lexical diversity of
their verbal production. Given the observation that many of the error types
observed in adults with aphasia appear to be, at least in part, lexical in nature, it
seems of particular importance to refine the tools we use in measuring aspects of
the lexical domain in discourse production.
One measure of lexical diversity in conversation has enjoyed particular
popularity in the child language literature for decades: type-token ratio (TTR).
TTR is a measure of conversational vocabulary and is defined as the ratio of the
total number of different words in a language sample to the total number of

Address correspondence to: Heather Harris Wright PhD, The University of


Kentucky, Division of Communication Disorders, CHS Building, 900 S.
Limestone, Lexington, KY 405360200, USA. Email: hhwrig2@uky.edu
2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html DOI:10.1080/
02687030344000166

28 APHASIOLOGY

words in the sample (Miller, 1981; Templin, 1957). Ratios closer to 0 reflect less
diversity of vocabulary, whereas values closer to 1.0 reflect greater diversity. As
was identified early on, however, TTR measurements are sensitive to sample size
variations; larger samples tend to yield lower TTR values than smaller samples
(Fillenbaum, Jones, & Wepman, 1961; Hess, Sefton, & Landry, 1986). Wachal
and Spreen (1973) used TTR and a variety of TTR-based alternatives (i.e., mean
segmental TTR, Johnson, 1944; bilogarithmic TTR, Chotlos, 1944; Herdan,
1960), designed to account for differences in sample sizes across participants, to
measure vocabulary diversity in adults with aphasia and their non-brain-damaged
counterparts. They found that adults with aphasia presented with less lexical
diversity in conversation compared to adults with no brain damage, and that
several of the measures, including the original TTR calculation, mean segmental
TTR , bilogarithmic TTR, and root TTR (Guiraud, 1959), significantly
differentiated the two groups. The concern about the sensitivity of TTR to sample
size variation is, although perhaps to a lesser extent, also valid for many of the
various transformations of TTR. For example, mean segmental TTR is the
average TTR for several consecutive, equal-length segments of the sample. This
would allow for comparisons of different sample lengths, as long as equivalent
sample segment sizes were used. However, since segment size must be
controlled, this measure is still dependent on sample size.
Other investigations involving the use of TTR as a measure of lexical diversity
in discourse production of adults with aphasia (e.g., Prins, Snow, & Wagenaar,
1978; Spreen & Wachal, 1973) have produced similar findings relative to the
productive vocabulary of adults with aphasia, but they highlight the central
weakness of TTR; its sensitivity to sample size variation. More recently, although
within the area of child language research, there have been some data to suggest
that TTR, when used in the diagnosis of specific language impairment, is not
sufficiently sensitive to separate the lexical performance of children with and
without language impairments (Watkins, Kelly, Harbers, & Hollis, 1995). In
particular, Watkins and colleagues found that, even when samples were truncated
to 50- and 100-utterance subsamples, TTR did not distinguish between the two
groups. It was only when samples were controlled for the number of words,
rather than utterances (i.e., calculating the number of different words occurring in
subsamples of 100 and 200 total words) that the measure differentiated between
the two groups. Thus, it appears rather critical to control sample size by
truncating samples to a common length. Given the findings of Watkins and
colleagues (1995), sample length should be determined by the number of words,
rather than the more common practice of determining length by the number of
utterances.
Relatedly, number of different words (NDW) has also been used to estimate
the diversity of conversational vocabulary across clinical populations (e.g.,
Ratner & Silverman, 2000; Watkins et al., 1995). Of importance, however,
although NDW has become the preferred measure in child language studies (e.g.,
Dollaghan et al., 1999; Goffman & Leonard, 2000), investigators who compute

LEXICAL DIVERSITY IN APHASIA 29

TTR on a standard number of words across samples (as opposed to a standard


number of utterances) are, in essence, computing NDW, because they are simply
dividing the number of different words in the samples by a common denominator
(e.g., 100 words). That is, TTR and NDW are perfectly correlated when samples
contain the same number of total words. Although NDW is also somewhat
sample-size sensitive, it is less so than TTR. As a given sample grows in length,
the probability that each consecutive word represents a new word (as opposed
to one that has already been produced in the sample) decreases. That is, as the
sample becomes longer, the NDW increases less. However, TTR does not yield
this same stability; as a sample increases, the probability of the numerator
(NDW) increasing with each new word decreases, but the denominator (total
number of words) increases with each item. From a computational standpoint,
then, NDW is not as affected by size variation as TTR. Despite its greater
stability, however, to date NDW has not been studied empirically as a measure of
lexical diversity with the language samples of adults with aphasia.
Recently, an alternative to these two measures, D, has been developed by
Malvern and Richards (1997) to address the problem of varying sample sizes. D
is a mathematical algorithm applied to TTR. As McKee, Malvern, and Richards
(2000, p. 323) noted:
The new measure is calculated by, first, randomly sampling words from
the transcript to produce a curve of the TTR against Tokens for the
empirical data. Then the software finds the best fit between this empirical
curve and theoretical curves calculated from the model by adjusting the
value of a parameter. The parameter, D), is shown to be a valid and reliable
measure of vocabulary diversity without the problems of sample size found
with previous methods.
D is computed as described above using the vocd utility of the CLAN language
analysis program (MacWhinney, 2000). D allows for the input of samples of any
size (greater than 50 words), and preliminary results have suggested that the
resulting D values are relatively stable despite size variations across samples.
McKee and colleagues (2000) found that D values obtained with split-half
samples were not significantly different from those obtained with whole samples
of children with normal language. Similar results have been reported by Silverman and Ratner (in press) in the measurement of conversational vocabulary in
children who stutter. Owen and Leonard (2002), in comparing diversity of
conversational vocabulary in children with and without specific language
impairment, concluded that, although sample-size effects could not be ruled out,
it was clear that D was less susceptible to size variation than TTR or NDW. The
validity of D has been evaluated using samples of normal (e.g., McKee et al.,
2000) and disordered language production in children (Owen & Leonard, 2002)
and samples of adults learning English as a second language (Malvern &

30 APHASIOLOGY

Richards, 1997), although to date there has been no known application to


language samples of individuals with aphasia.
Of additional interest, although measures of lexical diversity have been used to
examine lexical differences between the discourse of adults with and without
aphasia (Prins et al., 1978; Spreen & Wachal, 1973; Wachal & Spreen, 1973),
these measures are not known to have been applied to the differentiation of
individuals with fluent and nonfluent aphasia. Given the distinct verbal
production characteristics of adults with fluent and nonfluent types of aphasia,
and the greater ease of verbal output by adults with a fluent type of aphasia, it
would perhaps be expected that vocabulary diversity would differ between groups,
with adults with fluent aphasia possessing greater vocabulary diversity. On the
other hand, both groups are apt to present with word retrieval deficits that would
impact the diversity of vocabulary they demonstrate. Whether or not it is
reasonable to expect adults with fluent aphasia to demonstrate greater vocabulary
diversity in discourse than nonfluent counterparts, one would expect differences
in the lengths of samples obtained from adults with fluent and nonfluent aphasia
when time and/ or content are controlled. For this reason, a lexical measure that
allows for such variation in sample lengths could be beneficial as a tool for
assessment of lexical content across aphasia types.
The purpose of this study, then, was to examine the relationships among three
measures of productive vocabulary in discourse for adults with aphasia: TTR,
NDW, and D. Moreover, we sought to determine how these measures
differentiate fluent from nonfluent aphasia if, indeed, they did. Given that adults
with nonfluent aphasia tend to produce smaller language samples, in number of
words and number of utterances, than the fluent aphasia of their counterparts, we
were particularly interested in ascertaining whether this difference appeared to
impact results when whole language samples were analysed. We expected that
the measures, when used as they were designed (i.e, using whole samples for D,
restricting sample size for TTR and NDW), would differentiate the groups as
demonstrating nonfluent or fluent aphasia. Additionally, we expected that these
measures, each performed as originally intended, would be strongly correlated
with each other.
METHOD
Participants
A total of 23 adults with unilateral left brain damage subsequent to
cerebrovascular accident participated in the study. Once language samples were
transcribed and sample length was determined, five study participants data were
not included because these participants did not produce the minimum 200 words.
Data from nine adults with nonfluent aphasia (NF) and nine adults with fluent
aphasia (F) were included and subjected to analyses. Type and severity of

LEXICAL DIVERSITY IN APHASIA 31

aphasia were confirmed by performance on the Western Aphasia Battery (WAS)


(Kertesz, 1982). Aphasia quotients (AQ) were obtained for each participant. The
mean AQ for participants with nonfluent aphasia was 77.2 (SD = 5.7; range 67.0
81.0) and the mean AQ for participants with fluent aphasia was 85.8 (SD = 7.7;
range 71.094.0). Participants in the NF group scored 4 or lower on the fluency
portion of the WAB and participants in the F group scored 5 or higher.
Participants in the two aphasia groups were matched by their performance on the
auditory comprehension subtests of the WAB (NF group: X = 9.2, SD = 0.8; F
group: X = 9.3, SD = 0.7). Further, aphasia groups did not differ significantly in
age, t(8) = 1.13, p = .29, years of education, t(8) = 1.17, p = .28, WAB AQ, t(8) =
2.19, p = .06, or score on the auditory comprehension subtests, t(8) = 0.69, p = .
51. Table 1 shows demographic and clinical data for the individual participants.
TABLE 1 Demographic and clinical description data for the aphasia participants
Participant Age

Gender

Education M/p
(in
CVA1
years)

WAB
AQ2

And
comp3

Sample
size

4NF1

Female
Female
Female
Female
Female
Female
Male
Female
Female
Male
Male
Female
Female
Male
Male
Female
Female
Male

14
14
20
14
19
18
16
14
11
15
12
12
13
16
16
15
13
16

80.6
81.0
80.6
72.3
82.0
79.5
81.9
70.1
67.0
71.0
76.3
85.6
87.3
87.6
94.0
85.4
93.0
92.0

9.00
10.00
9.80
9.45
9.10
9.75
9.35
8.75
7.20
8.70
8.65
9.70
9.95
9.80
10.00
9.30
9.60
8.20

421
489
273
587
277
409
375
208
240
392
598
396
463
438
655
427
474
458

NF2
NF3
NF4
NF5
NF6
NF7
NF8
NF9
5F1
F2
F3
F4
F5
F6
F7
F8
F9
1

86
59
57
85
47
53
38
52
35
67
76
54
55
76
63
76
60
83

25
41
72
25
48
261
10
12
204
15
163
6
29
12
17
6
42
8

months since cerebrovascular accident; 2 Western Aphasia Battery aphasia quotient; 3


score on auditory comprehension snhtests of WAB: 4 nonfliient; 5 fluent

Language elicitation and transcription


Spontaneous conversation, supplemented by elicited conversation in response to
the WAB Picnic Scene, comprised the language sample for each participant.

32 APHASIOLOGY

The samples were audiorecorded, then transcribed and coded according to the
conventions of the Child Language Data Exchange System (CHILDES;
MacWhinney, 2000). The CHILDES system consists of a transcription protocol
(CHAT) and a series of language analysis programs (CLAN). Samples were first
transcribed verbatim and then coded for CLAN analysis. For inter-rater
agreement, the first author reviewed 22% of the audiotaped samples for
correspondence to the transcript. Word-by-word agreement was determined to be
98.5%. Revisions, direct repetitions, and fillers were coded for exclusion, so as
not to be counted in calculations of vocabulary diversity. This decision was made
because counting them essentially penalised participants who were more
disfluent, regardless of the lexical content of their language. In addition,
paraphasias that were recognisable English words were transcribed verbatim,
instead of attempting to discern intended lexical targets; that is, the transcriptions
are reflective only of the words and utterances actually produced.
Unrecognisable words and neologisms were coded for exclusion from the
transcripts. Each of the language samples obtained had at least 200 words; a
minimum sample length of 50 words is required to compute D. Sample size
ranged from 208 to 587 words for the NF group (X = 364.33; SD = 125.65) and
from 392 to 655 words (X = 477.89; SD = 89.92) for the F group; the F group
produced significantly more words than the NF group, t(8) = 2.63, p < .05.
Language analysis
Each sample was subjected to three measures of lexical diversity, each
performed by CLAN (MacWhinney, 2000): D, NDW, and TTR. Because TTR is
known to be sensitive to sample size variation, truncated samples of the middle
100 and 200 words were obtained, and each was subjected to the three analyses.
This procedure, similar to that of Watkins and colleagues (1995), allowed for a
more equitable comparison of TTR, NDW, and D. If these measures are each
performed as originally intended (restricting sample size for TTR and NDW, and
using whole samples for D), they should be strongly correlated with each other.
RESULTS
Relationships among D, NDW, and TTR
Pearson correlations were performed to determine the possible relationships
among D, NDW, and TTR. Because of the number of correlations performed, a
Bonferroni adjustment was applied to minimise the probability of Type I error.
Consequently, results with p values of < .001 were considered significant. As
shown in the correlation matrix in Table 2, when the samples were truncated to
100 and 200 words, the three measures correlated with each other for each
sample length. However, when whole samples were used, none of the three

LEXICAL DIVERSITY IN APHASIA 33

measures was significantly related to the others at the p < .001 level. In addition,
as shown in the correlation matrix, although each of the three D sample sizes
were significantly correlated with each other, this was not the case for the other
two measures; correlations between the three sample sizes were observed only
once for TTR (between TTR100 and TTR200) and twice for NDW (between
whole samples NDW and NDW200, and between NDW100 and NDW200).
Finally, the relationships among lexical diversity measures, when used as
intended, were assessed. In particular, the relationships among D whole
samples, NDW 100- and 200-words, and TTR 100and 200words were
evaluated. The results indicated that each of these correlations was significant at
the p < .001 level.
TABLE 2 Pearson correlation matrix for whole samples, 100word samples, and 200
word samples of NDW, TTR, and D

NDW
NOW 100
NDW 200
TTR
TTR 100
TTR 200
D
D 100
1

NDW1

NDW
100

NDW
200

TTR2

TTR
100

TTR
200

D 100

D 200

1.0

.66
1.0

.79*
.92*
1.0

.06
.46
.46
1.0

.66
1.0*
.92*
.46
1.0

.79*
.91*
1.0*
.44
.91*
1.0

.69
.77*
.81*
.56
.77*
.80*
1.0

.63
.96*
.86*
.48
.96*
.85*
.79*
1.0

.78*
.88*
.95*
.40
.88*
.95*
.86*
.86*

Number of different words; 2 type-token ratio; * indicates significant at p < .001.

Lexical diversity in adults with fluent vs nonfluent aphasia


In an attempt to determine the sensitivity of the measures to aphasia type, several
paired sample Mests were performed. Groups differed significantly in the
conversational vocabulary they produced when measured by D with whole
samples, t(8) = 2.69, p < .05, 100word samples, t(8) = 2.74, p < .05, and 200
word samples, t(8) = 3.55, p < .01. Groups also differed significantly for
NDWwith whole language samples t(8) = 3.07, p < .05, and when samples were
truncated to 100 words, t(8) = 2.44, p < .05, and 200 words, t(8) = 2.77, p < .05.
However, groups did not differ significantly for TTR with whole samples, t(8) =
0.65, p > .05, but did once samples were truncated to 100 words, t(8) = 2.45, p < .
05, and 200 words, t(8) = 2.71, p< .05. See Table 3 for means and standard
deviations for groups D, NDW, and TTR results.

34 APHASIOLOGY

DISCUSSION
The first objective of the investigation was to examine the relationships among D,
TTR and NDW across three sample lengths: whole sample, 100word, and 200
word. Since D is a relatively new measure of lexical diversity, one that had not
previously been used to assess the conversational vocabulary of adults with
aphasia, it was appropriate to determine the extent of the relationships among
this new measure and two other, wellestablished measures. Findings suggested
that the measures were all significantly correlated when samples were truncated
to 100 and 200 words, but that there were no significant relationships among any
of the measures when whole samples were used. This finding is likely
attributable to the fact that, in the present study, adults with fluent aphasia
produced (a) greater vocabulary diversity, but also (b) significantly longer
samples. Although longer samples should not impact D, longer samples would
be expected to negatively impact TTR while positively impacting NDW. With
adults with fluent and nonfluent aphasia presenting opposite patterns, then, it
appears there was a cancellation effect as a result of the uncontrolled length of
the samples.
It is clear from these data that, in general, the correlations are relatively
stronger, across analyses, for samples of the same length (e.g., greater for D100
and NDW100 than for NDW whole samples and NDW200). These stronger
correlations would appear to be a result of the fact that the same subset of
language sample data is used. Such
TABLE 3 D, NOW, and TTR means (standard deviations) for whole language samples,
100word samples, and 200word samples for fluent and nonfluent aphasia groups
Aphasia groups

D whole sample
D 100word sample
D 200word sample
NDW3 whole sample
NDW 100word sample
NDW 200word sample
TTR4 whole sample
TTR 100word sample
TTR 200word sample
1

NF1 Group (N = 9)

F2 Group (N = 9)

55.11 (19.35)
45.17 (15.82)
50.35 (20.04)
148.33 (43.93)
58.44 (6.46)
95.22 (13.06)
.41 (.05)
.58 (.06)
.48 (.08)

79.39 (12.14)
70.43 (19.12)
79.16 (11.03)
202.89 (29.86)
66.22 (5.09)
110.89 (7.98)
.43 (.04)
.66 (.05)
.56 (.04)

nonfluent; 2 fluent; 3 number of different words; 4 type-token ratio.

findings would seem to suggest that, as long as the decision is made to limit
samples to a particular length, any of the three analyses might be used to arrive
at similar conclusions about conversational vocabulary for adults with aphasia.

LEXICAL DIVERSITY IN APHASIA 35

The finding that the D values for each sample length were significantly related
seems to provide evidence of the stability of this measure of vocabulary
measurement. In contrast, TTR and NDW did not demonstrate this same
consistency of results across sample sizes. Rather, the significance of the
correlations with these measures varied (respectively) across sample sizes. As a
whole, this finding seems to highlight the issue of the sample-size sensitivity of
TTR and NDW, compared to the D. As discussed above, although NDW is also
somewhat sample-size sensitive, it is less so than TTR, the value of which
changes with every new word added to the sample.
In some sense, given sample sensitivity concerns, comparing TTR and NDW to
D using whole samples for each analysis is a rather unfair comparison, although
such analyses seemed to lead to a more complete understanding of both the
analyses and the lexical abilities of each group of adults with aphasia. Results of
the present study suggest that if each analysis is used as intended, with
equivalent samples for TTR and NDW and with whole samples for D, the
measures are each significantly related to one another, highlighting the
importance of using truncated sample data with TTR and NDW.
There is an equally important issue, however, related to the ecological validity
of using measures that require discarding language sample data. In collecting
language sample data, of course, there is an attempt to obtain as representative a
sample as possible. When language sample data are discarded because of the
constraints of a particular analysis, it is important to question whether the sample
is then less representative of the persons language abilities. In addition, arbitrary
decisions come into play, related to selecting the subset of words or utterances to
be included in the sample. In the present study, for example, we chose the middle
100 and 200 words for our truncated analyses; however, not all participants will
be at their best in the middle of the samples. Due to fatigue and/or frustration,
some might perform better earlier in the conversation. Still others, due to slow
rise time, might actually perform better later. Finally, in some cases, adults with
aphasia may not be capable of providing a sample of sufficient length to allow
the examiner the option of selecting 100 or 200 words for analysis. An a priori
intent to truncate samples to a specific predetermined length, then, could lead
either to misrepresentation of conversational abilities in cases in which a
substantial amount of language sample data is discarded, or to discarding
language sample data altogether if a client is not able to produce a sample of the
predetermined length.
The second objective of our study was to determine if these measures of
lexical diversity adequately differentiated adults with nonfluent and fluent
aphasia. Again, we were most interested in the use of whole language samples for
these analyses since it is our position that these will be most representative of the
abilities of individuals with aphasia. When whole language samples were used,
only D and NDW differentiated NF and F aphasia types. It is not surprising that
TTR did not reveal between-group differences, given its sensitivity to sample size
variation. The finding that NDW produced between-group differences on whole

36 APHASIOLOGY

samples, however, is initially somewhat surprising. In theory, it, too, is samplesize sensitive. That is, if a person produces a 200word language sample, there is
a greater opportunity to produce more different words than if the person only
produces a 100word sample. Upon reflection, however, this finding might have
been expected; the adults with fluent aphasia tended to produce longer samples,
as well as to use more diverse vocabulary. The greater diversity of their
vocabulary can be seen in all three analyses when sample length was controlled.
With NDW, however, this between-groups difference is magnified by the fact that
the adults with fluent aphasia also produced more language, resulting in their
performance appearing even more different from those with nonfluent aphasia
than it actually was. Thus, our finding of group differences for whole samples
with NDW is in no way indicative that the measure is stable across sample-size
variation.
The question of sample-size sensitivity might also be raised with respect to D.
If D is sample-size sensitive, it also should inflate the values for adults with
fluent aphasia. To assess this possibility informally, D analyses were performed
on split halves of each of five samples selected randomly, such that for each
sample every other utterance was omitted from the analysis. The 10 half-sample
D values (even utterances and odd utterances) were then compared to the 5
whole-sample D values. In theory, if D were sample-size sensitive, D values for
the half samples would each fall below their respective D value for the whole
sample, demonstrating that fewer words yield lower values than more words.
The results of this informal analysis, however, were that six of the halves fell
below their whole D value, and four of the halves fell above their whole D
value. These results are consistent with those of McKee and colleagues (2000),
and suggest that D is not, in fact, sample-size sensitive, at least to the extent that
TTR and NDW are.
Taken as a whole, our results in relation to D are of interest because they
suggest that this analysis is appropriate for quantifying conversational
vocabulary performance of adults with aphasia. Moreover, our results add to the
growing literature regarding the utility of D as a measure of lexical diversity in
clinical populations (e.g., Malvern & Richards, 1997; Owens & Leonard, 2002).
With respect to NDW, our findings that truncated samples can be used to
distinguish groups of differing language skills corroborate those of Watkins and
colleagues (1995), although they found that, even using truncated samples, TTR
did not distinguish children with language impairment from normal-language
peers. The most likely reason for this difference is that Watkins and colleagues
truncated samples by utterances rather than words.
One limitation of the present study is that the language samples obtained for
analyses are relatively small. Hess et al. (1986), for example, have made the case
that a minimum of 350 words are needed for reliable computation of TTR, at least
for analyses of preschool children with normal language. Despite this limitation,
however, findings and implications of the present study are important for at least
two reasons. First, samples of the present study allow for comparison to other

LEXICAL DIVERSITY IN APHASIA 37

work with similar samples (e.g., Watkins et al., 1995) and, arguably, are valuable
in that insight can be gained from evaluating the extent to which smaller samples
can be used with these analyses for this clinical population. Second, from our
perspective it is important to propose measurement procedures that are both
realistic and capable of implementation in a clinical setting. Although longer
samples are desirable from a research perspective, conclusions based on longer
samples are not as readily applied to clinical endeavours, given the nature of
fluent and nonfluent aphasias and the inherent time constraints of clinical work.
Conclusion and clinical implications
It appears that D is a rather promising tool for the analysis of lexical diversity in
the conversation of adults with aphasia. Its greatest strength is in its ability to
accommodate whole language samples while controlling for sample size in its
output. In contrast, TTR and NDW both require that language sample data be
discarded so that only samples of equivalent length in words are legitimately
compared. We have raised concerns about the ecological validity of procedures
that require the discarding of language sample data. It appears that D analysis
provides group separation between fluent and nonfluent aphasia samples,
suggesting perhaps its future use as an additional tool in the differential
diagnosis of aphasia. Future studies might further investigate the validity and
reliability of D as a measure of conversational vocabulary in adults with aphasia.
In addition, conversational vocabulary diversity among other populations with
acquired neurogenic disorders warrants exploration. Such work might increase
our confidence in the clinical utility of this new measure, as well as enhance our
understanding of the conversational lexical abilities of adults with a range of
acquired neurogenic disorders.
REFERENCES
Chotlos, J. W. (1944). Studies in language behavior. IV. A statistical and comparative
analysis of individual written language samples. Psychological Monographs, 56(2),
77111.
Dollaghan, C. A., Campbell, T. F., Paradise, J. L., Feldman, H. M., Janosky, J. E.,
Pitcairn, D. N., et al. (1999). Maternal education and measures of early speech and
language. Journal of Speech, Language, and Hearing Research, 42, 14321443.
Fillenbaum, S., Jones, L. V., Wepman, J. M. (1961). Some linguistic features of speech
from aphasic patients. Language and Speech, 4, 91108.
Goffman, L., & Leonard, J. (2000). Growth of language skills in preschool children with
specific language impairment: Implications for assessment and intervention.
American Journal of Speech-Language Pathology, 9, 151161.
Guiraud, P. (1959). Problemes et methodes de la statistique linguistique. Dordrecht:
D.Reidel.

38 APHASIOLOGY

Herdan, G. (1960). Type-token mathematics: A textbook of mathematical linguistics. The


Hague: Mouton.
Hess, C. K., Sefton, K. M., & Landry, R. G. (1986). Sample size and type-token ratios for
oral language of preschool children. Journal of Speech and Hearing Research, 29,
129134.
Johnson, W. (1944). Studies in language behavior. I. A program of research.
Psychological Monographs, 56(2), 115.
Kertesz, A. (1982). Western aphasia battery. New York: Grune & Stratton.
MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk (2nd ed.).
Hillsdale, NJ: Erlbaum.
Malvern, D. D., & Richards, B. J. (1997). A new measure of lexical diversity. In A.Ryan
& A.Wray (Eds.), Evolving models of language (pp. 5871). Clevedon: Multilingual
Matters.
McKee, G., Malvern, D., & Richards, B. (2000). Measuring vocabulary diversity using
dedicated software. Literary and Linguistic Computing, 15 (3), 323338.
McNeil, M. R., Doyle, P. J., Fossett, T. R. D., Park, G. H., & Goda, A. J. (2001).
Reliability and concurrent validity of the information unit scoring metric for the story
retelling procedure. Aphasiology, 15, 9911006.
Miller, J. (1981). Assessing language production in children. Baltimore: University Park
Press.
Nicholas, L. E., & Brookshire, R. H. (1993). A system for quantifying the informativeness
and efficiency of connected speech of adults with aphasia. Journal of Speech and
Hearing Research, 36, 338350.
Owen, A. J., & Leonard, L. B. (2002). Lexical diversity in the spontaneous speech of
children with specific language impairment: Application of D. Journal of Speech,
Language, and Hearing Research, 45, 927937.
Prins, R. S., Snow, C. E., & Wagenaar, E. (1978). Recovery from aphasia: Spontaneous
speech versus language comprehension. Brain and Language, 6, 192211.
Ratner, N. B., & Silverman, S. (2000). Parental perceptions of childrens communicative
development at stuttering onset. Journal of Speech, Language, and Hearing
Research, 43, 12521263.
Silverman, S., & Ratner, N. B. (in press). Measuring lexical diversity in children who
stutter: Application of D. Journal of Fluency Disorders.
Spreen, O., & Wachal, R. S. (1973). Psycholinguistic analysis of aphasic language:
Theoretical formulations and procedures. Language and Speech, 16, 130146.
Templin, M. C. (1957). Certain language skills in children, their development and
interrelationships. Minneapolis: University of Minnesota Press.
Wachal, R. S., & Spreen, O. (1973). Some measures of lexical diversity in aphasic and
normal language performance. Language and Speech, 16, 169181.
Watkins, R., Kelly, D., Harbers, H., & Hollis, W. (1995). Measuring childrens lexical
diversity: differentiating typical and impaired language learners. Journal of Speech
and Hearing Research, 38, 13491355.

Limb apraxia, pantomine, and lexical


gesture in aphasic speakers: Preliminary
findings
Miranda Rose and Jacinta Douglas
La Trobe University, Victoria, Australia

Background: Speech-language pathologists considering the use of


gesture as a therapeutic modality for clients with aphasia must first
evaluate the integrity of their cleints gesture systems. Questions
arise with respect to which behaviours to assess and how to assess
the chosen behaviours. There has been a long-held belief that tests of
limb apraxia and pantomime provide valid information about
candidacy for gesture-based interventions, yet the theoretical and
empirical basis of this assumption is limited. Further, the relationship
between conversational gesture skill and limb apraxia in cooccurring aphasia has been largely unexplored. It is possible that a
clients gesture performance in natural conversation provides more
valid information about gesture treatment candidacy than do tests of
limb apraxia.
Aims: This study aimed to investigate the relationship between the
presence of limb apraxia and conversational gesture use in speakers
with nonfluent aphasia. Following the assumption that limb praxis
and conversational gesture reflect differing underlying processing, it
was hypothesised that speakers with aphasia and limb apraxia would
produce the full range of conversational gesture types in a
conversational context. Further, it was hypothesised that speakers
with demonstrated pantomime deficits on formal tests of pantomime
would produce pantomimes naturally in conversation. Thus, a
dissociation would be demonstrated between the processing
responsible for gesture production as measured in limb apraxia tests
and that subserving the production of conversational gesture.

40 APHASIOLOGY

Methods & Procedure: Seven participants with nonfluent aphasia


and ideomotor and conceptual limb apraxia conversed in a semistructured conversation with the researcher. All arm and hand
gestures produced by the participants were counted and rated
according to guidelines provided by Hermann, Reichle, and LuciusHoene (1988), and the time they spent in either gesture or spoken
expression was compared. Correlations were calculated between
limb apraxia scores and proportions of meaning-laden gestures used
in conversation.
Outcomes & Results: All seven participants produced a wide range
of gesture types. Participants with limited verbal output produced
large amounts of meaning-laden gesture. Importantly, even
participants with severe limb apraxia produced high proportions of
meaning-laden gestures (codes and pantomimes) in the natural
setting. There were no significant relationships found between scores
on limb apraxia tests and natural gesture use.
Conclusions: Patients with nonfluent aphasia and limb apraxia
may still use meaningful conversational gesture in naturalistic
settings. Tests of limb apraxia may be poor predictors of use of
lexical gesture. Thus, clinicians are advised to sample lexical gesture
use in spontaneous interactions.
Speech-language pathologists often consider the use of gesture as either an
alternative to or facilitator of verbal communication for individuals with aphasia
(Christopoulou & Bonvillian, 1985; Helm-Estabrooks, Fitzpatrick, & Baressi,
1982; Rao, 1995; Skelly, Schinsky, Smith, & Fust, 1974; Wertz, LaPointe, &
Rosenbek, 1984). Clients may be taught to use pantomime (a sequence of
gestures demonstrating objects or actions without the need for speech) or AmerInd gestures (Skelly et al., 1974) to communicate thoughts and feelings when
verbal communication is not possible, or gestures may be paired with verbal
targets in order to facilitate verbal production (Hanlon, Brown, & Gerstmann,
1990; Kearns, Simmons, & Sisterhen, 1982; Rose & Douglas 2001; Rose,
Douglas, & Matyas, 2002). Thus, there is a need to determine the integrity of
clients gesture systems when considering the use of gesture in speech-language
pathology interventions for aphasia. However, in considering the gesture abilities

Address correspondence to: Dr Miranda Rose, School of Human Communication


Sciences, La Trobe University, Bundoora, 3086, Victoria, Australia. Email:
M.Rose@latrobe.edu.au
2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000157

DOI:10.1080/

LIMB APRAXIA AND LEXICAL GESTURE 41

of a client, the method of assessment, and the processes and skills requiring
assessment, need to be defined.
There have been three main approaches to assessing the gesture abilities of
speakers with aphasia. One approach has focused on assessing pantomime skills
through tests of limb apraxia (Duffy & Duffy, 1981; Goodglass & Kaplan, 1963;
Wang & Goodglass, 1992), the second approach is best described as a trial-anderror method where gesture treatments are trialled and their success monitored
(Wertz et al., 1984), and the third approach has measured the use of
conversational gesture in natural settings (Behrmann & Penn, 1984; Hermann et
al., 1988; Le May, David, & Thomas, 1988). Confusion persists with respect to
the utility of these three approaches and exactly what information
speechlanguage pathologists require in considering candidacy for gesture-based
treatments.
There has been a long-held notion that testing for the presence of limb apraxia
in patients with aphasia provides information about treatment candidacy for
gesture-based interventions (Helm-Estabrooks et al., 1982), although evidence to
support this assumption is limited. Limb apraxia, which is commonly defined as
the inability to perform skilled, purposeful limb movements in the absence of
elementary sensorimotor disorders, intellectual deterioration, or comprehension
difficulties (Chainay & Humphreys, 2002), frequently co-occurs with aphasia
(De Renzi, Raglioni, Lodesani, & Vecchi, 1983; Goodglass & Kaplan, 1963;
Kertesz, Ferro, & Shewan, 1984). Limb apraxia is assessed by asking the
individual to produce limb movements to verbal command and/or to imitation.
Clients are asked to make socially regulated, intransitive gestures such as,
saluting, waving goodbye, and making an OK sign. Clients are also asked to
make transitive gesturres displaying how objects are used, such as demonstrating
how to cut bread, either with or without the actual object present. In some
apraxia batteries, meaningless movements are also tested to imitation (Kimura &
Archibald, 1974). Three types of limb apraxia are currently recognised:
ideomotor apraxia (a disorder of temporal, sequential, and spatial organisation of
action), ideational apraxia (an incapacity to mentally evoke the action associated
with a sequence of objects), and conceptual apraxia (an incapacity to mentally
evoke the action associated with a single object) (Ochipa, Rothi, & Heilman,
1992).
Several seminal texts and papers have suggested that the presence of limb
apraxia in speakers with aphasia prevents them from learning a gesture system
for communication and/or from responding to gestural facilitation treatments
(Helm-Estabrooks et al., 1982; Rothi & Heilman 1997). However, the empirical
evidence to support this assumption is extremely limited. Helm, Kaplan and
Vercruysse (as cited in Helm-Estabrooks et al., 1982) found that patients with
severe aphasia and limb apraxia did not produce verbal labels or representational
gestures to picture stimuli and they suggested that limb apraxia may prevent
patients from using representational gestures as a natural means of
communication. However, natural gesture use was not assessed in the Helm et al.

42 APHASIOLOGY

study, rather an evaluation of the patients abilities to gesture to command or to


picture stimuli was made and their natural gesture abilities were inferred from
these tasks. Similarly, in a study by Borod, Fitzpatrick, Helm-Estabrooks, and
Goodglass (1989), significant correlations were found between the conversational
gesture abilities of individuals with aphasia as rated on their Nonvocal
Communication Scale and ratings of limb apraxia from the Boston Apraxia Test
(Helm-Estabrooks, 1986, as cited in Borod et al., 1989). Borod et al. concluded
that tests of limb apraxia could predict the probability of a patients competence
in nonverbal social interaction. However, the adequacy of the gestural repertoire
rated in the Nonvocal Communication Scale is questionable, in that it fails to rate
many important natural gestural acts, such as yes/no gestures, pantomimes used
for indicating affective states, and strength of gesture movements indicating
emphasis.
Limb praxis may not be the only behaviour requiring assessment when
considering gesture-based interventions for aphasia. The integrity of a clients
lexical gesture (arm and hand gestures that spontaneously accompany
verbalisation) (Krauss, Chen, & Gottesman, 2000) may be a valid predictor of
gesture-based treatment candidacy. While early studies inferred that speakers
with aphasia had poor use of lexical gesture, as measured by formal tests of
pantomime and limb praxis (Duffy & Duffy, 1981; Goodglass & Kaplan, 1963;
Wang & Goodglass, 1992), later research that directly examined lexical gesture
in naturalistic settings demonstrated that speakers with aphasia had considerable
gesture skills (Cicon, Wapner, Foldi, Zurif, & Gardner, 1979; Behrmann &
Penn, 1984; Feyereisen, Barter, Goosens, & Clerebaut, 1988; Glosser, Weiner, &
Kaplan, 1986; Hadar & Krauss 1999; Herrmann et al., 1988; Le May et al.,
1988). In Pedeltys study of four participants with Brocas aphasia and five
participants with Wernickes aphasia reported in McNeill, Levy, and Pedelty
(1990), similar rates of gesture production were found as compared to normal
speakers. Those with Brocas aphasia demonstrated brief, meaningful,
interpretable gestures while those with Wernickes aphasia predominantly
produced fluent, vague, meaningless, and uninterpretable gestures. Similar
findings were reported by Behrmann and Penn (1984), Cicone et al. (1979), and
Feyereisen (1983), suggesting that the nature of the lexical gesture produced is
closely related to aphasia type. However, the impact of the presence of limb
apraxia on lexical gesture skills in aphasia remains uncertain and controversy is
emerging in the current literature with respect to this relationship.
Several authors have argued for a direct relationship between limb praxis and
lexical gesture (Feyereisen & de Lannoy 1991; Hadar & Krauss (1999). Glosser,
Wiley, and Barnoski (1998) examined the lexical gesture of patients with
Alzheimers disease made during a 5minute conversation, and the degree of the
limb apraxia in each patient. They found that in comparison to normal control
participants, patients with Alzheimers disease produced proportionately more
referentially unclear or semantically ambiguous gestures as compared to contentbearing gestures. Further, significant correlations were found between imitation

LIMB APRAXIA AND LEXICAL GESTURE 43

and production of meaningful pantomimic movements (conceptual apraxia) and


production of ambiguous lexical gesture. This was not the case for nonmeaningful movements (ideomotor apraxia). Thus, Glosser et al. concluded that
tests of conceptual limb apraxia are predictive of referential gesture ability in
spontaneous production. While one might expect an underlying semantic/
conceptual disorder, such as is presumed in Alzheimers disease, to affect both
representational gestures and spontaneous gesture, the case may be quite different
in nonfluent aphasic populations where semantic representations are frequently
intact.
Lausberg, Davis, and Rothenhausler (2000), presented a clear dissociation
between limb apraxia and lexical gesture. Lausberg et al. found extensive use of
left-handed lexical gesture in a patient with severe left limb apraxia (in
movements to command and to imitation) following a complete callosal
infarction. Lausberg et al. argued that lexical gesture largely occurs without
conscious control, while the movements made during limb apraxia testing are
made consciously and in an abstract context. The dissociation noted in Lausberg
et al.s patient highlighted both the differences in the performance demands of
unconscious versus conscious gesture production, as well as the impact of context
on production.
Recent developments in cognitive neuropsychology and psycholinguistics
have led to the creation of models of word production, limb praxis, and lexical
gesture. These models help to explain the differences in processing underlying
consciously produced toolrelated gestures and unconsciously produced lexical
gestures. Rothi and Heilman (1997) presented a cognitive neuropsychological
model that portrays the relationship between tool knowledge and use on the one
hand, and comprehension and production of tool names on the other (see
Figure 1). The model builds on a model of word processing described by
Patterson and Shewell (1987) by adding a series of action lexicons where
movement representations are stored. The model is extremely useful in
highlighting the complex nature of the processing involved in the comprehension
and production of toolrelated action and words. The model also assists in
clarifying some of the differences between the three currently recognised types
of limb apraxia, for example, conceptual apraxia relates to deficits at the
semantics level of processing, while ideomotor apraxia related to deficits at the
action output lexicon.
In the field of psycholinguistics, researchers interested in conversational or
lexical gesture, have developed models describing the relationships between the
processes of speech production and lexical gesture production (Hadar &
Butterworth, 1997; Krauss et al., 2000) (see Figure 2). These later models are not
restricted to tool names or tool actions, rather they attempt to account for the
processing associated with a wide range of nouns, verbs, and adjectives and the
lexical gesture that is born out of pre-communication imagistic thought
processes. Krauss and Hadars model postulates three phases of gesture
production: spatial/dynamic feature selection, spatial/dynamic feature

44 APHASIOLOGY

Figure 1. A model of praxis processing and its relation to semantics, naming, and word
and object recognition. From Apraxia: The neuropsychology of action, L.Rothi and
K.Heilman (Eds.), (1997), p. 45. Hove, UK: Psychology Press. Copyright by Psychology
Press, Reprinted with permission.

specification, and motor planning. The model suggests a direct communication


between gesture and speech production attempting to account for gestural
facilitation effects. It is unclear how the models of limb praxis and lexical
gesture relate and many questions remain unanswered. Do the movements
evaluated in tests of limb apraxia and modelled by Rothi and Heilman share
underlying processing components with those termed lexical gesture and
modelled in Krauss et al.s psycholinguistic representation? Does the presence of
limb apraxia have any bearing on lexical gesture use?
PURPOSE OF THE STUDY
This paper reports on a study that aimed to investigate the relationship between
the presence of ideomotor and conceptual limb apraxia, and lexical gesture use in

LIMB APRAXIA AND LEXICAL GESTURE 45

Figure 2. Krauss et al.s (2000) model of cognitive architecture for the speech-gesture
production process. From Language and gesture, D.McNeill, (ed.). (2000), p. 261.
Cambridge: Cambridge University Press. Copyright by Cambridge University Press.
Reprinted with permission.

speakers with nonfluent aphasia. Following the theoretical models proposed by


Krauss et al. (2000) and Rothi and Heilman (1997) it was reasoned that gesture
movements assessed in tests of limb apraxia and those observed in lexical
gesture reflect different underlying processing. Therefore, it was hypothesised
that in a conversational context, speakers with nonfluent aphasia and concomitant
limb apraxia would produce the full range of lexical gesture types. Further, in
speakers with nonfluent aphasia, pantomime deficits as measured on formal tests
of pantomime were not expected to correlate with the amount of pantomime-type

46 APHASIOLOGY

lexical gestures produced during spontaneous conversation. Thus, a dissociation


would be demonstrated between the processing responsible for skilled action
associated with tool use (impaired in the clinical construct of limb apraxia) and
formal pantomime production, and the production of spontaneous lexical
gestures.
METHOD
Participants
Seven participants with aphasia were recruited for this study. Each sustained a
single leftsided stroke at least 18 months previously. The seven participants met
the following inclusionary and exclusionary criteria: English was their first and
only language, they were right-handed premorbidly (Simplified Hand Preference
Score = +1.0; Bryden, 1982), there was no history of drug or alcohol abuse, and
gesture had not been targeted in any previous speech-language pathology
interventions. Aphasia syndrome assignment and severity was based on the
language profiles obtained from performance on relevant subtests of either the
Boston Diagnostic Aphasia Examination (BDAE) (Goodglass & Kaplan, 1983),
or the Western Aphasia Battery Aphasia Quotient (WAB AQ) (Kertesz, 1982).
Each participant was required to demonstrate ideomotor and conceptual limb
apraxia, as defined by Rothi and Heilman (1997), by impaired performance on
the limb and gestures pictures subtests of the Test of Oral and Limb Apraxia
(TOLA) (HelmEstabrooks, 1992). Participant demographic data and results of
language assessments are provided in Table 1.
Procedure
All participants were tested on a range of formal measures and conversation
tasks by the first-named investigator, during a 2week period. The Test of Limb
Apraxia (TOLA) (Helm-Estabrooks, 1992) was utilised to obtain standard scores
of both ideomotor limb apraxia and pantomime-to-picture stimuli. In addition,
the abbreviated form of Kimura and Archibalds movement copying test
described by Corina, Poizner, Bellugi, Feinberg, Dowde, and OGrady-Batch
(1992) was used as a measure of praxis on non-meaningful stimuli. Impaired
performance on the limb and gestured pictures subtests of the TOLA was defined
as a total subtest score falling below the published mean scores obtained from a
group of normal participants. The types of errors that participants made were
recorded and scored in accordance with the TOLA scoring protocol. Errors in
timing, spatial orientation, and precision are consistent with a diagnosis of
ideomotor apraxia.
Errors on transitive gesture items were analysed for possible underlying
conceptual problems, for example, performing a sawing motion for a paintbrush,

LIMB APRAXIA AND LEXICAL GESTURE 47

TABLE 1 Participant demographic and linguistic data

WAB AQ = Western Aphasia Battery Aphasia Quotient (Kertesz, 1982); BDAE Severity
Rating = Boston Diagnostic Aphasia Examination ranging from 0no useable speech to
5minimal discernible handicap.

such content errors being consistent with a diagnosis of conceptual apraxia.


Finally, participants interacted in a 20minute conversation with the first
investigator. In order to achieve some degree of structural similarity between the
seven conversations, but maintain a high degree of naturalness, the investigator
used a list of topics to direct the interaction (see Appendix), resulting in semistructured conversations, which were videotaped for later analysis. The
investigator aimed to have the participant provide the maximum amount of
conversational turns, and only spoke in order to encourage continuation of the
topic, show interest, or to ask the next question on the list of topics.
Data analysis
Conversation analysis was based on 6 minutes of conversation for each
participant, starting after a 3minute warming-up period had elapsed. Gesture
behaviour was assessed by a system of observations focusing on head, arm, and
body movements, while isolated movements of the face were not analysed. The
entire 6minute conversations were transcribed in broad phonetic transcription,
using gesture transcription conventions suggested by McNeill (1992). Following
definitions provided by Herrmann et al. (1988), each verbalisation, each gesture,
and the time and duration over which they occurred were recorded.
Verbalisations were operationally defined as verbal utterances preceded and
followed by pauses of 2 seconds or more. Gestures were operationally defined as
any movement or sequences of movements of the head, arms, or body that had a
perceptible beginning and end. Gestures were then rated as one of four types as
described by Herrmann et al. (1988). Speech-focused gestures were defined as
communicative actions that subserve spoken language and cannot be interpreted
in isolation. Descriptive gestures were defined as actions that convey information
independent of spoken language and therefore can be interpreted in isolation.
Codified gestures were defined as actions that are not restricted to the given

48 APHASIOLOGY

situation or context, but which are generally used in connection with or as a


substitute for verbal utterances (e.g., nodding as a sign of approval). Pantomimes
were defined as actions of a complex, usually sequential nature, which substitute
for a verbal utterance. Spearman rank-order correlation coefficients (Spearman,
1904, as cited in Pett, 1997) were calculated between pantomime scores from the
TOLA, limb apraxia scores from the TOLA, scores on the non-meaningful
movements test, and proportion of codified and pantomime gestures produced in
conversation.

RESULTS
Two qualified speech-language pathologists acted as independent raters.
Following a 1hour training session provided by the first investigator, the two
raters classified 50% of the corpus of each participants gestures using Hermann
et al.s four categories. Point-topoint percentage agreement was calculated as
90%. Where discrepancies emerged, the raters re-coded the item in contention by
consensus discussion. The remaining 50% of the corpus of gestures was rated by
the first investigator. In order to calculate intra-rater agreement, all the gestures
produced by participant JS were rated on two separate occasions by the first
investigator. The point-to-point percentage intra-rater agreement was 86%.
Where discrepancies in rating occurred, the first investigator and one of the
independent raters re-coded the items in contention by consensus discussion.
The results of testing for each participant are presented in Table 2. All seven
participants demonstrated ideomotor limb apraxia ranging from severe for GC
through to mild for SA. Pantomime deficits (and conceptual apraxia), as
measured on a formal test of pantomime (TOLA subtest gestured pictures),
were also present in all seven participants. Two participants, SA and BO
demonstrated performances within normal limits on imitation of non-meaningful
movements, while the remaining five participants had impaired performance.
Table 3 displays the absolute duration of all verbal and gesture elements used
by each participant. Four participants (SA, GC, RG, JS) spent longer time in
gestural than verbal expression, reflecting the severity of their verbal expression
deficits. Table 4 details the percentage and actual numbers of each gesture type
used by the participants. There was considerable variability in the types of
gesture used by the participants. Of particular note was the high mean percentage
use of descriptive, codified, and pantomimic gestures known to carry a high
meaning load (M = 73.4%, range 4697%). This compares with a group of
normal speakers who produced a lower percentage of meaning-laden gestures
(iconics) (M = 56%) as compared to 44% non-meaning-laden gestures (beats,
metaphorics, deictics) in conversational settings (McNeill, 1992). Participants
GC and JS demonstrated similar distribution patterns for gesture type, with the
greatest use of pantomime and codified gestures. Similarly, 42.4% of SAs
gestures were codified or pantomime, again reflecting the severe verbal output

LIMB APRAXIA AND LEXICAL GESTURE 49

deficits and the attempts to enhance meaningful output through the gestural
modality.
Spearman rank-order correlation coefficients were computed to examine the
relationships between meaning-laden lexical gesture (codes and pantomimes)
and scores on the tests of ideomotor and conceptual limb apraxia (Table 5). No
significant relationships were found.
TABLE 2 Test results on standard measures

Limb
Apraxia
Pantomi
me
Kimura
Test

SA

BO

KC

WS

GC

RG

JS

84

75

37

63

16

25

63

50

84

63

63

37

50

75

22

22

15*

16*

12*

11*

14*

Limb Apraxia, and Pantomime, from Test of Limb Apraxia (Helm-Estabrooks, 1992),
expressed as percentile ranks of performance from participants with brain
damage. Kimura Test: Movement Copying Test (Kimura & Archibald, 1982,
as cited in Corina et al., 1992) scored out of 24 points (*<19.24 = impaired
performance).
TABLE 3 Total duration of verbalisation and gesture production during 6minute
conversation sample (seconds)

Verbal elements
Gesture elements

SA

BO

KC

WS

GC

RG

JS

72
77

197
88

288
66

256
138

0
89

37
78

28
79

TABLE 4 Percent distribution of the types of gesture behaviour for each participant
Speech-focused movements Descriptive gesture Codified gesture Pantomime
SA 27.3 (9)
30.3 (10)
30.3 (10)
12.1 (4)
BO 46.4 (13)
25 (7)
18 (5)
10.6 (3)
KC 54 (19)
23 (8)
9 (3)
14 (5)
WS 32.5 (13)
35 (14)
15 (6)
17.5 (7)
GC 3 (1)
15 (5)
12 (4)
70 (24)
RG 34 (12)
37 (13)
26 (9)
3 (31)
14 (5)
25 (9)
19 (7)
42 (15)
JS
Figures in brackets refer to actual number of gestures produced in the sample.

50 APHASIOLOGY

TABLE 5 Spearman rank-order correlation coefficients between proportions of lexical


gestures, scores on Test of Limb Apraxia subtests, and non-meaningful movements

% Codified gestures
and pantomimes

TOLA Limb Apraxia TOLA Pantomime


Subtest
Subtest

Non-meaningful
movements

.00

.072

.291

DISCUSSION
As hypothesised, despite the presence of significant ideomotor and conceptual
limb apraxia, all seven participants were able to supplement their verbal
communication with lexical gesture or completely substitute gesture for
verbalisation. Most noticeable was GC, who demonstrated the most severe limb
apraxia and yet used the greatest number of meaningful gestures per minute of
communication time. Further, the presence of pantomimic deficits, as detected on
a formal test of pantomime ability, did not prevent the use of codified gesture
and pantomime in conversation. SA and GC achieved scores consistent with
significant impairment on the pantomime section of the TOLA and yet clearly
demonstrated considerable pantomime production in conversation. GC scored
very poorly on the TOLA pantomime test but produced a total of 24 pantomimes
during a 6minute conversation sample. As a group, there were no significant
correlations found between the participants scores on limb apraxia assessments
and their use of codes and pantomimes in conversation. With respect to the
severity of the aphasia and lexical gesture production, the data from these seven
participants suggest that it is possible, even for patients with severe global
aphasia, to use meaningful gestures (codes and pantomimes) in a natural
communication setting. It is argued that the models of lexical gesture (Krauss et
al., 2000) and limb praxis (Rothi & Heilman, 1997) are portraying different
processing and behaviours.
One source of the discrepancy between limb praxis and formal pantomime test
scores and the occurrence of meaning-laden gestures (codes and pantomimes) in
conversation in these seven participants may be the differences in the processing
demands of the tasks. The processing involved in creating a gestural response in
commonly used limb apraxia protocols is both highly conscious and removed
from the usual context. In generating a gesture following a verbal command for
example, show me how to use a saw, one must comprehend the auditory label
saw and then consciously think about how to represent the features and
movements associated with the object saw in spatial and dynamic parameters
that can be understood by the examiner. Similarly, in making a gesture following
a pictorial stimulus, one has to identify the salient features of the object that can
be represented in the movement, recall the movement patterns usually associated
with the object, and then demonstrate these in a way that the examiner can
interpret. Such processing involves a good deal of cognitive abstraction. In

LIMB APRAXIA AND LEXICAL GESTURE 51

contrast, when spontaneously generating a codified gesture or pantomime during


conversation, the processing is largely unconscious and context-embedded, and
is hypothesised to result from the imagistic components of thought generation
occurring during pre-communication stages (Krauss et al., 2000) (see Figure 2).
The differences in the processing underlying these two types of gesture output
may be sufficient to explain the discrepancies between the test-related and
naturalistic gesture performances of the seven participants with aphasia reported
in this study.
The notion that context may be an important factor in determining gesture
performance gained support from the results of a recent study by Neiman, Duffy,
Belanger, and Coelho (2000). Neiman et al. found that contrary to commonly
held views, patients with limb apraxia performed more poorly on test items
utilising single objects as compared to items utilising multiple related objects.
They suggested that their patients performances were enhanced during the
multiple-object items because the greater associations among related objects
provided additional contextual cues.
We speculate that the lack of context and demand for abstract thought in limb
apraxia and formal pantomime test protocols explains poor test scores in the
context of good naturalistic gesture production for some patients with aphasia. We
also speculate that difficulties with abstraction may explain why some speakers
with aphasia do not naturally enhance their lack of verbal output with gesture,
even though they appear to have the capacity to produce meaningful gestures
(Coelho, 1991). Feyereisen and de Lannoy (1991) delineated the steps that must
occur in order for a speaker to use the gestural modality to compensate for wordfinding difficulties. First, the speaker must abandon the wordsearching behaviour
and decide to represent their thoughts with gesture. Second, the concept to be
expressed must be considered in terms of its distinctive visual and/or spatial
features so that appropriate and meaningful movements can be chosen, and this
may involve considerable innovation. Finally, the relevant motor programs must
be chosen and enacted. Therefore, Feyereisen and de Lannoy argued that
gestural performance in this regard relies as much on strategic choice and
creativity as it does on gestural capacity.
One may then ask what is the salience of a diagnosis of limb apraxia in a
client with aphasia for a speech-language pathologist considering gesture-based
intervention? It has been argued that when gestures are taught in aphasia
interventions, such as those proposed by Rao (2001) and Wertz et al. (1984),
patients are required to adopt a meta-analytic approach and learn to represent a
picture or object with an action, and that such activity is exactly that required in a
test of limb apraxia. Therefore, it is argued that a clients performance during limb
apraxia testing will reflect what the patient will do when learning to use gestures
to represent thoughts. However, if a client is using pantomime and codes
naturally in conversation, it may be possible to extend and shape these in
context through modelling and encouragement, even though the client may have
a demonstrated limb apraxia on formal testing. Thus, while the ability for

52 APHASIOLOGY

abstraction may be impaired, the ability to follow models in context may be


preserved and harnessed in treatment protocols. Certainly, the data from the
seven aphasic speakers reported here suggest that assessment of gestural abilities
will be insufficient if done exclusively through formal tests of pantomime and
limb apraxia or by trials of having a client attempt to represent an object or
picture with a gesture out of context. Rather, assessment should at least include
conversational sampling, focusing on the clients gestural repertoire in
naturalistic settings and in responding to gestural modelling and encouragement
during conversation.
The challenge for speech-language pathologists working in the gestural
modality with patients with aphasia and co-occurring limb apraxia is how to
extend the gestural repertoire of the client without creating learning
environments that are artificial, abstract, and devoid of natural context. To our
knowledge, there has been inadequate discussion of such procedures in the
speech-language pathology intervention literature, to date.
REFERENCES
Behrmann, M., & Penn, C. (1984). Non-verbal communication of aphasic patients. British
Journal of Disorders of Communication, 19, 155168.
Borod, J., Fitzpatrick, P., Helm-Estabrooks, N., & Goodglass, H. (1989). The relationship
between limb apraxia and the spontaneous use of communicative gesture in aphasia .
Brain and Cognition, 10, 121131.
Bryden, M. (1982). Handedness and its relation to cerebral function. Laterality:
Functional asymmetry in the intact brain (pp. 157179). New York: Academic
Press.
Chainay, H., & Humphreys, G. (2002). Neuropsychological evidence for a convergent
route model for action. Cognitive neuropsychology, 19(1), 6793.
Christopoulou, C., & Bonvillian, J. (1985). Sign language, pantomime, and gestural
processing in aphasic persons: A review. Journal of Communication Disorders, 18,
120.
Cicone, M., Wapner, W., Foldi, N., Zurif, E., & Gardner, H. (1979). The relation between
gesture and language in aphasic communication. Brain and Language, 8, 324349.
Coelho, C. (1991). Manual sign acquisition and use in two aphasic subjects. In T.Prescott
(Ed.), Clinical Aphasiology, 19. Austin, TX: ProEd.
Corina, D., Poizner, H., Bellugi, U., Feinberg, T., Dowde, D., & OGrady-Batch, L.
(1992). Dissociation between linguistic and nonlinguistic gestural systems: A case for
compositionality. Brain and Language, 43, 414447.
DeRenzi, E., Faglioni, P., Lodesani, M., & Vecchi, A. (1983). Performance of left braindamaged patients on imitation of single movements and motor sequences: Frontal
and parietal-injured patients compared. Cortex, 19, 333343.
Duffy, R., & Duffy, J. (1981). Three studies of deficits in pantomimic expression and
pantomimic recognition in aphasia. Journal of Speech and Hearing Research, 46,
7084.

LIMB APRAXIA AND LEXICAL GESTURE 53

Feyereisen, P. (1983). Manual activity during speaking in aphasic subjects. International


Journal of Psychology, 18, 545556.
Feyereisen, P., Barter, D., Goosens, M., & Clerebaut, N. (1988). Gestures and speech in
referential communication by aphasic subjects: Channel use and efficiency.
Aphasiology, 2, 2132.
Feyereisen, P., & de Lannoy, J. (1991). Gestures and speech: Psychological
investigations. New York: Cambridge University Press.
Glosser, G., Weiner, M., & Kaplan, E. (1986). Communicative gestures in aphasia. Brain
and Language, 27, 345359.
Glosser, G., Wiley, M., & Barnoski, E. (1998). Gestural communication in Alzheimers
disease. Journal of Clinical and Experimental Neuropsychology, 20(1), 113.
Goodglass, H., & Kaplan, E. (1963). Disturbance of gesture and pantomime in aphasia.
Brain, 86, 703720.
Goodglass, H., & Kaplan, E. (1983). The assessment of aphasia and related disorders.
Philadelphia; Lea & Febiger.
Hadar, U., & Butterworth, N. (1997). Iconic gestures, imagery, and word retrieval in
speech. Semiotica, 115, 147172.
Hadar, U., & Krauss, R. (1999). Iconic gestures: The grammatical categories of lexical
affiliates. Journal of Neurolinguistics, 12(1), 112.
Hanlon, R., Brown, J., & Gerstmann, L. (1990). Enhancement of naming in nonfluent
aphasia through gesture. Brain and Language, 38, 298314.
Helm-Estabrooks, N. (1992). Test of oral and limb apraxia. Chicago: The Riverside
Publishing Company.
Helm-Estabrooks, N., Fitzpatrick, P., & Baressi, B. (1982). Visual action therapy for
global aphasia. Journal of Speech and Hearing Disorders, 47, 385389.
Herrmann, R., Reichle, T., & Lucius-Hoene, G. (1988). Nonverbal communication as a
compensation strategy for severely nonfluent aphasics? A quantitative approach.
Brain and Language, 33, 4154.
Kearns, K., Simmons, N., & Sisterhen, C. (1982). Gestural sign (Amer-Ind) as a
facilitator of verbalisation in patients with aphasia. In R.Brookshire (Ed.), Clinical
aphasiology (pp. 183191). Minneapolis: BRK Publishers.
Kertesz, A. (1982). Western aphasia battery. New York: Grune & Stratton.
Kertesz, A., Ferro, J., & Shewan, C. (1984). Apraxia and aphasia: The functionalanatomical basis for their dissociation. Neurology, 34, 4047.
Kimura, D., & Archibald, Y. (1974). Motor functions of the left hemisphere. Brain, 97,
337350.
Krauss, R., Chuen, Y., & Gottesman, R. (2000). Lexical gestures and lexical access: A
process model. In D. McNeill (Ed.), Language and gesture (pp. 261283). Cambridge:
Cambridge University Press.
Lausberg, H., Davis, M., & Rothenhausler, A. (2000). Hemispheric specialization in
spontaneous gesticulation in a patient with callosal disconnection.
Neuropsychologia, 38, 16541663.
Le May, A., David, R., & Thomas, A. (1988). The use of spontaneous gesture by aphasic
patients. Aphasiology, 2, 137145.
McNeill, D. (1992). Hand and mind: What gesture reveals about thought. Chicago:
University of Chicago Press.

54 APHASIOLOGY

McNeill, D., Levy, E., & Pedelty, L. (1990). Speech and gesture. In G.Hammond (Ed.),
Cerebral control of speech and limb movements (pp. 203256). North Holland:
Elsevier.
Neiman, M., Duffy, R., Belanger, S., & Coehlo, C. (2000). The assessment of limb
apraxia: Relationship between performances on single- and multiple-object tasks by
left hemisphere damaged aphasic subjects. Neuropsychological Rehabilitation, 10
(4), 429448.
Ochipa, C., Rothi, L., & Heilamn, K. (1992). Conceptual apraxia in Alzheimers disease.
Brain, 115, 10611071.
Patterson, K., & Shewell, C. (1987). Speak and spell: Dissociations and word-class
effects. In M.Coltheart, R. Job, & G.Sartori (eds.), The cognitive neuropsychology of
language. Hove, UK: Lawrence Erlbaum Associates Ltd.
Pett, M. (1997). Nonparametric statistics for health care research. California: Sage
Publications.
Rao, P. (1995). Drawing and gesture as communication options in a person with severe
aphasia. Topics in Stroke Rehabilitation, 2(1), 4956.
Rao, P. (2001). Use of Amer-Ind code by persons with aphasia. In R.Chapey (Ed.),
Language intervention strategies in aphasia and related neurogenic communication
disorders (4th Edn., pp. 688702). Maryland: Lippincott Williams & Wilkins.
Rose, M., & Douglas, J. (2001). The differential facilitatory effects of gesture and
visualisation processes on object naming in aphasia. Aphasiology, 15(10/11),
977990.
Rose, M., Douglas, J. & Matyas, T. (2002). The comparative effectiveness of gesture and
verbal treatments for a specific phonologic naming impairment. Aphasiology, 16(10/
11), 10011030.
Rothi, L., & Heilman, K. (Eds.). (1997). Apraxia: the neuropsychology of action. Hove,
UK: Psychology Press.
Skelly, M. L., Schinsky, L., Smith, R., & Fust, R. (1974). American Indian sign (AmerInd) as a facilitator of verbalisation for the oral verbal apraxic. Journal of Speech and
Hearing Disorders, 34, 445455.
Wang, L., & Goodglass, H. (1992). Pantomime, praxis, and aphasia. Brain and Language,
42, 40218.
Wertz, T., LaPointe, L., & Rosenbek, J. (1984). Apraxia of speech in adults: The disorder
and its management. Orlando: Grune & Stratton Inc.

APPENDIX
List of conversation questions
What have you been doing today?
Can you tell me about your stroke?
What sort of work have you done in your life?
Can you tell me about your family?

Teaching self-cues: A treatment approach


for verbal naming
Gayle DeDe, Diane Parris, and Gloria Waters
Boston University, USA

Background: Very few treatment studies have examined the effects of


training individuals with anomia to self-generate phonological cues.
There is evidence that treatments using written language can improve
phonological access for some patients. Such approaches are most
effective when the patients are taught strategies to facilitate oral
reading of targets.
Aim: The goal of the present study was to evaluate the effects of a
naming treatment designed to teach a chronic nonfluent aphasic to
generate self-cues based on partial access to the written form of
words and tactile (placement) cues.
Methods: Therapy focused on naming items using a modified
cueing hierarchy that incorporated written naming and tactile cues.
An AB design was used to examine treatment effects in an
individual with aphasia and apraxia of speech.
Outcomes and Results. Verbal naming improved in target
compared to control items. Generalisation was observed to verbal
and written naming on standardised measures but not to novel
stimuli with initial target and control phonemes. Testing 6 weeks posttreatment revealed limited loss of treatment gains.
Conclusions: The results provide qualified support for the
treatment programme.
Treatment studies of verbal naming have focused on increasing the strength of
and access to lexical and phonological representations of words. Semantic
treatments emphasise development of lexical representations through
categorisation or other tasks designed to facilitate learning of fine-grained

56 APHASIOLOGY

semantic details. The rationale is that treatment of specific lexical items


increases the specification of the network associated with that item, which
improves access to both the trained items and related lexical items (Drew &
Thompson, 1999; Hillis, 1989). Treatment studies that target the semantic system
have demonstrated generalisation to untreated stimuli and maintenance of
treatment effects following termination of treatment (e.g., Drew & Thompson,
1999; see Nickels & Best, 1996, for a review of naming studies using both
semantic and phonological cueing treatments).
Phonological treatments tend to take the form of repetition and phonemic cues
such as provision of the first sound or syllable of target items. In their review,
Nickels and Best (1996) concluded that phonological treatments facilitate
naming but that the treatment gains are often short-lasting and do not generalise.
Other work has demonstrated that phonological treatments can result in
persistent gains in naming performance (e.g., Davis & Pring, 1991; Miceli,
Amitrano, Capasso, & Caramazza, 1996; Wambaugh, Linebaugh, Doyle,
Martinez, Kalinyak-Fliszar, & Spencer, 2001) and in generalisation to
semantically and phonologically related items (Raymer, Thompson, Jacobs, &
Le Grand, 1993). It has also been suggested that phonological treatments
requiring active participation by the patient, for example by providing a choice
of phonemic cues, may result in longerlasting effects than phonological
treatments in which patients are simply provided with phonemic information
about targets (Best, Hickin, Howards, & Osborne, 2000). This paper describes a
phonological treatment approach for verbal naming in which a patient was
trained to independently generate phonological cues using written naming and
tactile cues.
Written language has previously been used as a method of indirectly tapping
the phonological output system (Hillis, 1989; Nickels, 1992; Yampolsky &
Waters, 2002). Hillis (1989) compared the effectiveness of a cueing hierarchy
based on written naming for two patients whose underlying impairments were
primarily in the semantic or the phonological systems. The patients, who were
initially trained using the same cueing hierarchy, showed different patterns of
generalisation from written to verbal naming. The patient with primarily

Address correspondence to: Gayle DeDe, Boston University, Sargent College of


Health and Rehabilitation Sciences, Department of Communication Disorders,
635 Commonwealth Avenue, Boston, MA 02215, USA. Email: gdede@bu.edu
We thank LN, whose cooperation made this project possible. We also thank Julie
Wambaugh and three anonymous reviewers who made helpful suggestions on a
previous version of this manuscript.
2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000094

DOI:10.1080/

TEACHING SELF-CUES 57

semantic impairments showed generalisation from written to verbal naming


following the treatment that focused on written naming alone. The patient with a
primarily phonological impairment did not show generalisation from written to
verbal naming, but required additional treatment using a cueing hierarchy that
targeted the oral modality before showing improvement in verbal naming.
Although the cueing hierarchy designed to target verbal naming included a level
in which the patient was encouraged to read what she had written, Hillis (1989)
did not incorporate any strategies to facilitate the patients ability to orally read
the items. According to Bruce and Howard (1988), failure to self-cue, despite a
subjective tip-of-the tongue state, may be attributed to patients lack of at least
one of the abilities considered to be necessary for self-cueing, specifically,
knowing and sounding the first letter of a word, and using that information to
generate a phonemic cue. This finding suggests that written naming may show
limited generalisation to verbal naming if the correspondences between sounds
and letters are not explicitly trained.
Nickels (1992) found that explicitly teaching grapheme-phoneme
correspondences to a patient with intact written naming and impaired verbal
naming generalised to oral reading of words and verbal naming. She argued that
improved naming was observed because the patient was able to independently
generate phonemic cues by recalling the first grapheme of a word. This minimal
phonological information was sufficient to facilitate verbal naming. This
approach holds promise for patients who have a relatively intact semantic system
and a marked phonological deficit, because it may provide a method of
compensating for the phonological deficit. However, this treatment relies on the
patients ability to perform written confrontation naming tasks in order to apply
the graphemephoneme correspondences.
Yampolsky and Waters (2002) taught graphemephoneme correspondences to
a patient with the goal of improving oral reading. Following treatment, there was
evidence of improved access to lexical phonology. Verbal confrontation naming
did not significantly improve, but the patient was able to provide more partial
phonological information about items, such as word length and first sounds, than
she was prior to treatment. Her performance on picture homophone judgement
tasks, which assesses access to lexical phonology, also improved. Unlike
Nickels (1992) patient, Yampolsky and Waters patient performed poorly on
written confrontation naming tasks prior to therapy. This implies that she was
not learning to visualise and orally read graphemic representations that were
accessible prior to therapy. Instead, re-learning the grapheme-phoneme
correspondences generalised to improved access to lexical phonology.
These findings suggest that a therapy that incorporates reading could be
effective for patients whose naming impairments primarily reflect a deficit in the
phonological output lexicon, if the patient is provided with strategies to orally
read words. Patients with intact written naming may benefit most from such
treatments (Nickels, 1992), but this might not be a necessary condition
(Yampolsky & Waters, 2002). The patient in the current study, LN, did not

58 APHASIOLOGY

present with intact written naming, but did show evidence of relatively stronger
access to partial lexical information in the written than verbal modality. This
provided the motivation of using written language as a means of targeting verbal
language production.
However, LN also presented with apraxia of speech, which we postulated would
have a negative impact on his ability to effectively use graphemephoneme
correspondences. In an effort to provide him with nonverbal cues that he could
use to tap into phonology, we decided to incorporate tactile cues into our
treatment. This method was chosen because tactile cues had been effective in
previous treatment sessions with LN. The tactile cueing system, which will be
described in greater detail below, was a modified version of a method described
by Bashir, Grahjones, and Bostwick (1984) and associates specific hand shapes
and positions with specific phonemes.
Another important factor in the effectiveness of naming treatments is the
frequency of treatment sessions. Hillis (1998) reported that intensity of treatment
was more important than the therapeutic approach in determining outcomes of
therapy. Yampolsky and Waters (2002) incorporated a family member as part of
the therapeutic team by training the patients mother to conduct daily therapy
sessions. This approach is novel in that it increases the frequency of therapy in a
realistic manner. It is commonly accepted that more therapy is better, but this is
seldom possible in todays healthcare climate. Hence, intensive home practice
may be an important part of any therapy programme. In the present study, we
created a video version of the treatment programme to be used for home
practice.
In sum, the goal of the present study was to train a patient with aphasia and
apraxia of speech to cue himself to verbally name items by using a combination
of written naming and tactile cues. We hypothesised that this treatment approach
would result in generalisation to words beginning with targeted phonemes
because the patient would be able to use the tactile cues independently.
Following Best and colleagues (2000), we predicted that treatment effects would
be maintained 6 weeks following termination of treatment because the cueing
hierarchy, though primarily phonological, required a high level of participation
from the patient.
METHOD
Participant
LN was a 49year-old man who suffered a stroke in the territory of the left
middle cerebral artery with subsequent haemorrhage into the left basal ganglia
approximately 4 years prior to the inception of the present study. He lived at
home with his wife and three children. Premorbidly, LN had no history of

TEACHING SELF-CUES 59

learning disability or language impairment. He was a college graduate who was a


vice-president at a large restaurant chain.
At the time of study, LN completed extensive speech-language testing. Both
informal measures and selected subtests from the Psycholinguistic Assessment of
Language (PAL; Caplan, 1992) were completed. Performance on PAL subtests
was compared to normative data for age-matched non-brain-damaged controls.
Performance was considered to be impaired if LNs score fell more than two
standard deviations below the mean. The major features of LNs performance on
the PAL are summarised below and in Table 1.
LN completed portions of the Boston Diagnostic Aphasia Examination (BDAE;
Goodglass, & Kaplan, 1983) 1 to 1.5 years prior to inception of the study. Given
the chronic nature of his impairment, this testing was not repeated at the time of
the study. The results of the BDAE are presented in Table 2. Note that LNs
performance on the BDAE was stronger than on the PAL, particularly on the
written confrontation naming subtests. This likely reflects the inclusion of a
wider range of stimuli and the manipulation of factors such as spelling
regularity, length, and word frequency on the PAL.
LN presents with moderatesevere nonfluent aphasia that affects both the
verbal and written modalities, as well as oral, verbal, and limb apraxias.1 He is
highly motivated to speak and resistant to using alternative modes of
communication such as augmentative devices.
Oral Language Comprehension
LNs auditory comprehension is functional for one to one conversation. Lexical
semantic access is generally within functional limits, but comprehension of
abstract and derived words is mildly impaired. Sentence comprehension is within
functional limits for syntactically simple or lexically constrained sentences but
breaks down for sentences of increased length and syntactic complexity.
Oral language and speech production
LNs verbal language production is characterised by severe anomia. Word length
and frequency did not have an impact on his performance. He was unable to
produce the first sound of words, but could identify the first letter of the word
from a field of three with 72% accuracy. He showed limited knowledge of the
underlying phonological representation of words. He was unable to access and
retrieve phonological representations, as evidenced by his performance on the
PAL Picture Homophone Matching subtest. Repetition of words and nonwords
was a relative strength. LNs speech and verbal language production is
characterised by articulatory groping, inconsistent phonemic substitution and
prolongation errors, and insertion of schwa during production of words.

60 APHASIOLOGY

Written language comprehension


Reading comprehension was assessed with subtests of the PAL. Testing was
completed 35 months prior to beginning this study as part of LNs therapy
programme. Given the chronic nature of LNs impairment, and the lack of
therapy addressing this modality in the interim, these data were considered to
reflect his level of performance at the time of study.
LNs ability to discriminate words and nonwords was mildly impaired. He
was moderately impaired in accessing the lexical-semantic system via the written
modality, especially for morphologically complex items.
TABLE 1 Performance on language pretesting: PAL subtests
Language processing
component
Auditory comprehension
1. Auditory lexical access
2. Auditory lexical access
(affixed words)
3. Lexical semantic access
4. Lexical semantic access

5. Lexical semantic access


(abstract words)
6. Lexical semantic access
(affixed words)
7. Sentence-level
processing

PAL subtest

Number (%, A) correct

PAL Lexical Decision


PAL Lexical DecisionDerived Words
PAL WordPicture
Matching
PAL Probe Verification:
a. Structural
b. Functional
PAL Relatedness
Judgement:
Abstract Words
PAL Relatedness
Judgement:
Affixed Words
PAL Sentence
Comprehension:
a. Syntactic Processing
b. Lexico-inferential
Processing

19/20 (95%, A = .98)


39/48 (81%, A = .88)*

Verbal language production


1. Access to lexical
PAL PictureHomophone
phonological
Matching
representations
2. Production of lexical
PAL Repetition: Words
phonological
representations

30/32 (94%)
a. 8/9 (88%)*
b. 7/9 (78%)*
16/20 (80%)*

15/20 (75%)*

27/40 (68%)*
a. 18/20(90%)
b. 9/20 (45%)

19/32 (59%)*

16/20 (80%)*

The Adult Apraxia Battery (Dabul, 1979) was administered 1 year following cessation
of the study. LN classified in the moderate range of apraxia on the basis of that test.
Performance at least partly reflects his strong repetition skills.

TEACHING SELF-CUES 61

Language processing
component

PAL subtest

3. Production of nonlexical PAL Repetition: Nonwords


phonological
representations
4. Access to and production PAL Picture Naming
of lexical phonological
representations
Written language comprehension
1. Visual lexical access
PAL Written Lexical
Decision
2. Lexical semantic access PAL Written WordPicture
Matching
3. Lexical semantic access PAL Written Attribute
Verification
4. Lexical semantic access PAL Written Relatedness
(affixed words)
Judgement: Affixed Words
5. Sentence-level
PAL Written Sentence
processing
Comprehension
Written language production
1. Production of lexical
Writing to Dictation
graphemic representations
2. Access to and production Written Naming
of lexical graphemic
representations

Number (%, A) correct


16/20 (80%)*

5/32 (16%)*

31/40(78%)*
27/32 (84%)*
39/48 (81%)*
12/20 (60%)*
22/40 (55%)

0/10**
1st Grapheme: 5/10 (50%)
0/10**
1st Grapheme: 6/10 (60%)

* > 2 standard deviations below the mean for age-matched non-brain-damaged controls.
Note that for the sentence comprehension subtest, norms are available for the
full test and not specific sentence types.
** Unable to compare to age-matched norms because the test was not administered in
full. Performance was severely impaired relative to normative sample, who
performed at > 95% on both writing tasks.
TABLE 2 Performance on language pretesting: BDAE subtests
BDAE subtest

Date administered (in months prior to Score


study)

1. Visual Confrontation Naming


(items per category)
a. Objects
b. Letters
c. Forms
d. Actions
e. Numbers
f. Colours
g. Body Parts

13

36/114
1/6
0/6
1/2
4/6
3/6
1/6
2/6

62 APHASIOLOGY

BDAE subtest

Date administered (in months prior to Score


study)

2. Responsive Naming
3. Verbal Agility
4. Word Repetition
5. Repeating Phrases
6. Written Confrontation Naming
7. Writing to Dictation

13
18
18
18
13
16

18/30
7/14
9/10
5/16
4/10
3/10

Written language production


LNs ability to write single words was assessed using informal measures based
on subtests of the PAL. Testing was limited to 10 items from each subtest to
minimise LNs frustration. He was unable to write words to dictation or
complete written naming, but he was able to write the first graphemes of words
with 5060% accuracy.
Summary of deficits
Comprehension. Access to the auditory input lexicon was better than access to the
visual input lexicon, as demonstrated by better scores on the auditory than
written version of the lexical decision task. The semantic system also followed
this pattern, with better performance on comprehension tasks presented in the
oral than written modality. The results from the verbal and written modalities
suggest that LNs semantic system is mildly impaired, with factors such as
morphological complexity and imageability affecting performance.
Production. LNs oral and written production are severely impaired at the
word level. His ability to access partial information about lexical forms is
stronger in the written than verbal modality. He is able to write the first letter of
words with greater accuracy than he is able to verbally produce first sounds, as
evidenced by his ability to identify and write the first letters of words in verbal
and written naming tasks.
Previous and concurrent treatments
Previous treatment included written and verbal naming tasks, verbal apraxia
drills, as well as training with augmentative devices such as communication
books and computer programs (C-Speak Aphasia, Nicholas & Elliott, 1999). LN
had demonstrated negligible gains in verbal naming of objects, although his
overall naming score on the BDAE showed some improvement (0/114 when
tested 24 months prior to the present study, compared to 36/114 when tested 13
months prior to the present study).

TEACHING SELF-CUES 63

During the course of the study, LN participated in 1 hour of group treatment


each week in addition to the study-related treatment sessions. The emphasis of
group treatment was on counselling and multi-modal communication, for
example, use of gestures and drawing. Special effort was taken to avoid treatment
of verbal or written naming in the group forum.
Treatment rationale
Verbal naming was the primary target of therapy because of LNs strong
motivation to pursue this mode of communication. The rationale for the
development of this treatment was based on the results of the pretesting and
several informal observations made during previous testing and treatment
sessions. Pretesting suggested that LNs partial lexical access was better in the
written than verbal modality. Although LN was able to identify or write the first
letters of words, he was unable to orally read what he had written, that is,
graphemic representations were not sufficient to trigger phonemic associations.
Informal observations indicated that he was able to benefit from tactile/placement
cues to facilitate articulatory placement. Additionally, minimal phonemic cues,
such as demonstration of placement without sound, were successful in eliciting
targets.
Treatment was designed to tap into LNs partial access to the written form of
lexical items so that he could use it to independently generate the first phoneme
of the word from tactile cues. Based on his performance in treatment and testing,
we hypothesised that the combination of the first letter of the word and a tactile
cue for the articulatory placement associated with that grapheme would be
sufficient for LN to generate the first sound of the word independently. Because
he was able to verbally name items with fairly minimal phonological information,
we hoped that the first sound would provide enough information for him to
verbally name the item.
Experimental stimuli
The target and control word lists comprised one- and two-syllable words judged
to be relevant to LN by his primary clinician. Target words began with initial
phonemes /d, f, t, k/, while control items began with phonemes /b, p, s, g/. These
phonemes were selected because they are considered to be in the mid-range of
difficulty for apraxic speakers (Johns & Darley, 1970). All items beginning with
the phoneme /k/ began with the grapheme c, and all items beginning with the
phoneme /s/ began with the grapheme s. These choices were made to facilitate
selection of items deemed to be relevant to LN. A total of 48 words were
included, with three examples of one- and two-syllable words in each phoneme
class. Target and control words were matched for frequency (Francis & Kucera,
1982) t(46) = .80, p = NS. A complete list of experimental stimuli is presented
in Appendix A.

64 APHASIOLOGY

Picture stimuli were colour photographs either downloaded from an internet


picture gallery or supplied by LNs family.
Treatment
Treatment consisted of a confrontation naming task using the modified cueing
hierarchy presented in Appendix B and described below. LN was not required to
make a verbal response until the last step of the hierarchy. If he verbally named
the item spontaneously or after writing it, he was required to go through each
step of the hierarchy in order to reinforce the self-cueing strategies. For each step
in the hierarchy, verbal feedback was provided for both correct and incorrect
responses. LNs most frequent error in response to general prompts was a failure
to respond, in which case the clinician moved on to the next step in the
hierarchy.
(1) The first step in the modified cueing hierarchy targeted written naming. LN
was first prompted to write the word using a general prompt such as, Can
you write it?. If he was unable to independently write the name of the
target, he was provided with a series of cues that included a choice of first
letters (from a field of three) and blanks (e.g., providing half of the letters in
a word and requiring him to fill in the rest). If he was unable to write the
word using these cues, the clinician wrote it and LN copied the word.
(2) The second stage of the hierarchy targeted use of tactile cues. LN was taught
to associate particular graphemes with placement cues. For example, the
letter c was associated with the index finger placed against the top of the
neck. A complete description of the placement cues is given in Appendix C.
If LN was unable to generate the cue independently, he was provided with
pictures of the clinician modelling the four cues and asked to select the
correct picture. If he failed to select the correct picture, the clinician
modelled the tactile cue.
(3) In the third step of the hierarchy, LN was provided with specific
phonological cues such as the first sound and syllable of the word. As a final
measure, the clinician named the item and LN repeated it.
Treatment application
Each of the 24 treatment items was presented in random order once per session.
Treatment lasted for 13 weeks and occurred once per week. Treatment sessions
were 1 hour in length. Treatment was conducted by a graduate student clinician
under the supervision of a certified speech language pathologist.

TEACHING SELF-CUES 65

Home practice programme


A video version of the treatment programme was made to permit more intensive
experience using the steps in the hierarchy to generate self-cues (practice) than
was permitted by weekly treatment sessions. In the video, LNs clinician
presented each step of the hierarchy in succession. LN was verbally prompted to
attempt each stage of the hierarchy and provided with visual prompts such as
pictures and choices of first letters for the items in the same way as was used in
treatment sessions. Pauses were left within the video to allow LN time to make
responses. All of the trained items were included in the home video given to LN
during the treatment phase of the study.
No family members were available for training to provide feedback to LN
during home practice or to provide data regarding his performance at home.
LN was not provided with specific instructions as to how often he should use
the video. Each week, LN brought the paper on which he had written the names
of items during home practice. Calculation of practice frequency was based on this
paperwork. LN practised using his home video four to five times each week.
In order to assess the effectiveness of the home video in the absence of
structured treatment sessions, LN practised 12 (50%) of the trained items using his
home video for 6 weeks following termination of the study. This was
accomplished by editing the home video so that it included only half of the target
items. Items to be practised were random with respect to initial phonemes and
success in naming during the course of structured treatment. Frequency of
practice was not monitored during this period, but LN reported that he continued
to practice four to five times per week on average.
Experimental design
An AB design using multiple baselines was used to examine the effects of
treatment. The behaviours of interest were measured repeatedly during a baseline
phase and then treatment was applied to one set of behaviours. A second set of
behaviours remained untreated and was used to evaluate generalisation effects of
treatment.
Baseline phase
Baseline verbal naming performance was documented over three sessions. LN
was presented with each of the 48 to-be-trained and control pictures in random
order and instructed to name each item. Pictures were presented until LN
provided a verbal label or stated I dont know. Responses were recorded
verbatim, including errors and the presence of gestures. Final responses were
coded as correct or incorrect for the purposes of calculating baseline measures of
performance.

66 APHASIOLOGY

Treatment phase
There were two parts to each treatment session. In the treatment portion of the
session, LN was presented with pictures of target items and asked to name the
pictured item. During treatment, LN was prompted to use self-cueing strategies
through application of the modified cueing hierarchy. Criteria for termination of
treatment was established as 80% accuracy in naming the target items.
In the end of session probes, LN was presented with pictures of target and
control items and asked to name the pictured items. LN was not prompted to use
any strategies, although he was permitted to do so. All responses, including
errors and the presence of gestures were recorded verbatim. Final responses were
coded as correct or incorrect. No feedback regarding accuracy was provided
during probes. In the interest of time, items were split such that half of the items
were probed each week.
Follow-up and post-testing
Maintenance. Naming of all treated and control items was probed 6 weeks
following termination of treatment.
Generalisation. Generalisation of treatment effects was assessed in three
ways. First, naming of the untrained, control items was assessed repeatedly
across the course of the study. Second, a confrontation naming task using novel
stimuli was employed at the cessation of treatment. Specifically, two groups of
novel stimuli beginning with the same phonemes as the original target and
control items were presented to LN to probe response generalisation effects of
treatment. These items were matched for frequency of occurrence, t(38) = .34,
NS, and are presented in Appendix D. Third, the verbal and written naming
subtests of the PAL were re-administered to determine whether naming ability
had changed as measured by standardised tests.
Reliability
To test reliability, 15% of all baseline and probe sessions were re-scored by an
independent examiner. Verbal naming responses were scored as correct or
incorrect. Average point-to-point agreement between observers was 98%, with a
range of 96100%.
Reliability of the independent variable, that is, of the determination of the
level of cueing required was also calculated by an independent observer for 15%
of the treatment sessions. Average point-to-point agreement was 92% with a
range of 8896%.

TEACHING SELF-CUES 67

RESULTS
Treatment
Figure 1 illustrates LNs performance on control and target items over time.
Table 3 summarises performance at baseline, immediately post-treatment, and 6
weeks posttreatment. At baseline, LN achieved an average of 4.2% correct for
items beginning with control phonemes and 6.8% for items beginning with target
phonemes. None of the items was named accurately more than once during the
baseline phase. Post-treatment results were obtained by averaging accuracy over
the final three treatment sessions. LN achieved 12.1% accuracy on control items
and 55.5% accuracy on target items. By the final three sessions, LN was also
able to write all of the target items with 100% accuracy. No data for written
naming were collected for control items.
Data from the treatment portion of each session were analysed to determine
how LNs responses changed over time. Figure 2 shows how the level of cue
required for LN to verbally name target items changed over time. Data were not
available from sessions 2 and 5. In the figure, spontaneous refers to situations in
which LN independently named items without use of any cueing strategies.
Instances in which LN named the item following application of the written

68 APHASIOLOGY

Figure 1. Percent correct of control and trained items named accurately at baseline (Bl,
B2, B3), during end of session probes (113), and in post-treatment assessment of
maintenance (M) and generalisation (G).

naming portion of the hierarchy (step 1 in the hierarchy) were coded as use of
graphemic cues. If LN verbally named the item after
TABLE 3 Percent correct on target and control items (end of session probes)
Word group

Baseline

Post-treatment

6 weeks post-treatment

Target
Control

6.8%
4.2%

55.5%
12.1%

25%
8.3%

being prompted to generate a tactile cue (step 2 in the hierarchy), the response
was coded as use of tactile cues. Verbal cues refer to phonemic cues ranging
from phonemes to word repetition (step 3 in the hierarchy). LN required fewer
verbal prompts over time, and provided more self-cues, either through writing
alone or by writing and using a tactile cue to generate a phonemic cue. Note that
he named items (when prompted to use strategies) with an average of 72%
accuracy over the last three sessions.
Maintenance
Post-testing following the 6week break showed some loss of treatment gains.
LN named 8.3% (2/24) of control and 25% (6/24) of target items. He wrote the
names of 21% of control words and 58% of target words. Items practised using

TEACHING SELF-CUES 69

Figure 2. Percent correct of trained items accurately named spontaneously and when
prompted to use graphemic, tactile, and verbal cues.

the home video were not more likely to be named verbally (1 of 6 targets named
correctly) or in writing (6 of 14 written correctly) than nonpractised items. No
spontaneous use of tactile cues was observed.
Generalisation
There was some evidence of generalisation to naming ability in general, as LNs
performance improved on the verbal and written naming subtests of the PAL
(Table 4). He
TABLE 4 Tasks administered at initial assessment and after 13 weeks of treatment
(percent correct)
Accuracy
Test

Pretreatment

Post-treatment

PAL: Verbal Naming


PAL: Written Naming

16% (N = 32)
0% (N = 10)

34% (N = 32)
16% (N = 32)

achieved 11/32 correct on the verbal naming test at time 2 compared to 5/32 at
time 1. On the written naming test, he achieved 5/32 correct compared to 0/10 at
pretesting, and successfully wrote the first letter on 3 additional items.

70 APHASIOLOGY

There was not strong evidence of generalisation to untrained items beginning


with target and control phonemes (i.e., novel items probed at post-treatment
only). Naming of items beginning with trained phonemes was 25% accurate (5/
20 items), while naming of items beginning with untrained phonemes was 15%
accurate (3/20 items).
DISCUSSION
This single-case study provides qualified support for a treatment programme that
targets self-generation of phonemic cues through the use of tactile cues and
partial access to the written form of words. Treatment gains were observed in
target compared to control words over the period of study. Although LN required
prompts to use cueing strategies, he was increasingly able to benefit from selfgenerated cues across treatment sessions. When compared to LNs baseline level
of performance, effects of treatment were observed 6 weeks following cessation
of the structured treatment phase. However, there was some loss of treatment
gains, as verbal naming at the 6week follow-up revealed a decline from the
level of performance observed at the end of the treatment period.
Data regarding generalisation of the treatment effects were inconclusive.
Response generalisation to untrained words beginning with trained phonemes
was difficult to assess in the context of this design. However, post-treatment
assessment of frequency-matched items (Francis & Kucera, 1982) revealed that
LN named the untrained exemplars of /k, d, f, t/ at a higher level of accuracy
(i.e., 25%) than was observed during baseline for the items used in treatment
(i.e., 6.8%). Response generalisation to untrained words beginning with
untrained phonemes was evaluated throughout the course of treatment (i.e.,
control items) as well as in a post-treatment assessment. Little change was
observed in the accuracy with which LN named /b, p, s, g/ items, indicating
negligible generalisation. A comparison of pre- and post-treatment performance
on the oral naming subtest of the PAL revealed improved accuracy. These
generalisation findings indicate that treatment may have had some effect on
verbal naming in general.
There were two components of self-cueing in this treatment programme:
writing the word and producing a tactile cue. Although LN benefited from tactile
cues when prompted to use them, he did not use them independently when
naming of target items was probed during treatment sessions or when
maintenance of treatment effects was tested following the 6week break. On the
other hand, there was evidence of generalisation to written naming, as
demonstrated by his improved performance on the PAL written naming subtest.
He also wrote names of items when maintenance of treatment effects was tested
after the break and had greater success naming words he was able to write. In
sum, the writing portion of the treatment programme appears to have been more
critical to the observed gains in verbal naming than the tactile cueing portion.

TEACHING SELF-CUES 71

LNs reliance on writing over tactile cues suggests a candidate explanation for
the lack of generalisation to novel stimuli beginning with targeted phonemes.
Hillis (1989) proposed that for patients with phonological impairments, the
effect of verbal naming treatments was to increase access to specific
phonological representations, thus limiting generalisation to untreated items. It
could be argued that Nickels (1992) overcame this limitation by providing a
strategy, that is, teaching grapheme-phoneme conversion to facilitate oral
reading, for verbally producing items that were not explicitly trained. In the
present study, we attempted to teach LN to use tactile cues as a strategy to
facilitate oral reading. His failure to use the tactile cues may have played a role
in the lack of generalisation to items beginning with targeted phonemes.
We had hoped that LNs partial lexical access in the written modality, in
combination with the tactile cues, would be sufficient to allow generalisation
beyond the trained items. We did not predict generalisation to written naming of
untrained items beginning with target phonemes, because their graphemic
representations would not necessarily be more accessible following treatment.
This may have limited the extent to which the tactile cues could be generalised to
novel items beginning with the targeted phonemes. However, verbal naming of
items whose graphemic representations were at least partially accessible to LN
may have shown improvement as a result of learning to use self-cueing
strategies. This is consistent with the evidence of generalisation to verbal naming
in general discussed above.
Along the same lines, the lack of a semantic component to the cueing
hierarchy may have limited the treatment programmes ability to produce more
robust generalisation (Nickels & Best, 1996). Although LNs primary impairment
appeared to be in the phonological output lexicon, he did show some mild
deficits in the semantic system. A treatment programme that also targeted
semantics may have resulted in generalisation to a novel set of stimuli
semantically related to the target items by improving access to phonological
representations through the semantic system.
It is unclear why LN showed little spontaneous use of the tactile cues, even for
trained items. Given their effectiveness in treatment sessions before and during
the present study, it is unlikely that LN found them not to be useful. LNs
resistance to alternative modes of communication may have been a factor.
Although writing is not typically used to facilitate verbal language, it is a
modality that most people use to communicate at least occasionally. In contrast,
tactile cues are not used by non-brain-damaged individuals. For this reason,
writing may have seemed more acceptable to LN than tactile cues. This raises the
possibility that patients who are more receptive to using tactile cues may show
greater treatment effects and generalisation.
Another aspect of the present study was the use of a home practice video to
increase the intensity of exposure to the treatment materials. Previous work has
suggested that trained family members can provide effective treatment, thereby
increasing intensity of treatment (Yampolsky & Waters, 2002). However, some

72 APHASIOLOGY

patients, such as LN, do not have family members who are able to take an active
role in treatment. The home video programme used in the present study provided
an opportunity for LN to have more intense exposure to treatment materials and
the selfcueing strategies. However, home practice was not effective in
maintaining treatment effects in the absence of structured treatment sessions.
Further research is necessary to investigate the frequency of structured sessions
necessary to maintain treatment gains. A possible clinical application would be
periodic treatment sessions with patients discharged from outpatient services to
review compensatory strategies and provide materials for home practice.
There are a number of limitations to the present study that limit the scope of
interpretation. First, the complete set of tactile cues were not explicitly taught to
LN prior to inception of the treatment programme. Specifically, he was not taught
the tactile cues associated with the control items. This may have limited his
ability to generalise the selfcueing strategies to the control items. As a result, our
ability to measure generalisation to untrained phoneme classes may have been
limited. Another method to assess generalisation would have been to include
nontrained items beginning with target phonemes in the verbal naming probes.
This would have allowed us to assess generalisation to nontrained stimuli during
the treatment period. Future research should address the issue of generalisation in
these ways.
A related issue is that we did not extend treatment to the control items in an
ABA design, due to limitations on the time course of the present study.
Replication of treatment effects with the control stimuli would have provided a
stronger demonstration of experimental control, strengthening the results of the
study.
Another limitation is that probes were conducted at the end of treatment
sessions, following exposure to the target items in the context of the modified
cueing hierarchy. Other researchers (e.g., Raymer et al., 1993, Wambaugh et al.,
2001) have conducted probes at the beginning of treatment sessions to avoid
inflating accuracy due to the recent exposure or deflating accuracy due to client
fatigue. Another issue related to exposure pertains to the greater number of times
that LN was exposed to target compared to control stimuli. Given these
confounds, it is difficult to isolate the source of observed treatment effects.
Future research should address these issues by probing performance on trained
and control stimuli prior to the treatment session and controlling for the number
of exposures to trained and control stimuli.
In sum, the present study suggests that a programme focusing on selfgeneration of phonemic cues can be an effective treatment approach for anomia.
Functionally, this approach may be most effective in building a repertoire of
trained items, with some generalisation to verbal naming in general.

TEACHING SELF-CUES 73

REFERENCES
Bashir, A. S., Grahjones, F., & Bostwick, R. Y. (1984). A touch cue method of therapy
for developmental verbal apraxia. Seminars in Speech and Language, 5(2), 127137.
Best, W., Hickin, J., Herbert, R., Howards, D., & Osborne, F. (2000). Phonological
facilitation of aphasic naming and predicting the outcome of treatment for anomia.
Brain and Language, 74(3), 435438.
Bruce, C, & Howard, D. (1988). Why dont Brocas aphasics cue themselves? An
investigation of phonemic cueing and tip of the tongue information.
Neuropsychologia, 26(2), 253264.
Caplan D. (1992). Language: Structure, processing, and disorders (pp. 403441).
Cambridge, MA: MIT Press.
Dabul, B. L. (1979). Apraxia Battery for Adults. Austin, TX: Pro-Ed.
Davis, A., & Pring, T. (1991). Therapy for word-finding deficits: More on the effects of
semantic and phonological approaches to treatments with dysphasic patients.
Neuropsychological Rehabilitation, 1(2), 135145.
Drew, R. L. & Thompson, C. K. (1999). Model-based semantic treatment for naming
deficits in aphasia. Journal of Speech Language and Hearing Research, 42,
972989.
Francis, W. N., & Kucera, H. (1982). Frequency Analysis of English Usage. Boston, MA:
Houghton Mifflin.
Goodglass, H., & Kaplan, E. (1983). The assessment of aphasia and related disorders.
Philadelphia, PA: Lea & Febiger.
Hillis, A. (1989). Efficacy and generalization of treatment for aphasic naming errors.
Archives of Physical Medicine and Rehabilitation, 70, 632636.
Hillis, A. (1998). Treatment of naming disorders: New issues regarding old therapies.
Journal of the International Neuropsychological Society, 4, 648660.
Johns, D. F., & Darley, F. L. (1970). Phonemic variability in apraxia of speech. Journal
of Speech and Hearing Research, 13, 556.
Miceli, G., Amitrano, A., Capasso, R., & Caramazza, A. (1996). The treatment of anomia
resulting from output lexical damage: Analysis of two cases. Brain and Language,
52, 150174.
Nicholas, M., & Elliott, S. (1999). C-Speak aphasia: A communication system for adults
with aphasia. Solana Beach, CA: Mayer-Johnson Co.
Nickels, L. (1992). The autocue? Self-generated phonemic cues in the treatment of a
disorder of reading and naming. Cognitive Neuropsychology, 9(2), 155182.
Nickels, L., & Best, W. (1996). Therapy for naming disorders (part I): Principles, puzzles,
and progress. Aphasiology, 10(1), 2147.
Raymer, A. M., Thompson, C. K., Jacobs, B., & Le Grand, H. R. (1993). Phonological
treatment of naming deficits in aphasia: Model based generalization analysis.
Aphasiology, 7(1), 2753.
Wambaugh, J. L., Linebaugh, C. W., Doyle, P. J., Martinez, A. L., Kalinyak-Fliszar, M.,
& Spencer, K. A. (2001). Effects of two cueing treatments on lexical retrieval in
aphasic speakers with different levels of deficit. Aphasiology, 15(10/11), 933950.
Yampolsky, S., & Waters, G. (2002). Treatment of single word oral reading in an
individual with deep dyslexia. Aphasiology, 16(2), 455471.

74 APHASIOLOGY

APPENDIX A
TRAINING & CONTROL STIMULI IN TREATMENT
PHASE
Target words

Control words

Camera
Coat
Coffee
College
Couch
Cup
Date
Desk
Dinner
Doctor
Dollar
Door
Family
Father
Finger
Fish
Food
Foot
Table
Teacher
Time
Tire
Tissue
Toast

Bank
Bill
Body
Book
Business
Butter
Garage
Garbage
Garden
Gas
Girl
Gum
Park
Pool
Pill
Paper
Police
Popcorn
Salt
Sister
Sock
Son
Subway
Summer

APPENDIX B
MODIFIED CUEING HIERARCHY
(1) Present picture & produce written form
a. General prompt, e.g., Can you write it?.
b. Choose first letter from field of three,
c. Fill in the blanks provided,
d. Clinician writes word.
(2) Generate tactile cue

TEACHING SELF-CUES 75

a. General prompt, e.g., What is the cue for that and what sound does it
make?,
b. Picture of cue presented,
c. Clinician demonstrates cue.
(3) Verbal naming
a. General prompt, e.g., What is it called?,
b. Phonemic cue.
c. Word provided.
APPENDIX C
DESCRIPTION OF TACTILE CUES
(1) /d/ Index finger bent and placed on upper lip. Thumb placed on the neck to
indicate voicing (and to distinguish from /t/).
(2) /t/ Index finger bent and placed on upper lip.
(3) /k/ Index finger placed at top of throat.
(4) /f/ Index finger bent and placed below lower lip.
APPENDIX D
STIMULI FOR GENERALIZATION PROBE
Target phonemes

Control phonemes

Cake
Candle
Card
Corn
Cow
Deer
Dentist
Diamond
Dog
Duck
Fan
Farm
Feather
Fork
Tail
Tape
Tent

Bat
Belt
Bird
Boat
Bottle
Gift
Goat
Golf
Guitar
Gun
Paint
Pie
Pillow
Police
Pot
Sink
Soap

76 APHASIOLOGY

Target phonemes

Control phonemes

Tie
Tulip

Soldier
Suitcase

Functional measures of naming in aphasia:


Word retrieval in confrontation naming
versus connected speech
Jamie F.Mayer and Laura L.Murray
Indiana University, USA

Background: Word-finding difficulties are central to aphasia and as


such have received a great deal of attention in aphasia research.
Although treatment for lexical retrieval impairments can be
effective, studies often use measurement of single-word performance
(e.g., confrontation naming) to support such claims. In contrast, what
matters most to patients with aphasia and their families is the ability
to converse. Few aphasia studies, however, have addressed word
retrieval in connected speech. Furthermore, one could debate
whether generating names for single pictured stimuli bears
resemblance to the online, multifaceted retrieval required during
conversation.
Aims: The purpose of this study was to assess the adequacy of
Percent Word Retrieval (%WR) as well as two supplementary
analyses, Percent Substantive Verbs (%SV) and Percent Corrected
Errors (%CERR), to depict word retrieval in connected and
conversational speech with respect to lexical class (noun vs verb) and
aphasia severity (mild vs moderate). Specifically, we examined: (1)
the relationship between lexical retrieval in confrontation naming,
composite description, and conversational samples; and (2) the
clinical utility and feasibility of %WR, %SV, and %CERR in
quantifying such data.
Methods & Procedures: A total of 14 individuals with aphasia,
divided into mild (n = 7) and moderate (n = 7) groups based on aphasia
severity, participated. Word retrieval was tested in three different
contexts: single-word confrontation naming, composite description,
and conversational speech. Lexical retrieval was analysed in each

78 APHASIOLOGY

context using the analyses described above (%WR, %SV, and %


CERR). The effects of context, grammatical class, and measurement
technique were explored using repeated measures ANOVA and
correlational analyses.
Outcomes & Results: Statistical analyses revealed a significant
effect of context for both %WR and %CERR, with superior lexical
retrieval and self-correction of errors in connected speech versus
single-word naming tasks. Moreover, %SV in conjunction with %
WR was sensitive to possible verb retrieval deficits undetected by %
WR alone, particularly for mild patients. Confrontation naming
scores were strongly related to aphasia severity classification (mild
vs moderate), but were not significantly correlated with naming
abilities in connected speaking tasks.
Conclusions: These findings endorse the incorporation of
discourse-level tasks into aphasia assessment and treatment
protocols. Use of simple and easily quantifiable measures (e.g., %
WR) may be an option to extend current methodology and reconcile
issues of ecological validity and clinical feasibility.

The widespread prevalence of word-finding difficulties in aphasia is well-known


(HelmEstabrooks, 1997; Larfeuil & Le Dorze, 1997). As such, the treatment of
word retrieval disorders has received a great deal of research attention compared
to remediation of other areas of language or communication. Although word
retrieval treatments can be effective (e.g., Osborne, Hickin, Best, & Howard,
1998), studies often use measurement of singleword performance (e.g.,
confrontation naming) to support such claims. In contrast, what matters most to
patients with aphasia and their families is the ability to converse (Boles, 1998;
Edwards, 1998); likewise, it is in such situations that aphasia is most intrusive
(Wilkinson et al., 1998). Furthermore, it could be contended that generating
names for single pictured stimuli bears little resemblance to the online,
multifaceted word retrieval required during conversation.
Few aphasia studies, however, have either addressed word retrieval in connected
or conversational speech (Doesborgh, van de Sandt-Koenderman, Dippel, van
Harskamp, Koudstaal, & Visch-Brink, 2002; Jordan, Ward, & Cremona-

Address correspondence to: Jamie F. Mayer, Dept. of Speech and Hearing


Sciences, 200 South Jordan Avenue, Bloomington, IN 47405, USA. Email:
jfmayer@indiana.edu
2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html DOI:10.1080/
02687030344000148

WORD RETRIEVAL IN CONNECTED SPEECH 79

Meteyard, 1997) or explored the relationship between confrontation naming and


conversational word retrieval abilities. The limited research conducted thus far
has produced conflicting results. For example, there are reports of patients with
aphasia demonstrating vastly superior word retrieval during confrontation
naming than during connected speech (Manning & Warrington, 1996; Schwartz
& Hodgson, 2002; Wilshire & McCarthy, 2002), as well as patients with the
inverse profile of better word retrieval in discourse than during confrontation
naming (Ingles, Mate-Kole, & Connolly, 1996; Pashek & Tompkins, 2002;
Zingeser & Berndt, 1990). Whereas some investigators have found nominal
correlation, in general, between confrontation naming and conversational speech
(e.g., Nicholas, Obler, Albert, & Helm-Estabrooks, 1985), others have concluded
that the relationship between confrontational versus discourse-level word
retrieval may vary as a function of aphasia classification (Williams & Canter,
1982, 1987) or error type (e.g., phonemic vs neologistic paraphasias; Vermeulen,
Bastiaanse, & Van Wageningen, 1989). Although findings from two studies
support a substantial relationship between confrontation naming and connected
speech (Brown & Cullinan, 1981; Hickin, Best, Herbert, Howard, & Osborne,
2001), methodological limitations (e.g., imprecise measures of conversation
word retrieval; insufficient description of patient characteristics such as aphasia
type, severity, and chronicity) restrict confident conclusions based solely on
these results.
Several theoretical perspectives have been posited for observed discrepancies
between confrontation naming and discourse-level word retrieval. These
explanations typically highlight the differential context-dependent, nonlinguistic
demands (e.g., attention, level of abstraction) and linguistic factors (e.g., syntax
formulation, pragmatic factors) inherent within each naming context (Berndt,
Mitchum, Haendiges, & Sandson, 1991a, 1997b; Edwards, 1998; Murray &
Karcher, 2000; Penn, 2000; Williams & Canter, 1987). For example, Pashek and
Tompkins (2002) suggested that semantic, phonologic, or syntactic priming in
addition to probabilistic lexical co-occurrence of words (p. 228) might
facilitate word retrieval in connected speech tasks over and above confrontation
naming. A number of single-case studies, in contrast, have implicated
differential impairments to various processing mechanisms to explain observed
contextual dissociations: for example, damage to nominal versus prepositional
speech routes (Manning & Warrington, 1996), interference during multiple
lexeme selection (Schwartz & Hodgson, 2002), or impaired lexical control
(Wilshire & McCarthy, 2002).
Given the disparate findings and theoretical rationales from previous research,
the intent of the current study was to examine further the possible relationship
between confrontation naming and word retrieval in the discourse of adults with
aphasia. Whereas there are several well-accepted tools with which to assess
single-word naming in aphasia (e.g., Boston Naming Test; Kaplan, Goodglass, &
Weintraub, 1983), there is less agreement regarding what measures are most
suitable for quantifying and qualifying word retrieval during discourse.

80 APHASIOLOGY

Conversational analysis (CA; Perkins, 1995) has a number of potential strengths


for qualifying conversational behaviour in that it imposes no structure on the data
except that of the conversation itself, allowing for an emphasis on ecological
validity (Osborne et al., 1998). It is this validity, however, that also underlies one
of CAs caveats: that is, [it] does not marry easily with quantification (Perkins,
1995, p. 372). Although attempts have been made to quantify aspects of CA such
as proportion of major conversational turns or number/types of repairs
(Crockford & Lesser, 1994; Osborne et al., 1998; Perkins, 1995), reliability data
for application of such measures are unavailable; furthermore, the time demands
of such analyses may exceed standards of clinical practicality (Crockford &
Lesser, 1994). Likewise, the reliability and clinical feasibility of Correct
Information Units (CIUs; Nicholas & Brookshire, 1993a, 1993b), a measure used
fairly frequently at least in aphasia research studies, have been questioned
(Oelschlaeger & Thorne, 1999). Therefore, the challenge remains to develop a
word retrieval measure that is clinically reliable, easily quantifiable, and
meaningful to patients with aphasia and their families.
One approach may be the direct measurement of the percentage of successful
word retrieval in conversation, quantified by dividing the number of wordfinding errors (e.g., paraphasias, circumlocutions, hesitations) by the total number
of content words produced. Use of a similar measure has been reported in a few
previous studies such as that by Hickin et al. (2001), who described their
application of lexical selection as a measure of the persons ability to retrieve
lexical items in conversation andhow often this process fails (p. 16). Their
data provided a good initial step towards the use of valid, conversational word
retrieval measures; however, many important factors (e.g., aphasia severity) were
unspecified, and Hickin et al.s primary focus on nouns, like that of the majority
of word-finding studies (but for exceptions see Kemmerer & Tranel, 2000a,
2000b; Murray & Karcher, 2000), precludes application of their conclusions
across lexical classes (e.g., to verbs).
More recently, Pashek and Tompkins (2002) explored contextual influences
on lexical retrieval in aphasia by quantifying noun and verb retrieval during
confrontation naming and video narration tasks, and by carefully controlling
several potentially influential variables (e.g., word class, frequency, and length;
lexical stimulus matching; aphasia type/severity). Whereas this level of control
is, admittedly, crucial in overcoming the methodological limitations inherent in
studying discourse-level skills (Crockford & Lesser, 1994; Marshall & Pound,
1997), it may come at the expense of ecological validity. Therefore,
generalisation of Pashek and Tompkins (2002) results across communication
situations (e.g., unstructured conversation), data collection options (e.g., online
scoring), or aphasia severity levels (e.g., moderate aphasia) may be open to
examination.
In summary, the clinical utility or feasibility of discourse-level word retrieval
measures has yet to be verified. Furthermore, the relationship between naming
abilities in single-word paradigms versus connected speech, and the theoretical

WORD RETRIEVAL IN CONNECTED SPEECH 81

basis of the association, remain unresolved. Accordingly, the purpose of this


study was to assess the adequacy of Percent Word Retrieval (%WR), as well as
two supplementary analyses, Percent Substantive Verbs (%SV; cf. Berndt et al.,
1997a, 1997b; Breedin, Saffran, & Schwartz, 1998) and Percent Corrected
Errors (%CERR; Larfeuil & Le Dorze, 1997), to depict word retrieval in
connected and conversational speech with respect to lexical class (noun vs verb)
and aphasia severity (mild vs moderate). Specifically, we examined: (1) the
relationship between lexical retrieval in confrontation naming, composite naming
(i.e., picture description), and question-elicited conversational samples; and (2)
the clinical utility and feasibility of %WR, %SV, and %CERR in quantifying
such data.
METHOD
Subjects
A total of 14 right-handed individuals with aphasia secondary to unilateral, lefthemisphere damage participated (see Table 1). All subjects were native speakers
of English and demonstrated hearing and visual skills adequate for the testing
protocol. The Western Aphasia Battery (Kertesz, 1982) was used to determine
aphasia type and severity (based on the Aphasia Quotient; AQ). Subjects were
then divided into mild (MI) (n = 7; mean AQ = 87.8) and moderate (MO) (n = 7;
mean AQ = 51.7) aphasia groups, using a classification system modified from
Shewan and Bandur (1986). The two groups differed significantly with respect to
AQ, t(12) = 7.18, p < .001, but not age, t(12) = .69, p = .50, or education, t(12) = .
60, p = .56.
Tasks
Word retrieval was tested in three different contexts, with the order of
administration randomised across participants. All subjects completed Sections 1
and 4 of the Test of Adolescent and Adult Word Finding (TAWF; German, 1990)
to assess labelling of pictured nouns (n = 37) and verbs (n = 21), respectively.
TAWF noun and verb stimuli were matched roughly for frequency of occurrence
and word length (German, 1990). Composite naming samples were elicited
through description of pictured scenes, which were constructed so as to depict a
series of events via a sequence of three sketches. Scene topics included
restaurant, beach, shopping mall, and classroom scenarios, with multiple
characters and activities depicted (e.g., the shopping mall scene characters
included Santa Claus, children, parents, and policemen; events included a child
pulling off Santas beard, policemen eating doughnuts, and a robbery). All
sketches were drawn by the first author for a separate study, piloted with a group
of normal adults, and found to elicit similar numbers of words and correct

82 APHASIOLOGY

TABLE 1 Subject demographic data

TCM = transcortical motor; * = retired.

information units (Murray, unpublished data). Finally, subjects participated in


brief conversations with the first author who initiated similar topics (e.g., family,
travel, occupation) across subjects to promote comparable transcripts. The
number of scenes described and conversational topics initiated across subjects
varied according to the amount of language elicited, with a range of one to three
three-scene sketches and/or conversational topics required in an attempt to elicit
a criterion number of words per context for each subject.
Data analyses
Sample collection. All connected speech samples were tape-recorded and
transcribed in standard orthography, with neologisms and phonemic paraphasias
transcribed phonetically. Following previously suggested guidelines (Larfeuil &
Le Dorze, 1997; Nicholas & Brookshire, 1993a; Vermeulen et al., 1989), the
first 300 words of each discourse context sample (composite and conversational
word retrieval) were tallied for further analysis. Unintelligible words as well as
real- and non-word fillers (e.g., well, um) were excluded from this count
(Nicholas & Brookshire, 1993b). Due to moderate verbal output deficits, two
subjects, MO2 and MO3, were unable to meet the 300word criterion; for these
speakers, sample sizes of 150200 words from each context were considered
acceptable (Berndt et al., 1997b; Brown & Cullinan, 1981). A third subject
(MO7) demonstrated moderate-to-severe initiation deficits in addition to his
moderate aphasia; for this individual, therefore, it was necessary to follow time-

WORD RETRIEVAL IN CONNECTED SPEECH 83

based guidelines (Boles, 1998; Crockford & Lesser, 1994), by which 10 minutes
of connected speech (25 words and 43 words for composite description and
conversation, respectively) were utilised for analyses. The resultant sample size,
although small, was considered representative of this subjects typical daily
output
Scoring procedures. A detailed list of scoring procedures is provided in the
Appendix. To determine %WR for the composite and conversational contexts,
the total number of words in each grammatical class (nouns and verbs), and the
number of wordfinding errors per class were tallied. Word-finding errors were
defined broadly using criteria adapted from Crockford and Lesser (1994),
Dollaghan and Campbell (1992), and Pashek and Tompkins (2002). Specifically,
words were counted in error under the following conditions: (a) preceded
immediately by a 2+ second pause, (b) preceded or accompanied by comments
indicating difficulty, (c) self-corrections, and (d) obvious semantic, phonemic, or
unrelated paraphasias. Following the initial, objective identification of such
errors, transcripts were re-read comprehensively for identification of more subtle
instances of word retrieval difficulty such as deletions (Kemmerer & Tranel,
2000b) or indefinite terms (Hickin et al., 2001; Nicholas et al., 1985). Such
errors were identified via a careful analysis of contextual factors (e.g., if a
subject produced an indefinite term such as thing, was a clear referent
available within the transcript?), with subjects being given the benefit of the
doubt in ambiguous situations. The number of successful word retrieval attempts
was divided by the total number of words in each class (i.e., correct attempts +
errors) and multiplied by 100 to yield the %WR score. The TAWF (i.e.,
confrontation naming context) was scored using percent correct measures (i.e.,
correct attempts total number of stimuli x 100) to promote comparable
measures across elicitation contexts.
Supplementary analyses. To provide supplementary information about word
retrieval during composite description and in conversation, two additional
analyses were undertaken: (1) proportion of substantive versus light verbs (%
SV) and (2) proportion of corrected versus uncorrected errors (%CERR). The
former (%SV) has been shown previously to address semantic complexity in
verb retrieval (Breedin et al., 1998). Briefly, verbs such as do, make, have, or go
may be conceived of as semantically simple, or primitive verbs, and in the
linguistic literature are referred to as light verbs. Other verbs, however, are
classified as heavy or substantive because they contain additional and more
specific semantic components (e.g., compare go with run), and are therefore
more complex (Breedin et al., 1998). Both composite naming and conversational
transcripts were analysed for the number of substantive verbs, which was divided
by the total number of verbs produced (i.e., substantive + light verbs) and
multiplied by 100 to yield %SV. The second supplementary measure, %CERR,
has been described as a means of gauging efficiency in word retrieval (Larfeuil &
Le Dorze, 1997). Any resolution of an episode of word-finding difficulty (e.g.,
after a 2 + second delay, revision) was noted for each subject with respect to

84 APHASIOLOGY

lexical class and naming context. Corrected errors were divided by the total
number of errors (corrected + unresolved) and multiplied by 100 to yield %
CERR.
RESULTS
Reliability and clinical feasibility
Approximately 20% of the data were randomly selected for reliability analyses.
Similar to the procedures of Oelschlaeger and Thorne (1999), raters (one
certified speechlanguage pathologist and two graduate students in speech and
hearing sciences) were provided with written instructions regarding the
application of %WR, %SV, and %CERR analyses to language samples. No
formal discussions of rules or rule interpretations were undertaken, allowing
raters to apply independently the scoring rules as written. Point-topoint interrater agreement was calculated for %WR, %CERR, and %SV in composite
naming and conversational speech samples, and ranged from 80.5% to 100% for
%WR, 88.2% to 90% for %SV, and 88.9% to 100% for %CERR. Intra-judge
reliability was calculated for each measure on another randomly selected subset
of the data 1 week following initial data scoring, and ranged from 88.9% to 97.
1% for %WR, 87.2% to 93% for %SV, and 82.0% to 100% for %CERR.
Although no formal time limit was given for inter-judge analyses, raters
reported requiring approximately 45 minutes to score each 300word transcript.
This time requirement included that committed to learning and applying a set of
pre-constructed printed scoring rules. The time required for intra-judge rescoring (i.e., given the first authors familiarity with scoring standards), on the
other hand, was approximately 15 minutes per transcript.
Relationships among speaking contexts and grammatical class
Results of a three-way repeated measures ANOVA yielded significant main
effects of aphasia severity, F(l, 12) = 73.86, p < .001, and speaking context, F(2,
24) = 20.61, p < .001, on word retrieval scores, with no significant interactions
between factors (see Figure 1). As expected, patients with mild aphasia outscored
those with moderate aphasia across all measures. Both groups exhibited superior
performance in composite naming and conversational speech contexts compared
to confrontation naming. Post-hoc paired ttests, with p set to .017 using the
Bonferroni correction, confirmed this observation. That is, composite noun, t(13)
= 3.33, p = .005, and verb scores, t(13) = 2.98, p = .011, were significantly
higher than TAWF noun/verb subtest scores; conversational noun, t(13) = 3.44, p
= .004, and verb scores, t(13) = 4.71, p < .001, followed a similar pattern. A
comparison of word retrieval across the two connected speaking contexts,
composite naming and conversational speech, revealed a significant difference

WORD RETRIEVAL IN CONNECTED SPEECH 85

between verbs, t(13) = 2.83, p = .014, but not nouns, t(13) = .91, p = .38.
Although visual inspection of subjects scores (see Figure 1) indicated a general
trend toward more accurate verb compared to noun retrieval across severity
groups and elicitation contexts, no main effect of lexical class was revealed, F(l,
12) = 3.39, p = .091.
Despite the significant differences yielded through ANOVA analyses, all
measures of word retrieval (TAWF scores and %WR) were highly and
significantly correlated across elicitation contexts (see Table 2). That is, subjects
with low TAWF scores were likely to perform poorly on word retrieval measures
in composite description and conversational speech, and subjects who scored
well on the TAWF demonstrated relatively higher word retrieval scores across
contexts. When the mild and moderate groups were analysed separately,
however, this effect disappeared. No correlation was significant for the mild
group, and only one comparison (conversational nouns to composite nouns) was
significant for the moderate group.
% Substantive Verbs
Statistical analyses indicated no main effect of connected speaking context
(composite description vs conversation) on subjects generation of substantive
verbs, F(l, 12) = 1.76, p = .21 (see Figure 2). There was, however, a significant
effect of severity, F(l, 12) = 17.28, p = .001, and a significant interaction
between context and severity, F(l,12) = 5.92, p = .032, such that mild subjects
produced significantly more substantive verbs in composite description
compared to conversation, whereas moderate subjects generated slightly more
substantive verbs in the conversational versus composite condition. Accordingly,
%SV scores for the composite naming condition correlated more strongly with
other verb retrieval measures (TAWF and %WR) compared to conversational %
SV scores (see Table 2); moreover, composite %SV scores differentiated
significantly between the two severity groups, F(l, 12) = 19.26, p = .001,
whereas conversational %SV scores did not, F(l, 12) = 4.07, p = .067.
% Corrected Errors
ANOVA results revealed a significant main effect of context, F(2,14) = 4.73, p = .
027, indicating that subjects were more likely to self-correct word-finding errors
in discourse contexts than during confrontation naming (see Figure 3). Whereas
no main effect of

86 APHASIOLOGY

Figure 1. The percentage of correct word retrieval in composite description and


conversational speech (%WR scores), in comparison with single-word (TAWF) naming
scores.

WORD RETRIEVAL IN CONNECTED SPEECH 87

TABLE 2 Correlations across groups (mild and moderate, n = 14) among TAWF scores,
%WR (composite description and conversation), and %SV (composite description and
conversation)

TAWF
nouns
TAWF
verbs
Comp
nouns
Comp
verbs
Comp
%SV
Conv
nouns
Conv
verbs

TAWF
nouns

TAWF
verbs

Comp
nouns
(%WR)

Comp
verbs
(%WR)

Comv
verbs
(%SV)

Comv
nouns
(%WR)

Conv
verbs
(%WR)

Conv
verbs
(%SV)

.95**

.78**

.80**

.84**

.84**

.80**

.68

.87**

.83**

.84**

.89**

.88**

.69**

.77**

.71**

.97**

.84**

.56

.85**

.79**

.97**

.49

.71**

.83**

.54

.84**

.61

.56

Correlations across groups (mild and moderate, n = 14) among TAWF scores, %WR
(composite description and conversation), and %SV (composite description and
conversation).
** Correlation is significant at p < .007 (Bonferroni correction).

grammatical class was noted, F(l, 7) = 0.47, p = .52, significant interactions were
found between context and class, F(2,14) = 4.97, p = .023, and context, class,
and severity, F(2, 14) = 11.33, p = .001. The nature of the interactions was such
that mild subjects tended to self-correct verbs more often than nouns during
composite naming and conversational speech, whereas moderate subjects tended
to do the opposite (increased self-correction of nouns compared to verbs). No main
effect of severity, F(1, 7) = 4.32, p = .08, was detected.
DISCUSSION
Whereas several measures have been proposed to analyse various aspects of
connected speech, quantification of word-finding difficulties in this context has
been often overlooked. Given the centrality of such difficulties to aphasia (Boles,
1998; Larfeuil & Le Dorze, 1997), the development of a measure to analyse
lexical retrieval in natural contexts appears essential. This study examined
several such measures (i.e., %WR, %SV, and %CERR) in an attempt to describe
clinically useful patterns and bridge a likely gap between frequently used singleword measures of word retrieval versus more complex, connected speaking
paradigms. Findings from the current study demonstrated a significant effect of
context, with superior word retrieval in connected speech compared to

88 APHASIOLOGY

Figure 2. Percent Substantive Verbs (%SV) across severity groups and connected
speaking contexts.

confrontation naming, and a nonsignificant trend towards lexical class effects,


with more accurate retrieval of verbs than nouns. Confrontation naming scores
were strongly related to aphasia severity classification, but did not predict
robustly composite or conversational word retrieval scores.
Clinical utility: Feasibility and reliability
Conversational data are difficult to quantify (Perkins, 1995), time-consuming
(Crockford & Lesser, 1994; Togher, 2001), of questionable reliability
(Brookshire & Nicholas, 1994; Oelschlaeger & Thorne, 1999; Osborne et al.,
1998), and confounded by the complex interaction of extraneous factors (Doyle,
Thompson, Oleyar, Wambaugh, & Jackson, 1994; Jordan et al., 1997). This
study proposed a simplistic measure, %WR, to quantify one aspect of an
extraordinarily complicated entity. Whereas the limited reliability measures
described herein do not purport to establish psychometric properties of %WR,
they do underscore the clinical utility of this metric. Oelschlaeger and Thorne
(1999), in their application of a %CIUs to conversation, noted that previously
published reliability standards for highly controlled clinical measures (e.g.,
standardised tests) may be unrealistic as applied to naturally occurring language
phenomena. On the other hand, clinical decision making that affects individuals
lives must, by definition, meet fairly high reliability standards (Crockford &
Lesser, 1994; Oelschlaeger & Thorne, 1999). Although no firm guidelines with

WORD RETRIEVAL IN CONNECTED SPEECH 89

Figure 3. Percent Corrected Errors (%CERR) across elicitation contexts. Note this
measure was inapplicable to the single-word (TAWF) verb scores of the mildly aphasic
group due to ceiling effects on this subtest (i.e., few opportunities for corrected error
scores). The n from which each percentage was determined varied from subject to subject
according to the number of word-finding errors elicited per context, with a range of 116
in the mild group and 445 in the moderate group.

90 APHASIOLOGY

respect to precise reliability standards have been established, the range of interand intra-rater reliability scores obtained in this study was similar to that
considered acceptable by Oelschlaeger and Thorne (i.e., > 80%). Importantly, the
fact that our raters were able to apply written scoring rules with at least 80%
accuracy in the absence of any formal training in or discussion of the measure
supports the straightforward and intuitive nature of %WR. It also is noteworthy
that given the ever-increasing pressure to increase clinical assessment efficiency,
these data were obtained in the context of feasible clinical time demands for both
initial testing and subsequent analysis (Crockford & Lessser, 1994). Finally, the
nature of %WR and supplementary analyses lends itself to the possibility of online data measurement (Hickin et al., 2001), a much-needed step in the
advancement of functional communication measures (Togher, 2001).
Relationships among contexts
Results of this study demonstrated a significant effect of context on word
retrieval, with enhanced performance of subjects during connected speaking
tasks compared to confrontation naming. Whereas initial analyses based on the
entire subject sample (n = 14) demonstrated a high correlation between lexical
retrieval in single-word and connected speaking tasks, subsequent separate
correlational analyses of the mild and moderately aphasic groups data failed to
detect such effects. That is, the high correlation obtained for the entire subject
sample appeared to reflect broad inter-group score differences between the mild
versus moderate groups (i.e., floor/ceiling effects), rather than a strong predictive
effect within each group. Because of the small n and relatively restricted range of
scores included in the separate group analyses, however, these results should be
interpreted with caution (i.e., within-group analyses may have reflected lower
statistical power to detect significant effects compared to the between-group
analysis). Nonetheless, single-word, TAWF scores, although highly predictive of
aphasia severity (mild vs moderate), did not appear strongly related to lexical
retrieval in composite naming and conversational speech within each aphasic
group.
The current identification of divergence between single-word confrontation
naming and discourse-level word retrieval is consistent with previous data
(Pashek & Tompkins, 2002; Williams & Canter, 1987), and has important
implications in terms of how researchers and clinicians should assess and treat
their patients with aphasia. Contemporary cognitive neuropsychological
approaches that promote specific, model-based treatments for hypothesised
deficits (Hillis, 1998; Raymer & Gonzalez-Rothi, 2001) and that have become
the focus of much recent research, primarily address language deficits at the
single-word level. In fact, few cognitive neuropsychological model-based
treatment studies have addressed discourse-level tasks, and those that have (e.g.,
McNeil, Doyle, Spencer, Jackson-Goda, Flores, & Small, 1997), reported
minimal transfer of single-word therapy gains to discourse contexts. Other

WORD RETRIEVAL IN CONNECTED SPEECH 91

aphasia treatment studies have similarly found that conversational gains may be
more resistant to treatment than less natural communicative behaviours (e.g.,
Larfeuil & Le Dorze, 1997; Murray & Karcher, 2000). Alternatively, work
utilising conversational analysis frameworks has demonstrated that aphasia
therapy targeted directly at conversational behaviours may create ecologically
valid change in daily interactions with communication partners (Hopper,
Holland, & Rewega, 2002; Lock et al., 2001; Wilkinson et al., 1998).
Collectively, the current and previous findings endorse incorporating discourselevel tasks into aphasia assessment and treatment protocols.
Further data are needed to affirm which of the two connected speaking
contexts utilised in this study best lends itself to accurate and valid assessment of
naming. Each context entails a number of benefits and caveats; for example,
composite description tasks have inherent limitations such as practice effects and
a lack of interactional opportunities (Shewan, 1988; Togher, 2001), but the
practical advantages of consistency and a priori targets (Hickin et al., 2001;
Shewan, 1988). Likewise, the interchange of natural communication is
theoretically the ideal setting for aphasia assessment and remediation;
nonetheless, difficulties associated with applying consistent analytical measures
to the conversational speech of aphasic patients are well recognised (Crockford &
Lesser, 1994; Edwards, 1998; Marshall & Pound, 1997). For example,
conversation may encourage various tactics (e.g., indefinite terms, anticipating/
avoiding difficult words) that allow the patient to maintain a socially acceptable
level of fluency in the face of severe difficulty in finding the right word
(Vermeulen et al., 1989, p. 262), thereby decreasing the likelihood of detecting
word-finding errors. In contrast, developing similar compensatory strategies has
been recommended as a desired and important therapeutic outcome (e.g.,
Holland, 1994). In such cases, %WR could nevertheless function as a critical
outcome measure: that is, albeit limited to the perspective of lexicalsemantic
output, %WR could quantify patients ability to acquire these strategies, and thus
function appropriately and meaningfully in naturalistic communicative
conditions.
The type of discourse task (i.e., composite description vs conversation)
appeared to affect specific aspects of lexical retrieval (i.e., accuracy or %WR;
semantic complexity of verbs or %SV), a finding consistent with previous
reports of variation in word retrieval according to numerous task-related
variables (e.g., Cooper, 1990; Doyle et al., 1994). The current findings suggested
a general trend towards more accurate verb retrieval in conversational contexts
compared to composite description, at the expense of semantic complexity,
particularly for mild subjects. Thus, the choice of which or how many discourselevel tasks to incorporate into assessment and treatment (composite description
and/or conversational samples) may be, in part, a function of ultimate treatment
goals with respect to, for example, verb naming or sentence construction.

92 APHASIOLOGY

Effect of grammatical class


Previous research is inconsistent and inconclusive with respect to grammatical
class effects in aphasic naming, with patterns of better noun than verb retrieval
(Breedin et al., 1998; Edwards, 1998; Williams & Canter, 1987), as well as the
inverse noted (Pashek & Tompkins, 2002; Zingeser & Berndt, 1990). The
current findings tentatively support the data of Pashek and Tompkins (2002), in
that a trend towards superior retrieval of verbs compared to nouns was observed
across mild and moderate patients with aphasia. These results may reflect
predominant characteristics of our sample of participants (e.g., fluency, lesion
site, aphasia type). That is, superiority of noun over verb retrieval has been
associated with agrammatic aphasia and anterior left hemisphere lesions;
conversely, the opposite pattern has been associated typically with anomic
aphasia and middle/inferior left temporal lesions (Damasio & Tranel, 1993;
Hillis, Tuffiash, Wityk, & Barker, 2002; Zingeser & Berndt, 1990). Of the 14
participants in this study, only 3 were judged to be agrammatic. Although
subjective analysis of these subjects data demonstrated a mild discrepancy in
patterns of noun/verb retrieval compared to the more fluent speakers (i.e.,
slightly higher noun than verb retrieval scores in the composite condition), the
fact that this discrepancy did not extend to the single-word and conversational
contexts is inconsistent with broad grammatical class differences among subjects
based on fluency/ aphasia type alone. Rather, the nonsignificant grammatical
class trend in this study is more likely a function of a variety of stimulus factors
such as word length, frequency of occurrence, semantic complexity (Berndt et
al., 1997a; Breedin et al., 1998; Pashek & Tompkins, 2002). Because this studys
intent was to explore a clinically feasible and ecologically valid discourse
measure, these factors were left to vary, and therefore, sound conclusions
regarding lexical organisation and processing cannot be drawn. A meaningful
implication of these results, however, is the inadequacy of limited grammatical
class inclusion (e.g., nouns only) when assessing naming in adults with aphasia.
Supplementary measures
The Percent Substantive Verbs measure (%SV) proved to be a relatively simple
means of extracting important lexical retrieval information from the language
samples. In fact, for several subjects, %SV was sensitive to possible deficits
undetected by %WR alone. For example, although two subjects, MI2 and MI5,
demonstrated 100% accurate verb retrieval in conversational speech, their %SV
scores of just 33% and 34%, respectively, indicated that these individuals relied
heavily on light verbs to communicate meaning (cf. Berndt et al., 1997a, 1997b;
Breedin et al., 1998). Thus, a decision to utilise %SV or %WR when assessing
word retrieval may be influenced by aphasia severity: %SV may be more useful
than, or an important complement to, %WR to gauge verb retrieval in mild
patients, given that many mild subjects were at ceiling with the latter measure.

WORD RETRIEVAL IN CONNECTED SPEECH 93

That composite description elicited higher %SV compared to conversation for


mild subjects (see Figure 2) also has assessment and treatment ramifications. For
example, one strategy to encourage conversational substantive verb production
may involve eliciting specific verbs during picture description tasks. Conversely,
the goal with moderate patients may be to encourage word retrieval by whatever
means necessary; in that case, %WR may provide more valuable information
than %SV regarding initial status and measurement of treatment outcomes.
It is noteworthy that ideal %SV values for non-brain-damaged individuals
have yet to be established. Although previous research has reported
approximately 6570% substantive verb production by control subjects (Berndt
et al., 1997b), these data were collected via sentence-level tasks and may be
inapplicable to expository and conversational speech samples. Future research,
therefore, should establish appropriate %SV norms by which to gauge the level of
lexical-semantic complexity of verbs produced by patients with aphasia.
The percentage of corrected word-finding episodes (%CERR) has previously
been utilised to look beyond lexical retrieval accuracy and examine lexical
retrieval efficiency (Larfeuil & Le Dorze, 1997). A caveat of the current study
was that a few mildly aphasic subjects had 100% accurate word retrieval in some
contexts, and consequently, %CERR could not be applied. Therefore, statistical
analyses were necessarily performed on a smaller data set, and thus, results
should be interpreted with caution. Nevertheless, the significant main effect of
context (i.e., higher %CERR in connected speaking vs singleword contexts)
dovetails with more accurate general lexical retrieval abilities, as measured by %
WR, in connected speaking than in confrontation naming tasks. That is,
connected speech facilitated not only general lexical retrieval (Pashek &
Tompkins, 2002), but also the efficiency of strategies to correct retrieval failures.
Conclusion
In summary, use of confrontation naming procedures to assess aphasia severity
and to demonstrate treatment progress is endorsed by a large number of
effectiveness studies, and reflects common, current clinical practice (Doesborgh
et al., 2002; Jordan et al., 1997). Clearly, single-word naming tests assess the
impairment level of functioning (i.e., structure/function limitations; World
Health Organisation, 2001), and yet are often used to make predictions about
daily communication abilities in patients with aphasia. Healthcare system
changes have fostered a growing awareness of our obligation to address directly
the functional (i.e., activity limitations) and personal (i.e., participation
restrictions) consequences of aphasia (World Health Organisation, 2001). The
issue has grown more complicated still, as the search for quantifiable
conversational measures has historically encountered numerous obstacles. Use of
a simple and easily quantifiable lexical retrieval measure, %WR, may be an
option to extend current assessment methodology and reconcile issues of
ecological validity and clinical feasibility. Although many questions regarding

94 APHASIOLOGY

the utility of this measure remain (e.g., feasibility of %WR for online
measurement, its sensitivity to measure treatment outcomes, performance of
nonbrain-damaged individuals as measured by this system, effects of complex or
abstract conversational topics), the theoretical implications and clinical
ramifications of the current data provide a solid basis for further exploration of %
WR and related measures (e.g., %SV, %CERR) in our continual efforts to gauge
legitimately the strengths and needs of our patients with aphasia.

REFERENCES
Berndt, R. S., Mitchum, C. C., Haendiges, A. N., & Sandson, J. (1997a). Verb retrieval in
aphasia, 1. Characterizing single word impairments. Brain and Language, 56,
68106.
Berndt, R. S., Mitchum, C. C., Haendiges, A. N., & Sandson, J. (1997b). Verb retrieval in
aphasia, 2. Relationship to sentence processing. Brain and Language, 56, 107137.
Boles, L. (1998). Conversational discourse analysis as a method for evaluating progress in
aphasia: A case report. Journal of Communication Disorders, 31, 261274.
Breedin, S. D., Saffran, E. M., & Schwartz, M. F. (1998). Semantic factors in verb
retrieval: An effect of complexity. Brain and Language, 63, 131.
Brookshire, R. H., & Nicholas, L. E. (1994). Testretest stability of measures of
connected speech in aphasia. Clinical Aphasiology, 22, 119133.
Brown, C. S., & Cullinan, W. L. (1981). Word-retrieval difficulty and disfluent speech in
adult anomic speakers. Journal of Speech and Hearing Research, 24, 358365.
Cooper, P. V. (1990). Discourse production and normal aging: Performance on oral
picture description tasks. Journal of Gerontology, 45, 210214.
Crockford, C., & Lesser, R. (1994). Assessing functional communication in aphasia:
Clinical utility and time demands of three methods. European Journal of Disorders
of Communication, 29, 165182.
Damasio, A. R., & Tranel, D. (1993). Nouns and verbs are retrieved with differently
distributed neural systems. Proceedings of the National Academy of Sciences, 90,
49574960.
Doesborgh, S. J. C., van de Sandt-Koenderman, W. M. E., Dippel, D. W. J., van
Harskamp, F., Koudstaal, P. J., & Visch-Brink, E. G. (2002). The impact of linguistic
deficits on verbal communication. Aphasiology, 16 (4/ 5/6/), 413423.
Dollaghan, C. A., & Campbell, T. F. (1992). A procedure for classifying disruptions in
spontaneous language samples. Topics in Language Disorders, 12, 5668.
Doyle, P. J., Thompson, C. K., Oleyar, K., Wambaugh, J., & Jackson, A. (1994). The
effects of setting variables on conversational discourse in normal and aphasic adults.
Clinical Aphasiology, 22, 135143.
Edwards, S. (1998). Single words are not enough: Verbs, grammar and fluent aphasia.
International Journal of Language and Communication Disorders, 33 (Supplement),
190195.
German, D. J. (1990). The Test of Adolescent and Adult Word-Finding. Austin, TX: ProEd.

WORD RETRIEVAL IN CONNECTED SPEECH 95

Helm-Estabrooks, N. (1997). Treatment of aphasic naming problems. In H.Goodglass &


A.Wingfield (Eds.), Anomia (pp. 189202). San Diego, CA: Academic Press.
Hickin, J., Best, W., Herbert, R., Howard, D., & Osborne, F. (2001). Treatment of word
retrieval in aphasia: Generalisation to conversational speech. International Journal
of Language and Communication Disorders, 36(Suppl.), 38.
Hillis, A. E. (1998). Whats in a name? A model of the cognitive processes underlying
object naming. In E.G. Visch-Brink & R.Bastiaanse [Eds.], Linguistic levels in
aphasiology (pp. 3548). San Diego, CA: Singular.
Hillis, A. E., Tuffiash, E., Wityk, R. J., & Barker, P. B. (2002). Regions of neural
dysfunction associated with impaired naming of actions and objects in acute stroke.
Cognitive Neuropsychology, 19 (6), 523534.
Holland, A. L. (1994). Cognitive neuropsychological theory and treatment for aphasia:
Exploring the strengths and limitations. Clinical Aphasiology, 22, 275282.
Hopper, T., Holland, A., & Rewega, M. (2002). Conversational coaching: Treatment
outcomes and future directions. Aphasiology, 16(7), 745761.
Ingles, J. L., Mate-Kole, C. C., & Connolly, J. F. (1996). Evidence for multiple routes of
speech production in a case of fluent aphasia. Cortex, 32(2), 199219.
Jordan, F., Ward, K., & Cremona-Meteyard, S. (1997). Word-finding in the conversational
discourse of children with closed head injury. Aphasiology, 11(9), 877888.
Kaplan, E., Goodglass, H., & Weintraub, S. (1983). The Boston naming test. Philadelphia:
Lea & Febiger.
Kemmerer, D., & Tranel, D. (2000a). Verb retrieval in brain-damaged subjects: 1.
Analysis of stimulus, lexical, and conceptual factors. Brain and Language, 73,
347392.
Kemmerer, D., & Tranel, D. (2000b). Verb retrieval in brain-damaged subjects: 2.
Analysis of errors. Brain and Language, 73, 393420.
Kertesz, A. (1982). Western Aphasia Battery. New York: Grune & Stratton.
Larfeuil, C., & Le Dorze, G. (1997). An analysis of the word-finding difficulties and the
content of the discourse of recent and chronic aphasic speakers. Aphasiology, 11(8),
783811.
Lock, S., Wilkinson, R., Bryan, K., Maxim, J., Edmundson, A., Bruce, C. et al. (2001).
Supporting partners of people with aphasia in relationships and conversation
(SPPARC). International Journal of Language and Communication Disorders, 36
(Supplement), 2530.
MacWhinney, B. (1995). The CHILDES project: Tools for analyzing talk (pp. 4145).
Hillsdale, NJ: Lawrence Erlbaum Associates Inc.
Manning, L., & Warrington, E. K. (1996). Two routes to naming: A case study.
Neuropsychologia, 34(8), 809817.
Marshall, J., & Pound, C. (1997). Difficulties with discourse. Aphasiology, 11(6),
625629.
McNeil, M. R., Doyle, P. J., Spencer, K. A., Jackson-Goda, A., Flores, D., & Small, S. L.
(1997). A doubleblind, placebo-controlled study of pharmacological and behavioral
treatment of lexical-semantic deficits in aphasia. Aphasiology, 11(4/5), 385400.
Murray, L. L., & Karcher, L. (2000). A treatment for written verb retrieval and sentence
construction skills. Aphasiology, 14, 585602.
Nicholas, L. E., & Brookshire, R. H. (1993a). A system for scoring main concepts in the
discourse of non-braindamaged and aphasic speakers. Clinical Aphasiology, 21,
8799.

96 APHASIOLOGY

Nicholas, L. E., & Brookshire, R. H. (1993b). A system for quantifying the


informativeness and efficiency of the connected speech of adults with aphasia.
Journal of Speech and Hearing Research, 36, 338350.
Nicholas, M., Obler, L. K., Albert, M. L., & Helm-Estabrooks, N. (1985). Empty speech
in Alzheimers disease and fluent aphasia. Journal of Speech and Hearing Research,
28, 405410.
Oelschlaeger, M. L, & Thorne, J. C. (1999). Application of the Correct Information Unit
analysis to the naturally occurring conversation of a person with aphasia. Journal of
Speech, Language, and Hearing Research, 42, 636648.
Osborne, F., Hickin, J., Best, W., & Howard, D. (1998). Treating word-finding
difficulties: Beyond picturenaming. International Journal of Language and
Communication Disorders, 33 (Supplement), 208213.
Pashek, G. V., & Tompkins, C. A. (2002). Context and word class influences on lexical
retrieval in aphasia. Aphasiology, 16(3), 261286.
Penn, C. (2000). Paying attention to conversation. Brain and Language, 71, 185189.
Perkins, L. (1995). Applying conversation analysis to aphasia: Clinical implications and
analytic issues. European Journal of Disorders of Communication, 30, 372383.
Raymer, A. M., & Gonzalez-Rothi, L. J. (2001). Cognitive approaches to impairments of
word comprehension and production. In R.Chapey (Ed.), Language intervention
strategies in aphasia and related neurogenic communication disorders
(pp. 524550). Philadelphia, PA: Lippincott, Williams, & Wilkins.
Schwartz, M. F., & Hodgson, C. (2002). A new multiword naming deficit: Evidence and
interpretation. Cognitive Neuropsychology, 19(3), 263288.
Segalowitz, S. J., & Lane, K. C. (2000). Lexical access of function versus content words.
Brain and Language, 75(3), 376389.
Shewan, C. M. (1988). The Shewan Spontaneous Language Analysis (SSLA) system for
aphasic adults: Description, reliability, and validity. Journal of Communication
Disorders, 21, 103138.
Shewan, C. M., & Bandur, D. L. (1986). Treatment of aphasia: A language-oriented
approach (pp. 243259). London: Taylor & Francis.
Snow, P., Douglas, J., & Ponsford, J. (1995). Discourse assessment following traumatic
brain injury: A pilot study examining some demographic and methodological issues.
Aphasiology, 9, 365380.
Togher, L. (2001). Discourse sampling in the 21st century. Journal of Communication
Disorders, 34, 131150.
Vermeulen, J., Bastiaanse, R., & Van Wageningen, B. (1989). Spontaneous speech in
aphasia: A correlational study . Brain and Language, 36, 252274.
Wilkinson, R., Bryan, K., Lock, S., Bayley, K., Maxim, J., Bruce, C. et al. (1998).
Therapy using conversation analysis: Helping couples adapt to aphasia in
conversation. International Journal of Language and Communication Disorders, 33
(Supplement), 144149.
Williams, S. E., & Canter, G. J. (1982). The influence of situational context on naming
performance in aphasic syndromes. Brain and Language, 17, 92106,
Williams, S. E., & Canter, G. J. (1987). Action-naming performance in four syndromes of
aphasia. Brain and Language, 32, 124136.
Wilshire, C. E., & McCarthy, R. A. (2002). Evidence for a context-sensitive word retrieval
disorder in a case of nonfluent aphasia. Cognitive Neuropsychology, 19(2), 165186.

WORD RETRIEVAL IN CONNECTED SPEECH 97

World Health Organisation (2001). ICIDH2: International Classification of


Functioning, Disability and Health. Geneva, Switzerland: WHO.
Zingeser, L. B., & Berndt, R. S. (1990). Retrieval of nouns and verbs in agrammatism and
anomia. Brain and Language, 39 (1), 1432.

APPENDIX
%WR SCORING PROTOCOL
Count the first 3001 words of the sample (Larfeuil & Le Dorze, 1997;
Vermeulen et al., 1989).
Count the number of nouns and verbs within the 300word corpus, with the
following exceptions:
(a) Modalising speech2 (cf Larfeuil & Le Dorze, 1997) is excluded from
further analysis (i.e., noun/verb counts). This is to prevent artificial
inflation of a speakers noun/verb output.
(b) Nouns and verbs that are part of circumlocutions, self-corrections, or
repetitions/stalling are counted only in initial form (Larfeuil & De Lorze,
1997; Vermeulen et al., 1989).
(c) If a word is repeated for emphasis (e.g., shark, shark, shark!) or to
denote different items (e.g., lifeguardlifeguard (pointing to one, then
the other)), it is counted each time; for other instances of repetition (e.g.,
stalling), the word is counted just once (MacWhinney, 1995).
(d) The verb to be is not counted (Breedin et al., 1998).
(e) Pronouns and prepositions are not counted (Segalowitz & Lane, 2000).
(f) Numerals are not counted as nouns (Segalowitz & Lane, 2000).
Noun and verb word-finding episodes/errors include:
(a) Words immediately preceded by prolonged (filled or unfilled) hesitation
(2+ seconds; Crockford & Lesser, 1994; Dollaghan & Campbell, 1992;
Pashek & Tompkins, 2002); if the pause is utteranceinitial, however, it is
ignored (due to the possibility of sentence construction deficits).3
(b) Words preceded or accompanied by comments indicating difficulty.
(c) Self-corrections (Pashek & Tompkins, 2002).
(d) Paraphasias (semantic, phonemic, or unrelated).
(e) Deletions (Kemmerer & Tranel, 2000b): e.g., obvious deletion of a
syntactic constituent (main verb, object noun) in a required context.4
5

98 APHASIOLOGY

(f) Overuse of indefinite terms (Nicholas eet al., 1985). (g) Overuse of
pronouns, or pronouns without antecedents (Hickin et al., 2001; Nicholas
et al., 1985).

1 If 300 words were not produced in the sample, the first 150200 words were used
(Berndt et al., 1997; Brown & Cullinan, 1981).
2 Larfeuil and LeDorze (1997) defined modalising speech as all utterances in which
the speaker includes himself/herself in the discourse[e.g.], verbs and verbal phrases
whose function is not to express the speakers feelings but rather to predicatee.g., I
think that (p. 788).
3 Pashek and Tompkins (2002) noted possible confounds of the 2+ second rule for
indicating word-finding difficulty; they suggest that hesitations be analysed relative to
overall rate of speech. Therefore, language samples of patients whose response times or
speaking rates were judged clinically to be slow were analysed such that relative
hesitations (i.e., > 2 seconds longer than average pause time between words in fluent
speech) were noted rather than 2+ second pauses, per se.
4 With the exception of obvious deletions, additional errors of grammaticality (e.g., I had
going) were not considered errors of word retrieval if the subject demonstrated
evidence of having retrieved the correct lexical constituent.
5 The inclusion of indefinite terms [see (f)] is controversial in some respects, as normal
speakers have been noted to utilise such terms nearly as often as brain-injured speakers in
some cases (Snow, Douglas, & Ponsford, 1995). Therefore, the appropriateness or
inappropriateness of such terms in the context of the discourse (i.e., whether or not an
intended referent could be interpolated) was noted prior to definitive scoring. Subjects
were given the benefit of the doubt in ambiguous situations.

Narrative and conversational discourse of


adults with closed head injuries and nonbrain-injured adults: A discriminant
analysis
Carl A.Coelho
University of Connecticut, and Hospital for Special Care, New
Britain, CT, USA
Kathleen M.Youse and Karen N.Le
University of Connecticut, USA
Richard Feinn
University of Connecticut Health Center, Farmington, USA
Background: Although there is general agreement regarding the
clinical utility of discourse analyses for detecting the often subtle
communicative impairments following closed head injuries (CHI),
there is little consensus regarding discourse elicitation or analysis
procedures. Consequently it has been difficult to compare findings
across studies.
Aims: In an effort to facilitate a movement towards the adoption of
a more consistent methodology for the assessment of discourse
abilities, the current study examined several commonly used
measures of discourse performance and the accuracy with which
these measures were able to distinguish individuals with CHI from
non-brain-injured (NBI) controls. Previous studies have suggested
that conversation is less demanding than narrative discourse because
such narratives require greater manipulation of extended units of
language while conversational discourse can be maintained with
minimal responses (Chapman, 1997; Galski, Tompkins, & Johnston,
1998). On the basis of these reports it was hypothesised that the
measures of narrative story performance would more accurately
discriminate the participant groups than conversational measures.
Methods & Procedures: Discourse samples were elicited from 32
adults with CHI and 43 NBI adults. Discourse samples included two
story narratives, generation and retelling, and 15 minutes of
conversation. A variety of discourse analyses were performed
including story narrative measures of grammatical complexity,
cohesive adequacy, and story grammar. Measures of conversation
included appropriateness and topic initiation. Discriminant function

100 APHASIOLOGY

analyses (DFA) were then employed to determine the accuracy of the


selected measures in classifying the participants into their respective
groups.
Outcomes & Results: Results of the DFA with only the story
narrative measures indicated that 70% of the cases, 64.5% of the
CHI group, and 74.4% of the NBI group were accurately classified.
This finding was not significant, suggesting that the story narrative
measures did not reliably discriminate the CHI from the NBI
participants. The DFA with the conversational measures correctly
classified over 77% of the cases, 78.1% of the CHI participants, and
72.1% of the NBI group. This finding was significant, which
suggests that the measures of conversational discourse were better
able to discriminate the participant groups. A third DFA was
performed, with all of the story narrative and conversational
discourse measures included, which revealed that the conversational
measures, comments and adequate plus responses, and the story
narrative measure, T-units within episode structure in the generation
task, made the greatest contributions to discriminating between the
groups. Overall, group membership was correctly classified by the
DFA in 81% of the cases, 84.4% of the CHI group, and 77.5% of the
NBI participants. This finding was significant, suggesting that these
three discourse measures discriminated the two participant groups
with the highest degree of reliability.
Conclusions: These findings did not support the hypothesis that
the narrative discourse measures would more accurately predict
group membership of the CHI and NBI participants than the
conversational measures. A variety of factors may account for these
findings including the interactive nature of conversation as well as
social factors which appear to make this genre more difficult for
individuals with CHI and a more sensitive index of their cognitivecommunicative impairments.

Address correspondence to: Carl A. Coelho PhD, Communication


Sciences Dept, University of Connecticut, Unit 1085, Storrs,
Connecticut 062681085, USA. Email: coelho@uconn.edu This
project was supported by grants from the University of Connecticut
Research Foundation, and the Hospital for Special Care.
2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000111

DOI:10.1080/

NARRATIVE AND CONVERSATIONAL DISCOURSE 101

The clinical utility of including discourse analyses in the assessment procedures


for cognitive-communicative deficits secondary to closed head injury (CHI) in
adults has been documented by a variety of recent investigations (e.g., Coelho,
Liles, & Duffy, 1991, 1995; Hartley & Jensen, 1991; Mentis & Fruiting, 1987;
Snow, Douglas, & Ponsford, 1995, 1997). Although there is general agreement
among these studies regarding the sensitivity of discourse analyses for detecting
the often subtle communicative impairments following CHI, there is little
consensus regarding discourse elicitation or analysis procedures. Consequently it
has been difficult to compare findings across studies. For example, two recent
studies compared the discourse performance of CHI and non-brain-injured (NBI)
controls. In the first study (Coelho, 2002) narratives were elicited in two story
tasks, retelling and generation, from two groups of adults, 55 CHI and 47 NBI.
Narratives were analysed at the levels of sentence production, cohesive adequacy,
and story grammar. Discourse performance was then compared across groups
and tasks. Results indicated that two measures distinguished the groups. The
CHI participants produced significantly fewer words per T-unit and fewer Tunits within episode structure than the NBI group. In addition significant
differences were noted for all five discourse measures (words per T-unit,
subordinate clauses per T-unit, cohesive adequacy, number of complete
episodes, and proportion of T-units within episode structure) across the story
tasks. All participants, CHI and NBI, produced longer and more grammatically
complex T-units in the story generation task than in story retelling. However,
cohesive adequacy and story grammar were better in the story retelling task as
compared to the story generation task. In the second study (Coelho, Youse, &
Le, 2002) samples of conversation were elicited from 32 individuals with CHI
and 43 NBI adults, and analysed for various dimensions of appropriateness and
topic initiation. Findings indicated that the CHI group produced significantly
fewer comments (i.e., utterances for which no response was explicitly
demanded) than the NBI participants. In addition, the CHI participants produced
significantly more adequate plus responses (i.e., providing more information than
was requested) than the NBI group.
The findings of these studies illustrate some of the difficulties involved in the
measurement of narrative ability, namely the pragmatic nature of the task.
Narrative performance may be influenced by a variety of contextual parameters
such as: listener characteristics, elicitation procedures, presentation medium,
complexity of content, structural complexity, social function, and manner of
textual coherence (Liles, Duffy, Merritt, & Purcell, 1995; Togher, 2001). In
addition to contextual influences, the measurement of narrative performance is
further complicated by the multiplicity of measures that have been applied. For
example, narrative ability may be described in terms of the speakers social role,
cognitive organisation, linguistic structure of the text, and sentence-level
complexity (Liles et al., 1995; Togher, 2001).
In an effort to facilitate a movement towards the adoption of a more consistent
and clinically efficient methodology for the assessment of discourse abilities, the

102 APHASIOLOGY

current study examined several commonly used measures of discourse


performance in story and conversational narratives. Specifically, we were
interested in the accuracy with which these measures were able to distinguish
individuals with CHI from NBI controls. The measures applied to story
narratives included within-sentence analyses such as grammatical complexity
and between-sentence analyses such as cohesive adequacy and story grammar.
Conversational measures included dimensions of response appropriateness and
topic initiation. Discriminant function analyses were then employed to determine
which of the selected measures were most effective at classifying large groups of
adults with CHI and NBI adults.
Ylvisaker, Szekeres, and Feeney (2001) have suggested that discourse
proficiency involves an interaction of cognitive and linguistic organisational
processes. Story narratives were selected for study because they provide the
opportunity for the analysis of two separate sources of information related to
cognitive and linguistic levels of narrative organisation. The first level, macroorganisation, relates to story grammar. At this level information is organised in
terms of how intentions and events are logically related in time or through cause
effect relations reflected in general experience (Liles et al., 1995, p. 423).
Because the interpretation of content is facilitated by the speaker/ listeners
access to general cognitive schemata, this level of narrative organisation is
hypothesised to go beyond the content of a specific text (Mandler, 1982). The
second organisational level, micro-organisation, involves linguistic organisation
of the text, both within and across sentence boundaries. At this level the text is
processed as a closed unit (Liles et al., 1995).
The complexities of conversational discourse have been well described in a
number of studies (Doyle, Goda, & Spencer, 1995; Mackenzie, 2000; Togher,
2001; Togher, Hand, & Code, 1999; Wilkinson, 1999). Effective participation in
a conversation is dependent on a variety of factors such as: topic maintenance,
turn taking, appropriate referencing, sensitivity to the conversational partner, and
general cognitive abilities such as attention, vigilance, and memory. Somewhat
contrary to this view, other studies have suggested that conversation is less
demanding than narrative discourse, because such narratives require greater
manipulation of extended units of language while conversational discourse can
be maintained with minimal responses (Chapman, 1997; Galski et al., 1998).
In the present study we hypothesised that the measures of narrative story
performance would more accurately discriminate the participant groups than
conversational measures. It was predicted that because proficiency in story
narrative production is heavily dependent on cognitive and linguistic
organisational skills, this discourse genre would be more sensitive to the
cognitive-linguistic dysfunction associated with CHI than conversational
discourse. It was also hypothesised that entering all of the narrative and
conversational discourse measures into a single step-wise discriminant function
analysis would increase the accuracy with which the participant groups could be
discriminated.

NARRATIVE AND CONVERSATIONAL DISCOURSE 103

METHOD
Participants
CHI. A total of 32 native speakers of English who had sustained a CHI were
studied. Participants were selected because they had recovered a high level of
functional languagethat is, they had achieved fluent conversation and did not
demonstrate any significant deficits on traditional clinical language tests. In
addition, participants were recruited to represent a range of socioeconomic
backgrounds (see below).
All CHI participants met the following criteria: (a) no reported history of
substance abuse or psychiatric illness; (b) visual acuity and visual perceptual
abilities adequate to distinguish stimulus materials as determined by screening
procedures; (c) hearing acuity adequate to follow directions in each task as
determined by screening procedures; (d) an aphasia quotient (AQ) from the
Western Aphasia Battery (Kertesz, 1982) above 93; (e) no significant motor
speech disorder as determined by an experienced speech-language pathologist;
(f) Rancho Los Amigos Level of Cognitive Functioning (Hagen, Malkmus, &
Durham, 1980) of VII (automatic-appropriate) or above; (g) Galveston
Orientation and Amnesia Test (Levin, ODonnel, & Grossman, 1979) score of 75
or above; and (h) a score of 120 or above on the Dementia Rating Scale (Mattis,
1976), a general screen of cognitive processing. The CHI group consisted of 8
females and 24 males ranging in age from 1669 years (mean31.7 years). Four
members of this group were African-American and the remainder Caucasian.
Years of education for the CHI group ranged from 1021 (mean = 13.2). The CHI
participants were also assigned to one of three socioeconomic groups:
Professional, Skilled Worker, or Unskilled Worker on the basis of the
Hollingshead rating (Hollingshead, 1972) (see Coelho et al., 2002 for a
description). The group consisted of 11 professionals, 10 skilled workers, and 11
unskilled workers. All of the CHI participants injuries were rated as either
moderate (duration of coma less than 6 hours) or severe (duration of coma greater
than than 6 hours) on the basis of criterion established by Lezak (1995). Time
post onset ranged from 199 months (mean = 12.8 months).
NBI. A total of 43 hospital employees, working in a variety of capacities, who
were native speakers of English made up the NBI group. No individual in this
group reported a history of neurologic or psychiatric disease, or substance abuse.
NBI participants were also selected on the basis of socioeconomic level.
Attempts were also made to match these individuals, as closely as possible, with
the CHI participants on the basis of age and gender. There were 30 males and 13
females studied, ages ranged from 1663 years old (mean = 31.9 years). Two
individuals from this group were African-American and 41 Caucasian. Level of
education ranged from 1124 years (mean = 15.3). With regard to
socioeconomic status, the NBI group consisted of 15 professionals, 10 skilled
workers, and 18 unskilled workers.

104 APHASIOLOGY

Discourse elicitation procedures


Two genres of discourse were elicited from all participants: stories under two
conditions, Retelling and Generation, and conversation.
Story retelling task. Subjects were presented the picture story, The Bear and
the Fly (Winter, 1976), by filmstrip projector on a 23 cm x 30.5 cm screen. The
picture story has 19 frames with no soundtrack. After viewing the filmstrip the
subjects were given the following instruction: Tell me that story.
Story generation task. Subjects were presented with a copy of the Norman
Rockwell painting, The Runaway. The subjects were given the following
instruction: Tell me a story about what you think is happening in this picture.
The picture remained in view of the examiner and subject until the task was
completed.
Conversation. Each of the individuals, CHI and NBI, was individually brought
into a quiet room by the examiner. He introduced himself to each participant and
stated that he was interested in learning more about conversational behaviour.
Each participant was then engaged in a 15minute conversation. The examiner
and co-interactor in each of the conversations was a 42year-old Caucasian male
with approximately 22 years of education working as a speech-language
pathologist. The examiner was essentially a stranger to all of the individuals with
CHI and NBI participants prior to the conversations. Most conversations were
initiated by the examiner with the question Why are you here at the hospital
today?. Each conversation was audiotaped and each recording transcribed
verbatim with each utterance being assigned to one of the speakers (examiner or
participant).
Data collection
Each story and conversation was audiotaped and later transcribed verbatim.
Transcriptions of the stories were distributed into T-units (i.e., an independent
clause plus any subordinate clauses associated with it) prior to analysis,
following the conventions described by Liles (1985). For the conversations each
utterance was assigned to one of the speakers, examiner or participant. Any
discourse samples that were judged to be inconsistent with the intended elicited
genre (e.g., production of a narrative description instead of a story, or an
extended monologue instead of conversation) were excluded from analysis.
Analyses of story narratives
The narrative discourse analysis procedures, including reliability measures,
employed in the present study have been explained in detail elsewhere (see
Coelho, 2002). Therefore the analyses are only briefly described below.
Within-sentence. Two measures of sentence production were examined and
compared across tasks and groups:

NARRATIVE AND CONVERSATIONAL DISCOURSE 105

(1) Number of Words per T-unit.


(2) Number of Subordinate Clauses per T-unitthe total number of subordinate
clauses in each story was obtained and divided by the total number of Tunits. The frequency of subordinate clause use may be considered a measure
of the complexity of sentence-level grammar.
Between-sentence. Between-sentence measures included:
(1) Cohesive AdequacyThe measure of cohesive adequacy used in this study
was Percent Complete Ties out of Total Ties. Cohesive ties pertain to how
meaning is conjoined across sentences. A word is considered to be a
cohesive tie if the listener must search outside the sentence for the
completed meaning. Three categories of adequacy were used: complete,
incomplete, and erroneous.
(2) Story GrammarTwo measures of story grammar performance were
employed in this study: (a) Number of Total Episodes: number of complete
and incomplete episodes, considered to be a measure of content organisation;
and (b) Proportion of T-units Contained within Episode Structure (T-units in
episode structure/total T-units). An episode consists of (a) an initiating event
that prompts a character to formulate a goal, (b) an action, and (c) a direct
consequence marking attainment or nonattainment of the goal.
Analyses of conversation
The procedures for the analysis of conversation, as well as reliability measures,
have been discussed elsewhere (see Coelho et al., 2002). These analyses are
summarised below. The middle 6 minutes of each conversation were analysed.
Two categories of analyses were employed with each transcribed conversation:
Appropriateness (Blank & Franklin, 1980) and Topic Initiation (Brinton &
Fujiki, 1989). Number of conversational turns was also tallied.
Appropriateness. Within the category of Appropriateness, each utterance was
categorised either as a Speaker-Initiation or a Speaker-Response.
Speaker-initiations. These were classified as Obliges (utterances containing
explicit requirements for a response from the listener) or Comments (utterances
not containing an explicit demand for a response). The total numbers of Obliges
and Comments produced by a subject or the examiner over the course of each
conversation were tallied.
Speaker-responses. These were classified in terms of adequacy. An Adequate
response was one that appropriately met the initiators verbalisation. An
Adequate Plus response was relevant and elaborated the theme, providing more
information than was requested. The total numbers of Adequate Plus and
Adequate responses produced by each participant in each conversation were
tallied.

106 APHASIOLOGY

Topic initiation. Either a participant or the examiner could introduce topics.


Topics could be changed in one of three ways: (a) at the beginning of the
conversation, or by ending discussion of one topic and initiating another, referred
to as a Novel Introduction; (b) by means of a Smooth Shift, in which discussion
of one topic is subtly switched to another; or (c) by means of a Disruptive Shift,
in which discussion of one topic is abruptly or illogically switched to another topic.
The total numbers of Novel Introductions and shifts (Smooth, Disruptive)
produced by a participant over the course of each conversation were tallied.
Turns. An utterance was defined as an oral statement or response.
Reliability of discourse measures
Inter- and intra-examiner reliability scores for all of the discourse measures
described in the present paper have been reported on elsewhere (Coelho, 2002;
Coelho et al., 2002) and therefore will only be summarised here. For the
measures of story narrative ability, inter- and intra-examiner reliability scores
ranged from 9098%. Reliability scores for the conversation measures ranged
from 8099%.

RESULTS
In the present study data from 32 CHI and 43 NBI participants from the two
previous investigations described in the introduction (Coelho, 2002; Coelho et
al., 2002) were reanalysed using discriminant function analyses (DFA). The
intent of the present study was to investigate the accuracy with which group
membership (CHI versus NBI) could be predicted on the basis of discourse
performance. The measures selected for inclusion in the DFA included measures
of story narrative and conversational discourse. Results from each of these DFAs
are discussed below.
Story narrative measures
Five measures that sampled aspects of micro-organisation (i.e., words per T-unit
and subordinate clauses per T-unit) and macro-organisation (i.e., percentage of
complete cohesive ties to total cohesive ties, total episodes, and proportion of Tunits within episode structure) in story retelling and story generation tasks were
entered into the DFA for narrative discourse. The DFA accurately classified 70%
of the cases, x2(10) = 14.54, p = .15, 64.5% of the CHI group and 74.4% of the
NBI group (see Table 1). This finding was not significant, accounting for
approximately 20% of the explained variance, suggesting that the story narrative
measures did not reliably discriminate the CHI from the NBI participants. Of the
story narrative measures, the proportion of T-units within episode structure and

NARRATIVE AND CONVERSATIONAL DISCOURSE 107

words per T-unit both from the story generation task had the highest correlations
with the discriminant function, .54 and .49 respectively (see Table 2).
Conversation measures
Seven measures of conversational performance (i.e., numbers of obliges,
comments, adequate and adequate plus responses, novel topic introductions,
smooth topic shifts, and turns) were included in the DFA for conversation. Of the
measures of conversational performance studied, number of comments and
adequate plus responses, had the highest correlations to the discriminant
function,.91 and .67 respectively (see Table 3). This DFA correctly classified
over 77% of the cases, x2(7) = 25.04, p = .001, 78.1% of the CHI participants and
72.1% of the NBI group (see Table 4). This finding was significant, accounting
for approximately 30% of the explained variance, which suggests that the
measures of conversational discourse were better able to discriminate the
participant groups.
TABLE 1 Classification results from discriminant function analysis of story narrative
measures
Predicted group membership
Actual group

CHI

NBI

Total

CHI
NBI

20 (64.5%)
10 (25.6%)

11 (35.5%)
29 (74.4%)

31 (100.0%)
39 (100.0%)

70.0% of original grouped cases correctly classified.


TABLE 2 Correlations between the story narrative measures and the discriminant function
Measure

Correlation

GENER-TUEPTR
GENER-WDSTU
RETELL-TUEPTR
RETELL-SUBT
GENER-COMTPC
RETELL-COMTPC
RETELL-WDSTU
GENER-SUBT
GENER-EPTOT
RETELL-EPTOT

.54
.49
.42
.39
.29
.26
.23
.22
.15
.03

The measures with the highest correlation contribute the most to discriminating between
the groups.
GENER = story generation task, RETELL = story retelling, task, WDSTU = words per Tunit, SUBT = subordinate clauses per T-unit, COMTPC = percent complete

108 APHASIOLOGY

ties out of total ties, EPTOT = number of total episodes, TUEPTR = proportion
of T-units within episode structure.

Story narrative and conversation measures


In an effort to determine if group classification could be improved by including all
17 measures of both the story narrative and conversational discourse, a step-wise
DFA was performed. In this procedure the measure providing the best
discrimination is entered first, then from the remaining 16 measures, the measure
that adds the most to discriminating between the groups is added to the first
selected measure. This procedure continues until there are no measures that,
when added, significantly increase the capacity to discriminate above the
measures entered in previous steps. Results from the step-wise DFA revealed
that the conversational measures comments and adequate plus responses and the
story narrative measure T-units within episode structure in the generation task
made the greatest contributions to discriminating between the groups (see
Table 5). The combination of just these three measures discriminated the groups
as well as any other
TABLE 3 Correlations between the conversation measures and the discriminant function
Measure

Correlation

COMMENTS
.91
ADEQUATE PLUS RESPONSES
.67
OBLIGES
.28
ADEQUATE RESPONSES
.23
NOVEL TOPIC INTRODUCTIONS
.19
TURNS
.09
.04
SMOOTH TOPIC SHIFTS
The measures with the highest correlation contribute the most to discriminating between
the groups.

TABLE 4 Classification results from discriminant function analysis with conversation


measures
Predicted group membership
Actual group

CHI

NBI

Total

CHI
NBI

28 (87.5%)
13 (30.2%)

4 (12.5%)
30 (69.8%)

32 (100.0%)
43 (100.0%)

77.3% of original grouped cases correctly classified.

NARRATIVE AND CONVERSATIONAL DISCOURSE 109

combination of the 17 story narrative and conversational measures. Overall,


group membership was correctly classified by the DFA in 81% of the cases, x2
(3) = 32.23, p < .001, 84.4% of the CHI group and 77.5% of the NBI participants
(see Table 6). This finding was significant and accounted for over 37% of the
explained variance suggesting that these three discourse measures discriminate
the participant groups with the highest degree of reliability. However it was the
conversational measures (i.e., comments and adequate plus responses) that had
the largest correlations with the discriminant function, .79 and .55, versus the
story narrative measure (i.e., T-units within episode structure in the generation
task) with a correlation of .40.
DISCUSSION
Prior to discussing the findings of this study it is important to acknowledge a
limitation in the procedures employed for data analysis. If one estimates the
discriminant functions that may best predict group membership from a given
data set, one should not then use the same data set, as was done in this study, to
judge the accuracy of the prediction. Validation of a predicted discriminant
function requires testing of the function with another data sample, thereby
reducing the effect of chance on the predictive process. Replication of the
present study is needed. With that qualification in mind, the results of the present
study should be interpreted cautiously.
Results of the DFAs run with the discourse data from the CHI and NBI
participants indicate that the conversational measures were more accurate in
discriminating the groups. These findings did not support the hypothesis which
predicted that the narrative discourse measures would more accurately predict
group membership of the CHI and NBI participants than the conversational
measures. A variety of explanations may account for these findings.
TABLE 5 Correlations between selected story narrative and conversation measures and the
discriminant function
Measure

Correlation

COMMENTS
ADEQUATE PLUS RESPONSES
GENER-TUEPTR

.79
.55
.40

GENER = story generation task, TUEPTR = proportion of T-units within episode


structure

110 APHASIOLOGY

TABLE 6 Classification results from discriminant function analysis with selected story
narrative and conversation measures
Predicted group membership
Group

CHI

NBI

Total

CHI
NBI

27 (84.4%)
7 (22.5%)

5 (15.6%)
31 (77.5%)

32 (100.0%)
40 (100.0%)

80.6% of original grouped cases correctly classified.

Galski and colleagues (1998) have commented that the success of an


individuals social, vocational, familial, and academic integration rests on the
recovery of effective communication. Although previous research has
demonstrated that individuals with CHI have difficulty with many narrative
discourse tasks (see Coelho, 1995), it may be that because of the interactive
nature of conversation it is a more difficult discourse genre for this population.
Consistent with this explanation, it has been reported that individuals with CHI
produced more discourse errors in conversation than in a structured referential
communication task. This may be attributed to social aspects, such as the
relationship between conversational partnersthat is, familiarity, status, and role
as well as the face-saving strategies used for politeness when communication
breakdowns occur (Prince, Haynes, & Haak, 2002). Such factors are extremely
difficult to simulate in other types of noninteractive discourse.
A second explanation pertains to the stylistic variation that can exist among
speakers within a specific genre. In other words speakers may achieve the same
text macrostructure through many different patterns of micro structure
(Armstrong, 2002). Consequently such variation in NBI speakers is important to
note when making judgements regarding what is normal or what is
disordered in the discourse of individuals with CHI. For example, in the
present study over 25% of the NBI participants were classified as CHI on the
story narrative tasks and that rose to over 30% in conversation.
An additional explanation pertains to the potential cognitive factors that have
been suggested to be important for meaningful participation in conversation. For
example, topic maintenance and appropriate referencing require both selective
and sustained attention. Further, functional memory is required to recall what the
speaker has said as well as the listener (Mackenzie, 2000). Similarly,
comprehension of sarcasm and implicit language may also influence the
effectiveness of a conversational participant. Individuals with CHI would be at
risk for demonstrating difficulty with any or all of these factors.
Finally, it is important to emphasise that although all of the CHI participants
studied had suffered moderate to severe injuries, they were selected on the basis
of having recovered fluent conversational speech. Therefore the present findings
may not be applicable to all individuals with CHI, particularly those with limited
discourse production capabilities. Although the DFAs reported in the present
study involved a variety of discourse measures, discriminant functions derived

NARRATIVE AND CONVERSATIONAL DISCOURSE 111

from different measures of narrative and conversational discourse may have


yielded different results. A related issue pertains to the potential effects of the
measures and interactions with the targeted discourse genres. For example, the
conversational measures are considered to be pragmatic in nature, while the story
narrative measures involve various aspects of cognitivelinguistic organisation. A
reasonable explanation for the present findings would be that pragmatic
measures are more sensitive to the communicative dysfunction displayed by
individuals with CHI than the more structurally focused narrative measures.
The findings from the present study did support the second hypothesis which
stated that if all of the discourse measures were entered into a DFA, the CHI and
NBI participants would be discriminated with a higher degree of accuracy than
with the conversational or story narrative measures alone. Previous
investigations of the discourse of individuals with CHI have documented an
array of impairments across discourse genres analysed at varied levels. The
likelihood of delineating the nature of discourse impairment secondary to CHI
with a single measure is poor given the broad array of cognitive, linguistic, and
psychosocial sequelae that characterise CHI. Therefore it is not surprising that,
as noted in the present study, a variety of discourse measures more accurately
discriminated the CHI and NBI participant groups. The study of discourse
following brain injury requires the use of multiple and varied elicitation tasks
and measures.
Regarding implications of these findings, it has been observed that discourse
represents a critical point of intersection between cognition and language, and
therefore is an important component in the management of individuals with CHI
(Ylvisaker et al, 2001). The present findings suggest that conversation may be
more sensitive than story narratives to the discourse impairments that
characterise individuals with CHI. None the less, ongoing research is needed to
develop discourse procedures that will not only be sensitive to subtle
impairments but clinically efficient as well.
REFERENCES
Armstrong, E. (2002). Variation in the discourse of non-brain-damaged speakers on a
clinical task. Aphasiology, 16, 647658.
Blank, M., & Franklin, E. (1980). Dialogue with pre-schoolers: A cognitively-based
system of assessment. Applied Psycholinguistics, 1, 127150.
Brinton, B., & Fujiki, M. (1989). Conversational management with language-impaired
children, Rockville, MD: Aspen.
Chapman, S. B. (1997). Cognitive-communication abilities in children with closed head
injury. American Journal of Speech-Language Pathology, 6, 5058.
Coelho, C. A. (2002). Story narratives of adults with closed head injury and non-braininjured adults: Influence of socioeconomic status, elicitation task, and executive
functioning. Journal of Speech, Language, and Hearing Research, 45, 12321248.

112 APHASIOLOGY

Coelho, C. A. (1995). Discourse production deficits following traumatic brain injury: A


critical review of the recent literature. Aphasiology, 9, 409429.
Coelho, C. A., Liles, B. Z., & Duffy, R. J. (1991). Discourse analyses with closed head
injured adults: Evidence for differing patterns of deficits. Archives of Physical
Medicine and Rehabilitation, 72, 465468.
Coelho, C. A., Liles, B. Z., & Duffy, R. J. (1995). Impairments of discourse abilities and
executive functions in traumatically brain-injured adults. Brain Injury, 9, 471477.
Coelho, C. A., Youse, K. M., & Le, K. N. (2002). Conversational discourse in closedhead-injured and nonbrain-injured adults. Aphasiology, 16, 659672.
Doyle, P. J., Goda, A. J., & Spencer, K. A. (1995). The communicative informativeness
and efficiency of connected discourse by adults with aphasia under structured and
conversational sampling conditions. American Journal of Speech-Language
Pathology, 4, 130134.
Galski, T., Tompkins, C., & Johnston, M. V. (1998). Competence in discourse as a
measure of social integration and quality of life in persons with traumatic brain
injury. Brain Injury, 12, 769782.
Hagan, C., Malkmus, D., & Durham, P. (1980). Levels of cognitive functioning. In
Rehabilitation of the head injured adult: Comprehensive physical management.
Downey, CA: Professional Staff Association of Rancho Los Amigos Hospital.
Hartley, L. L., & Jensen, P. (1991). Narrative and procedural discourse after closed head
injury. Brain Injury, 5, 267285.
Hollingshead, A. (1972). Four factor index of social status. Unpublished manuscript. Yale
University, New Haven, CT.
Kertesz, A. (1982). Western Aphasia Battery. New York: Grime & Stratton.
Levin, H. S., ODonnell, V. M., & Grossman, R. G. (1979). The Galveston orientation
and amnesia test: A practical scale to assess cognition after head injury. Journal of
Nervous and Mental Disease, 167, 675684.
Lezak, M. (1995). Neuropsychological assessment (3rd Ed.). New York: Oxford
University Press.
Liles, B. Z. (1985). Narrative ability in normal and language disordered children. Journal
of Speech and Hearing Research, 28, 123133.
Liles, B. Z., Dufiy, R. J., Merritt, D. D., & Purcell, S. L. (1995). Measurement of narrative
discourse ability in children with language disorders. Journal or Speech, Language,
and Hearing Research, 38 (2), 415425.
Mackenzie, C. (2000). Adult spoken discourse: The influences of age and education.
International Journal of Language and Communication Disorders, 35(2), 269285.
Mandler, J. M. (1982). An analysis of story grammars. In F.Klix, J.Hoffman, & E.van der
Meer (Eds.), Cognitive Psychology, 9, 111115.
Mattis, S. (1976). Mental status examination for organic mental syndrome in the elderly
patient. In L.Bellak & T.B.Karasu (Eds.), Geriatric psychiatry. New York: Grune &
Stratton.
Mentis, M., & Prutting, C. A. (1987). Cohesion in the discourse of normal and headinjured adults. Journal of Speech and Hearing Research, 30, 583595.
Prince, S., Haynes, W.O., & Haak, N. J. (2002). Occurrence of contingent queries and
discourse errors in referential communication and conversational tasks: A study of
college students with closed head injury. Journal of Medical Speech-Language
Pathology, 10, 1939.

NARRATIVE AND CONVERSATIONAL DISCOURSE 113

Snow, P., Douglas, J., & Ponsford, J. (1995). Discourse assessment following traumatic
brain injury: A pilot study examining some demographic and methodological issues.
Aphasiology, 9, 365380.
Snow, P., Douglas, J., & Ponsford, J. (1997). Procedural discourse following traumatic
brain injury. Aphasiology, 11, 947967.
Togher, L. (2001). Discourse sampling in the 21st century. Journal of Communication
Disorders, 34, 131150.
Togher, L., Hand, L., & Code, C. (1999). Exchanges of information in the talk of people
with traumatic brain injury. In S.McDonald, L.Togher, & C.Code (Eds.),
Communication skills following traumatic brain injury (pp. 113145). Hove, UK:
Psychology Press.
Wilkinson, R. (1999). Sequentiality as a problem and resource for intersubjectivity in
aphasic conversation: Analysis and implications for therapy. Aphasiology, 13,
327343.
Winter, P. (1976). The bear and the fly. New York: Crown Publishers.
Ylvisaker, M., Szekeres, S. F., & Feeney, T. (2001). Communication disorders associated
with traumatic brain injury. In R.Chapey (Ed.), Language intervention strategies in
aphasia and related neurogenic communication disorders (pp. 745800).
Philadelphia: Lippincott, Williams & Wilkins.

Relationship between discourse and


Western Aphasia Battery performance in
African Americans with aphasia
Hanna K.Ulatowska and Gloria Streit Olness
University of Texas at Dallas, USA
Robert T.Wertz
Department of Veterans Affairs Tennessee Valley Healthcare
System and Vanderbilt
University School of Medicine, Tennessee, USA
Agnes M.Samson, Molly W.Keebler, and Karen E.Goins
University of Texas at Dallas, USA
Background: There is a need for discourse research with African
Americans who have aphasia, highlighted by ethnic group
differences in stroke prevalence, and potential ethnic group
differences in dialect. Identification of ethnic dialect is critical to
differentiate communication changes associated with pathology from
normal communicative differences associated with ethnicity. Also,
preliminary research on adults with aphasia indicates an uncertain
relationship between discourse performance and standardised test
performance.
Aims: This study was designed to assess: (1) the relationship
between performance on a standardised language measure and
discourse performance, and (2) the use of ethnic dialect and
discourse features, in the narrative productions of African-American
adults with moderate aphasia on a variety of discourse tasks.
Methods & Procedures: We investigated the discourse of 12
African Americans with scores in the moderate severity range on the
Western Aphasia Battery, Aphasia Quotient (WABAQ). Each
subject produced a fable retell, a story derived from a picture
sequence, two stories derived from single pictures, and a topicelicited personal narrative of a frightening experience. Analysis
consisted of ratings of discourse quality (coherence, reference, and
emplotment); a measure of discourse quantity (number of
propositions); and a tally of the presence or absence of ethnic dialect
and discourse features.
Outcomes & Results: The correlation between WAB-AQ and
discourse quality was statistically significant on the picture sequence

DISCOURSE AND THE WESTERN APHASIA BATTERY 115

task and one single-picture task, but not on the other discourse tasks.
There was a significant relationship between WAB-AQ and overall
quality ratings of coherence, reference, and emplotment. The
correlation between WAB-AQ and discourse quantity was not
significant for any task, and discourse quality was not significantly
correlated with discourse quantity. Ethnic features appeared most
often on one single-picture task and the personal narrative. No ethnic
dialect features occurred on the fable retell.
Conclusions: These findings suggest the need to supplement
standardised assessment of aphasia with assessment of discourse
performance, using less structured discourse tasks, such as a
personal narrative task. Less structured discourse tasks may also be
optimal for eliciting natural ethnic patterns of communication. The
lack of relationship between narrative quantity and narrative quality
may not generalise to individuals with aphasia that is severe or mild.
This study contributes towards development of a discourse
assessment tool for culturally and linguistically diverse populations
that may supplement information provided by standardised testing.
Several factors point to the need for discourse research with African Americans
who have aphasia. The incidence of stroke, and hence the probability of aphasia,
is higher in African Americans than in Caucasians (Kittner, White, Losonczy,
Wolf, & Hebel, 1990). Moreover, many African Americans are speakers of a
distinct ethnic dialect. While ethnicity does not determine ethnic dialect use,
previous research has confirmed its presence in some African Americans with
aphasia on certain tasks (Ulatowska & Olness, 2001). Identification of ethnic
dialect is critical to differentiate communication change associated with
pathology from normal communicative differences associated with ethnicity
(Wolfram, 1992), especially when surface features of the ethnic dialect overlap

Address correspondence to: Hanna K. Ulatowska, UTD/Callier


Center for Communication Disorders, 1966 Inwood Road, Dallas,
TX, 75235, USA. Email: hanna@utdallas.edu Agnes M.Samson is
now at Integrated Health Services (IHS) in Richardson, Texas. Molly
W.Keebler is now at The Center for Brain Health, University of
Texas at Dallas. This research was supported by the Department of
Veterans Affairs Rehabilitation Research and Development Service,
and Excellence in Education Funds from the Callier Center for
Communication Disorders, University of Texas at Dallas.
2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000102

DOI:10.1080/

116 APHASIOLOGY

with features of aphasia. Unfortunately, clinical discourse research has


traditionally excluded African Americans or, when they are included, has not
differentiated subjects according to ethnicity (Ulatowska & Chapman, 1994;
Wallace, 1996).
Aphasia assessment for any ethnic group requires procedures that reflect an
individuals level of impairment as well as functional communication ability.
Impairment measures, such as the Western Aphasia Battery (WAB, Kertesz,
1982) provide profiles of basic language skills. In contrast, discourse measures
are thought to be a reflection of daily, functional language (Chapman, Highley, &
Thompson, 1998; Holland, 1983; Ulatowska & Chapman, 1994). Because
discourse tasks require skills close to those required in daily life, they often
display ethnic styles of communication (Ulatowska & Olness, 2001; Ulatowska,
Olness, Hill, Roberts, & Keebler, 2000). The relationship between discourse
performance and performance on the lexico-syntactic skills addressed by
impairment measures is not well understood (Ulatowska & Olness, 2000).
This study assesses the extent to which various discourse tasks provide
information that may either duplicate or supplement information gained from use
of standardised aphasia tests. The first purpose of this study was to examine the
relationship between discourse performance and standardised test performance in
African Americans with moderate aphasia. The second purpose was to determine
whether ethnic features of dialect and discourse (i.e., reflections of natural,
functional language) are present in the discourse of African Americans with
moderate aphasia and whether their presence differs among discourse tasks. The
standardised test used for this study was the Western Aphasia Battery (WAB,
Kertesz, 1982), and the discourse tasks were part of a larger seven-task battery
used to study narratives of African Americans and Caucasians with and without
aphasia (Wertz et al., 2000).
Research on adults with aphasia indicates that discourse performance and
standardised test performance are not significantly related for some discourse
tasks, and are significantly related for others. Two studies have examined the
relationships between discourse performance and standardised test performance
among African Americans with relatively mild aphasia. The first of these
(Ulatowska et al., 200la) found a nonsignificant correlation between performance
on a personal narrative of a frightening experience and performance in the Western
Aphasia Battery, Aphasia Quotient (WAB-AQ). Neither the quantity of language
in the narrative task (measured in propositions), nor the quality of the narrative
(measured on a narrative quality scale) was correlated with WAB-AQ scores. A
second study (Ulatowska, 2001b) examined the relationship of WAB-AQ with
(1) interpretations of mini-narratives (in this case, interpretations of proverbs);
and (2) the ability to comprehend and express the lesson derived from didactic
narratives (in this case, fables). WAB-AQ was found to correlate with the overall
generalisation level, accuracy, and completeness of spontaneous interpretations
of proverbs, and with the generalisation level of lessons derived from an auditory
fable (Ulatowska et al., 2001b). The same study did not find significant

DISCOURSE AND THE WESTERN APHASIA BATTERY 117

correlations between WAB-AQ and: (1) a lesson derived from a picturesequence fable; or (2) multiple-choice proverb interpretation.
A recent case series study (Ulatowska, Olness, Hill, Samson, & Goins, 2002)
suggests the importance of studying the discourse and standardised test
performance of individuals with moderate aphasia, in contrast to the previous
studies focusing on mild aphasia. Subjects for this study of individuals with
moderate aphasia were drawn from the larger comparative study previously cited
(Wertz et al., 2000). Individuals with aphasia in the moderate severity range, i.e.,
with similar aphasia severity as measured by the WAB, were found to vary
widely in their ability to produce stories that were coherent, referentially clear,
and contained all elements of the story or scenario. Individuals with moderate
aphasia appear to be an informative group for study because they have the ability
to produce discourse-length responses (unlike individuals with severe aphasia),
yet display various patterns of discourse disruption (unlike many individuals
with mild aphasia).
This study was designed to answer the following questions:
(1) What is the relationship between performance on a standardised language
measure and discourse performance on a variety of discourse tasks in
African-American adults with moderate aphasia?
(2) Are ethnic dialect and discourse features present in the discourse of
AfricanAmerican adults with moderate aphasia, and if present, does their
appearance differ among discourse tasks?
METHODS
Subjects
Twelve African-American adults with aphasia subsequent to a left-hemisphere
stroke participated in the study. Participants were selected from a larger
discourse investigation (Wertz et al., 2000), based on standardised test scores in
the moderate aphasia severity range on the Western Aphasia Battery, Aphasia
Quotient (WAB-AQ) (Kertesz, 1979, 1982). Table 1 provides participant
demographics. Socioeconomic status (SES) was rated on a 17 scale (adapted
from Featherman & Stevens, 1980), where a higher number indicates lower SES.
All participants were native speakers of English and were raised in the southern
United States. Table 2 displays participants WAB scores.
Discourse tasks
Five discourse tasks that varied in the type of demand placed on the speaker,
amount of information in the stimulus, and stimulus modality (visual or auditory)
were presented. All the tasks were designed to elicit narrative discourse, through

118 APHASIOLOGY

selection of stimuli that could be expressed as a sequence of events containing


some complicating action or situation. The five tasks, ordered by the degree to
which each inherently specified the structure and content of the participants
response, from most specification to least, were: retell of a fable (Farmer and
Sons); tell a story based on a picture sequence (Apple Theft); tell a story
based on a single picture (Counting Money); tell a story based on a single
picture (Easter Morning); and a topic-elicited personal narrative of a
frightening experience. Five of the participants were interviewed by an AfricanAmerican clinician, and seven were interviewed by a Caucasian clinician.
Analysis
Three measures (see Appendix) were used to rate the quality of participants
discourse: coherence, reference, and emplotment. All three of these dimensions
reflect quality of information processing in the narrative and are not necessarily a
direct reflection of wordor sentence-level skills. Two of them are basic
properties of discourse in general, whereas emplotment is a quality specific to
the narrative discourse genre. Coherence is a cognitive-linguistic property of the
text, specifying how well the story makes sense as a whole. This core property of
narratives differentiates a coherent story from an unrelated sequence of
sentences. The coherence rating represented how well information was
connected in the stories. Reference signals the elements talked about in the story,
such as characters, locations, time, etc. The reference system appears to be more
vulnerable than other discourse systems in aphasia and other language disorders
(Chapman & Ulatowska, 1989). The reference rating represented how well
information elements were unambiguously signalled in the story. Emplotment is
the ability to express information about an event in a narrative structural form,
including all elements of the story or scenario. The emplotment rating
represented how well the information in a story formed a complete story. Each
aspect of narrative quality (coherence, reference, and emplotment) was measured
by rating on a 5point (04) scale. Thus a total quality score of 12 was possible
for each task. Rating systems were used because these fundamental dimensions
of narrative quality are not directly related to features of sentential structure, and
thus are difficult to associate empirically with any particular linguistic material
contained in the discourse (Patry & Nespoulous, 1990).
Discourse quantity for each task was measured in number of propositions
(Mross, 1990). This measure represented the quantity of information contained in
a production, as a complement to the rating scales, which assessed the quality of
the information produced.
The presence or absence of ethnic dialect and discourse features (Mufwene,
Rickford, Baily, & Baugh, 1998; Ulatowska & Olness, 2001) on each task was
recorded. To identify ethnic dialect features, we focused on the verb system of
African-American Vernacular English (AAVE) for three reasons. First, the verb
system is one of the more complex and pivotal of the grammatical systems of

TABLE 1 Demographic and clinical data

DISCOURSE AND THE WESTERN APHASIA BATTERY 119

* Classification is based on performance on the Western Aphasia Battery (Kertesz, 1982).

AAVE (Green, 1998; Wolfram & Fasold, 1974), distinctive from other dialects of

120 APHASIOLOGY

American English. Second, the verb carries the temporal and aspectual
information that forms the backbone of narratives, and narratives were elicited in
this study. Third, morpho-syntactic features of the verb are highly prone to
disruption in aphasia. Thus, the verb system was a natural choice of focus, for its
complexity, distinctiveness, its importance in narratives, and its ability to reflect
aphasic disruptions. Example verb forms from the sample included habitual
aspect BE (My father said, Dont be playing with those guns. ), and
perfective aspect DONE (He said, It kinda look like she done had a stroke.).
Discourse features of repetition and direct speech were also identified. Repetition
included both partial and full repetitions of previous portions of an utterance, and
instances of direct speech were reproductions of the speech produced by
characters in a narrative. These features are common in the oral storytelling
styles of many African Americans (Mitchel-Kernan, 1972; Ulatowska et al.,
2000) and act to highlight information and increase vividness in narratives.
Examples are: I was laying on the hospital, cant walk, cant talk, cant move.
I couldnt walk, talk, nothing, and They said, Oooh, girl, they gon get you
tonight, they gon get you.
Six raters, including two African-American clinicians, were trained to
discriminate the points on the rating scales. The stories were then rated, and
disagreements were resolved by group consensus. For each task, the relationship
between the discourse measures (quality and quantity) and WAB-AQ was
determined by computing correlations (Spearman and Pearson, respectively).
Alpha was adjusted to .01 to control for familywise error.
Reliability
Interrater reliability of the ratings for coherence, reference, and emplotment was
analysed by comparison of the original group ratings with ratings of an individual
rater on complete data from six of the twelve subjects. Point-by-point interrater
agreement was 90% for the coherence rating, 70% for the reference rating, and
75% for the emplotment rating. The final rating assigned to each response was
that of the original six raters, whose disagreements had been resolved by group
consensus.
RESULTS
Quality scores are shown in Table 3. A Spearman correlation, adjusted for
family-wise error (alpha = .01), between WAB-AQ and discourse quality was
statistically significant on the picture sequence, rs(10) = .82, p < .01, and single
picture Easter Morning, rs(10) = .83, p < .01. Correlations between WAB-AQ
and discourse quality were nonsignificant at an alpha level of .01 for the other
tasks: fable retell, rs(10) = .76; single picture Counting Money, rs(10) = .74;
and personal narrative, rs(10) = .15.

DISCOURSE AND THE WESTERN APHASIA BATTERY 121

Table 4 shows participants combined quality ratingscoherence, reference,


and emplotmentacross the five discourse tasks. The maximum possible score
in each cell of this table is 20 (maximum of 4 points per rating, multiplied by 5
different tasks). This score represents the individuals overall ability to produce
quality narratives, since coherence, reference, and emplotment are general
qualities of narrative, irrespective of task. Spearman correlations revealed a
significant relationship between WAB-AQ and each quality rating: coherence, rs
(10) = .91, p < .01; reference, rs(10) = .79, p < .01; and emplotment, rs(10) = .90,
p < .01.
Quantity of discourse (number of propositions) is shown in Table 5. Pearson
correlations revealed no significant relationship between WAB-AQ and quantity
of discourse on any task, with alpha at .01: fable retell, r(10) = .21; picture
sequence, r(10) = .001; single pictures Counting Money, r(10) = .11, and
Easter Morning, r(10) = .17; and personal narrative, r(10) = .20. Moreover,
a Spearman correlation between participants discourse quality ratings and their
discourse quantity across tasks was not significant, with alpha at .01, rs(58) = .30.
Table 6 shows the presence or absence of ethnic dialect or discourse features
across discourse tasks. Ten of the twelve participants displayed at least one
ethnic feature on at least one task. Five of these ten were interviewed by an
African-American clinician, and five were interviewed by a Caucasian clinician.
Presence of ethnic features in subjects
TABLE 3 Quality scores
Discourse tasks
Participant
number*

Father & &


Sons (retell)

Boys &
Apples
(picture
sequence)

Counting
Money
(single
picture)

Easter
Morning
(single
picture)

Frightening
Experience
(personal
narrative)

01
02
03
04
05
06
07
08
09
10
11
12
M
Range

10
7
11
8
3
5
7
5
5
4
6
3
6.17
31

9
10
7
10
10
8
9
6
6
4
3
4
7.17
310

12
4
7
11
12
6
3
0
9
8
6
6
7.00
012

12
7
9
9
11
6
9
7
10
4
3
5
7.67
312

11
10
8
9
8
0
5
4
9
8
8
11
7.58
011

122 APHASIOLOGY

Discourse tasks
Participant
number*

Father & &


Sons (retell)

Boys &
Apples
(picture
sequence)

Counting
Money
(single
picture)

Easter
Morning
(single
picture)

Frightening
Experience
(personal
narrative)

SD
2.55
2.55
3.67
2.81
3.18
Sum of the three quality response scores (Coherence, maximum 4 points; Reference,
maximum 4 points; Emplotment, maximum 4 points) for the 12 participants
responses on five discourse tasks.
* Participants are listed in order by decreasing WAB-AQ score.
TABLE 4 Combined quality ratings
Discourse quality measures
Participant number* Coherence

Reference

Emplotment

01
02
03
04
05
06
07
08
09
10
11
12
M
Range
SD

17
11
13
13
15
8
9
6
12
9
9
8
10.83
617
3.24

19
16
16
15
14
9
13
9
15
10
8
11
12.92
819
3.48

18
11
13
15
15
8
11
7
12
9
9
10
11.50
718
3.26

Sum of each of the discourse quality measures for responses of the 12 participants across
five discourse tasks (retell, picture sequence, two single pictures, and personal
narrative), with maximum score of 20 points (4 points per measure across 5
tasks).
* Participants are listed in order by decreasing WAB-AQ score.

DISCOURSE AND THE WESTERN APHASIA BATTERY 123

TABLE 5 Quality scores


Discourse tasks
Participant
number*

Father & &


Sons (retell)

Boys &
Apples
(picture
sequence)

Counting
Money
(single
picture)

Easter
Morning
(single
picture)

Frightening
Experience
(personal
narrative)

01
02
03
04
05
06
07
08
09
10
11
12
M
Range
SD

8
8
5
6
3
4
7
7
16
3
5
10
6.83
316
3.59

10
15
5
10
7
5
9
4
26
6
5
11
9.42
426
6.14

9
5
5
4
4
4
4
0
8
4
4
6
4.75
09
2.26

4
6
6
10
4
4
4
4
8
5
6
7
5.67
410
1.92

5
13
6
29
4
0
13
27
19
6
6
19
12.25
029
9.43

Quantity (number of propositions) in discourse responses of 12 AfricanAmerican adults


with aphasia on five discourse tasks.
* Participants are listed in order by decreasing WAB-AQ score.
TABLE 6 Ethnic dialect and discourse features
Discourse tasks
Participant
number*

Father & &


Sons (retell)

Boys &
Apples
(picture
sequence)

Counting
Money
(single
picture)

Easter
Morning
(single
picture)

Frightening
Experience
(personal
narrative)

01
02
03
04
05
06
07
08
09
10

()
()
()
()
()
()
()
()
()
()

()
()
()
()
()
()
(+)
(+)
()
()

(+)
()
()
()
(+)
()
()
n.r.*
()
()

(+)
(+)
()
()
(+)
(+)
(+)
(+)
()
()

(+)
()
()
(+)
()
n.r.*
()
()
(+)
()

124 APHASIOLOGY

Discourse tasks
Participant
number*

Father & &


Sons (retell)

Boys &
Apples
(picture
sequence)

Counting
Money
(single
picture)

Easter
Morning
(single
picture)

Frightening
Experience
(personal
narrative)

11
12
Total +s

()
()
0

(+)
(+)
4

()
(+)
3

(+)
(+)
8

(+)
(+)
5

Presence (+) or absence () of ethnic dialect and discourse features in responses by 12


African- American adults with moderate aphasia on five discourse tasks.
* n.r. = no response.

responses did not differ by ethnicity of the interviewer. Ethnic features


appeared most often on one single-picture task (Easter Morning) and the
personal narrative. No ethnic dialect features occurred for any participant on the
fable retell.
DISCUSSION
The current study adds to our knowledge of what appears to be an inconsistent
pattern of relationship between performance on language impairment measures
and performance on discourse tasks (Ulatowska et al., 2001a, 2001b). Although
the findings suggest that overall dimensions of narrative quality (coherence,
reference, and emplotment) may be related to performance on the WAB-AQ, this
relationship may be task-specific. In particular, the quality of personal narratives
does not appear to be related to aphasia severity level as measured by the WAB,
at least among these individuals with moderate aphasia. Of the tasks in this
discourse battery, the personal narrative most closely reflects functional
communication, allowing subjects full latitude in task interpretation, story
evaluation, and creativity. Overall, this group of findings suggests that less
structured discourse tasks and the WAB-AQ assess different linguistic domains.
However, confirmation of these possibilities is beyond the power of correlational
analyses. Nevertheless, the absence of significant relationships may suggest
supplementing standardised aphasia tests with functional discourse measures in
aphasia assessment.
The findings also provide evidence that the more functional and open-ended
the discourse task, the more frequent the production of ethnic features of
communication. Almost all the participants in this study displayed ethnic dialect
or discourse features on one or more discourse tasks, irrespective of the ethnicity
of the interviewer. This was most common on one single-picture task (Easter
Morning) and the personal narrative task and completely absent on the fable
retell. The tasks that most frequently elicited ethnic features did not require close
replication of a stimulus provided by the experimenter. Subjects incorporated more

DISCOURSE AND THE WESTERN APHASIA BATTERY 125

interpretive material in responses on these tasks, and their personal involvement


in creating the response yielded more natural, ethnically marked language. For
example, two seemingly identical tasks, i.e., the two single-picture tasks
Counting Money and Easter Morning, differed in their frequency of
elicitation of ethnic features. This difference may be accounted for, in part, by
differences in the pictures effectiveness in evoking personal involvement on the
part of the responder. Counting Money depicts a dated scene of adult
characters counting their savings in the form of hard currency, while Easter
Morning is a more plausible scenario in personal life, i.e., family conflict over
church attendance. In contrast, the task that most closely constrained the content
of the subjects responses and which was least likely to involve personal
involvement by the speaker (i.e., fable retell) did not elicit any ethnic features.
The fable used for the fable retell task was presented in Standard American
English, and subjects rarely if ever incorporated interpretive information in the
responses on this task. In summary, it would appear that more functional tasks,
such as a personal narrative task, may supplement language impairment
measures, both in the cognitive-linguistic skills they require, and in the degree to
which they are able to elicit natural ethnic patterns of communication.
Because this article addresses the ways in which discourse testing may
complement standardised testing, a logical extension of the analysis would be to
examine responses to the WAB Picnic Picture for presence or absence of ethnic
features. This task is unlikely to evoke personal involvement on the part of the
listener, or to reflect natural language, because subjects are instructed only to tell
the examiner what they see in the picture, i.e., to describe. Descriptive discourse
is less likely than narrative discourse to elicit narrative features (Olness,
Ulatowska, Wertz, Thompson, & Auther, 2002).
Another finding with potential clinical implications is the lack of significant
relationships between the WAB-AQ and discourse quantity (number of
propositions). Thus, severity of aphasia, as indicated by the WAB-AQ, does not
seem to predict the number of propositions a person with aphasia will produce on
a discourse task, at least for the group of individuals with moderate aphasia
considered here. In addition, discourse quality ratings were not significantly
related with discourse quantity ratings. One must remember, however, that all the
subjects in this small sample had aphasia of moderate severity; interpretation of a
lack of relationship between narrative quantity and narrative quality cannot be
generalised to individuals with aphasia that is either severe or mild. One might
predict that individuals with severe aphasia produce discourse that is both short
and relatively poor, while individuals with mild aphasia produce discourse that is
relatively longer and higher in quality. Expansion of the aphasia severity range
might be predicted to yield a relationship between discourse quality and
discourse length.
In summary this study provides performance information on those tasks that
may elicit natural language, with ethnic features, and those that may not. It also
expands our knowledge of performance profiles on a variety of discourse tasks,

126 APHASIOLOGY

such as retell, personal narrative, and narrative elicited with pictures. Thus, it
brings us one step closer to development of a discourse assessment tool for
culturally and linguistically diverse populations which may serve to supplement
information provided by standardised testing.
REFERENCES
Chapman, S. B., Highley, A. P., & Thompson, J. L. (1998). Discourse in fluent aphasia
and Alzheimers disease: Linguistic and cognitive considerations. In M.Paradis
(Ed.), Pragmatics in neurogenic communication disorders (pp. 5578). New York:
Elsevier.
Chapman, S. B., & Ulatowska, H. K. (1989). Discourse in aphasia: Integration deficits in
processing reference. Brain and Language, 36, 651668.
Featherman, D. L., & Stevens, G. A. (1980). A revised socioeconomic index of
occupational status. Center for Demography and Ecology Working Paper 7948.
Madison, WI: University of Wisconsin.
Green, L. (1998). Aspect and predicate phrases in African-American Vernacular English.
In S.S.Mufwene, J.R. Rickford, G.Baily, & J.Baugh (Eds.), African-American
English: Structure, history, and use (pp. 3768). London: Routledge.
Holland, A. L. (1983). Non-biased assessment and treatment of adults who have
neurologic speech and language problems. Topics in Language Disorders, 3, 6775.
Kertesz, A. (1979). Aphasia and related disorders: Taxonomy, localization, and recovery.
New York: Grune & Stratton.
Kertesz, A. (1982). Western Aphasia Battery. New York: Grune & Stratton.
Kittner, S. J., White, L. R., Losonczy, K. G., Wolf, P. A., & Rebel, J. R. (1990). Blackwhite differences in stroke incidence in a national sample. The contribution of
hypertension and diabetes mellitus. Journal of the American Medical Association,
264, 12671270.
Mitchel-Kernan, C. (1972). Signifying, loud-talking, and marking. In T.Kochman (Ed.),
Rappin and stylin out. Communication in urban black America (pp. 315335).
Urbana, IL: University of Illinois Press.
Mross, E. F. (1990). Text analysis: Macro- and microstructural aspects of discourse
processing. In Y.Joanette & H.H.Brownell (Eds.), Discourse ability and brain
damage: Theoretical and empirical perspectives (pp. 5068). New York: SpringerVerlag.
Mufwene, S. S., Rickford, J. R., Baily, G., & Baugh, J. (Eds.). (1998). African-American
English: Structure, history and use. London: Routledge.
Olness, G. S., Ulatowska, H. K., Wertz, R. T., Thompson, J. L., & Auther, L. L. (2002).
Discourse elicitation with pictorial stimuli in African Americans and Caucasians
with and without aphasia. Aphasiology, 16, 623633.
Patry, R., & Nespoulous, J. L. (1990). Discourse analysis in linguistics: Historical and
theoretical background. In Y.Joanette & H.H.Brownell (Eds.), Discourse ability and
brain damage: Theoretical and empirical perspectives (pp. 327). New York:
Springer-Verlag.
Ulatowska, H. K., & Chapman, S. B. (1994). Discourse macrostructure in aphasia. In
R.L.Bloom, L.K.Obler, S.DeSanti, & J.S.Ehrlich (Eds.), Discourse analyses and

DISCOURSE AND THE WESTERN APHASIA BATTERY 127

applications: Studies in adult clinical populations (pp. 2946). Hillsdale, NJ:


Lawrence Erlbaum Associates Inc.
Ulatowska, H. K., & Olness, G. S. (2000). Discourse revisited: Contributions of lexicosyntactic devices. Brain and Language, 71, 249251.
Ulatowska, H. K., & Olness, G. S. (2001). Dialectal variants of verbs in narratives of
African Americans with aphasia: Some methodological considerations. Journal
ofNeurolinguistics, 14, 93110.
Ulatowska, H. K., Olness, G. S., Hill, C. L., Roberts, J., & Keebler, M. W. (2000).
Repetition in narratives of African Americans: The effects of aphasia. Discourse
Processes, 30, 265283.
Ulatowska, H. K., Olness, G. S., Hill, C. L., Samson, A., & Goins, K. (2002, April). The
influence of aphasia on narrative quality in African Americans. Paper presented at the
meeting of the National Black Association for Speech-Language and Hearing
(NBASLH), Raleigh, NC.
Ulatowska, H. K., Olness, G. S., Wertz, R. T., Thompson, J. L., Keebler, M. W., Hill, C.
L. et al. (2001 a). Comparison of language impairment, functional communication,
and discourse measures in AfricanAmerican aphasic and normal adults.
Aphasiology, 15, 10071016.
Ulatowska, H. K., Wertz, R. T., Chapman, S. B., Hill, C. L., Thompson, J. L., Keebler,
M. W. et al. (2001b). Interpretation of fables and proverbs by African Americans
with and without aphasia. American Journal of Speech Language Pathology, 10,
4050.
Wallace, G. L. (1996). Management of aphasic individuals from culturally and
linguistically diverse populations. In G.L.Wallace (Ed.), Adult aphasia rehabilitation.
Newton, MA: Butterworth-Heinemann.
Wertz, R. T., Ulatowska, H. K., Wallace, G., Payne, J. C., Chapman, S., Auther-Steffen,
L. L. et al. (2000, November). A comparison of aphasia in African Americans and
Caucasians. Paper presented at the meeting of the American Speech and Hearing
Association, Washington, DC.
Wolfram, W. (1992, September). The sociolinguistic model in speech and language
pathology. Keynote address at the International Conference on Inter-Disciplinary
Perspectives in Speech and Language Pathology, Dublin, Ireland. (ERIC
Documentation Reproduction Service No. ED 359 789.)
Wolfram, W., & Fasold, R. W. (1974). The study of social dialect in American English.
Englewood Cliffs, NJ: Prentice-Hall.

APPENDIX
QUALITY RATING SYSTEM
Coherence
4 All portions of the response are interconnected and clear
3 Most of the response is connected and clear, with some problems
2 Some of the elements of the response are connected
1 The discourse is not interpretable

128 APHASIOLOGY

0 No response
Reference
4 All referents and the relationship between them are clear
3 Some reference errors
2 Many reference errors
1 None of the referents, nor their relationship, is interpretable
0 No response
Emplotment
4 Full scenario is produced
3 Story or scenario is produced with some elements missing
2 Story or scenario is produced with many elements missing
1 Only brief mention of elements with no story or scenario observable
0 No response

The inter-rater reliability of the story retell


procedure
William D.Hula, Malcolm R.McNeil, and Patrick J.Doyle
VA Pittsburgh Healthcare System Geriatric Research Education &
Clinical Center,
and University of Pittsburgh, USA
Hillel J.Rubinsky
University of Pittsburgh, USA
Tepanta R.D.Fossett
VA Pittsburgh Healthcare System Geriatric Research Education &
Clinical Center,
and University of Pittsburgh, USA
Background: McNeil, Doyle, Fossett, Park, and Goda (2001) have
presented the story retell procedure (SRP) as an efficient means of
assessing discourse in adults with aphasia, in part because it provides
reliable, valid, and sensitive indices of performance without the need
for time-consuming transcription of language samples.
Aims: The purpose of this study was to demonstrate that the SRP,
when scored without transcription by judges with minimal training,
produces a reliable measure of information transfer.
Methods & Procedures: Four judges who had not used the SRP
previously scored audiorecorded language samples, produced by
four subjects with aphasia and eleven normal subjects, for percent
information units per minute (%IU/Min).
Outcomes & Results: The results demonstrate that the SRP has
high inter-rater reliability. Reliability coefficients ranged from .89 to .
995, and the standard error of measurement associated with interrater scoring error ranged from .59 to 1.42 %IU/Min. Point-to-point
reliability in scoring individual information units ranged from 85
95% and averaged 91% for both subject groups.
Conclusions: The SRP is a potentially useful tool for quantifying
connected language behaviour, and may be particularly valuable in
clinical and research settings where economy of assessment
procedures is essential.
In a series of recent publications, McNeil, Doyle, and colleagues have presented
information on a story retell procedure (SRP) used to elicit language samples
from persons with and without aphasia (Doyle et al., 2000; Doyle, McNeil,

130 APHASIOLOGY

Spencer, Goda, Cottrell, & Lustig, 1998; McNeil et al., 2001; McNeil, Doyle,
Park, Fossett, & Brodsky, 2002). The SRP consists of auditory presentation of
stories derived from Brookshire and Nicholass (1993) Discourse
Comprehension Test to a subject or patient, followed by an immediate retell. The
stories can be presented with or without picture support, and likewise, picture
support can be provided for the retells, or not, depending on the patient. It has
been argued that the SRP possesses some distinct advantages over other
connected language sampling procedures described in the literature, including
conversational observation (Oelschlaeger & Thorne, 1999), scripted interviews
(Goodglass & Kaplan, 1983), on-line video narration (McNeil, Small,
Masterson, & Fossett, 1995), fable generation and storytelling (Berndt, Wayland,
Rochon, Saffran, & Schwartz, 2000; Ulatowska, Chapman, Highley, & Prince,
1998), picture description (Nicholas & Brookshire, 1993, 1995; Yorkston &
Beukelman, 1980), and procedural description (Nicholas & Brookshire, 1993,
1995).
From a language sampling perspective, it has been suggested that the
constrained nature of the SRP enables it to provide a well-standardised and
replicable sample of language formulation and production. Specifically, data
have been presented to support the internal validity of the SRP (Doyle et al.,
1998) and the linguistic equivalence of language samples generated by four
alternate forms of the procedure (Doyle et al., 2000).
In addition, a scoring metric was developed to quantify the information content
and communicative efficiency of the samples generated by the SRP. This metric,
labelled the information unit (IU), was derived from Nicholas and Brookshires
(1993, 1995) correct information unit, and was defined as an identified word,
phrase, or acceptable alternative from the stimulus story that is intelligible and
informative and conveys accurate and relevant information about the story
(McNeil et al., 2001, p. 994). The primary virtue of the IU scoring metric used
with the SRP is that all possible IUs are known a priori and can be printed on score
sheets. This potentially allows scoring to be done directly from audio recordings,
eliminating the need for time-consuming transcription of lengthy language
samples. The IU scoring metric expressed as a percentage of total possible IUs (%

Address correspondence to: William D.Hula MS, Doctoral Fellow, Audiology &
Speech Pathology, VA Pittsburgh Healthcare System, 7180 Highland Drive,
Pittsburgh, PA 15206, USA. Email: william.hula@med.va.gov The authors
gratefully acknowledge the assistance of Stephanie Nixon and Joyce Poydence.
This research was supported by VA Rehabilitation Research and Development
Project # C8942RA.
2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000139

DOI:10.1080/

INTER-RATER RELIABILITY OF THE SRP 131

IU) has been demonstrated to be reliable across forms of the SRP and to have
good criterion validity (McNeil et al., 2001). Also, an efficiency measure
obtained by dividing %IUs by the time taken to produce them (%IU/Min) has
been demonstrated to be reliable across forms and to discriminate between
normal and aphasic performance with reasonable accuracy (McNeil et al., 2002).
In addition to reporting on the validity and alternate form reliability of the %IU
metric, McNeil and colleagues (2001) demonstrated that it has good interobserver reliability. However, these data were obtained from scoring of printed
transcripts that had themselves already been subjected to reliability checks.
Furthermore, all of the data presented thus far on the SRP have been generated
by scorers who were themselves involved in the development of the IU measure.
If the SRP and its associated IU metrics are to be used to their fullest advantage,
particularly in a clinical setting, they must be demonstrated to have acceptable
reliability when scored directly from audio recordings by observers who have
received training comparable to what a practising clinician could be expected to
receive.
One final shortcoming of prior work done to demonstrate the psychometric
strength of the SRP concerns the distinction between IUs that were directly
stated in the stimulus stories (direct IUs) and lUs retold as synonyms (alternate
IUs) of words and phrases contained in the stimulus stories. In a study
investigating memory demands of the SRP (Brodsky, McNeil, Park, Fossett,
Timm, & Doyle, 2000), a strong serial position effect was demonstrated for direct
lUs, but not for alternate IUs. Thus far, no data have been presented to
demonstrate that this distinction can be reliably scored.
The purpose of the current paper is to present additional information on the
inter-rater reliability of the SRP using procedures and raters more representative
of a clinical setting than have been used in the past. Inter-rater reliability
coefficients and standard errors of measurement (SEM) will be reported for the %
IU/Min score. Also, point-to-point reliability for identification of individual IUs
will be reported.
METHOD
Participants
Recordings of story retells by four persons with aphasia and eleven normal
individuals were used. All recordings were randomly drawn from the sample of
15 subjects with aphasia and 31 normal subjects reported by McNeil et al.
(2001). Descriptive statistics for the subjects with aphasia are presented in
Table 1. Judges were four individuals with varying amounts of experience with
aphasia and language transcription: a licensed psychologist, a masters student in
speech-language pathology, and two doctoral students who are also certified
speech-language pathologists. These judges were a convenience sample, as they

132 APHASIOLOGY

were all new employees in the second authors laboratory and required training
in scoring the SRP for their work. The two doctoral students both had 23 years
of work experience that involved transcription of language samples from clinical
populations. The psychologist had approximately 13 years of experience with
neuropsychological testing of rehabilitation patients, including patients with
aphasia, but little experience with language transcription per se. The masters
student had no experience with language transcription for research or clinical
purposes.

INTER-RATER RELIABILITY OF THE SRP 133

TABLE 1 Biographical and descriptive subject information for subjects with aphasia (N =
15)

Subjects chosen for reliability analysis in this study are marked with an (*).
MPO = Months post onset; RTT = Revised Token Test (McNeil & Prescott, 1978),
percentile compared to adults with left-hemisphere damage; ABCD ratio = Arizona
Battery for Communication Disorders of Dementia (Bayles & Tomoeda, 1993) ratio,
determined by number of delayed recall items/number of immediate recall items x 10;
Ravens = Ravens Coloured Progressive Matrices (Raven, 1976), raw score out of a
possible 36; PICA = Porch Index of Communicative Ability (Porch, 1981), percentile
compared to adults with lefthemisphere damage, OA = overall percentile and VRB = verbal
percentile.

Procedures
Prior to scoring any of the story retells, each of the four judges read the IU
definition and examples published by McNeil et al. (2001), and practised scoring
IUs on six to eight stories from printed transcripts. These language samples were
drawn from the samples collected by Doyle et al. (2000) and McNeil et al.
(2001). After training, each judge scored the same SRP form for each of the four
persons with aphasia and eleven normal subjects. Each form consisted of three
separate stories as reported by Doyle et al. (2000). All scoring was done from
audio files using score sheets containing all possible direct and alternative IUs.
Judges listened to each story as many times as they wanted to and placed a check
on the score sheet wherever an IU was observed. Wherever an alternate IU (as
opposed to a direct IU) was observed, they made an additional mark to denote

134 APHASIOLOGY

which of the predetermined synonyms was produced. The %IU/Min for each
story was calculated and averaged across the appropriate three-story form to give
a total %IU/Min score for each subject. The total %IU/Min score was also
broken down into %direct IU/Min and %alternate IU/Min to allow for assessment
of inter-rater reliability on these more specific measures.
The judges all reported that it generally took 1530 minutes for them to score
a single form (three stories) of the SRP for a single subject. Data on the time
spent scoring retells were kept for the least trained and experienced judge. Her
average time to complete a single form was 23 minutes (range = 1229; SD = 4).
RESULTS
Inter-rater reliability coefficients were calculated separately for subjects with
aphasia and normal subjects using the %total, %direct, and %alternate IU/Min
scores generated by each of the four judges for each of the subjects. To
determine a reliability coefficient that would allow for generalisation to judges
beyond those in this study, absolute-agreement intraclass correlation coefficients
(ICCs) were calculated with both subjects and judges as random factors. The ICC
has been argued to be a more conservative measure of reliability than the Pearson
Product Moment Correlation (Denegar & Ball, 1993). The ICCs are presented in
Table 2. They ranged from .94 to .995 for the subjects with aphasia and from .89
to .99 for the normal subjects. The SEM associated with inter-judge scoring error
was also calculated for each metric. These results are presented in Table 3 and
they ranged from .59 to .95 %IU/Min for the subjects with aphasia and from .99
to 1.42 %IU/Min for the normal subjects.
Point-to-point reliability between all six possible pairings of judges was
calculated separately for the four subjects with aphasia and for four of the
normal subjects. The
TABLE 2 Inter-rater reliability (intraclass) correlation coefficients for total, direct and
alternate %IU/Min
Subjects

Total

Direct

Alternate

Aphasic (n = 4)
Normal (n = 11)

0.995
0.993

0.986
0.979

0.944
0.885

All significant at p < .001.


TABLE 3 Inter-rater standard errors of measurement (SEM) for total, direct and alternate
%IU/Min
Subjects

Total

Direct

Alternate

Aphasic (n = 4)
Normal (n = 11)

0.69
0.99

0.95
1.42

0.59
1.04

INTER-RATER RELIABILITY OF THE SRP 135

formula used was [(agreements/disagreements + agreements) x 100]. Point-topoint reliability averaged 91% (range = 8595%) for both subject groups.
DISCUSSION
The inter-rater reliability for the %IU/Min metric, when scored directly from audio
recordings by newly and minimally trained judges, was high, with small
differences in scoring reliability among judges with differences in professional
experience. The SEMs were found to be much lower than the SEMs reported by
McNeil et al. (2002) for the four alternate forms for subjects with aphasia (range
= 4.85.6) and for the normal subjects (range = 3.24.7). The low SEMs suggest
that measurement error attributable to differences between raters is small relative
to the score variance due to the story forms themselves.
Furthermore, the present data, scored to include the direct vs alternate IU
distinction, demonstrated point-to-point reliability that was high and comparable
to previously reported values obtained from printed transcripts. Finally, the
preliminary data presented regarding the time needed to score language samples
elicited by the SRP suggest that it might be useful in clinical environments where
economy of assessment procedures is essential.
REFERENCES
Bayles, K. A., & Tomoeda, C. K. (1993). Arizona Battery for Communication Disorders
of Dementia. Tucson, AZ: Canyonlands Publishing, Inc.
Berndt, R. S., Wayland, S., Rochon, E., Saffran, E., & Schwartz, M. (2000). Quantitative
Production Analysis (QPA). Philadelphia: Psychology Press.
Brodsky, M., McNeil, M., Park, G., Fossett, T., Timm, N., & Doyle, P. (2000). Auditory
memory for story retelling in normal male, female, young, and old adult subjects in
persons with aphasia. Poster presented to the Academy of Aphasia Conference,
Montreal, CA.
Brookshire, R. H., & Nicholas, L. H. (1993). Discourse Comprehension Test. Tuscon,
AZ: Communication Skill Builders.
Denegar, C. R., & Ball, D. W. (1993). Assessing the reliability and precision of
measurement: An introduction to the intraclass correlation and standard error of
measurement. Journal of Sports Rehabilitation, 2, 3542.
Doyle, P. J., McNeil, M. R., Park, G., Goda, A., Rubenstein, E., Spencer, K., et al. (2000).
Linguistic validation of four parallel forms of a story retelling procedure.
Aphasiology, 14, 537549.
Doyle, P. J., McNeil, M. R., Spencer, K. A., Goda, A. J., Cottrell, K., & Lustig, A. P.
(1998). The effects of concurrent picture presentations on retelling of orally
presented stories by adults with aphasia. Aphasiology, 12, 561574.
Goodglass, H., & Kaplan, E. (1983). The assessment of aphasia and related disorders.
Philadelphia: Lea & Febiger.

136 APHASIOLOGY

McNeil, M. R., Doyle, P., Fossett, T., Park, G., & Goda, A. (2001). Reliability and
concurrent validity of an information unit scoring metric for the retelling procedure.
Aphasiology, 15, 9911007.
McNeil, M.R., Doyle, P., Park, G., Fossett, T., & Brodsky, M. (2002). Increasing the
sensitivity of the Story Retell Procedure for the discrimination of normal elderly
subjects from persons with aphasia. Aphasiology, 16, 815822.
McNeil, M.R., & Prescott, T.E. (1978). The Revised Token Test. Austin, TX: Pro-Ed.
McNeil, M.R., Small, S.L., Masterson, R.J., & Fossett, T.R. D. (1995). Behavioral and
pharmacological treatment of lexical-semantic deficits in a single patient with
primary progressive aphasia. American Journal of Speech-Language Pathology, 4,
7687.
Nicholas, L.E., & Brookshire, R.H. (1993). A system for quantifying the informativeness
and efficiency of the connected speech of adults with aphasia. Journal of Speech and
Hearing Research, 36, 338350.
Nicholas, L.E., & Brookshire, R.H. (1995). Presence, completeness, and accuracy of main
concepts in the connected speech of non-brain-damaged adults and adults with
aphasia. Journal of Speech and Hearing Research , 38, 145156.
Oelschlager, M.L., & Thorne, J.C. (1999). Application of the correct information unit
analysis to the naturally occurring conversation of a person with aphasia. Journal of
Speech, Language, and Hearing Research, 42, 636648.
Porch, B.E. (1981). Porch Index of Communicative Ability. Palo Alto, CA: Consulting
Psychologists Press.
Raven, J.C. (1976). Coloured Progressive Matrices. Oxford: Oxford Psychologists Press,
Ltd.
Ulatowska, H.K., Chapman, S.B., Highley, A.P., & Prince, J. (1998). Discourse in healthy
old-elderly adults: A longitudinal study. Aphasiology, 15, 619633.
Yorkston, K.M., & Beukelman, D.R. (1980). An analysis of connected speech samples of
aphasic and normal speakers. Journal of Speech and Hearing Disorders, 45, 2736.

You might also like