Professional Documents
Culture Documents
CONTENTS
Preface
Patrick J.Doyle
viii
Papers
Right hemisphere syndrome is in the eye of the beholder
Margaret Lehman Blake, Joseph R.Duffy, Connie A.Tompkins, and
Penelope S.Myers
14
26
39
55
77
99
114
129
Editor
Patrick J.Doyle, Ph.D., Geriatric Research Education & Clinical Center, VA
Pittsburgh Healthcare System, and Department of Communication Science and
Disorders, University of Pittsburgh, Pittsburgh, PA, USA.
Associate Editors
Kirrie J.Ballard, Ph.D., Department of Speech Pathology and Audiology,
University of Iowa, Iowa City, IA, USA.
Annette Baumgaertner, Ph.D., Neurologische Universitaetsklinik, University
of Hamburg, Hamburg, Germany.
Mary Boyle, Ph.D., Department of Communication Sciences and Disorders,
Montclair State University, Upper Montclair, NJ, USA.
Carol Frattali, Ph.D., National Institutes of Health, Bethesda, MD, USA.
Michael E.Groher, Ph.D., Department of Communicative Disorders,
University of Florida Health Science Center, Gainesville, FL, USA.
Katherine Odell, Ph.D., Department of Communication, University of
Wisconsin, Madison, WI, USA.
Grace H.Park, Ph.D., National Institutes of Health, NIDCD, Language Section,
Bethesda, MD, USA.
Anastasia Raymer, Ph.D., Child Study Center, Old Dominion University,
Norfolk, VA, USA.
Linda Schuster, Ph.D., University of West Virginia, Morgantown, WV, USA.
Nina Simmons-Mackie, Ph.D., Department of Communication Science &
Disorders, Southeastern Louisiana University, Hammond, LA, USA.
Evy Visch-Brink, Ph.D., Department of Neuropsychology, Erasmus
University Rotterdam, Rotterdam, The Netherlands.
iv
APHASIOLOGY
SUBSCRIPTION INFORMATION
Subscription rates to Volume 17, 2003 (12 issues) are as follows:
To individuals: UK 361.00; Rest of World $596.00
To institutions: UK 857.00; Rest of World $1414.00
A subscription to the print edition includes free access for any number of
concurrent users across a local area network to the online edition, ISSN 1464
5041.
Print subscriptions are also available to individual members of the British
Aphasiology Society (BAS), on application to the Society.
For a complete and up-to-date guide to Taylor & Francis Groups journals and
books publishing programmes, visit the Taylor and Francis website: http://
www.tandf.co.uk/
Aphasiology (USPS permit number 001413) is published monthly. The 2003
US Institutional subscription price is $1414.00. Periodicals postage paid at
Champlain, NY, by US Mail Agent IMS of New York, 100 Walnut Street,
Champlain, NY.
US Postmaster: Please send address changes to pAPH, PO Box 1518,
Champlain, NY 12919, USA.
Dollar rates apply to subscribers in all countries except the UK and the
Republic of Ireland where the pound sterling price applies. All subscriptions are
payable in advance and all rates include postage. Journals are sent by air to the
USA, Canada, Mexico, India, Japan and Australasia. Subscriptions are entered
on an annual basis, i.e. from January to December. Payment may be made by
sterling cheque, dollar cheque, international money order, National Giro, or
credit card (AMEX, VISA, Mastercard).
vi
Orders originating in the following territories should be sent direct to the local
distributor.
India Universal Subscription Agency Pvt. Ltd, 101102 Community Centre,
Malviya Nagar Extn, Post Bag No. 8, Saket, New Delhi 110017.
Japan Kinokuniya Company Ltd, Journal Department, PO Box 55, Chitose,
Tokyo 156.
USA, Canada and Mexico Psychology Press, a member of the Taylor &
Francis Group, 325 Chestnut St, Philadelphia, PA 19106, USA
UK and other territories Taylor & Francis Ltd, Rankine Road, Basingstoke,
Hampshire RG24 8PR.
The print edition of this journal is typeset by DP Photosetting, Aylesbury and
printed by Hobbs the Printer, Totton, Hants. The online edition of this journal is
hosted by Metapress at journalsonline.tandf.co.uk
Copyright 2003 Psychology Press Limited. All rights reserved. No part
of this publication may be reproduced, stored, transmitted or disseminated,
in any form, or by any means, without prior written permission from
Psychology Press Ltd, to whom all requests to reproduce copyright material
should be directed, in writing.
Psychology Press Ltd grants authorization for individuals to photocopy
copyright material for private research use, on the sole basis that requests for
such use are referred directly to the requestors local Reproduction Rights
Organization (RRO). In order to contact your local RRO, please contact:
International Federation of Reproduction Rights Organisations (IFRRO), rue
de Prince Royal, 87, B1050 Brussels, Belgium; email: ifrro@skynet.be
Copyright Clearance Centre Inc., 222 Rosewood Drive, Danvers, MA 01923,
USA; email: info@copyright.com Copyright Licensing Agency, 90 Tottenham
Court Road, London, W1P 0LP, UK; email: cla@cla.co.uk This authorization
does not extend to any other kind of copying, by any means, in any form, and for
any purpose other than private research use.
Preface
The papers that appear in this special edition of Aphasiology were selected based
upon their theoretical importance, clinical relevance, and scientific merit, from
among the many platform and poster presentations comprising the 32nd Annual
Clinical Aphasiology Conference held in Ridgedale, Missouri in June of 2002.
Each paper was peer-reviewed by the Editorial Consultants and Associate
Editors acknowledged herein consistent with the standards of Aphasiology and
the rigours of merit review that represent this indexed, archival journal.
Patrick J.Doyle, Ph. D.
VA Pittsburgh Healthcare System
Pittsburgh, PA, USA
2 APHASIOLOGY
study indicated that the most commonly diagnosed deficits were in attention,
neglect, visuoperception, and learning/memory. Additionally, the deficit
categories of calculation, hyperaffectivity, and linguistics were not closely
related to any of the other deficits evaluated.
The current study examined the same group of patients to explore some
questions that remained unanswered after the initial study. Deficit categories
analysed in the original study were based on diagnoses made by four disciplines
combined: neurology/physiatry, neuropsychology, occupational therapy (OT),
and speech-language pathology (SLP). Because diagnoses from all four
professions were used, the picture of right hemisphere syndrome described in the
previous study (Lehman Blake et al., 2002) may not be reflective of the typical
caseload of RHD patients seen by a US speech-language pathologist, because
SLPs and other professionals may not recognise or identify the same deficits.
Thus, the first aim was to evaluate whether prevalence and deficit patterns differ
when diagnoses are made by SLPs versus other disciplines. Differences between
disciplines may provide insight into how cognitive/communicative deficits are
perceived by various medical professionals. The previous study also indicated
that while 94% of cases exhibited at least one cognitive/communicative deficit,
only 44% were referred for an SLP evaluation. Thus, the second aim was to
examine which deficits are likely to lead to a referral to SLP.
METHOD
Inpatient medical records were reviewed for patients with RHD consecutively
admitted to a US inpatient rehabilitation unit over a 3year period. Diagnoses of
RHD were made by neurologists. For 88% of the cases, CT or MRI scans
confirmed the diagnosis. The initial list contained 246 cases. Seven of these were
excluded because the patients did not release their medical records for research
purposes. Another 117 cases were excluded due to incomplete charts, lesions
restricted to the cerebellum or brain stem, other neurological disease (e.g.,
dementia, Parkinsons disease), psychiatric disorder other than depression, and/
or bilateral cerebral lesions (see Lehman Blake et al., 2002, for complete details
of the chart review). This left a total of 122 cases available for group analyses.
Demographic and clinical data are provided in Table 1. Information about the
presence or absence of selected disorders and deficits was obtained from
inpatient neurology/ physiatry, neuropsychology, OT, and SLP reports. As
detailed in Lehman Blake et al. (2002), the long list of diagnostic labels obtained
from the medical charts was reduced to 14 deficit categories based on broad
traditional classifications (e.g., linguistics, attention,
4 APHASIOLOGY
TABLE 1 Demographic and clinical information for cases with lesions restricted to the
right hemisphere
Demographic and clinical
variables
Sex
71 male
51 female
68.6 (12.4)
1295
26 male
28 female
68.6 (12.6)
1594
12.0 (3.0)
720
12.0 (3.3)
720
87% right
5% left
1% ambidextrous
[7% missing]
86% RH stroke
14% other medical
condition*
81% no previous stroke
19% prior RH stroke
87% right
9% left
2% ambidextrous
[2% missing]
87% RH Stroke
13% other medical
condition*
83% no previous stroke
17% prior RH stroke
13.9 (26.6)
0240
13.7 (21.5)
1120
Age (years)
Mean (SD)
Range
Education (years)
Mean (SD)
Range
Handedness
Prevalence diagnosed by
neurology/physiatry,
neurophyschology and OT
Prevalence diagnosed by
SLP only
neglect
attention
perception
learning/memory
reasoning & problem
solving
other cognitive deficits
orientation
awareness
hyperresponsive
hyporesponsive
calculation
hypoaffective
linguistic
hyperaffective
aprosodia
interpersonal interactions
66.4%
63.9%
58.2%
58.2%
56.6%
53.7%
35.2%
27.8%
24.1%
37.0%
45.1%
40.2%
38.5%
36.1%
30.3%
28.7%
24.6%
21.3%
15.6%
12.3%
7.4%
42.6%
27.8%
27.8%
18.5%
38.9%
5.6%
18.5%
24.1%
7.4%
25.9%
29.6%
pragmatic deficits in only 13% of those same patients. Aprosodia also was
diagnosed twice as often by SLPs (26%) as by the other professionals (12%).
In order to examine differences in patterns of co-occurrence related to
disciplines, hierarchical cluster analyses (SPSS, 1999) were performed. A cluster
analysis is an exploratory tool that identifies related groups or clusters within a
6 APHASIOLOGY
body of data (Aldenderfer & Blashfield, 1984). The two categories that co-occur
most often are linked to form a cluster, and the linking continues until all categories
fit into a specified number of clusters. For the current purposes, clusters were
based on how often deficit categories co-occurred across the sample of cases. Six
clusters were specified based on findings from the previous study (Lehman
Blake et al., 2002). Analyses were conducted first on the data from SLP
diagnoses, then on diagnoses from the other disciplines combined. As shown in
Table 3, the affective deficits (hypoaffective and hyperaffective) separated into
their own clusters when diagnosed by either SLP or other professionals. This
result indicates that these deficit categories are relatively dissimilar to all others,
regardless of who makes the diagnosis. No other obvious patterns were
identified.
Phi correlation coefficients also were computed to evaluate similarities
between the diagnoses by SLPs versus other disciplines. Based on Cohens rule
of thumb for evaluating correlation coefficients (Cohen, 1988), moderate to high
correlations were obtained for diagnoses of linguistics (phi = .70), attention (phi
= .46), and neglect (phi = .42). Small correlations were obtained for all other
deficits and deficit categories (phi = .15 to .29), with the exception of learning/
memory (phi = .09).
To address the second aim, identifying which patients with RHD are referred
to SLP, chi-square cross-tabulation analyses (SPSS, 1999) were performed to
evaluate the association between referral to SLP and presence of deficits. First,
the relationship
TABLE 3 Results of cluster analyses for diagnoses by speech-language pathologists (a)
and other medical professionals (b)
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Cluster 6
Cluster 1
Cluster 2
Cluster 3
orientation
reasoning
other
cognitive
neglect
Cluster 4
Cluster 5
Cluster 6
8 APHASIOLOGY
orientationorientation*
absent
40
present
23
attentionattention
absent
22
present
41
recalllearning/memory*
absent
22
present
41
abstractionreasoning/prob.solving
absent
23
present
40
constructionvisuoperception
absent
9
present
20
mathscalculation*
42
absent
21
present
Means (SD)
7.7 (.72)
6.8 (1.7)
2.98
.002
6.0 (1.5)
6.1 (1.0)
0.16
.88
2.6 (1.4)
1.7 (1.4)
2.39
.010
2.5 (.90)
2.l (2.1)
1.35
.09
2.4 (1.4)
2.1 (1.2)
0.78
.22
2.5 (1.3)
1.9 (1.6)
1.74
.04
Present = deficit diagnosed as present by at least one of four disciplines; Absent = deficit
not present.
* Significantly different at p < .05.
from only one rehabilitation unit, and thus are influenced by sampling biases
present in that facility. Despite these limitations, broad clinical implications can
be drawn. One implication is that the characteristics of right hemisphere
syndrome may vary depending on who makes the diagnosis, as prevalence of
deficits may be a reflection of the biases of the professional conducting the
evaluation. There appears to be substantial overlap across disciplines in the
conceptualisation and recognition of attention, neglect, and linguistic deficits,
but much diversity across disciplines for other cognitive/communicative
disorders. The important question is not who is right? about the deficits that
occur after RHD; the data suggest that different professionals focus on different
deficits, which is appropriate given the training and expertise that characterise
various professions. The relevant question that arises from this study is how can
we ensure that patients with RHD are appropriately referred to SLP when they
exhibit deficits that are not consistently recognised by those professionals who
make such referrals?.
There is no obvious explanation for which patients are referred to SLP. The
presence of some deficits (e.g., interpersonal interactions, aprosodia, neglect)
was associated with SLP referrals. This suggests that when other professionals
do identify communicative disorders (pragmatic deficits and aprosodia), they
refer those patients to SLP. However, the frequency analyses indicated that
neurologists, neuropsychologists, and OTs do not consistently identify
communicative deficits, or may not be as stringent in judging aspects of
communication, and thus many appropriate referrals are missed. Performance on
a general mental status screening test was not meaningfully related to referrals or
to higher-level deficit categories, and thus does not add much information about
how referral decisions are made. Several factors not taken into account here
include experience of the referring neurologist/physiatrist, and individual
referring preference. For example, some physicians are more likely to refer to
SLPs due to their approach to referring in general, without regard for patients
specific deficits.
As discussed in the initial study (Lehman Blake et al., 2002), one important
weakness with current practices of diagnosis and treatment of adults with RHD
is the absence of a definition of right hemisphere syndrome. This study suggests
that different disciplines have their own criteria or expectations regarding what
deficits may occur after RHD, likely based on their professional expertise. While
it is appropriate that different disciplines focus on different disorders, some
patients may not receive proper referrals if deficits that can be treated by one
discipline are not recognised by another. Related to this problem is the lack of
consistent terminology, both within and across disciplines. A descriptive
definition of the deficits associated with RHD would benefit our discipline and
would be a step towards developing criteria for other disciplines to use when
making decisions about referral for SLP evaluation and management. Of course,
terminology or definitions alone cannot solve the problems associated with
diagnosis and treatment of right hemisphere syndrome, and it may be impossible
10 APHASIOLOGY
APPENDIX A
Deficits and deficit categories defined by Lehman Blake et al. (2002)
Category
Description
Illustrative labels
encompassed under
category
Hyperaffective
heightened affective
response
Category
Description
Illustrative labels
encompassed under
category
Hypoaffective
dampened or restricted
affective response
ability to focus on stimuli;
includes focused,
sustained, and divided
attention
awareness of, or insight into
deficits and consquences of
the deficits
ability to learn and retain
new information
visual and tactile perception
and construction
flat affect
Attention
Awareness
Learning/Memory
Perception
Hyperresponsive
Hyporesponsive
heightened responsivity to
stimuli verbosity, talkative,
tangential,
dampened or restricted
responsivity to stimuli
Linguistic
Orientation
Calculation
mathematical skills
Attention, concentration,
distractible
12 APHASIOLOGY
Category
Description
Illustrative labels
encompassed under
category
Interpersonal Interactions
behavioural aspects of
interpersonal
communication
aprosodia
visuospatial neglect
APPENDIX B
The Short Test of Mental Status (Kokmen et al., 1987)
Orientation (8 points)
Full name, day, date, month, year, address, city, building name (1 point per
item)
Attention: forward digit span (7 points)
Repeat a string of numbers, starting with five digits, increasing to seven
Score is number of digits correctly repeated
Learning (4 points)
Patient repeats four words after all are presented (apple, Mr. Johnson, charity,
tunnel)
Examiner can repeat the words up to four times if needed for patient to learn
all of them.
Score is the total number of words, minus number of trials needed if more than
one
(e.g., if patient requires two trials, then score = 3; if patient requires only one
trial, score = 4)
Calculation (4 points)
multiply 5 by 13
substract 7 from 65
divide 58 by 2
add 11 and 29
Abstraction: similarities (3 points)
orange/banana
horse/dog
table/bookcase
Information (4 points)
current president
first president
number of weeks in a year
define the word island
Construction (4 points)
http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000085
DOI:
10.1080/
16 APHASIOLOGY
Score or result
79
Measure
Score or result
Aphasia Classification
Subtests (AQ totals)
Spontaneous Speech
Comprehension
Repetition
Naming
Test of Adolescent/Adult Word Finding (German, 1990)
Total raw score (107 possible)
Subtests
Picture Naming: Nouns
Sentence Completion Naming
Description Naming
Picture Naming: Verbs
Category Naming
Spoken WordPicture Matching
(Comprehension) (88 possible)
Category Sorting (informal assessment)
Total items correct (7 categories, 49 possible correct)
PALPA Subtests (Kay, Lesser, & Coltheart, 1992)
Auditory synonym judgements (#49)
High imageability (30 possible)
Low imageability (30 possible)
Word semantic association (printed stimuli, #51)
High imageability (15 possible)
Low imageability (15 possible)
Confrontation Naming of Objects: Snodgrass and Vandenvart (1980)
items
(1st administration)
Total items correct (260 possible)
Error analysis (71 errors)
Semantic paraphasias
Phonemic paraphasias
Mixed semantic & phonemic paraphasias
Gestural response
Unrelated, realword response
Neologism
Perseverative response
No response
Calculated as percentage of total errors.
Anomic
16.5
9.5
6.8
6.7
19 (18%)
4
4
1
2
8
88 (100%)
49 (100%)
28 (93%)
21 (70%)
15 (100%)
15 (100%)
189 (73%)
26 (37%)*
5 (7%)*
9 (13%)*
3 (4%)*
5 (7%)*
5 (7%)*
1 (1%)*
17 (24%)*
18 APHASIOLOGY
20 APHASIOLOGY
SCT items following five treatment sessions and reached 83% correct responding
for PCT items.
The participants responses to untrained items (Sets 3 and 4) remained relatively
stable during the initial training phase. Specifically, there was an increase of one
additional item named correctly for the SCT set in comparison to the maximum
baseline level (i.e., an increase from 25% correct to 33% correct). Because of
this increase, additional probing was conducted to establish stability of
responding. Two additional probes (see sessions 9 and 10 on the lower graph)
indicated relative stability at 33% correct for Sets 3 and 4.
After the termination of treatment with Sets 1 and 2 and the extended probing
of Sets 3 and 4, treatment was then extended to the untreated sets: PCT was
applied to Set 3 and SCT was applied to Set 4. Increases in correct responses
were seen for both sets. Criterion was reached for Set 3 (SCT) after nine
treatment sessions. Correct naming of Set 4 (PCT) items reached 75%.
Probes of performance with Set 1 and 2 items during treatment of Sets 3 and 4
revealed initially strong maintenance (i.e., 100% and 83% accuracy for SCT and
PCT items, respectively) followed by a reduction in accuracy (i.e., 67% and 58%
accuracy for SCT and PCT, respectively). However, follow-up probing at 2 and 6
weeks following cessation of all treatment indicated maintenance of trained
behaviours at levels that approximated treatment probe performance for all sets:
PCT #1 (Set 1) = 83% and 75%; SCT #1 (Set 2) = 83% and 92%; PCT #2 (Set 3)
= 75% and 75%; SCT #2 (Set 4) = 92% and 92%.
CONCLUSIONS
The results of this investigation are in accord with findings by VischBrink et
al. (2002) and Wambaugh et al. (2001) in that both treatments produced positive
changes with this participant. Although the participant displayed superior
performance with SCT, he did respond positively to PCT as well.
The participant achieved higher levels of accuracy of naming with SCT for
two treatment comparisons (i.e., the ATD was replicated within the participant).
For both treatment comparisons, he correctly named SCT items at levels that
were approximately 20% higher than PCT items. This difference also remained at
6 weeks posttreatment. His greater success with SCT may be related to word
retrieval behaviours that were observed prior to the start of treatment. That is, he
often spontaneously used semantically related sentence cues and descriptions to
facilitate word retrieval. It is unknown whether this self-cueing strategy was selfinitiated or was a result of previous therapy. In either case, the participant may
have been predisposed to favour semantic cues. It is also possible that SCT had a
more facilitative impact than PCT in effecting accurate lexical processing.
It should be noted that the participant received a limited number of treatment
sessions during both treatment phases (i.e., five applications of each treatment
during the first treatment phase and nine applications during the second phase).
Additional treatment sessions may have resulted in increased levels of accuracy
of responding to PCT items. That is, the maximal effects of PCT may not have
been observed with this participant.
22 APHASIOLOGY
The results of this investigation provide further support for the use of SCT and
PCT, in that both appear likely to be beneficial in promoting increased accuracy
of naming of trained items. Clinicians may consider the use of a period of trial
therapy in the form of an ATD to assist in treatment selection.
The use of an ATD to compare speech/language treatments is almost always
complicated by the issue of possible generalisation effects. Although the
measurement of additional, untreated behaviours is not a requisite in the
application of an ATD (Barlow & Hersen, 1984), the use of such measurements
may assist in the determination of the presence of potential generalisation
effects. However, in the case of treating a behaviour that may improve through
repeated exposure to probe stimuli (as in the case of word retrieval),
improvements in performance may be misinterpreted as generalisation. If
measuring untrained behaviours repeatedly, the investigator may wish to
measure other untreated behaviours at pre- and post-treatment intervals (i.e.,
limited repeated measurement) to compare the effects of repeated exposure on
untrained behaviours. If previous research has indicated that generalisation
effects can be expected to be minimal and the researchers interest in is the
relative differences of treatments being administered concurrently, the researcher
may chose to forgo the repeated measurement of untreated behaviours and utilise
a more traditional ATD. Regardless of design specifics, the replication of the
observed effects is recommended both within and across speakers to strengthen
internal and external validity, respectively.
REFERENCES
Barlow, D. H., & Hersen, M. (1984). Single case experimental designs: Strategies for
studying behavior change (2nd ed.). New York: Pergamon Press.
Blomert, L. (1992). The AmsterdamNijmegen Everyday Language Test (ANELT). In
N.Steinbuchel & D.Y. von Cramon (Eds.), Neuropsychological rehabilitation
(pp. 121127). Berlin: Springer Verlag.
Davis, A., & Pring, T. (1991). Therapy for word-finding deficits: More on the effects of
semantic and phonologic approaches to treatment with dysphasic patients.
Neuropsychological Rehabilitation, 1, 135145.
German, D. J. (1990). Test of adolescent/adult word finding. Austin, TX: Pro-Ed.
Howard, D., Patterson, K., Franklin, S., Orchard-Lisle, V., & Morton, J. (1985). Treatment
of word retrieval deficits in aphasia: A comparison of two methods. Brain, 108,
817829.
Kay, J., Lesser, R., & Coltheart, M. (1992). Psycholinguistic Assessment of Language
Processes in Aphasia (PALPA). Hove, UK: Lawrence Erlbaum Associates Ltd.
Kertesz, A. (1982). The Western Aphasia Battery. New York: Grune & Stratton.
Miceli, G., Amitrano, A., Capasso, R., & Caramazza, A. (1996). The treatment of anomia
resulting from output lexical damage: Analysis of two cases . Brain and Language,
52, 150174.
Nickels, L., & Best, W. (1996). Therapy for naming disorders (Part I): Principles,
puzzles, and progress. Aphasiology, 10,
APPENDIX A
EXPERIMENTAL STIMULI
List 1PCT #1
List 2SCT #1
List 3PCT #2
List 4SCT #2
axe
cannon
eagle
frog
ironing board
kettle
necklace
peanut
sled
suitcase
toothbrush
wheel
alligator
bowl
chisel
fence
lips
lobster
mitten
pumpkin
rollerskate
ruler
thread
toaster
basket
cigarette
couch
crown
donkey
garbage can
grapes
mushroom
saw
skirt
traffic light
wagon
bottle
bow
caterpillar
desk
doorknob
envelope
guitar
kangaroo
lettuce
pliers
tennis racket
windmill
24 APHASIOLOGY
APPENDIX B
DESCRIPTION OF TREATMENTS
Semantic Cueing Treatment (SCT)
Prestimulation. The target item was presented in picture form with three picture
foils (two semantically related, one unrelated). The examiner provided a verbal
phrase corresponding to the item and asked the participant to point to the correct
picture.
Cueing hierarchy. The application of the steps of the hierarchy was responsecontingent. The steps were applied sequentially until a correct naming response
was elicited. Then, the order of the steps was reversed, to elicit correct responses
at each of the preceding steps. In the event that an incorrect response occurred
during the hierarchy reversal, the order of hierarchy steps was again reversed
until a correct response was obtained.
(1) Picture of target item presented, naming response requested, verbal feedback
provided for correct or incorrect responses (78-second response time
allowedsame for following steps).
(2) Picture of target item presented along with a verbal description of target,
naming response requested, verbal feedback provided for correct or
incorrect responses (e.g., target = cow, a farm animal that gives milk).
(3) Picture of target item presented along with a semantically nonspecific
sentence completion phrase, naming response requested, verbal feedback
provided for correct or incorrect responses (e.g., The farmer fed the).
(4) Picture of target item presented along with a semantically loaded sentence
completion phrase, naming response requested, verbal feedback provided for
correct or incorrect responses (e.g., The farmer went to the barn to milk
the).
(5) Picture of target item presented along with verbal model of target word,
repetition of target word requested.
Phonologic cueing treatment (PCT)
Prestimulation. The target item was presented in picture form with three picture
foils (two phonetically related, one unrelated). The examiner provided a verbal
phrase corresponding to the item and asked the participant to point to the correct
picture.
Cueing hierarchy. The application of the steps of the hierarchy was the same
as above.
(1) Picture of target item presented, naming response requested, verbal feedback
provided for correct or incorrect responses (78-second response time
allowedsame for following steps).
(2) Picture of target item presented along with a verbal production of a non-real
word that rhymed with the target (e.g., target = pig, it rhymes with chig).
(3) Picture of target item presented along with a verbal first sound cue (e.g., it
starts with /p/).
(4) Picture of target item presented along with a sentence completion phrase
that included the rhyme and the sound cue, naming response requested,
verbal feedback provided for correct or incorrect responses (e.g., The name
of this picture rhymes with chig, it is a /p/).
(5) Picture of target item presented along with verbal model of target word,
repetition of target word requested.
Adults with aphasia present with word retrieval deficits during discourse
production. These deficits may present themselves in discourse through the
persons use of nonreferential terminology, pauses, filler terms, paraphasias, or
neologisms. Typically, adults with a nonfluent type of aphasia use pauses and
filler terms as they struggle with verbal output. By contrast, adults with a fluent
type of aphasia have little difficulty with verbal output, although they do produce
paraphasias and neologisms during verbal production.
Clearly, an important aspect to aphasia assessment is the analysis of discourse
production, especially given the fact that many of the above characteristics are
primarily detectable through the analysis of discourse. Several researchers have
assessed percent of information units provided by adults with aphasia when
stimuli are controlled (e.g., McNeil, Doyle, Fossett, Park, & Goda, 2001;
Nicholas & Brookshire, 1993). However, one important aspect of discourse that
has not been readily assessed in adults with aphasia is the lexical diversity of
their verbal production. Given the observation that many of the error types
observed in adults with aphasia appear to be, at least in part, lexical in nature, it
seems of particular importance to refine the tools we use in measuring aspects of
the lexical domain in discourse production.
One measure of lexical diversity in conversation has enjoyed particular
popularity in the child language literature for decades: type-token ratio (TTR).
TTR is a measure of conversational vocabulary and is defined as the ratio of the
total number of different words in a language sample to the total number of
http://www.tandf.co.uk/journals/pp/02687038.html DOI:10.1080/
02687030344000166
28 APHASIOLOGY
words in the sample (Miller, 1981; Templin, 1957). Ratios closer to 0 reflect less
diversity of vocabulary, whereas values closer to 1.0 reflect greater diversity. As
was identified early on, however, TTR measurements are sensitive to sample size
variations; larger samples tend to yield lower TTR values than smaller samples
(Fillenbaum, Jones, & Wepman, 1961; Hess, Sefton, & Landry, 1986). Wachal
and Spreen (1973) used TTR and a variety of TTR-based alternatives (i.e., mean
segmental TTR, Johnson, 1944; bilogarithmic TTR, Chotlos, 1944; Herdan,
1960), designed to account for differences in sample sizes across participants, to
measure vocabulary diversity in adults with aphasia and their non-brain-damaged
counterparts. They found that adults with aphasia presented with less lexical
diversity in conversation compared to adults with no brain damage, and that
several of the measures, including the original TTR calculation, mean segmental
TTR , bilogarithmic TTR, and root TTR (Guiraud, 1959), significantly
differentiated the two groups. The concern about the sensitivity of TTR to sample
size variation is, although perhaps to a lesser extent, also valid for many of the
various transformations of TTR. For example, mean segmental TTR is the
average TTR for several consecutive, equal-length segments of the sample. This
would allow for comparisons of different sample lengths, as long as equivalent
sample segment sizes were used. However, since segment size must be
controlled, this measure is still dependent on sample size.
Other investigations involving the use of TTR as a measure of lexical diversity
in discourse production of adults with aphasia (e.g., Prins, Snow, & Wagenaar,
1978; Spreen & Wachal, 1973) have produced similar findings relative to the
productive vocabulary of adults with aphasia, but they highlight the central
weakness of TTR; its sensitivity to sample size variation. More recently, although
within the area of child language research, there have been some data to suggest
that TTR, when used in the diagnosis of specific language impairment, is not
sufficiently sensitive to separate the lexical performance of children with and
without language impairments (Watkins, Kelly, Harbers, & Hollis, 1995). In
particular, Watkins and colleagues found that, even when samples were truncated
to 50- and 100-utterance subsamples, TTR did not distinguish between the two
groups. It was only when samples were controlled for the number of words,
rather than utterances (i.e., calculating the number of different words occurring in
subsamples of 100 and 200 total words) that the measure differentiated between
the two groups. Thus, it appears rather critical to control sample size by
truncating samples to a common length. Given the findings of Watkins and
colleagues (1995), sample length should be determined by the number of words,
rather than the more common practice of determining length by the number of
utterances.
Relatedly, number of different words (NDW) has also been used to estimate
the diversity of conversational vocabulary across clinical populations (e.g.,
Ratner & Silverman, 2000; Watkins et al., 1995). Of importance, however,
although NDW has become the preferred measure in child language studies (e.g.,
Dollaghan et al., 1999; Goffman & Leonard, 2000), investigators who compute
30 APHASIOLOGY
Gender
Education M/p
(in
CVA1
years)
WAB
AQ2
And
comp3
Sample
size
4NF1
Female
Female
Female
Female
Female
Female
Male
Female
Female
Male
Male
Female
Female
Male
Male
Female
Female
Male
14
14
20
14
19
18
16
14
11
15
12
12
13
16
16
15
13
16
80.6
81.0
80.6
72.3
82.0
79.5
81.9
70.1
67.0
71.0
76.3
85.6
87.3
87.6
94.0
85.4
93.0
92.0
9.00
10.00
9.80
9.45
9.10
9.75
9.35
8.75
7.20
8.70
8.65
9.70
9.95
9.80
10.00
9.30
9.60
8.20
421
489
273
587
277
409
375
208
240
392
598
396
463
438
655
427
474
458
NF2
NF3
NF4
NF5
NF6
NF7
NF8
NF9
5F1
F2
F3
F4
F5
F6
F7
F8
F9
1
86
59
57
85
47
53
38
52
35
67
76
54
55
76
63
76
60
83
25
41
72
25
48
261
10
12
204
15
163
6
29
12
17
6
42
8
32 APHASIOLOGY
The samples were audiorecorded, then transcribed and coded according to the
conventions of the Child Language Data Exchange System (CHILDES;
MacWhinney, 2000). The CHILDES system consists of a transcription protocol
(CHAT) and a series of language analysis programs (CLAN). Samples were first
transcribed verbatim and then coded for CLAN analysis. For inter-rater
agreement, the first author reviewed 22% of the audiotaped samples for
correspondence to the transcript. Word-by-word agreement was determined to be
98.5%. Revisions, direct repetitions, and fillers were coded for exclusion, so as
not to be counted in calculations of vocabulary diversity. This decision was made
because counting them essentially penalised participants who were more
disfluent, regardless of the lexical content of their language. In addition,
paraphasias that were recognisable English words were transcribed verbatim,
instead of attempting to discern intended lexical targets; that is, the transcriptions
are reflective only of the words and utterances actually produced.
Unrecognisable words and neologisms were coded for exclusion from the
transcripts. Each of the language samples obtained had at least 200 words; a
minimum sample length of 50 words is required to compute D. Sample size
ranged from 208 to 587 words for the NF group (X = 364.33; SD = 125.65) and
from 392 to 655 words (X = 477.89; SD = 89.92) for the F group; the F group
produced significantly more words than the NF group, t(8) = 2.63, p < .05.
Language analysis
Each sample was subjected to three measures of lexical diversity, each
performed by CLAN (MacWhinney, 2000): D, NDW, and TTR. Because TTR is
known to be sensitive to sample size variation, truncated samples of the middle
100 and 200 words were obtained, and each was subjected to the three analyses.
This procedure, similar to that of Watkins and colleagues (1995), allowed for a
more equitable comparison of TTR, NDW, and D. If these measures are each
performed as originally intended (restricting sample size for TTR and NDW, and
using whole samples for D), they should be strongly correlated with each other.
RESULTS
Relationships among D, NDW, and TTR
Pearson correlations were performed to determine the possible relationships
among D, NDW, and TTR. Because of the number of correlations performed, a
Bonferroni adjustment was applied to minimise the probability of Type I error.
Consequently, results with p values of < .001 were considered significant. As
shown in the correlation matrix in Table 2, when the samples were truncated to
100 and 200 words, the three measures correlated with each other for each
sample length. However, when whole samples were used, none of the three
measures was significantly related to the others at the p < .001 level. In addition,
as shown in the correlation matrix, although each of the three D sample sizes
were significantly correlated with each other, this was not the case for the other
two measures; correlations between the three sample sizes were observed only
once for TTR (between TTR100 and TTR200) and twice for NDW (between
whole samples NDW and NDW200, and between NDW100 and NDW200).
Finally, the relationships among lexical diversity measures, when used as
intended, were assessed. In particular, the relationships among D whole
samples, NDW 100- and 200-words, and TTR 100and 200words were
evaluated. The results indicated that each of these correlations was significant at
the p < .001 level.
TABLE 2 Pearson correlation matrix for whole samples, 100word samples, and 200
word samples of NDW, TTR, and D
NDW
NOW 100
NDW 200
TTR
TTR 100
TTR 200
D
D 100
1
NDW1
NDW
100
NDW
200
TTR2
TTR
100
TTR
200
D 100
D 200
1.0
.66
1.0
.79*
.92*
1.0
.06
.46
.46
1.0
.66
1.0*
.92*
.46
1.0
.79*
.91*
1.0*
.44
.91*
1.0
.69
.77*
.81*
.56
.77*
.80*
1.0
.63
.96*
.86*
.48
.96*
.85*
.79*
1.0
.78*
.88*
.95*
.40
.88*
.95*
.86*
.86*
34 APHASIOLOGY
DISCUSSION
The first objective of the investigation was to examine the relationships among D,
TTR and NDW across three sample lengths: whole sample, 100word, and 200
word. Since D is a relatively new measure of lexical diversity, one that had not
previously been used to assess the conversational vocabulary of adults with
aphasia, it was appropriate to determine the extent of the relationships among
this new measure and two other, wellestablished measures. Findings suggested
that the measures were all significantly correlated when samples were truncated
to 100 and 200 words, but that there were no significant relationships among any
of the measures when whole samples were used. This finding is likely
attributable to the fact that, in the present study, adults with fluent aphasia
produced (a) greater vocabulary diversity, but also (b) significantly longer
samples. Although longer samples should not impact D, longer samples would
be expected to negatively impact TTR while positively impacting NDW. With
adults with fluent and nonfluent aphasia presenting opposite patterns, then, it
appears there was a cancellation effect as a result of the uncontrolled length of
the samples.
It is clear from these data that, in general, the correlations are relatively
stronger, across analyses, for samples of the same length (e.g., greater for D100
and NDW100 than for NDW whole samples and NDW200). These stronger
correlations would appear to be a result of the fact that the same subset of
language sample data is used. Such
TABLE 3 D, NOW, and TTR means (standard deviations) for whole language samples,
100word samples, and 200word samples for fluent and nonfluent aphasia groups
Aphasia groups
D whole sample
D 100word sample
D 200word sample
NDW3 whole sample
NDW 100word sample
NDW 200word sample
TTR4 whole sample
TTR 100word sample
TTR 200word sample
1
NF1 Group (N = 9)
F2 Group (N = 9)
55.11 (19.35)
45.17 (15.82)
50.35 (20.04)
148.33 (43.93)
58.44 (6.46)
95.22 (13.06)
.41 (.05)
.58 (.06)
.48 (.08)
79.39 (12.14)
70.43 (19.12)
79.16 (11.03)
202.89 (29.86)
66.22 (5.09)
110.89 (7.98)
.43 (.04)
.66 (.05)
.56 (.04)
findings would seem to suggest that, as long as the decision is made to limit
samples to a particular length, any of the three analyses might be used to arrive
at similar conclusions about conversational vocabulary for adults with aphasia.
The finding that the D values for each sample length were significantly related
seems to provide evidence of the stability of this measure of vocabulary
measurement. In contrast, TTR and NDW did not demonstrate this same
consistency of results across sample sizes. Rather, the significance of the
correlations with these measures varied (respectively) across sample sizes. As a
whole, this finding seems to highlight the issue of the sample-size sensitivity of
TTR and NDW, compared to the D. As discussed above, although NDW is also
somewhat sample-size sensitive, it is less so than TTR, the value of which
changes with every new word added to the sample.
In some sense, given sample sensitivity concerns, comparing TTR and NDW to
D using whole samples for each analysis is a rather unfair comparison, although
such analyses seemed to lead to a more complete understanding of both the
analyses and the lexical abilities of each group of adults with aphasia. Results of
the present study suggest that if each analysis is used as intended, with
equivalent samples for TTR and NDW and with whole samples for D, the
measures are each significantly related to one another, highlighting the
importance of using truncated sample data with TTR and NDW.
There is an equally important issue, however, related to the ecological validity
of using measures that require discarding language sample data. In collecting
language sample data, of course, there is an attempt to obtain as representative a
sample as possible. When language sample data are discarded because of the
constraints of a particular analysis, it is important to question whether the sample
is then less representative of the persons language abilities. In addition, arbitrary
decisions come into play, related to selecting the subset of words or utterances to
be included in the sample. In the present study, for example, we chose the middle
100 and 200 words for our truncated analyses; however, not all participants will
be at their best in the middle of the samples. Due to fatigue and/or frustration,
some might perform better earlier in the conversation. Still others, due to slow
rise time, might actually perform better later. Finally, in some cases, adults with
aphasia may not be capable of providing a sample of sufficient length to allow
the examiner the option of selecting 100 or 200 words for analysis. An a priori
intent to truncate samples to a specific predetermined length, then, could lead
either to misrepresentation of conversational abilities in cases in which a
substantial amount of language sample data is discarded, or to discarding
language sample data altogether if a client is not able to produce a sample of the
predetermined length.
The second objective of our study was to determine if these measures of
lexical diversity adequately differentiated adults with nonfluent and fluent
aphasia. Again, we were most interested in the use of whole language samples for
these analyses since it is our position that these will be most representative of the
abilities of individuals with aphasia. When whole language samples were used,
only D and NDW differentiated NF and F aphasia types. It is not surprising that
TTR did not reveal between-group differences, given its sensitivity to sample size
variation. The finding that NDW produced between-group differences on whole
36 APHASIOLOGY
samples, however, is initially somewhat surprising. In theory, it, too, is samplesize sensitive. That is, if a person produces a 200word language sample, there is
a greater opportunity to produce more different words than if the person only
produces a 100word sample. Upon reflection, however, this finding might have
been expected; the adults with fluent aphasia tended to produce longer samples,
as well as to use more diverse vocabulary. The greater diversity of their
vocabulary can be seen in all three analyses when sample length was controlled.
With NDW, however, this between-groups difference is magnified by the fact that
the adults with fluent aphasia also produced more language, resulting in their
performance appearing even more different from those with nonfluent aphasia
than it actually was. Thus, our finding of group differences for whole samples
with NDW is in no way indicative that the measure is stable across sample-size
variation.
The question of sample-size sensitivity might also be raised with respect to D.
If D is sample-size sensitive, it also should inflate the values for adults with
fluent aphasia. To assess this possibility informally, D analyses were performed
on split halves of each of five samples selected randomly, such that for each
sample every other utterance was omitted from the analysis. The 10 half-sample
D values (even utterances and odd utterances) were then compared to the 5
whole-sample D values. In theory, if D were sample-size sensitive, D values for
the half samples would each fall below their respective D value for the whole
sample, demonstrating that fewer words yield lower values than more words.
The results of this informal analysis, however, were that six of the halves fell
below their whole D value, and four of the halves fell above their whole D
value. These results are consistent with those of McKee and colleagues (2000),
and suggest that D is not, in fact, sample-size sensitive, at least to the extent that
TTR and NDW are.
Taken as a whole, our results in relation to D are of interest because they
suggest that this analysis is appropriate for quantifying conversational
vocabulary performance of adults with aphasia. Moreover, our results add to the
growing literature regarding the utility of D as a measure of lexical diversity in
clinical populations (e.g., Malvern & Richards, 1997; Owens & Leonard, 2002).
With respect to NDW, our findings that truncated samples can be used to
distinguish groups of differing language skills corroborate those of Watkins and
colleagues (1995), although they found that, even using truncated samples, TTR
did not distinguish children with language impairment from normal-language
peers. The most likely reason for this difference is that Watkins and colleagues
truncated samples by utterances rather than words.
One limitation of the present study is that the language samples obtained for
analyses are relatively small. Hess et al. (1986), for example, have made the case
that a minimum of 350 words are needed for reliable computation of TTR, at least
for analyses of preschool children with normal language. Despite this limitation,
however, findings and implications of the present study are important for at least
two reasons. First, samples of the present study allow for comparison to other
work with similar samples (e.g., Watkins et al., 1995) and, arguably, are valuable
in that insight can be gained from evaluating the extent to which smaller samples
can be used with these analyses for this clinical population. Second, from our
perspective it is important to propose measurement procedures that are both
realistic and capable of implementation in a clinical setting. Although longer
samples are desirable from a research perspective, conclusions based on longer
samples are not as readily applied to clinical endeavours, given the nature of
fluent and nonfluent aphasias and the inherent time constraints of clinical work.
Conclusion and clinical implications
It appears that D is a rather promising tool for the analysis of lexical diversity in
the conversation of adults with aphasia. Its greatest strength is in its ability to
accommodate whole language samples while controlling for sample size in its
output. In contrast, TTR and NDW both require that language sample data be
discarded so that only samples of equivalent length in words are legitimately
compared. We have raised concerns about the ecological validity of procedures
that require the discarding of language sample data. It appears that D analysis
provides group separation between fluent and nonfluent aphasia samples,
suggesting perhaps its future use as an additional tool in the differential
diagnosis of aphasia. Future studies might further investigate the validity and
reliability of D as a measure of conversational vocabulary in adults with aphasia.
In addition, conversational vocabulary diversity among other populations with
acquired neurogenic disorders warrants exploration. Such work might increase
our confidence in the clinical utility of this new measure, as well as enhance our
understanding of the conversational lexical abilities of adults with a range of
acquired neurogenic disorders.
REFERENCES
Chotlos, J. W. (1944). Studies in language behavior. IV. A statistical and comparative
analysis of individual written language samples. Psychological Monographs, 56(2),
77111.
Dollaghan, C. A., Campbell, T. F., Paradise, J. L., Feldman, H. M., Janosky, J. E.,
Pitcairn, D. N., et al. (1999). Maternal education and measures of early speech and
language. Journal of Speech, Language, and Hearing Research, 42, 14321443.
Fillenbaum, S., Jones, L. V., Wepman, J. M. (1961). Some linguistic features of speech
from aphasic patients. Language and Speech, 4, 91108.
Goffman, L., & Leonard, J. (2000). Growth of language skills in preschool children with
specific language impairment: Implications for assessment and intervention.
American Journal of Speech-Language Pathology, 9, 151161.
Guiraud, P. (1959). Problemes et methodes de la statistique linguistique. Dordrecht:
D.Reidel.
38 APHASIOLOGY
40 APHASIOLOGY
http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000157
DOI:10.1080/
of a client, the method of assessment, and the processes and skills requiring
assessment, need to be defined.
There have been three main approaches to assessing the gesture abilities of
speakers with aphasia. One approach has focused on assessing pantomime skills
through tests of limb apraxia (Duffy & Duffy, 1981; Goodglass & Kaplan, 1963;
Wang & Goodglass, 1992), the second approach is best described as a trial-anderror method where gesture treatments are trialled and their success monitored
(Wertz et al., 1984), and the third approach has measured the use of
conversational gesture in natural settings (Behrmann & Penn, 1984; Hermann et
al., 1988; Le May, David, & Thomas, 1988). Confusion persists with respect to
the utility of these three approaches and exactly what information
speechlanguage pathologists require in considering candidacy for gesture-based
treatments.
There has been a long-held notion that testing for the presence of limb apraxia
in patients with aphasia provides information about treatment candidacy for
gesture-based interventions (Helm-Estabrooks et al., 1982), although evidence to
support this assumption is limited. Limb apraxia, which is commonly defined as
the inability to perform skilled, purposeful limb movements in the absence of
elementary sensorimotor disorders, intellectual deterioration, or comprehension
difficulties (Chainay & Humphreys, 2002), frequently co-occurs with aphasia
(De Renzi, Raglioni, Lodesani, & Vecchi, 1983; Goodglass & Kaplan, 1963;
Kertesz, Ferro, & Shewan, 1984). Limb apraxia is assessed by asking the
individual to produce limb movements to verbal command and/or to imitation.
Clients are asked to make socially regulated, intransitive gestures such as,
saluting, waving goodbye, and making an OK sign. Clients are also asked to
make transitive gesturres displaying how objects are used, such as demonstrating
how to cut bread, either with or without the actual object present. In some
apraxia batteries, meaningless movements are also tested to imitation (Kimura &
Archibald, 1974). Three types of limb apraxia are currently recognised:
ideomotor apraxia (a disorder of temporal, sequential, and spatial organisation of
action), ideational apraxia (an incapacity to mentally evoke the action associated
with a sequence of objects), and conceptual apraxia (an incapacity to mentally
evoke the action associated with a single object) (Ochipa, Rothi, & Heilman,
1992).
Several seminal texts and papers have suggested that the presence of limb
apraxia in speakers with aphasia prevents them from learning a gesture system
for communication and/or from responding to gestural facilitation treatments
(Helm-Estabrooks et al., 1982; Rothi & Heilman 1997). However, the empirical
evidence to support this assumption is extremely limited. Helm, Kaplan and
Vercruysse (as cited in Helm-Estabrooks et al., 1982) found that patients with
severe aphasia and limb apraxia did not produce verbal labels or representational
gestures to picture stimuli and they suggested that limb apraxia may prevent
patients from using representational gestures as a natural means of
communication. However, natural gesture use was not assessed in the Helm et al.
42 APHASIOLOGY
44 APHASIOLOGY
Figure 1. A model of praxis processing and its relation to semantics, naming, and word
and object recognition. From Apraxia: The neuropsychology of action, L.Rothi and
K.Heilman (Eds.), (1997), p. 45. Hove, UK: Psychology Press. Copyright by Psychology
Press, Reprinted with permission.
Figure 2. Krauss et al.s (2000) model of cognitive architecture for the speech-gesture
production process. From Language and gesture, D.McNeill, (ed.). (2000), p. 261.
Cambridge: Cambridge University Press. Copyright by Cambridge University Press.
Reprinted with permission.
46 APHASIOLOGY
WAB AQ = Western Aphasia Battery Aphasia Quotient (Kertesz, 1982); BDAE Severity
Rating = Boston Diagnostic Aphasia Examination ranging from 0no useable speech to
5minimal discernible handicap.
48 APHASIOLOGY
RESULTS
Two qualified speech-language pathologists acted as independent raters.
Following a 1hour training session provided by the first investigator, the two
raters classified 50% of the corpus of each participants gestures using Hermann
et al.s four categories. Point-topoint percentage agreement was calculated as
90%. Where discrepancies emerged, the raters re-coded the item in contention by
consensus discussion. The remaining 50% of the corpus of gestures was rated by
the first investigator. In order to calculate intra-rater agreement, all the gestures
produced by participant JS were rated on two separate occasions by the first
investigator. The point-to-point percentage intra-rater agreement was 86%.
Where discrepancies in rating occurred, the first investigator and one of the
independent raters re-coded the items in contention by consensus discussion.
The results of testing for each participant are presented in Table 2. All seven
participants demonstrated ideomotor limb apraxia ranging from severe for GC
through to mild for SA. Pantomime deficits (and conceptual apraxia), as
measured on a formal test of pantomime (TOLA subtest gestured pictures),
were also present in all seven participants. Two participants, SA and BO
demonstrated performances within normal limits on imitation of non-meaningful
movements, while the remaining five participants had impaired performance.
Table 3 displays the absolute duration of all verbal and gesture elements used
by each participant. Four participants (SA, GC, RG, JS) spent longer time in
gestural than verbal expression, reflecting the severity of their verbal expression
deficits. Table 4 details the percentage and actual numbers of each gesture type
used by the participants. There was considerable variability in the types of
gesture used by the participants. Of particular note was the high mean percentage
use of descriptive, codified, and pantomimic gestures known to carry a high
meaning load (M = 73.4%, range 4697%). This compares with a group of
normal speakers who produced a lower percentage of meaning-laden gestures
(iconics) (M = 56%) as compared to 44% non-meaning-laden gestures (beats,
metaphorics, deictics) in conversational settings (McNeill, 1992). Participants
GC and JS demonstrated similar distribution patterns for gesture type, with the
greatest use of pantomime and codified gestures. Similarly, 42.4% of SAs
gestures were codified or pantomime, again reflecting the severe verbal output
deficits and the attempts to enhance meaningful output through the gestural
modality.
Spearman rank-order correlation coefficients were computed to examine the
relationships between meaning-laden lexical gesture (codes and pantomimes)
and scores on the tests of ideomotor and conceptual limb apraxia (Table 5). No
significant relationships were found.
TABLE 2 Test results on standard measures
Limb
Apraxia
Pantomi
me
Kimura
Test
SA
BO
KC
WS
GC
RG
JS
84
75
37
63
16
25
63
50
84
63
63
37
50
75
22
22
15*
16*
12*
11*
14*
Limb Apraxia, and Pantomime, from Test of Limb Apraxia (Helm-Estabrooks, 1992),
expressed as percentile ranks of performance from participants with brain
damage. Kimura Test: Movement Copying Test (Kimura & Archibald, 1982,
as cited in Corina et al., 1992) scored out of 24 points (*<19.24 = impaired
performance).
TABLE 3 Total duration of verbalisation and gesture production during 6minute
conversation sample (seconds)
Verbal elements
Gesture elements
SA
BO
KC
WS
GC
RG
JS
72
77
197
88
288
66
256
138
0
89
37
78
28
79
TABLE 4 Percent distribution of the types of gesture behaviour for each participant
Speech-focused movements Descriptive gesture Codified gesture Pantomime
SA 27.3 (9)
30.3 (10)
30.3 (10)
12.1 (4)
BO 46.4 (13)
25 (7)
18 (5)
10.6 (3)
KC 54 (19)
23 (8)
9 (3)
14 (5)
WS 32.5 (13)
35 (14)
15 (6)
17.5 (7)
GC 3 (1)
15 (5)
12 (4)
70 (24)
RG 34 (12)
37 (13)
26 (9)
3 (31)
14 (5)
25 (9)
19 (7)
42 (15)
JS
Figures in brackets refer to actual number of gestures produced in the sample.
50 APHASIOLOGY
% Codified gestures
and pantomimes
Non-meaningful
movements
.00
.072
.291
DISCUSSION
As hypothesised, despite the presence of significant ideomotor and conceptual
limb apraxia, all seven participants were able to supplement their verbal
communication with lexical gesture or completely substitute gesture for
verbalisation. Most noticeable was GC, who demonstrated the most severe limb
apraxia and yet used the greatest number of meaningful gestures per minute of
communication time. Further, the presence of pantomimic deficits, as detected on
a formal test of pantomime ability, did not prevent the use of codified gesture
and pantomime in conversation. SA and GC achieved scores consistent with
significant impairment on the pantomime section of the TOLA and yet clearly
demonstrated considerable pantomime production in conversation. GC scored
very poorly on the TOLA pantomime test but produced a total of 24 pantomimes
during a 6minute conversation sample. As a group, there were no significant
correlations found between the participants scores on limb apraxia assessments
and their use of codes and pantomimes in conversation. With respect to the
severity of the aphasia and lexical gesture production, the data from these seven
participants suggest that it is possible, even for patients with severe global
aphasia, to use meaningful gestures (codes and pantomimes) in a natural
communication setting. It is argued that the models of lexical gesture (Krauss et
al., 2000) and limb praxis (Rothi & Heilman, 1997) are portraying different
processing and behaviours.
One source of the discrepancy between limb praxis and formal pantomime test
scores and the occurrence of meaning-laden gestures (codes and pantomimes) in
conversation in these seven participants may be the differences in the processing
demands of the tasks. The processing involved in creating a gestural response in
commonly used limb apraxia protocols is both highly conscious and removed
from the usual context. In generating a gesture following a verbal command for
example, show me how to use a saw, one must comprehend the auditory label
saw and then consciously think about how to represent the features and
movements associated with the object saw in spatial and dynamic parameters
that can be understood by the examiner. Similarly, in making a gesture following
a pictorial stimulus, one has to identify the salient features of the object that can
be represented in the movement, recall the movement patterns usually associated
with the object, and then demonstrate these in a way that the examiner can
interpret. Such processing involves a good deal of cognitive abstraction. In
52 APHASIOLOGY
54 APHASIOLOGY
McNeill, D., Levy, E., & Pedelty, L. (1990). Speech and gesture. In G.Hammond (Ed.),
Cerebral control of speech and limb movements (pp. 203256). North Holland:
Elsevier.
Neiman, M., Duffy, R., Belanger, S., & Coehlo, C. (2000). The assessment of limb
apraxia: Relationship between performances on single- and multiple-object tasks by
left hemisphere damaged aphasic subjects. Neuropsychological Rehabilitation, 10
(4), 429448.
Ochipa, C., Rothi, L., & Heilamn, K. (1992). Conceptual apraxia in Alzheimers disease.
Brain, 115, 10611071.
Patterson, K., & Shewell, C. (1987). Speak and spell: Dissociations and word-class
effects. In M.Coltheart, R. Job, & G.Sartori (eds.), The cognitive neuropsychology of
language. Hove, UK: Lawrence Erlbaum Associates Ltd.
Pett, M. (1997). Nonparametric statistics for health care research. California: Sage
Publications.
Rao, P. (1995). Drawing and gesture as communication options in a person with severe
aphasia. Topics in Stroke Rehabilitation, 2(1), 4956.
Rao, P. (2001). Use of Amer-Ind code by persons with aphasia. In R.Chapey (Ed.),
Language intervention strategies in aphasia and related neurogenic communication
disorders (4th Edn., pp. 688702). Maryland: Lippincott Williams & Wilkins.
Rose, M., & Douglas, J. (2001). The differential facilitatory effects of gesture and
visualisation processes on object naming in aphasia. Aphasiology, 15(10/11),
977990.
Rose, M., Douglas, J. & Matyas, T. (2002). The comparative effectiveness of gesture and
verbal treatments for a specific phonologic naming impairment. Aphasiology, 16(10/
11), 10011030.
Rothi, L., & Heilman, K. (Eds.). (1997). Apraxia: the neuropsychology of action. Hove,
UK: Psychology Press.
Skelly, M. L., Schinsky, L., Smith, R., & Fust, R. (1974). American Indian sign (AmerInd) as a facilitator of verbalisation for the oral verbal apraxic. Journal of Speech and
Hearing Disorders, 34, 445455.
Wang, L., & Goodglass, H. (1992). Pantomime, praxis, and aphasia. Brain and Language,
42, 40218.
Wertz, T., LaPointe, L., & Rosenbek, J. (1984). Apraxia of speech in adults: The disorder
and its management. Orlando: Grune & Stratton Inc.
APPENDIX
List of conversation questions
What have you been doing today?
Can you tell me about your stroke?
What sort of work have you done in your life?
Can you tell me about your family?
56 APHASIOLOGY
http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000094
DOI:10.1080/
TEACHING SELF-CUES 57
58 APHASIOLOGY
present with intact written naming, but did show evidence of relatively stronger
access to partial lexical information in the written than verbal modality. This
provided the motivation of using written language as a means of targeting verbal
language production.
However, LN also presented with apraxia of speech, which we postulated would
have a negative impact on his ability to effectively use graphemephoneme
correspondences. In an effort to provide him with nonverbal cues that he could
use to tap into phonology, we decided to incorporate tactile cues into our
treatment. This method was chosen because tactile cues had been effective in
previous treatment sessions with LN. The tactile cueing system, which will be
described in greater detail below, was a modified version of a method described
by Bashir, Grahjones, and Bostwick (1984) and associates specific hand shapes
and positions with specific phonemes.
Another important factor in the effectiveness of naming treatments is the
frequency of treatment sessions. Hillis (1998) reported that intensity of treatment
was more important than the therapeutic approach in determining outcomes of
therapy. Yampolsky and Waters (2002) incorporated a family member as part of
the therapeutic team by training the patients mother to conduct daily therapy
sessions. This approach is novel in that it increases the frequency of therapy in a
realistic manner. It is commonly accepted that more therapy is better, but this is
seldom possible in todays healthcare climate. Hence, intensive home practice
may be an important part of any therapy programme. In the present study, we
created a video version of the treatment programme to be used for home
practice.
In sum, the goal of the present study was to train a patient with aphasia and
apraxia of speech to cue himself to verbally name items by using a combination
of written naming and tactile cues. We hypothesised that this treatment approach
would result in generalisation to words beginning with targeted phonemes
because the patient would be able to use the tactile cues independently.
Following Best and colleagues (2000), we predicted that treatment effects would
be maintained 6 weeks following termination of treatment because the cueing
hierarchy, though primarily phonological, required a high level of participation
from the patient.
METHOD
Participant
LN was a 49year-old man who suffered a stroke in the territory of the left
middle cerebral artery with subsequent haemorrhage into the left basal ganglia
approximately 4 years prior to the inception of the present study. He lived at
home with his wife and three children. Premorbidly, LN had no history of
TEACHING SELF-CUES 59
60 APHASIOLOGY
PAL subtest
30/32 (94%)
a. 8/9 (88%)*
b. 7/9 (78%)*
16/20 (80%)*
15/20 (75%)*
27/40 (68%)*
a. 18/20(90%)
b. 9/20 (45%)
19/32 (59%)*
16/20 (80%)*
The Adult Apraxia Battery (Dabul, 1979) was administered 1 year following cessation
of the study. LN classified in the moderate range of apraxia on the basis of that test.
Performance at least partly reflects his strong repetition skills.
TEACHING SELF-CUES 61
Language processing
component
PAL subtest
5/32 (16%)*
31/40(78%)*
27/32 (84%)*
39/48 (81%)*
12/20 (60%)*
22/40 (55%)
0/10**
1st Grapheme: 5/10 (50%)
0/10**
1st Grapheme: 6/10 (60%)
* > 2 standard deviations below the mean for age-matched non-brain-damaged controls.
Note that for the sentence comprehension subtest, norms are available for the
full test and not specific sentence types.
** Unable to compare to age-matched norms because the test was not administered in
full. Performance was severely impaired relative to normative sample, who
performed at > 95% on both writing tasks.
TABLE 2 Performance on language pretesting: BDAE subtests
BDAE subtest
13
36/114
1/6
0/6
1/2
4/6
3/6
1/6
2/6
62 APHASIOLOGY
BDAE subtest
2. Responsive Naming
3. Verbal Agility
4. Word Repetition
5. Repeating Phrases
6. Written Confrontation Naming
7. Writing to Dictation
13
18
18
18
13
16
18/30
7/14
9/10
5/16
4/10
3/10
TEACHING SELF-CUES 63
64 APHASIOLOGY
TEACHING SELF-CUES 65
66 APHASIOLOGY
Treatment phase
There were two parts to each treatment session. In the treatment portion of the
session, LN was presented with pictures of target items and asked to name the
pictured item. During treatment, LN was prompted to use self-cueing strategies
through application of the modified cueing hierarchy. Criteria for termination of
treatment was established as 80% accuracy in naming the target items.
In the end of session probes, LN was presented with pictures of target and
control items and asked to name the pictured items. LN was not prompted to use
any strategies, although he was permitted to do so. All responses, including
errors and the presence of gestures were recorded verbatim. Final responses were
coded as correct or incorrect. No feedback regarding accuracy was provided
during probes. In the interest of time, items were split such that half of the items
were probed each week.
Follow-up and post-testing
Maintenance. Naming of all treated and control items was probed 6 weeks
following termination of treatment.
Generalisation. Generalisation of treatment effects was assessed in three
ways. First, naming of the untrained, control items was assessed repeatedly
across the course of the study. Second, a confrontation naming task using novel
stimuli was employed at the cessation of treatment. Specifically, two groups of
novel stimuli beginning with the same phonemes as the original target and
control items were presented to LN to probe response generalisation effects of
treatment. These items were matched for frequency of occurrence, t(38) = .34,
NS, and are presented in Appendix D. Third, the verbal and written naming
subtests of the PAL were re-administered to determine whether naming ability
had changed as measured by standardised tests.
Reliability
To test reliability, 15% of all baseline and probe sessions were re-scored by an
independent examiner. Verbal naming responses were scored as correct or
incorrect. Average point-to-point agreement between observers was 98%, with a
range of 96100%.
Reliability of the independent variable, that is, of the determination of the
level of cueing required was also calculated by an independent observer for 15%
of the treatment sessions. Average point-to-point agreement was 92% with a
range of 8896%.
TEACHING SELF-CUES 67
RESULTS
Treatment
Figure 1 illustrates LNs performance on control and target items over time.
Table 3 summarises performance at baseline, immediately post-treatment, and 6
weeks posttreatment. At baseline, LN achieved an average of 4.2% correct for
items beginning with control phonemes and 6.8% for items beginning with target
phonemes. None of the items was named accurately more than once during the
baseline phase. Post-treatment results were obtained by averaging accuracy over
the final three treatment sessions. LN achieved 12.1% accuracy on control items
and 55.5% accuracy on target items. By the final three sessions, LN was also
able to write all of the target items with 100% accuracy. No data for written
naming were collected for control items.
Data from the treatment portion of each session were analysed to determine
how LNs responses changed over time. Figure 2 shows how the level of cue
required for LN to verbally name target items changed over time. Data were not
available from sessions 2 and 5. In the figure, spontaneous refers to situations in
which LN independently named items without use of any cueing strategies.
Instances in which LN named the item following application of the written
68 APHASIOLOGY
Figure 1. Percent correct of control and trained items named accurately at baseline (Bl,
B2, B3), during end of session probes (113), and in post-treatment assessment of
maintenance (M) and generalisation (G).
naming portion of the hierarchy (step 1 in the hierarchy) were coded as use of
graphemic cues. If LN verbally named the item after
TABLE 3 Percent correct on target and control items (end of session probes)
Word group
Baseline
Post-treatment
6 weeks post-treatment
Target
Control
6.8%
4.2%
55.5%
12.1%
25%
8.3%
being prompted to generate a tactile cue (step 2 in the hierarchy), the response
was coded as use of tactile cues. Verbal cues refer to phonemic cues ranging
from phonemes to word repetition (step 3 in the hierarchy). LN required fewer
verbal prompts over time, and provided more self-cues, either through writing
alone or by writing and using a tactile cue to generate a phonemic cue. Note that
he named items (when prompted to use strategies) with an average of 72%
accuracy over the last three sessions.
Maintenance
Post-testing following the 6week break showed some loss of treatment gains.
LN named 8.3% (2/24) of control and 25% (6/24) of target items. He wrote the
names of 21% of control words and 58% of target words. Items practised using
TEACHING SELF-CUES 69
Figure 2. Percent correct of trained items accurately named spontaneously and when
prompted to use graphemic, tactile, and verbal cues.
the home video were not more likely to be named verbally (1 of 6 targets named
correctly) or in writing (6 of 14 written correctly) than nonpractised items. No
spontaneous use of tactile cues was observed.
Generalisation
There was some evidence of generalisation to naming ability in general, as LNs
performance improved on the verbal and written naming subtests of the PAL
(Table 4). He
TABLE 4 Tasks administered at initial assessment and after 13 weeks of treatment
(percent correct)
Accuracy
Test
Pretreatment
Post-treatment
16% (N = 32)
0% (N = 10)
34% (N = 32)
16% (N = 32)
achieved 11/32 correct on the verbal naming test at time 2 compared to 5/32 at
time 1. On the written naming test, he achieved 5/32 correct compared to 0/10 at
pretesting, and successfully wrote the first letter on 3 additional items.
70 APHASIOLOGY
TEACHING SELF-CUES 71
LNs reliance on writing over tactile cues suggests a candidate explanation for
the lack of generalisation to novel stimuli beginning with targeted phonemes.
Hillis (1989) proposed that for patients with phonological impairments, the
effect of verbal naming treatments was to increase access to specific
phonological representations, thus limiting generalisation to untreated items. It
could be argued that Nickels (1992) overcame this limitation by providing a
strategy, that is, teaching grapheme-phoneme conversion to facilitate oral
reading, for verbally producing items that were not explicitly trained. In the
present study, we attempted to teach LN to use tactile cues as a strategy to
facilitate oral reading. His failure to use the tactile cues may have played a role
in the lack of generalisation to items beginning with targeted phonemes.
We had hoped that LNs partial lexical access in the written modality, in
combination with the tactile cues, would be sufficient to allow generalisation
beyond the trained items. We did not predict generalisation to written naming of
untrained items beginning with target phonemes, because their graphemic
representations would not necessarily be more accessible following treatment.
This may have limited the extent to which the tactile cues could be generalised to
novel items beginning with the targeted phonemes. However, verbal naming of
items whose graphemic representations were at least partially accessible to LN
may have shown improvement as a result of learning to use self-cueing
strategies. This is consistent with the evidence of generalisation to verbal naming
in general discussed above.
Along the same lines, the lack of a semantic component to the cueing
hierarchy may have limited the treatment programmes ability to produce more
robust generalisation (Nickels & Best, 1996). Although LNs primary impairment
appeared to be in the phonological output lexicon, he did show some mild
deficits in the semantic system. A treatment programme that also targeted
semantics may have resulted in generalisation to a novel set of stimuli
semantically related to the target items by improving access to phonological
representations through the semantic system.
It is unclear why LN showed little spontaneous use of the tactile cues, even for
trained items. Given their effectiveness in treatment sessions before and during
the present study, it is unlikely that LN found them not to be useful. LNs
resistance to alternative modes of communication may have been a factor.
Although writing is not typically used to facilitate verbal language, it is a
modality that most people use to communicate at least occasionally. In contrast,
tactile cues are not used by non-brain-damaged individuals. For this reason,
writing may have seemed more acceptable to LN than tactile cues. This raises the
possibility that patients who are more receptive to using tactile cues may show
greater treatment effects and generalisation.
Another aspect of the present study was the use of a home practice video to
increase the intensity of exposure to the treatment materials. Previous work has
suggested that trained family members can provide effective treatment, thereby
increasing intensity of treatment (Yampolsky & Waters, 2002). However, some
72 APHASIOLOGY
patients, such as LN, do not have family members who are able to take an active
role in treatment. The home video programme used in the present study provided
an opportunity for LN to have more intense exposure to treatment materials and
the selfcueing strategies. However, home practice was not effective in
maintaining treatment effects in the absence of structured treatment sessions.
Further research is necessary to investigate the frequency of structured sessions
necessary to maintain treatment gains. A possible clinical application would be
periodic treatment sessions with patients discharged from outpatient services to
review compensatory strategies and provide materials for home practice.
There are a number of limitations to the present study that limit the scope of
interpretation. First, the complete set of tactile cues were not explicitly taught to
LN prior to inception of the treatment programme. Specifically, he was not taught
the tactile cues associated with the control items. This may have limited his
ability to generalise the selfcueing strategies to the control items. As a result, our
ability to measure generalisation to untrained phoneme classes may have been
limited. Another method to assess generalisation would have been to include
nontrained items beginning with target phonemes in the verbal naming probes.
This would have allowed us to assess generalisation to nontrained stimuli during
the treatment period. Future research should address the issue of generalisation in
these ways.
A related issue is that we did not extend treatment to the control items in an
ABA design, due to limitations on the time course of the present study.
Replication of treatment effects with the control stimuli would have provided a
stronger demonstration of experimental control, strengthening the results of the
study.
Another limitation is that probes were conducted at the end of treatment
sessions, following exposure to the target items in the context of the modified
cueing hierarchy. Other researchers (e.g., Raymer et al., 1993, Wambaugh et al.,
2001) have conducted probes at the beginning of treatment sessions to avoid
inflating accuracy due to the recent exposure or deflating accuracy due to client
fatigue. Another issue related to exposure pertains to the greater number of times
that LN was exposed to target compared to control stimuli. Given these
confounds, it is difficult to isolate the source of observed treatment effects.
Future research should address these issues by probing performance on trained
and control stimuli prior to the treatment session and controlling for the number
of exposures to trained and control stimuli.
In sum, the present study suggests that a programme focusing on selfgeneration of phonemic cues can be an effective treatment approach for anomia.
Functionally, this approach may be most effective in building a repertoire of
trained items, with some generalisation to verbal naming in general.
TEACHING SELF-CUES 73
REFERENCES
Bashir, A. S., Grahjones, F., & Bostwick, R. Y. (1984). A touch cue method of therapy
for developmental verbal apraxia. Seminars in Speech and Language, 5(2), 127137.
Best, W., Hickin, J., Herbert, R., Howards, D., & Osborne, F. (2000). Phonological
facilitation of aphasic naming and predicting the outcome of treatment for anomia.
Brain and Language, 74(3), 435438.
Bruce, C, & Howard, D. (1988). Why dont Brocas aphasics cue themselves? An
investigation of phonemic cueing and tip of the tongue information.
Neuropsychologia, 26(2), 253264.
Caplan D. (1992). Language: Structure, processing, and disorders (pp. 403441).
Cambridge, MA: MIT Press.
Dabul, B. L. (1979). Apraxia Battery for Adults. Austin, TX: Pro-Ed.
Davis, A., & Pring, T. (1991). Therapy for word-finding deficits: More on the effects of
semantic and phonological approaches to treatments with dysphasic patients.
Neuropsychological Rehabilitation, 1(2), 135145.
Drew, R. L. & Thompson, C. K. (1999). Model-based semantic treatment for naming
deficits in aphasia. Journal of Speech Language and Hearing Research, 42,
972989.
Francis, W. N., & Kucera, H. (1982). Frequency Analysis of English Usage. Boston, MA:
Houghton Mifflin.
Goodglass, H., & Kaplan, E. (1983). The assessment of aphasia and related disorders.
Philadelphia, PA: Lea & Febiger.
Hillis, A. (1989). Efficacy and generalization of treatment for aphasic naming errors.
Archives of Physical Medicine and Rehabilitation, 70, 632636.
Hillis, A. (1998). Treatment of naming disorders: New issues regarding old therapies.
Journal of the International Neuropsychological Society, 4, 648660.
Johns, D. F., & Darley, F. L. (1970). Phonemic variability in apraxia of speech. Journal
of Speech and Hearing Research, 13, 556.
Miceli, G., Amitrano, A., Capasso, R., & Caramazza, A. (1996). The treatment of anomia
resulting from output lexical damage: Analysis of two cases. Brain and Language,
52, 150174.
Nicholas, M., & Elliott, S. (1999). C-Speak aphasia: A communication system for adults
with aphasia. Solana Beach, CA: Mayer-Johnson Co.
Nickels, L. (1992). The autocue? Self-generated phonemic cues in the treatment of a
disorder of reading and naming. Cognitive Neuropsychology, 9(2), 155182.
Nickels, L., & Best, W. (1996). Therapy for naming disorders (part I): Principles, puzzles,
and progress. Aphasiology, 10(1), 2147.
Raymer, A. M., Thompson, C. K., Jacobs, B., & Le Grand, H. R. (1993). Phonological
treatment of naming deficits in aphasia: Model based generalization analysis.
Aphasiology, 7(1), 2753.
Wambaugh, J. L., Linebaugh, C. W., Doyle, P. J., Martinez, A. L., Kalinyak-Fliszar, M.,
& Spencer, K. A. (2001). Effects of two cueing treatments on lexical retrieval in
aphasic speakers with different levels of deficit. Aphasiology, 15(10/11), 933950.
Yampolsky, S., & Waters, G. (2002). Treatment of single word oral reading in an
individual with deep dyslexia. Aphasiology, 16(2), 455471.
74 APHASIOLOGY
APPENDIX A
TRAINING & CONTROL STIMULI IN TREATMENT
PHASE
Target words
Control words
Camera
Coat
Coffee
College
Couch
Cup
Date
Desk
Dinner
Doctor
Dollar
Door
Family
Father
Finger
Fish
Food
Foot
Table
Teacher
Time
Tire
Tissue
Toast
Bank
Bill
Body
Book
Business
Butter
Garage
Garbage
Garden
Gas
Girl
Gum
Park
Pool
Pill
Paper
Police
Popcorn
Salt
Sister
Sock
Son
Subway
Summer
APPENDIX B
MODIFIED CUEING HIERARCHY
(1) Present picture & produce written form
a. General prompt, e.g., Can you write it?.
b. Choose first letter from field of three,
c. Fill in the blanks provided,
d. Clinician writes word.
(2) Generate tactile cue
TEACHING SELF-CUES 75
a. General prompt, e.g., What is the cue for that and what sound does it
make?,
b. Picture of cue presented,
c. Clinician demonstrates cue.
(3) Verbal naming
a. General prompt, e.g., What is it called?,
b. Phonemic cue.
c. Word provided.
APPENDIX C
DESCRIPTION OF TACTILE CUES
(1) /d/ Index finger bent and placed on upper lip. Thumb placed on the neck to
indicate voicing (and to distinguish from /t/).
(2) /t/ Index finger bent and placed on upper lip.
(3) /k/ Index finger placed at top of throat.
(4) /f/ Index finger bent and placed below lower lip.
APPENDIX D
STIMULI FOR GENERALIZATION PROBE
Target phonemes
Control phonemes
Cake
Candle
Card
Corn
Cow
Deer
Dentist
Diamond
Dog
Duck
Fan
Farm
Feather
Fork
Tail
Tape
Tent
Bat
Belt
Bird
Boat
Bottle
Gift
Goat
Golf
Guitar
Gun
Paint
Pie
Pillow
Police
Pot
Sink
Soap
76 APHASIOLOGY
Target phonemes
Control phonemes
Tie
Tulip
Soldier
Suitcase
78 APHASIOLOGY
http://www.tandf.co.uk/journals/pp/02687038.html DOI:10.1080/
02687030344000148
80 APHASIOLOGY
82 APHASIOLOGY
based guidelines (Boles, 1998; Crockford & Lesser, 1994), by which 10 minutes
of connected speech (25 words and 43 words for composite description and
conversation, respectively) were utilised for analyses. The resultant sample size,
although small, was considered representative of this subjects typical daily
output
Scoring procedures. A detailed list of scoring procedures is provided in the
Appendix. To determine %WR for the composite and conversational contexts,
the total number of words in each grammatical class (nouns and verbs), and the
number of wordfinding errors per class were tallied. Word-finding errors were
defined broadly using criteria adapted from Crockford and Lesser (1994),
Dollaghan and Campbell (1992), and Pashek and Tompkins (2002). Specifically,
words were counted in error under the following conditions: (a) preceded
immediately by a 2+ second pause, (b) preceded or accompanied by comments
indicating difficulty, (c) self-corrections, and (d) obvious semantic, phonemic, or
unrelated paraphasias. Following the initial, objective identification of such
errors, transcripts were re-read comprehensively for identification of more subtle
instances of word retrieval difficulty such as deletions (Kemmerer & Tranel,
2000b) or indefinite terms (Hickin et al., 2001; Nicholas et al., 1985). Such
errors were identified via a careful analysis of contextual factors (e.g., if a
subject produced an indefinite term such as thing, was a clear referent
available within the transcript?), with subjects being given the benefit of the
doubt in ambiguous situations. The number of successful word retrieval attempts
was divided by the total number of words in each class (i.e., correct attempts +
errors) and multiplied by 100 to yield the %WR score. The TAWF (i.e.,
confrontation naming context) was scored using percent correct measures (i.e.,
correct attempts total number of stimuli x 100) to promote comparable
measures across elicitation contexts.
Supplementary analyses. To provide supplementary information about word
retrieval during composite description and in conversation, two additional
analyses were undertaken: (1) proportion of substantive versus light verbs (%
SV) and (2) proportion of corrected versus uncorrected errors (%CERR). The
former (%SV) has been shown previously to address semantic complexity in
verb retrieval (Breedin et al., 1998). Briefly, verbs such as do, make, have, or go
may be conceived of as semantically simple, or primitive verbs, and in the
linguistic literature are referred to as light verbs. Other verbs, however, are
classified as heavy or substantive because they contain additional and more
specific semantic components (e.g., compare go with run), and are therefore
more complex (Breedin et al., 1998). Both composite naming and conversational
transcripts were analysed for the number of substantive verbs, which was divided
by the total number of verbs produced (i.e., substantive + light verbs) and
multiplied by 100 to yield %SV. The second supplementary measure, %CERR,
has been described as a means of gauging efficiency in word retrieval (Larfeuil &
Le Dorze, 1997). Any resolution of an episode of word-finding difficulty (e.g.,
after a 2 + second delay, revision) was noted for each subject with respect to
84 APHASIOLOGY
lexical class and naming context. Corrected errors were divided by the total
number of errors (corrected + unresolved) and multiplied by 100 to yield %
CERR.
RESULTS
Reliability and clinical feasibility
Approximately 20% of the data were randomly selected for reliability analyses.
Similar to the procedures of Oelschlaeger and Thorne (1999), raters (one
certified speechlanguage pathologist and two graduate students in speech and
hearing sciences) were provided with written instructions regarding the
application of %WR, %SV, and %CERR analyses to language samples. No
formal discussions of rules or rule interpretations were undertaken, allowing
raters to apply independently the scoring rules as written. Point-topoint interrater agreement was calculated for %WR, %CERR, and %SV in composite
naming and conversational speech samples, and ranged from 80.5% to 100% for
%WR, 88.2% to 90% for %SV, and 88.9% to 100% for %CERR. Intra-judge
reliability was calculated for each measure on another randomly selected subset
of the data 1 week following initial data scoring, and ranged from 88.9% to 97.
1% for %WR, 87.2% to 93% for %SV, and 82.0% to 100% for %CERR.
Although no formal time limit was given for inter-judge analyses, raters
reported requiring approximately 45 minutes to score each 300word transcript.
This time requirement included that committed to learning and applying a set of
pre-constructed printed scoring rules. The time required for intra-judge rescoring (i.e., given the first authors familiarity with scoring standards), on the
other hand, was approximately 15 minutes per transcript.
Relationships among speaking contexts and grammatical class
Results of a three-way repeated measures ANOVA yielded significant main
effects of aphasia severity, F(l, 12) = 73.86, p < .001, and speaking context, F(2,
24) = 20.61, p < .001, on word retrieval scores, with no significant interactions
between factors (see Figure 1). As expected, patients with mild aphasia outscored
those with moderate aphasia across all measures. Both groups exhibited superior
performance in composite naming and conversational speech contexts compared
to confrontation naming. Post-hoc paired ttests, with p set to .017 using the
Bonferroni correction, confirmed this observation. That is, composite noun, t(13)
= 3.33, p = .005, and verb scores, t(13) = 2.98, p = .011, were significantly
higher than TAWF noun/verb subtest scores; conversational noun, t(13) = 3.44, p
= .004, and verb scores, t(13) = 4.71, p < .001, followed a similar pattern. A
comparison of word retrieval across the two connected speaking contexts,
composite naming and conversational speech, revealed a significant difference
between verbs, t(13) = 2.83, p = .014, but not nouns, t(13) = .91, p = .38.
Although visual inspection of subjects scores (see Figure 1) indicated a general
trend toward more accurate verb compared to noun retrieval across severity
groups and elicitation contexts, no main effect of lexical class was revealed, F(l,
12) = 3.39, p = .091.
Despite the significant differences yielded through ANOVA analyses, all
measures of word retrieval (TAWF scores and %WR) were highly and
significantly correlated across elicitation contexts (see Table 2). That is, subjects
with low TAWF scores were likely to perform poorly on word retrieval measures
in composite description and conversational speech, and subjects who scored
well on the TAWF demonstrated relatively higher word retrieval scores across
contexts. When the mild and moderate groups were analysed separately,
however, this effect disappeared. No correlation was significant for the mild
group, and only one comparison (conversational nouns to composite nouns) was
significant for the moderate group.
% Substantive Verbs
Statistical analyses indicated no main effect of connected speaking context
(composite description vs conversation) on subjects generation of substantive
verbs, F(l, 12) = 1.76, p = .21 (see Figure 2). There was, however, a significant
effect of severity, F(l, 12) = 17.28, p = .001, and a significant interaction
between context and severity, F(l,12) = 5.92, p = .032, such that mild subjects
produced significantly more substantive verbs in composite description
compared to conversation, whereas moderate subjects generated slightly more
substantive verbs in the conversational versus composite condition. Accordingly,
%SV scores for the composite naming condition correlated more strongly with
other verb retrieval measures (TAWF and %WR) compared to conversational %
SV scores (see Table 2); moreover, composite %SV scores differentiated
significantly between the two severity groups, F(l, 12) = 19.26, p = .001,
whereas conversational %SV scores did not, F(l, 12) = 4.07, p = .067.
% Corrected Errors
ANOVA results revealed a significant main effect of context, F(2,14) = 4.73, p = .
027, indicating that subjects were more likely to self-correct word-finding errors
in discourse contexts than during confrontation naming (see Figure 3). Whereas
no main effect of
86 APHASIOLOGY
TABLE 2 Correlations across groups (mild and moderate, n = 14) among TAWF scores,
%WR (composite description and conversation), and %SV (composite description and
conversation)
TAWF
nouns
TAWF
verbs
Comp
nouns
Comp
verbs
Comp
%SV
Conv
nouns
Conv
verbs
TAWF
nouns
TAWF
verbs
Comp
nouns
(%WR)
Comp
verbs
(%WR)
Comv
verbs
(%SV)
Comv
nouns
(%WR)
Conv
verbs
(%WR)
Conv
verbs
(%SV)
.95**
.78**
.80**
.84**
.84**
.80**
.68
.87**
.83**
.84**
.89**
.88**
.69**
.77**
.71**
.97**
.84**
.56
.85**
.79**
.97**
.49
.71**
.83**
.54
.84**
.61
.56
Correlations across groups (mild and moderate, n = 14) among TAWF scores, %WR
(composite description and conversation), and %SV (composite description and
conversation).
** Correlation is significant at p < .007 (Bonferroni correction).
grammatical class was noted, F(l, 7) = 0.47, p = .52, significant interactions were
found between context and class, F(2,14) = 4.97, p = .023, and context, class,
and severity, F(2, 14) = 11.33, p = .001. The nature of the interactions was such
that mild subjects tended to self-correct verbs more often than nouns during
composite naming and conversational speech, whereas moderate subjects tended
to do the opposite (increased self-correction of nouns compared to verbs). No main
effect of severity, F(1, 7) = 4.32, p = .08, was detected.
DISCUSSION
Whereas several measures have been proposed to analyse various aspects of
connected speech, quantification of word-finding difficulties in this context has
been often overlooked. Given the centrality of such difficulties to aphasia (Boles,
1998; Larfeuil & Le Dorze, 1997), the development of a measure to analyse
lexical retrieval in natural contexts appears essential. This study examined
several such measures (i.e., %WR, %SV, and %CERR) in an attempt to describe
clinically useful patterns and bridge a likely gap between frequently used singleword measures of word retrieval versus more complex, connected speaking
paradigms. Findings from the current study demonstrated a significant effect of
context, with superior word retrieval in connected speech compared to
88 APHASIOLOGY
Figure 2. Percent Substantive Verbs (%SV) across severity groups and connected
speaking contexts.
Figure 3. Percent Corrected Errors (%CERR) across elicitation contexts. Note this
measure was inapplicable to the single-word (TAWF) verb scores of the mildly aphasic
group due to ceiling effects on this subtest (i.e., few opportunities for corrected error
scores). The n from which each percentage was determined varied from subject to subject
according to the number of word-finding errors elicited per context, with a range of 116
in the mild group and 445 in the moderate group.
90 APHASIOLOGY
respect to precise reliability standards have been established, the range of interand intra-rater reliability scores obtained in this study was similar to that
considered acceptable by Oelschlaeger and Thorne (i.e., > 80%). Importantly, the
fact that our raters were able to apply written scoring rules with at least 80%
accuracy in the absence of any formal training in or discussion of the measure
supports the straightforward and intuitive nature of %WR. It also is noteworthy
that given the ever-increasing pressure to increase clinical assessment efficiency,
these data were obtained in the context of feasible clinical time demands for both
initial testing and subsequent analysis (Crockford & Lessser, 1994). Finally, the
nature of %WR and supplementary analyses lends itself to the possibility of online data measurement (Hickin et al., 2001), a much-needed step in the
advancement of functional communication measures (Togher, 2001).
Relationships among contexts
Results of this study demonstrated a significant effect of context on word
retrieval, with enhanced performance of subjects during connected speaking
tasks compared to confrontation naming. Whereas initial analyses based on the
entire subject sample (n = 14) demonstrated a high correlation between lexical
retrieval in single-word and connected speaking tasks, subsequent separate
correlational analyses of the mild and moderately aphasic groups data failed to
detect such effects. That is, the high correlation obtained for the entire subject
sample appeared to reflect broad inter-group score differences between the mild
versus moderate groups (i.e., floor/ceiling effects), rather than a strong predictive
effect within each group. Because of the small n and relatively restricted range of
scores included in the separate group analyses, however, these results should be
interpreted with caution (i.e., within-group analyses may have reflected lower
statistical power to detect significant effects compared to the between-group
analysis). Nonetheless, single-word, TAWF scores, although highly predictive of
aphasia severity (mild vs moderate), did not appear strongly related to lexical
retrieval in composite naming and conversational speech within each aphasic
group.
The current identification of divergence between single-word confrontation
naming and discourse-level word retrieval is consistent with previous data
(Pashek & Tompkins, 2002; Williams & Canter, 1987), and has important
implications in terms of how researchers and clinicians should assess and treat
their patients with aphasia. Contemporary cognitive neuropsychological
approaches that promote specific, model-based treatments for hypothesised
deficits (Hillis, 1998; Raymer & Gonzalez-Rothi, 2001) and that have become
the focus of much recent research, primarily address language deficits at the
single-word level. In fact, few cognitive neuropsychological model-based
treatment studies have addressed discourse-level tasks, and those that have (e.g.,
McNeil, Doyle, Spencer, Jackson-Goda, Flores, & Small, 1997), reported
minimal transfer of single-word therapy gains to discourse contexts. Other
aphasia treatment studies have similarly found that conversational gains may be
more resistant to treatment than less natural communicative behaviours (e.g.,
Larfeuil & Le Dorze, 1997; Murray & Karcher, 2000). Alternatively, work
utilising conversational analysis frameworks has demonstrated that aphasia
therapy targeted directly at conversational behaviours may create ecologically
valid change in daily interactions with communication partners (Hopper,
Holland, & Rewega, 2002; Lock et al., 2001; Wilkinson et al., 1998).
Collectively, the current and previous findings endorse incorporating discourselevel tasks into aphasia assessment and treatment protocols.
Further data are needed to affirm which of the two connected speaking
contexts utilised in this study best lends itself to accurate and valid assessment of
naming. Each context entails a number of benefits and caveats; for example,
composite description tasks have inherent limitations such as practice effects and
a lack of interactional opportunities (Shewan, 1988; Togher, 2001), but the
practical advantages of consistency and a priori targets (Hickin et al., 2001;
Shewan, 1988). Likewise, the interchange of natural communication is
theoretically the ideal setting for aphasia assessment and remediation;
nonetheless, difficulties associated with applying consistent analytical measures
to the conversational speech of aphasic patients are well recognised (Crockford &
Lesser, 1994; Edwards, 1998; Marshall & Pound, 1997). For example,
conversation may encourage various tactics (e.g., indefinite terms, anticipating/
avoiding difficult words) that allow the patient to maintain a socially acceptable
level of fluency in the face of severe difficulty in finding the right word
(Vermeulen et al., 1989, p. 262), thereby decreasing the likelihood of detecting
word-finding errors. In contrast, developing similar compensatory strategies has
been recommended as a desired and important therapeutic outcome (e.g.,
Holland, 1994). In such cases, %WR could nevertheless function as a critical
outcome measure: that is, albeit limited to the perspective of lexicalsemantic
output, %WR could quantify patients ability to acquire these strategies, and thus
function appropriately and meaningfully in naturalistic communicative
conditions.
The type of discourse task (i.e., composite description vs conversation)
appeared to affect specific aspects of lexical retrieval (i.e., accuracy or %WR;
semantic complexity of verbs or %SV), a finding consistent with previous
reports of variation in word retrieval according to numerous task-related
variables (e.g., Cooper, 1990; Doyle et al., 1994). The current findings suggested
a general trend towards more accurate verb retrieval in conversational contexts
compared to composite description, at the expense of semantic complexity,
particularly for mild subjects. Thus, the choice of which or how many discourselevel tasks to incorporate into assessment and treatment (composite description
and/or conversational samples) may be, in part, a function of ultimate treatment
goals with respect to, for example, verb naming or sentence construction.
92 APHASIOLOGY
94 APHASIOLOGY
the utility of this measure remain (e.g., feasibility of %WR for online
measurement, its sensitivity to measure treatment outcomes, performance of
nonbrain-damaged individuals as measured by this system, effects of complex or
abstract conversational topics), the theoretical implications and clinical
ramifications of the current data provide a solid basis for further exploration of %
WR and related measures (e.g., %SV, %CERR) in our continual efforts to gauge
legitimately the strengths and needs of our patients with aphasia.
REFERENCES
Berndt, R. S., Mitchum, C. C., Haendiges, A. N., & Sandson, J. (1997a). Verb retrieval in
aphasia, 1. Characterizing single word impairments. Brain and Language, 56,
68106.
Berndt, R. S., Mitchum, C. C., Haendiges, A. N., & Sandson, J. (1997b). Verb retrieval in
aphasia, 2. Relationship to sentence processing. Brain and Language, 56, 107137.
Boles, L. (1998). Conversational discourse analysis as a method for evaluating progress in
aphasia: A case report. Journal of Communication Disorders, 31, 261274.
Breedin, S. D., Saffran, E. M., & Schwartz, M. F. (1998). Semantic factors in verb
retrieval: An effect of complexity. Brain and Language, 63, 131.
Brookshire, R. H., & Nicholas, L. E. (1994). Testretest stability of measures of
connected speech in aphasia. Clinical Aphasiology, 22, 119133.
Brown, C. S., & Cullinan, W. L. (1981). Word-retrieval difficulty and disfluent speech in
adult anomic speakers. Journal of Speech and Hearing Research, 24, 358365.
Cooper, P. V. (1990). Discourse production and normal aging: Performance on oral
picture description tasks. Journal of Gerontology, 45, 210214.
Crockford, C., & Lesser, R. (1994). Assessing functional communication in aphasia:
Clinical utility and time demands of three methods. European Journal of Disorders
of Communication, 29, 165182.
Damasio, A. R., & Tranel, D. (1993). Nouns and verbs are retrieved with differently
distributed neural systems. Proceedings of the National Academy of Sciences, 90,
49574960.
Doesborgh, S. J. C., van de Sandt-Koenderman, W. M. E., Dippel, D. W. J., van
Harskamp, F., Koudstaal, P. J., & Visch-Brink, E. G. (2002). The impact of linguistic
deficits on verbal communication. Aphasiology, 16 (4/ 5/6/), 413423.
Dollaghan, C. A., & Campbell, T. F. (1992). A procedure for classifying disruptions in
spontaneous language samples. Topics in Language Disorders, 12, 5668.
Doyle, P. J., Thompson, C. K., Oleyar, K., Wambaugh, J., & Jackson, A. (1994). The
effects of setting variables on conversational discourse in normal and aphasic adults.
Clinical Aphasiology, 22, 135143.
Edwards, S. (1998). Single words are not enough: Verbs, grammar and fluent aphasia.
International Journal of Language and Communication Disorders, 33 (Supplement),
190195.
German, D. J. (1990). The Test of Adolescent and Adult Word-Finding. Austin, TX: ProEd.
96 APHASIOLOGY
APPENDIX
%WR SCORING PROTOCOL
Count the first 3001 words of the sample (Larfeuil & Le Dorze, 1997;
Vermeulen et al., 1989).
Count the number of nouns and verbs within the 300word corpus, with the
following exceptions:
(a) Modalising speech2 (cf Larfeuil & Le Dorze, 1997) is excluded from
further analysis (i.e., noun/verb counts). This is to prevent artificial
inflation of a speakers noun/verb output.
(b) Nouns and verbs that are part of circumlocutions, self-corrections, or
repetitions/stalling are counted only in initial form (Larfeuil & De Lorze,
1997; Vermeulen et al., 1989).
(c) If a word is repeated for emphasis (e.g., shark, shark, shark!) or to
denote different items (e.g., lifeguardlifeguard (pointing to one, then
the other)), it is counted each time; for other instances of repetition (e.g.,
stalling), the word is counted just once (MacWhinney, 1995).
(d) The verb to be is not counted (Breedin et al., 1998).
(e) Pronouns and prepositions are not counted (Segalowitz & Lane, 2000).
(f) Numerals are not counted as nouns (Segalowitz & Lane, 2000).
Noun and verb word-finding episodes/errors include:
(a) Words immediately preceded by prolonged (filled or unfilled) hesitation
(2+ seconds; Crockford & Lesser, 1994; Dollaghan & Campbell, 1992;
Pashek & Tompkins, 2002); if the pause is utteranceinitial, however, it is
ignored (due to the possibility of sentence construction deficits).3
(b) Words preceded or accompanied by comments indicating difficulty.
(c) Self-corrections (Pashek & Tompkins, 2002).
(d) Paraphasias (semantic, phonemic, or unrelated).
(e) Deletions (Kemmerer & Tranel, 2000b): e.g., obvious deletion of a
syntactic constituent (main verb, object noun) in a required context.4
5
98 APHASIOLOGY
(f) Overuse of indefinite terms (Nicholas eet al., 1985). (g) Overuse of
pronouns, or pronouns without antecedents (Hickin et al., 2001; Nicholas
et al., 1985).
1 If 300 words were not produced in the sample, the first 150200 words were used
(Berndt et al., 1997; Brown & Cullinan, 1981).
2 Larfeuil and LeDorze (1997) defined modalising speech as all utterances in which
the speaker includes himself/herself in the discourse[e.g.], verbs and verbal phrases
whose function is not to express the speakers feelings but rather to predicatee.g., I
think that (p. 788).
3 Pashek and Tompkins (2002) noted possible confounds of the 2+ second rule for
indicating word-finding difficulty; they suggest that hesitations be analysed relative to
overall rate of speech. Therefore, language samples of patients whose response times or
speaking rates were judged clinically to be slow were analysed such that relative
hesitations (i.e., > 2 seconds longer than average pause time between words in fluent
speech) were noted rather than 2+ second pauses, per se.
4 With the exception of obvious deletions, additional errors of grammaticality (e.g., I had
going) were not considered errors of word retrieval if the subject demonstrated
evidence of having retrieved the correct lexical constituent.
5 The inclusion of indefinite terms [see (f)] is controversial in some respects, as normal
speakers have been noted to utilise such terms nearly as often as brain-injured speakers in
some cases (Snow, Douglas, & Ponsford, 1995). Therefore, the appropriateness or
inappropriateness of such terms in the context of the discourse (i.e., whether or not an
intended referent could be interpolated) was noted prior to definitive scoring. Subjects
were given the benefit of the doubt in ambiguous situations.
100 APHASIOLOGY
http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000111
DOI:10.1080/
102 APHASIOLOGY
METHOD
Participants
CHI. A total of 32 native speakers of English who had sustained a CHI were
studied. Participants were selected because they had recovered a high level of
functional languagethat is, they had achieved fluent conversation and did not
demonstrate any significant deficits on traditional clinical language tests. In
addition, participants were recruited to represent a range of socioeconomic
backgrounds (see below).
All CHI participants met the following criteria: (a) no reported history of
substance abuse or psychiatric illness; (b) visual acuity and visual perceptual
abilities adequate to distinguish stimulus materials as determined by screening
procedures; (c) hearing acuity adequate to follow directions in each task as
determined by screening procedures; (d) an aphasia quotient (AQ) from the
Western Aphasia Battery (Kertesz, 1982) above 93; (e) no significant motor
speech disorder as determined by an experienced speech-language pathologist;
(f) Rancho Los Amigos Level of Cognitive Functioning (Hagen, Malkmus, &
Durham, 1980) of VII (automatic-appropriate) or above; (g) Galveston
Orientation and Amnesia Test (Levin, ODonnel, & Grossman, 1979) score of 75
or above; and (h) a score of 120 or above on the Dementia Rating Scale (Mattis,
1976), a general screen of cognitive processing. The CHI group consisted of 8
females and 24 males ranging in age from 1669 years (mean31.7 years). Four
members of this group were African-American and the remainder Caucasian.
Years of education for the CHI group ranged from 1021 (mean = 13.2). The CHI
participants were also assigned to one of three socioeconomic groups:
Professional, Skilled Worker, or Unskilled Worker on the basis of the
Hollingshead rating (Hollingshead, 1972) (see Coelho et al., 2002 for a
description). The group consisted of 11 professionals, 10 skilled workers, and 11
unskilled workers. All of the CHI participants injuries were rated as either
moderate (duration of coma less than 6 hours) or severe (duration of coma greater
than than 6 hours) on the basis of criterion established by Lezak (1995). Time
post onset ranged from 199 months (mean = 12.8 months).
NBI. A total of 43 hospital employees, working in a variety of capacities, who
were native speakers of English made up the NBI group. No individual in this
group reported a history of neurologic or psychiatric disease, or substance abuse.
NBI participants were also selected on the basis of socioeconomic level.
Attempts were also made to match these individuals, as closely as possible, with
the CHI participants on the basis of age and gender. There were 30 males and 13
females studied, ages ranged from 1663 years old (mean = 31.9 years). Two
individuals from this group were African-American and 41 Caucasian. Level of
education ranged from 1124 years (mean = 15.3). With regard to
socioeconomic status, the NBI group consisted of 15 professionals, 10 skilled
workers, and 18 unskilled workers.
104 APHASIOLOGY
106 APHASIOLOGY
RESULTS
In the present study data from 32 CHI and 43 NBI participants from the two
previous investigations described in the introduction (Coelho, 2002; Coelho et
al., 2002) were reanalysed using discriminant function analyses (DFA). The
intent of the present study was to investigate the accuracy with which group
membership (CHI versus NBI) could be predicted on the basis of discourse
performance. The measures selected for inclusion in the DFA included measures
of story narrative and conversational discourse. Results from each of these DFAs
are discussed below.
Story narrative measures
Five measures that sampled aspects of micro-organisation (i.e., words per T-unit
and subordinate clauses per T-unit) and macro-organisation (i.e., percentage of
complete cohesive ties to total cohesive ties, total episodes, and proportion of Tunits within episode structure) in story retelling and story generation tasks were
entered into the DFA for narrative discourse. The DFA accurately classified 70%
of the cases, x2(10) = 14.54, p = .15, 64.5% of the CHI group and 74.4% of the
NBI group (see Table 1). This finding was not significant, accounting for
approximately 20% of the explained variance, suggesting that the story narrative
measures did not reliably discriminate the CHI from the NBI participants. Of the
story narrative measures, the proportion of T-units within episode structure and
words per T-unit both from the story generation task had the highest correlations
with the discriminant function, .54 and .49 respectively (see Table 2).
Conversation measures
Seven measures of conversational performance (i.e., numbers of obliges,
comments, adequate and adequate plus responses, novel topic introductions,
smooth topic shifts, and turns) were included in the DFA for conversation. Of the
measures of conversational performance studied, number of comments and
adequate plus responses, had the highest correlations to the discriminant
function,.91 and .67 respectively (see Table 3). This DFA correctly classified
over 77% of the cases, x2(7) = 25.04, p = .001, 78.1% of the CHI participants and
72.1% of the NBI group (see Table 4). This finding was significant, accounting
for approximately 30% of the explained variance, which suggests that the
measures of conversational discourse were better able to discriminate the
participant groups.
TABLE 1 Classification results from discriminant function analysis of story narrative
measures
Predicted group membership
Actual group
CHI
NBI
Total
CHI
NBI
20 (64.5%)
10 (25.6%)
11 (35.5%)
29 (74.4%)
31 (100.0%)
39 (100.0%)
Correlation
GENER-TUEPTR
GENER-WDSTU
RETELL-TUEPTR
RETELL-SUBT
GENER-COMTPC
RETELL-COMTPC
RETELL-WDSTU
GENER-SUBT
GENER-EPTOT
RETELL-EPTOT
.54
.49
.42
.39
.29
.26
.23
.22
.15
.03
The measures with the highest correlation contribute the most to discriminating between
the groups.
GENER = story generation task, RETELL = story retelling, task, WDSTU = words per Tunit, SUBT = subordinate clauses per T-unit, COMTPC = percent complete
108 APHASIOLOGY
ties out of total ties, EPTOT = number of total episodes, TUEPTR = proportion
of T-units within episode structure.
Correlation
COMMENTS
.91
ADEQUATE PLUS RESPONSES
.67
OBLIGES
.28
ADEQUATE RESPONSES
.23
NOVEL TOPIC INTRODUCTIONS
.19
TURNS
.09
.04
SMOOTH TOPIC SHIFTS
The measures with the highest correlation contribute the most to discriminating between
the groups.
CHI
NBI
Total
CHI
NBI
28 (87.5%)
13 (30.2%)
4 (12.5%)
30 (69.8%)
32 (100.0%)
43 (100.0%)
Correlation
COMMENTS
ADEQUATE PLUS RESPONSES
GENER-TUEPTR
.79
.55
.40
110 APHASIOLOGY
TABLE 6 Classification results from discriminant function analysis with selected story
narrative and conversation measures
Predicted group membership
Group
CHI
NBI
Total
CHI
NBI
27 (84.4%)
7 (22.5%)
5 (15.6%)
31 (77.5%)
32 (100.0%)
40 (100.0%)
112 APHASIOLOGY
Snow, P., Douglas, J., & Ponsford, J. (1995). Discourse assessment following traumatic
brain injury: A pilot study examining some demographic and methodological issues.
Aphasiology, 9, 365380.
Snow, P., Douglas, J., & Ponsford, J. (1997). Procedural discourse following traumatic
brain injury. Aphasiology, 11, 947967.
Togher, L. (2001). Discourse sampling in the 21st century. Journal of Communication
Disorders, 34, 131150.
Togher, L., Hand, L., & Code, C. (1999). Exchanges of information in the talk of people
with traumatic brain injury. In S.McDonald, L.Togher, & C.Code (Eds.),
Communication skills following traumatic brain injury (pp. 113145). Hove, UK:
Psychology Press.
Wilkinson, R. (1999). Sequentiality as a problem and resource for intersubjectivity in
aphasic conversation: Analysis and implications for therapy. Aphasiology, 13,
327343.
Winter, P. (1976). The bear and the fly. New York: Crown Publishers.
Ylvisaker, M., Szekeres, S. F., & Feeney, T. (2001). Communication disorders associated
with traumatic brain injury. In R.Chapey (Ed.), Language intervention strategies in
aphasia and related neurogenic communication disorders (pp. 745800).
Philadelphia: Lippincott, Williams & Wilkins.
task and one single-picture task, but not on the other discourse tasks.
There was a significant relationship between WAB-AQ and overall
quality ratings of coherence, reference, and emplotment. The
correlation between WAB-AQ and discourse quantity was not
significant for any task, and discourse quality was not significantly
correlated with discourse quantity. Ethnic features appeared most
often on one single-picture task and the personal narrative. No ethnic
dialect features occurred on the fable retell.
Conclusions: These findings suggest the need to supplement
standardised assessment of aphasia with assessment of discourse
performance, using less structured discourse tasks, such as a
personal narrative task. Less structured discourse tasks may also be
optimal for eliciting natural ethnic patterns of communication. The
lack of relationship between narrative quantity and narrative quality
may not generalise to individuals with aphasia that is severe or mild.
This study contributes towards development of a discourse
assessment tool for culturally and linguistically diverse populations
that may supplement information provided by standardised testing.
Several factors point to the need for discourse research with African Americans
who have aphasia. The incidence of stroke, and hence the probability of aphasia,
is higher in African Americans than in Caucasians (Kittner, White, Losonczy,
Wolf, & Hebel, 1990). Moreover, many African Americans are speakers of a
distinct ethnic dialect. While ethnicity does not determine ethnic dialect use,
previous research has confirmed its presence in some African Americans with
aphasia on certain tasks (Ulatowska & Olness, 2001). Identification of ethnic
dialect is critical to differentiate communication change associated with
pathology from normal communicative differences associated with ethnicity
(Wolfram, 1992), especially when surface features of the ethnic dialect overlap
http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000102
DOI:10.1080/
116 APHASIOLOGY
correlations between WAB-AQ and: (1) a lesson derived from a picturesequence fable; or (2) multiple-choice proverb interpretation.
A recent case series study (Ulatowska, Olness, Hill, Samson, & Goins, 2002)
suggests the importance of studying the discourse and standardised test
performance of individuals with moderate aphasia, in contrast to the previous
studies focusing on mild aphasia. Subjects for this study of individuals with
moderate aphasia were drawn from the larger comparative study previously cited
(Wertz et al., 2000). Individuals with aphasia in the moderate severity range, i.e.,
with similar aphasia severity as measured by the WAB, were found to vary
widely in their ability to produce stories that were coherent, referentially clear,
and contained all elements of the story or scenario. Individuals with moderate
aphasia appear to be an informative group for study because they have the ability
to produce discourse-length responses (unlike individuals with severe aphasia),
yet display various patterns of discourse disruption (unlike many individuals
with mild aphasia).
This study was designed to answer the following questions:
(1) What is the relationship between performance on a standardised language
measure and discourse performance on a variety of discourse tasks in
African-American adults with moderate aphasia?
(2) Are ethnic dialect and discourse features present in the discourse of
AfricanAmerican adults with moderate aphasia, and if present, does their
appearance differ among discourse tasks?
METHODS
Subjects
Twelve African-American adults with aphasia subsequent to a left-hemisphere
stroke participated in the study. Participants were selected from a larger
discourse investigation (Wertz et al., 2000), based on standardised test scores in
the moderate aphasia severity range on the Western Aphasia Battery, Aphasia
Quotient (WAB-AQ) (Kertesz, 1979, 1982). Table 1 provides participant
demographics. Socioeconomic status (SES) was rated on a 17 scale (adapted
from Featherman & Stevens, 1980), where a higher number indicates lower SES.
All participants were native speakers of English and were raised in the southern
United States. Table 2 displays participants WAB scores.
Discourse tasks
Five discourse tasks that varied in the type of demand placed on the speaker,
amount of information in the stimulus, and stimulus modality (visual or auditory)
were presented. All the tasks were designed to elicit narrative discourse, through
118 APHASIOLOGY
AAVE (Green, 1998; Wolfram & Fasold, 1974), distinctive from other dialects of
120 APHASIOLOGY
American English. Second, the verb carries the temporal and aspectual
information that forms the backbone of narratives, and narratives were elicited in
this study. Third, morpho-syntactic features of the verb are highly prone to
disruption in aphasia. Thus, the verb system was a natural choice of focus, for its
complexity, distinctiveness, its importance in narratives, and its ability to reflect
aphasic disruptions. Example verb forms from the sample included habitual
aspect BE (My father said, Dont be playing with those guns. ), and
perfective aspect DONE (He said, It kinda look like she done had a stroke.).
Discourse features of repetition and direct speech were also identified. Repetition
included both partial and full repetitions of previous portions of an utterance, and
instances of direct speech were reproductions of the speech produced by
characters in a narrative. These features are common in the oral storytelling
styles of many African Americans (Mitchel-Kernan, 1972; Ulatowska et al.,
2000) and act to highlight information and increase vividness in narratives.
Examples are: I was laying on the hospital, cant walk, cant talk, cant move.
I couldnt walk, talk, nothing, and They said, Oooh, girl, they gon get you
tonight, they gon get you.
Six raters, including two African-American clinicians, were trained to
discriminate the points on the rating scales. The stories were then rated, and
disagreements were resolved by group consensus. For each task, the relationship
between the discourse measures (quality and quantity) and WAB-AQ was
determined by computing correlations (Spearman and Pearson, respectively).
Alpha was adjusted to .01 to control for familywise error.
Reliability
Interrater reliability of the ratings for coherence, reference, and emplotment was
analysed by comparison of the original group ratings with ratings of an individual
rater on complete data from six of the twelve subjects. Point-by-point interrater
agreement was 90% for the coherence rating, 70% for the reference rating, and
75% for the emplotment rating. The final rating assigned to each response was
that of the original six raters, whose disagreements had been resolved by group
consensus.
RESULTS
Quality scores are shown in Table 3. A Spearman correlation, adjusted for
family-wise error (alpha = .01), between WAB-AQ and discourse quality was
statistically significant on the picture sequence, rs(10) = .82, p < .01, and single
picture Easter Morning, rs(10) = .83, p < .01. Correlations between WAB-AQ
and discourse quality were nonsignificant at an alpha level of .01 for the other
tasks: fable retell, rs(10) = .76; single picture Counting Money, rs(10) = .74;
and personal narrative, rs(10) = .15.
Boys &
Apples
(picture
sequence)
Counting
Money
(single
picture)
Easter
Morning
(single
picture)
Frightening
Experience
(personal
narrative)
01
02
03
04
05
06
07
08
09
10
11
12
M
Range
10
7
11
8
3
5
7
5
5
4
6
3
6.17
31
9
10
7
10
10
8
9
6
6
4
3
4
7.17
310
12
4
7
11
12
6
3
0
9
8
6
6
7.00
012
12
7
9
9
11
6
9
7
10
4
3
5
7.67
312
11
10
8
9
8
0
5
4
9
8
8
11
7.58
011
122 APHASIOLOGY
Discourse tasks
Participant
number*
Boys &
Apples
(picture
sequence)
Counting
Money
(single
picture)
Easter
Morning
(single
picture)
Frightening
Experience
(personal
narrative)
SD
2.55
2.55
3.67
2.81
3.18
Sum of the three quality response scores (Coherence, maximum 4 points; Reference,
maximum 4 points; Emplotment, maximum 4 points) for the 12 participants
responses on five discourse tasks.
* Participants are listed in order by decreasing WAB-AQ score.
TABLE 4 Combined quality ratings
Discourse quality measures
Participant number* Coherence
Reference
Emplotment
01
02
03
04
05
06
07
08
09
10
11
12
M
Range
SD
17
11
13
13
15
8
9
6
12
9
9
8
10.83
617
3.24
19
16
16
15
14
9
13
9
15
10
8
11
12.92
819
3.48
18
11
13
15
15
8
11
7
12
9
9
10
11.50
718
3.26
Sum of each of the discourse quality measures for responses of the 12 participants across
five discourse tasks (retell, picture sequence, two single pictures, and personal
narrative), with maximum score of 20 points (4 points per measure across 5
tasks).
* Participants are listed in order by decreasing WAB-AQ score.
Boys &
Apples
(picture
sequence)
Counting
Money
(single
picture)
Easter
Morning
(single
picture)
Frightening
Experience
(personal
narrative)
01
02
03
04
05
06
07
08
09
10
11
12
M
Range
SD
8
8
5
6
3
4
7
7
16
3
5
10
6.83
316
3.59
10
15
5
10
7
5
9
4
26
6
5
11
9.42
426
6.14
9
5
5
4
4
4
4
0
8
4
4
6
4.75
09
2.26
4
6
6
10
4
4
4
4
8
5
6
7
5.67
410
1.92
5
13
6
29
4
0
13
27
19
6
6
19
12.25
029
9.43
Boys &
Apples
(picture
sequence)
Counting
Money
(single
picture)
Easter
Morning
(single
picture)
Frightening
Experience
(personal
narrative)
01
02
03
04
05
06
07
08
09
10
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
(+)
(+)
()
()
(+)
()
()
()
(+)
()
()
n.r.*
()
()
(+)
(+)
()
()
(+)
(+)
(+)
(+)
()
()
(+)
()
()
(+)
()
n.r.*
()
()
(+)
()
124 APHASIOLOGY
Discourse tasks
Participant
number*
Boys &
Apples
(picture
sequence)
Counting
Money
(single
picture)
Easter
Morning
(single
picture)
Frightening
Experience
(personal
narrative)
11
12
Total +s
()
()
0
(+)
(+)
4
()
(+)
3
(+)
(+)
8
(+)
(+)
5
126 APHASIOLOGY
such as retell, personal narrative, and narrative elicited with pictures. Thus, it
brings us one step closer to development of a discourse assessment tool for
culturally and linguistically diverse populations which may serve to supplement
information provided by standardised testing.
REFERENCES
Chapman, S. B., Highley, A. P., & Thompson, J. L. (1998). Discourse in fluent aphasia
and Alzheimers disease: Linguistic and cognitive considerations. In M.Paradis
(Ed.), Pragmatics in neurogenic communication disorders (pp. 5578). New York:
Elsevier.
Chapman, S. B., & Ulatowska, H. K. (1989). Discourse in aphasia: Integration deficits in
processing reference. Brain and Language, 36, 651668.
Featherman, D. L., & Stevens, G. A. (1980). A revised socioeconomic index of
occupational status. Center for Demography and Ecology Working Paper 7948.
Madison, WI: University of Wisconsin.
Green, L. (1998). Aspect and predicate phrases in African-American Vernacular English.
In S.S.Mufwene, J.R. Rickford, G.Baily, & J.Baugh (Eds.), African-American
English: Structure, history, and use (pp. 3768). London: Routledge.
Holland, A. L. (1983). Non-biased assessment and treatment of adults who have
neurologic speech and language problems. Topics in Language Disorders, 3, 6775.
Kertesz, A. (1979). Aphasia and related disorders: Taxonomy, localization, and recovery.
New York: Grune & Stratton.
Kertesz, A. (1982). Western Aphasia Battery. New York: Grune & Stratton.
Kittner, S. J., White, L. R., Losonczy, K. G., Wolf, P. A., & Rebel, J. R. (1990). Blackwhite differences in stroke incidence in a national sample. The contribution of
hypertension and diabetes mellitus. Journal of the American Medical Association,
264, 12671270.
Mitchel-Kernan, C. (1972). Signifying, loud-talking, and marking. In T.Kochman (Ed.),
Rappin and stylin out. Communication in urban black America (pp. 315335).
Urbana, IL: University of Illinois Press.
Mross, E. F. (1990). Text analysis: Macro- and microstructural aspects of discourse
processing. In Y.Joanette & H.H.Brownell (Eds.), Discourse ability and brain
damage: Theoretical and empirical perspectives (pp. 5068). New York: SpringerVerlag.
Mufwene, S. S., Rickford, J. R., Baily, G., & Baugh, J. (Eds.). (1998). African-American
English: Structure, history and use. London: Routledge.
Olness, G. S., Ulatowska, H. K., Wertz, R. T., Thompson, J. L., & Auther, L. L. (2002).
Discourse elicitation with pictorial stimuli in African Americans and Caucasians
with and without aphasia. Aphasiology, 16, 623633.
Patry, R., & Nespoulous, J. L. (1990). Discourse analysis in linguistics: Historical and
theoretical background. In Y.Joanette & H.H.Brownell (Eds.), Discourse ability and
brain damage: Theoretical and empirical perspectives (pp. 327). New York:
Springer-Verlag.
Ulatowska, H. K., & Chapman, S. B. (1994). Discourse macrostructure in aphasia. In
R.L.Bloom, L.K.Obler, S.DeSanti, & J.S.Ehrlich (Eds.), Discourse analyses and
APPENDIX
QUALITY RATING SYSTEM
Coherence
4 All portions of the response are interconnected and clear
3 Most of the response is connected and clear, with some problems
2 Some of the elements of the response are connected
1 The discourse is not interpretable
128 APHASIOLOGY
0 No response
Reference
4 All referents and the relationship between them are clear
3 Some reference errors
2 Many reference errors
1 None of the referents, nor their relationship, is interpretable
0 No response
Emplotment
4 Full scenario is produced
3 Story or scenario is produced with some elements missing
2 Story or scenario is produced with many elements missing
1 Only brief mention of elements with no story or scenario observable
0 No response
130 APHASIOLOGY
Spencer, Goda, Cottrell, & Lustig, 1998; McNeil et al., 2001; McNeil, Doyle,
Park, Fossett, & Brodsky, 2002). The SRP consists of auditory presentation of
stories derived from Brookshire and Nicholass (1993) Discourse
Comprehension Test to a subject or patient, followed by an immediate retell. The
stories can be presented with or without picture support, and likewise, picture
support can be provided for the retells, or not, depending on the patient. It has
been argued that the SRP possesses some distinct advantages over other
connected language sampling procedures described in the literature, including
conversational observation (Oelschlaeger & Thorne, 1999), scripted interviews
(Goodglass & Kaplan, 1983), on-line video narration (McNeil, Small,
Masterson, & Fossett, 1995), fable generation and storytelling (Berndt, Wayland,
Rochon, Saffran, & Schwartz, 2000; Ulatowska, Chapman, Highley, & Prince,
1998), picture description (Nicholas & Brookshire, 1993, 1995; Yorkston &
Beukelman, 1980), and procedural description (Nicholas & Brookshire, 1993,
1995).
From a language sampling perspective, it has been suggested that the
constrained nature of the SRP enables it to provide a well-standardised and
replicable sample of language formulation and production. Specifically, data
have been presented to support the internal validity of the SRP (Doyle et al.,
1998) and the linguistic equivalence of language samples generated by four
alternate forms of the procedure (Doyle et al., 2000).
In addition, a scoring metric was developed to quantify the information content
and communicative efficiency of the samples generated by the SRP. This metric,
labelled the information unit (IU), was derived from Nicholas and Brookshires
(1993, 1995) correct information unit, and was defined as an identified word,
phrase, or acceptable alternative from the stimulus story that is intelligible and
informative and conveys accurate and relevant information about the story
(McNeil et al., 2001, p. 994). The primary virtue of the IU scoring metric used
with the SRP is that all possible IUs are known a priori and can be printed on score
sheets. This potentially allows scoring to be done directly from audio recordings,
eliminating the need for time-consuming transcription of lengthy language
samples. The IU scoring metric expressed as a percentage of total possible IUs (%
Address correspondence to: William D.Hula MS, Doctoral Fellow, Audiology &
Speech Pathology, VA Pittsburgh Healthcare System, 7180 Highland Drive,
Pittsburgh, PA 15206, USA. Email: william.hula@med.va.gov The authors
gratefully acknowledge the assistance of Stephanie Nixon and Joyce Poydence.
This research was supported by VA Rehabilitation Research and Development
Project # C8942RA.
2003 Psychology Press Ltd
http://www.tandf.co.uk/journals/pp/02687038.html
02687030344000139
DOI:10.1080/
IU) has been demonstrated to be reliable across forms of the SRP and to have
good criterion validity (McNeil et al., 2001). Also, an efficiency measure
obtained by dividing %IUs by the time taken to produce them (%IU/Min) has
been demonstrated to be reliable across forms and to discriminate between
normal and aphasic performance with reasonable accuracy (McNeil et al., 2002).
In addition to reporting on the validity and alternate form reliability of the %IU
metric, McNeil and colleagues (2001) demonstrated that it has good interobserver reliability. However, these data were obtained from scoring of printed
transcripts that had themselves already been subjected to reliability checks.
Furthermore, all of the data presented thus far on the SRP have been generated
by scorers who were themselves involved in the development of the IU measure.
If the SRP and its associated IU metrics are to be used to their fullest advantage,
particularly in a clinical setting, they must be demonstrated to have acceptable
reliability when scored directly from audio recordings by observers who have
received training comparable to what a practising clinician could be expected to
receive.
One final shortcoming of prior work done to demonstrate the psychometric
strength of the SRP concerns the distinction between IUs that were directly
stated in the stimulus stories (direct IUs) and lUs retold as synonyms (alternate
IUs) of words and phrases contained in the stimulus stories. In a study
investigating memory demands of the SRP (Brodsky, McNeil, Park, Fossett,
Timm, & Doyle, 2000), a strong serial position effect was demonstrated for direct
lUs, but not for alternate IUs. Thus far, no data have been presented to
demonstrate that this distinction can be reliably scored.
The purpose of the current paper is to present additional information on the
inter-rater reliability of the SRP using procedures and raters more representative
of a clinical setting than have been used in the past. Inter-rater reliability
coefficients and standard errors of measurement (SEM) will be reported for the %
IU/Min score. Also, point-to-point reliability for identification of individual IUs
will be reported.
METHOD
Participants
Recordings of story retells by four persons with aphasia and eleven normal
individuals were used. All recordings were randomly drawn from the sample of
15 subjects with aphasia and 31 normal subjects reported by McNeil et al.
(2001). Descriptive statistics for the subjects with aphasia are presented in
Table 1. Judges were four individuals with varying amounts of experience with
aphasia and language transcription: a licensed psychologist, a masters student in
speech-language pathology, and two doctoral students who are also certified
speech-language pathologists. These judges were a convenience sample, as they
132 APHASIOLOGY
were all new employees in the second authors laboratory and required training
in scoring the SRP for their work. The two doctoral students both had 23 years
of work experience that involved transcription of language samples from clinical
populations. The psychologist had approximately 13 years of experience with
neuropsychological testing of rehabilitation patients, including patients with
aphasia, but little experience with language transcription per se. The masters
student had no experience with language transcription for research or clinical
purposes.
TABLE 1 Biographical and descriptive subject information for subjects with aphasia (N =
15)
Subjects chosen for reliability analysis in this study are marked with an (*).
MPO = Months post onset; RTT = Revised Token Test (McNeil & Prescott, 1978),
percentile compared to adults with left-hemisphere damage; ABCD ratio = Arizona
Battery for Communication Disorders of Dementia (Bayles & Tomoeda, 1993) ratio,
determined by number of delayed recall items/number of immediate recall items x 10;
Ravens = Ravens Coloured Progressive Matrices (Raven, 1976), raw score out of a
possible 36; PICA = Porch Index of Communicative Ability (Porch, 1981), percentile
compared to adults with lefthemisphere damage, OA = overall percentile and VRB = verbal
percentile.
Procedures
Prior to scoring any of the story retells, each of the four judges read the IU
definition and examples published by McNeil et al. (2001), and practised scoring
IUs on six to eight stories from printed transcripts. These language samples were
drawn from the samples collected by Doyle et al. (2000) and McNeil et al.
(2001). After training, each judge scored the same SRP form for each of the four
persons with aphasia and eleven normal subjects. Each form consisted of three
separate stories as reported by Doyle et al. (2000). All scoring was done from
audio files using score sheets containing all possible direct and alternative IUs.
Judges listened to each story as many times as they wanted to and placed a check
on the score sheet wherever an IU was observed. Wherever an alternate IU (as
opposed to a direct IU) was observed, they made an additional mark to denote
134 APHASIOLOGY
which of the predetermined synonyms was produced. The %IU/Min for each
story was calculated and averaged across the appropriate three-story form to give
a total %IU/Min score for each subject. The total %IU/Min score was also
broken down into %direct IU/Min and %alternate IU/Min to allow for assessment
of inter-rater reliability on these more specific measures.
The judges all reported that it generally took 1530 minutes for them to score
a single form (three stories) of the SRP for a single subject. Data on the time
spent scoring retells were kept for the least trained and experienced judge. Her
average time to complete a single form was 23 minutes (range = 1229; SD = 4).
RESULTS
Inter-rater reliability coefficients were calculated separately for subjects with
aphasia and normal subjects using the %total, %direct, and %alternate IU/Min
scores generated by each of the four judges for each of the subjects. To
determine a reliability coefficient that would allow for generalisation to judges
beyond those in this study, absolute-agreement intraclass correlation coefficients
(ICCs) were calculated with both subjects and judges as random factors. The ICC
has been argued to be a more conservative measure of reliability than the Pearson
Product Moment Correlation (Denegar & Ball, 1993). The ICCs are presented in
Table 2. They ranged from .94 to .995 for the subjects with aphasia and from .89
to .99 for the normal subjects. The SEM associated with inter-judge scoring error
was also calculated for each metric. These results are presented in Table 3 and
they ranged from .59 to .95 %IU/Min for the subjects with aphasia and from .99
to 1.42 %IU/Min for the normal subjects.
Point-to-point reliability between all six possible pairings of judges was
calculated separately for the four subjects with aphasia and for four of the
normal subjects. The
TABLE 2 Inter-rater reliability (intraclass) correlation coefficients for total, direct and
alternate %IU/Min
Subjects
Total
Direct
Alternate
Aphasic (n = 4)
Normal (n = 11)
0.995
0.993
0.986
0.979
0.944
0.885
Total
Direct
Alternate
Aphasic (n = 4)
Normal (n = 11)
0.69
0.99
0.95
1.42
0.59
1.04
formula used was [(agreements/disagreements + agreements) x 100]. Point-topoint reliability averaged 91% (range = 8595%) for both subject groups.
DISCUSSION
The inter-rater reliability for the %IU/Min metric, when scored directly from audio
recordings by newly and minimally trained judges, was high, with small
differences in scoring reliability among judges with differences in professional
experience. The SEMs were found to be much lower than the SEMs reported by
McNeil et al. (2002) for the four alternate forms for subjects with aphasia (range
= 4.85.6) and for the normal subjects (range = 3.24.7). The low SEMs suggest
that measurement error attributable to differences between raters is small relative
to the score variance due to the story forms themselves.
Furthermore, the present data, scored to include the direct vs alternate IU
distinction, demonstrated point-to-point reliability that was high and comparable
to previously reported values obtained from printed transcripts. Finally, the
preliminary data presented regarding the time needed to score language samples
elicited by the SRP suggest that it might be useful in clinical environments where
economy of assessment procedures is essential.
REFERENCES
Bayles, K. A., & Tomoeda, C. K. (1993). Arizona Battery for Communication Disorders
of Dementia. Tucson, AZ: Canyonlands Publishing, Inc.
Berndt, R. S., Wayland, S., Rochon, E., Saffran, E., & Schwartz, M. (2000). Quantitative
Production Analysis (QPA). Philadelphia: Psychology Press.
Brodsky, M., McNeil, M., Park, G., Fossett, T., Timm, N., & Doyle, P. (2000). Auditory
memory for story retelling in normal male, female, young, and old adult subjects in
persons with aphasia. Poster presented to the Academy of Aphasia Conference,
Montreal, CA.
Brookshire, R. H., & Nicholas, L. H. (1993). Discourse Comprehension Test. Tuscon,
AZ: Communication Skill Builders.
Denegar, C. R., & Ball, D. W. (1993). Assessing the reliability and precision of
measurement: An introduction to the intraclass correlation and standard error of
measurement. Journal of Sports Rehabilitation, 2, 3542.
Doyle, P. J., McNeil, M. R., Park, G., Goda, A., Rubenstein, E., Spencer, K., et al. (2000).
Linguistic validation of four parallel forms of a story retelling procedure.
Aphasiology, 14, 537549.
Doyle, P. J., McNeil, M. R., Spencer, K. A., Goda, A. J., Cottrell, K., & Lustig, A. P.
(1998). The effects of concurrent picture presentations on retelling of orally
presented stories by adults with aphasia. Aphasiology, 12, 561574.
Goodglass, H., & Kaplan, E. (1983). The assessment of aphasia and related disorders.
Philadelphia: Lea & Febiger.
136 APHASIOLOGY
McNeil, M. R., Doyle, P., Fossett, T., Park, G., & Goda, A. (2001). Reliability and
concurrent validity of an information unit scoring metric for the retelling procedure.
Aphasiology, 15, 9911007.
McNeil, M.R., Doyle, P., Park, G., Fossett, T., & Brodsky, M. (2002). Increasing the
sensitivity of the Story Retell Procedure for the discrimination of normal elderly
subjects from persons with aphasia. Aphasiology, 16, 815822.
McNeil, M.R., & Prescott, T.E. (1978). The Revised Token Test. Austin, TX: Pro-Ed.
McNeil, M.R., Small, S.L., Masterson, R.J., & Fossett, T.R. D. (1995). Behavioral and
pharmacological treatment of lexical-semantic deficits in a single patient with
primary progressive aphasia. American Journal of Speech-Language Pathology, 4,
7687.
Nicholas, L.E., & Brookshire, R.H. (1993). A system for quantifying the informativeness
and efficiency of the connected speech of adults with aphasia. Journal of Speech and
Hearing Research, 36, 338350.
Nicholas, L.E., & Brookshire, R.H. (1995). Presence, completeness, and accuracy of main
concepts in the connected speech of non-brain-damaged adults and adults with
aphasia. Journal of Speech and Hearing Research , 38, 145156.
Oelschlager, M.L., & Thorne, J.C. (1999). Application of the correct information unit
analysis to the naturally occurring conversation of a person with aphasia. Journal of
Speech, Language, and Hearing Research, 42, 636648.
Porch, B.E. (1981). Porch Index of Communicative Ability. Palo Alto, CA: Consulting
Psychologists Press.
Raven, J.C. (1976). Coloured Progressive Matrices. Oxford: Oxford Psychologists Press,
Ltd.
Ulatowska, H.K., Chapman, S.B., Highley, A.P., & Prince, J. (1998). Discourse in healthy
old-elderly adults: A longitudinal study. Aphasiology, 15, 619633.
Yorkston, K.M., & Beukelman, D.R. (1980). An analysis of connected speech samples of
aphasic and normal speakers. Journal of Speech and Hearing Disorders, 45, 2736.