Professional Documents
Culture Documents
STEPHANIE CAWTHON
of test items is one test format element that has been studied in the context of struggling readers and their participation in paper-and-pencil tests. The present article presents findings from an exploratory study on the potential relationship between linguistic complexity and test performance for deaf readers. A total of 64 students completed 52 multiple-choice items, 32 in mathematics and 20 in reading. These items were coded for linguistic complexity components of vocabulary, syntax, and discourse. Mathematics items had higher linguistic complexity ratings than reading items, but there were no significant relationships between item linguistic complexity scores and student performance on the test items. The discussion addresses issues related to the subject area, student proficiency levels in the test content, factors to look for in determining a linguistic complexity effect, and areas for further research in test item development and deaf students.
INGUISTIC COMPLEXITY
CAWTHON IS AN ASSISTANT PROFESSOR, SCHOOL PSYCHOLOGY PROGRAM, DEPARTMENT OF EDUCATIONAL PSYCHOLOGY, UNIVERSITY OF TEXAS, AUSTIN.
Under current accountability reform frameworks such as those guided by the No Child Left Behind Act of 2001 (NCLB), evaluation of student knowledge and skill is almost exclusively conducted via large-scale, standardized assessments. Because of the accountability mechanisms (i.e., financial and programmatic sanctions) that follow poor student performance on assessments, a great deal of emphasis is placed on how to best measure proficiency in students who have not traditionally seen academic success. In almost all cases, these accountability assessments are in written English. Proficiency in reading is thus a gateway skill to accessing the content of
state assessments. For students who are not grade-level English users, including many deaf students, the written-English format of the assessment may be a barrier to demonstrating what they know. The challenge in accessing test content may compound other difculties in measuring achievement when students are several years behind grade level. Deaf 1 students, on average, show signicant academic delays in all subject areas (Mitchell, 2008). Many of the early complaints about the high-stakes testing under NCLB concerned the requirement that states use grade-level tests even for students whose instruction was 2 or more years behind the
255
256
prociency (ranging from not procient to procient English-language users). Sato and colleagues found a consistent trend across all groups of better performance on mathematics content test items with lower linguistic complexity than items with higher linguistic complexity. This difference was most striking for students who were ELL, in comparison with students who were no longer considered ELL but were not yet fully procient in English language arts or students who were fully English procient. Reduction in linguistic complexity appears to give a specic benet to students who are not yet procient in English. As noted above, linguistic complexity includes features such as vocabulary and syntax. It may be that specic components of language contribute to the effects more than others. Shaftel, Belton-Kocher, Glassnap, and Poggio (2006) looked at the linguistic components of typical test questions and whether they affected students scores. In the study, students were from a variety of backgrounds; the sample included students with disabilities (SWD), ELL, and students without either disabilities or ELL status. The researchers found that language, specically mathematics vocabulary, affected item difculty strongly in 4th grade (moderate effect) and had lesser effects on item difculty in 10th grade. Additionally, they pointed out that mathematics items have linguistic components that may be difcult for all students regardless of SWD or ELL status. Most importantly, ambiguous or multiple meaning words increased item difculty for all students. Implications for Deaf Students Before findings about test item linguistic complexity can be generalized from students who are ELL to deaf students, it is important to further study
how test format features affect students within the latter population, specifically (American Educational Research Association, American Psychological Association, & National Council on Educational Measurement, 1999). In their analysis of potential bias and adequacy in the development of tests for deaf students taking a high school graduation exam, Lollis and LaSasso (2009) review previous literature on linguistic structures and students who are deaf, much of it from the 1970s and early 1980s. Their summary of test construction bias raised critical questions about how test developers consider the needs of students whose first language is not English, including deaf students. Much of the discussion about deaf students and access to standardized assessments centers on literacy development and academic achievement. The educational implications of prelingual or early childhood hearing loss can be signicant if sustained interventions are not in place to give students access to a robust language model such as American Sign Language (ASL) or speech via amplication (e.g., Moores, 2009; Nussbaum, La Porta, & Hinger, 2003; YoshinagaItano & Gravel, 2001). Even with early intervention, educational institutions historically have struggled to provide deaf students with opportunities for academic success (Harris & Bamford, 2001; Mutua & Elhoweris, 2002; Traxler, 2000). Part of the struggle has been in literacy development, which is often delayed. The three features of test item linguistic complexity targeted in the present study were vocabulary, syntax, and discourse. In order to have a sense of where test item features might interact with deaf students language and literacy development, I review some of the ndings in these three areas.
Although all three contribute to reading comprehension, there are some distinctions in how they contribute to literacy achievement in deaf students. Vocabulary, or semantics, is a core component of language development and is one of the earliest parts of language that children acquire (Bloom, 1991). Within the rst few years of life, children begin to connect names to individual objects and, later, to groups of objects (Golinkoff et al., 2000). For hearing children, word acquisition builds upon acquisition of phonology and morphology, or the sounds and units of speech that make up English. The extent to which deaf children use phonology and morphology in early vocabulary acquisition is a debated topic (see Mayberry, del Guidice, & Lieberman, 2011, for a meta-analysis), but the ultimate goal for deaf readers is to be able to decode English, or to read and identify words in print. Mitchell (2008) provides summaries of two recent studies of vocabulary achievement with deaf students, rst with the Woodcock-Johnson III (WJ) and second with the Stanford Achievement Test (SAT, 10th ed.). The advantage of these ndings is that they are normed data, and provided a sense of achievement relative to the general population. For the WJ data on letterword identication, over 60% of students with hearing impairments were in the rst quartile, or in the lowest 25% of scores on the assessment. For the SAT, nearly 90% of deaf students were in the rst quartile on a reading vocabulary assessment. Although the test samples were not exactly the same in their representativeness of the deaf student population, both sets of ndings point toward a low level of achievement in English vocabulary relative to deaf students hearing peers. The second two features of linguistic complexity are syntax and dis-
257
Sample
One of the greatest challenges in research with deaf students is the low incidence of the disability. Consequently, my fellow researchers and I recruited participants from six schools for deaf students across the United States. When teachers and administrators sought clarification on who would be a suitable participant, we provided them with sample items and an example of the test format and asked them to judge whether a potential participant would be able to complete the research study tasks. Students with a severe to profound hearing loss but without disabilities that required additional test accommodations were included in the sample. In the end there was a total of 64 students, 29 boys and 35 girls, who were enrolled in fifth through eighth grades (and ranged in age from 10 to 15 years). Students also completed a standardized measure of reading and mathematics prociency, the Iowa Test of Basic Skills, or ITBS (Hoover et al., 2001). The ITBS reading test (parts 1 and 2) is a widely used assessment of a students prociency in vocabulary and reading comprehension. For mathematics we used the ITBS mathematics test, parts 1 and 3 (Hoover
258
et al., 2001), which included problemsolving and computation sections. Student scores in both reading and mathematics were converted into grade equivalencies based on norms provided by the ITBS (Cawthon et al., 2011). In the sample, students had grade equivalencies in reading ranging from 1.5 to 6.6 (M = 3.6). The overall ITBS mathematics grade equivalency ranged from 1.9 to 7.6 (M = 4.2). As a point of reference, the test items were selected to target fthand sixth-grade reading and mathematics skills. While there was a strong correlation between grade level performance on the ITBS reading and mathematics assessments (r = .774, p < .01), there was not a signicant relationship between academic grade level (i.e., grade enrolled at school) and the ITBS grade level performance in either reading (r = .193, ns) or mathematics (r = .125, ns). Finally, we asked students to share with us information about their experiencs using language in both academic and social settings (Cawthon et al., 2011). While we did not ask the students to complete an ASL prociency measure, we did gather information about the length of time they had used ASL, their preferences for language use across different settings, and whether they had family members who were Deaf. For half of the students, ASL was the rst language; for those students who had acquired ASL as a second language, the average length of time they had been learning and using ASL was 5.4 years (SD = 3.3 years). Because our population was enrolled in the fth through eighth grades, this is roughly parallel to the length of time many of these students had been in elementary school. Students predominantly received instruction in ASL (91%), with Signing Exact English (SEE) following with 20% and spoken English with 16%. (Students
could respond with more than one language mode.) When asked what language they used at home, only 53% of students indicated ASL; 16% indicated SEE and 50% spoken language (47% English and 3% Spanish).
Test Items
Mathematics and reading tests were based on released practice problems from the fifth- and sixth-grade 2003 and 2004 Texas state assessments. There were a total of 32 mathematics items and 20 reading items. For mathematics, the questions were word problems that focused on a range of concepts such as number properties, proportions, and geometric properties. For reading, the passages were three to four paragraphs long and covered topics such as how scientists monitor penguins in the Arctic, panda bears visiting the United States, training elephants in Africa, and weaving lace in Paraguay. Two example items, one in mathematics and one in reading, are shown in Appendix A.
gories of linguistic demand and the criteria these categories used to identify potentially challenging areas (see Appendix B). In our coding scheme, each feature was defined and coded as described in the following sections for each item. (The codes for examples in Appendix A are provided in Appendix C.)
Vocabulary
We counted the number of complex vocabulary words in each text unit. Complex vocabulary items were dened as words with multiple meanings, nonliteral usage, and manipulation of lexical forms (Martinello, 2008). For example, if the item included the word plane, this word would be counted as a complex vocabulary word because it has different meanings depending on the context of the sentence. Content words in mathematics, such as parallelogram or other vocabulary that was relevant to the problem, were not included in the vocabulary count even if they might otherwise have multiple meanings. Similarly, words that were introduced in the reading passages as target concepts were not included in the complex vocabulary count. If there were no complex words, the vocabulary complexity was given a rating of 0; if one to two words, the score was 1; if three to four words, the score was 2; an item with ve or more complex words received a score of 3 for vocabulary.
Syntax
The syntax score is a composite of a checklist of the following items: atypical parts of speech, uncommon syntactic structures, complex syntax, academic syntactic form, long nominals, conditional clauses, relative clauses, and complex questions. The presence or absence of passive voice also was included in the syntax score. An item received 1 point for each syntax element present. The minimum raw
259
Discourse
Complex discourse was defined in the present study as uncommon genre, the need for multiclausal processing, or the use of academic language (Abedi, Lord, & Plummer, 1997). Item discourse was also considered complex if students were required to synthesize information across sentences or to make clausal connections between concepts and sentences. Discourse was coded as a discrete variable (1 or 0) based on the presence or absence of one or more of these features. Using the linguistic complexity rubric, two raters convened to discuss and rate the test items used in the project. The raters rst took part in a training session using examples from released state assessment items. The training period ended once the raters reached at least a 90% agreement rate. They then independently rated the project items. For reading, initial agreement was 91% for vocabulary, 64% for syntax, and 100% for discourse. For mathematics, initial agreement was 84% for vocabulary, 81% for syntax, and 100% for discourse. When there was a disagreement, we took the average of the two items. In some cases, this resulted in a score with a .5 such as the average of 1 and 2 equaling 1.5. Abedi and colleagues (2005) give three levels of classication for the scores; not complex (0), moderately complex (12), and very complex (3 or higher). We therefore adapted the complexity ranges to accommodate our half-point scores, so that total score classications were as
260
Table 1
Table 2
Mathematics
of logistical challenges we faced in conducting the study. There were challenges raised by requests for anonymity from school sites that limited the range of information we could gather about each student and still allow them to remain unidentifiable to project staff. There were also limitations in allocated research time, with the study time needing to be balanced with instructional time and significant levels of testing for school and state accountability programs. Adding additional measures or asking for file reviews by school staff was not feasible under the time constraints that were in place. This said, the reduced demographic information resulted in only a very general idea of who the participating students were and how their individual characteristics may have influenced their approach to the test tasks. For example, we know that students at these sites tend to have
severe to profound hearing loss, and that about half of the students came from families in which ASL was used at home. However, more precise information about individual students degree of hearing impairment, use of a cochlear implant, or age at onset would provide a sense of their exposure to English, and how their background may have interacted with their processing of the different language characteristics (e.g. vocabulary vs. syntax) of the passage. A second limitation is the technical properties of the adapted linguistic complexity scale. The ndings of the present study could be conceptualized as a part of the larger process of providing evidence for or against the external validity of the low, moderate, and high ratings for linguistic complexity. However, these categories have not been compared with other measures of item difculty or levels of complex-
ity that take similar approaches to item analysis. For example, Abedi (2006) has a holistic rating scale that gathers information about the relative strength and weakness of an item. Instead of counting points for where there are components of linguistic complexity, the holistic scale asks raters to score each item on a scale from 1 to 5, with higher scores representing higher levels of linguistic complexity. The scale provides sample features associated with each score level, with all of the sample features relating to the language demands made by the item. For instance, for a score of 3, which identies a weak item, the scale includes features such as relatively unfamiliar or seldom used words, long sentences, abstract concepts, complex sentence/conditional tense/adverbial clause, and a few passive voice or abstract or impersonal presentations. The holistic nature of this scale allows raters to attend to the overall gestalt of the item but also requires that the raters have experience with students who are ELL and a solid understanding of the difculties involved in learning a new language. The measure used in the present study may not capture the features of an inaccessible item, or it may not have sufcient range to detect meaningful differences in effects on test takers.
261
262
tent). The mathematics word problems were opportunities for students to apply strategies they had learned to do particular kinds of inquiry (e.g., transforming objects across a grid, solving for a missing variable). Gaining access to the test item content is thus a very different process for reading comprehension items and mathematics word problems. Due to the different cognitive tasks required, levels of linguistic complexity for reading comprehension test items may have different effects on student performance than a similar level of linguistic complexity for mathematics items. How the item content is presented, including the linguistic complexity level, would appear to be even more critical when the student was connecting the demand of the test item with his or her own knowledge base without the additional support of a reading passage. In short, the question of access to test item content, with potential barriers within the test item format, needs to take into account the type of cognitive skills that are measured by the standardized assessment format in the content area (a nding echoed by Ansell & Pagliaro, 2006). Although our results did not show a difference in student performance between mathematics and reading, future research will need to investigate this nding on a larger scale, using larger item pools and controlling for item task cognitive difculty while manipulating the level of linguistic complexity for the item pair(s). Conclusion When considering the potential impact of policies that guide alternate assessments and test item modifications for students with disabilities, it is important to identify which aspects of the test format are revised from the standard version. The assumption be-
hind changing language components such as the syntactic structure or level of vocabulary is that students will find the simplified version easier to understand. In the definition of alternate assessments, modified assessments, and related special frameworks for testing students with significant cognitive disabilities, we have moved perhaps a step away from the concept of universal design of assessments that arose 10 or 15 years ago (Center for Universal Design, 1997; Thompson, Johnstone, & Thurlow, 2002). The continuing evolution of large-scale assessments to create different formats for eligible populations (i.e., for the three percent) places a great deal of emphasis on the specific match between student characteristic and test format (Elliott & Roach, 2007; Weigert, 2009; Zigmond & Kloo, 2009). However, especially for students from heterogeneous populations such as deaf students, this match is very difficult and can lead to inconsistencies in assessment practices across schools, districts, and states. Issues of linguistic complexity, if they do arise as pertinent for deaf students, may be best addressed through assessments that look at comprehensive access features (instead of items with only modied language components). For example, researchers from Vanderbilt University (e.g., the Consortium for Alternate Assessment Validity and Experimental Studies, or CAAVES) take an aggregate approach to item modication. In this approach, a range of characteristics is evaluated as an overall accessibility construct. More specically, in CAAVES Accessibility Rating Matrix, characteristics such as the format of the test item or the use of graphics are summed in an aggregate manner to measure the extent to which access has been increased for students under the modified test condition (Beddow et al.,
2008). By taking into account both the visual and linguistic features of a test item, test developers may be able to create assessments that allow deaf students to access test items using multiple representations of information not solely based on English text. Note 1. The definition of Deaf or hard of hearing varies by multiple factors including hearing threshold and cultural identity. Deaf or hard of hearing may include people who are culturally Deaf, individuals who identify as audiologically deaf, sign language users, those with cochlear implants, those who wear hearing aids, and those who use a range of communication styles in a variety of settings. The term deaf is used in the present article to refer generally to students who have a severe to profound hearing loss but does not specify other characteristics such as whether they participate in the Deaf community or if they may be categorized demographically as hard of hearing. References
Abedi, J. (2002). Standardized achievement tests and English Language Learners: Psychometrics issues. Educational Assessment, 8, 231257. Abedi, J. (2006) Language issues in item development. In S. Downing & T. Haladyna (Eds.), Handbook of test development (pp. 377399). Rahway, NJ: Erlbaum. Abedi, J., Bailey, A., Butler, F., Castellon-Wellington, M., Leon, S., & Mirocha, J. (2005). The validity of administering large-scale content assessments to English Language Learners: An investigation from three perspectives (CSE Report No. 663). Los Angeles, CA: Center for Research on Evaluation Standards and Student Testing. Abedi, J., Courtney, M., & Goldberg, J. (2000). Language modification of reading, science, and mathematics test items. Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing. Abedi, J., Courtney, M., Mirocha, J., Leon, S., & Goldberg, J. (2005). Language accommodations for English Language Learners in largescale assessments: Bilingual dictionaries and linguistic modification. Los Angeles, CA:
263
264
Schirmer, B. (2000). Language and literacy development in children who are deaf (2nd ed.). Needham Heights, MA: Allyn & Bacon. Shaftel, J., Belton-Kocher, E., Glassnap, J., & Poggio, J. (2006). The impact of language characteristics in mathematics test items on the performance of English Language Learners and students with disabilities. Educational Assessment, 11, 105126. Sireci, S. G., Scarpati, S. E., & Li, S. (2005). Test accommodations for students with disabilities: An analysis of the interaction hypothesis. Review of Educational Research, 75(4), 457490. Spencer, P & Marschark, M. (2010). Evidence., based practice in educating deaf and hard of hearing students. New York: Oxford University Press.
Thompson, S. J., Johnstone, C. J., & Thurlow, M. L. (2002). Universal design applied to large-scale assessments (Synthesis Report No. 44). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved from University of Minnesota website: http://education.umn.edu/ NCEO/OnlinePubs/Synthesis44.html Traxler, C. (2000). The Stanford Achievement Test, ninth edition: National norming and performance standards for deaf and hard of hearing students. Journal of Deaf Studies and Deaf Education, 5(4), 337348. U.S. Department of Education (2007a). Modified academic achievement standards: Nonregulatory guidance. Washington, DC: Author. U.S. Department of Education (2007b). Stan-
dards and assessments peer review guidance. Washington, DC: Author. Weigert, S. (2009). Perspectives on the current state of alternate assessments based on modified academic achievement standards: Commentary on Peabody Journal of Education special issue. Peabody Journal of Education, 84, 585594. Yoshinaga-Itano, C., & Gravel, J. (2001). The evidence for universal newborn hearing screening. American Journal of Audiology,10(2), 6264. Zigmond, N., & Kloo, A. (2009). The two percent students: Considerations and consequences of eligibility decisions. Peabody Journal of Education: Issues of Leadership, Policy, and Organization, 84(4), 478495.
265
Mathematics
3 Marcy bought 6 apples priced at $0.35 each. She used a coupon worth $0.50 off the total cost. Which number sentence can be used to find how much money Marcy needed in order to buy the apples? A B C D (6 0.35) 0.50 = 1.60 (6 + 0.35) + 0.50 = 6.85 (6 0.35) + 0.50 = 6.15 (6 0.50) 0.35 = 2.65
Why did scientists need to study penguins in the Antarctic before building the Penguin Encounter? A The scientists were afraid the penguins homes would be destroyed before the Penguin Encounter was finished. B The scientists wanted to make the Penguin Encounter as much like the penguins natural home as they could. C The scientists wanted to see how the penguins reacted to global warming before taking them to sunny California. D The scientists knew it would take many years to capture several hundred penguins for the Penguin Encounter.
266
PART 1: VOCABULARY Description: uncommon usage, nonliteral usage, manipulation of lexical forms List of uncommon words here: -----------------------------------# of uncommon words Score -----------------------------------0 ------------------------------------ 0 12 Vocabulary score 34 5+ PART 2: SYNTAX Description: Atypical parts of speech, uncommon syntactic structures, complex syntax, academic syntactic form Number of long nominals? 1 2 3
------------------------------------------------------------------------------------------------ Does the passage use passive voice? (no = 0, yes = 1) ------------------------------------------------------------------------------------------------ Number of conditional clauses? ------------------------------------------------------------------------------------------------ Number of relative clauses? ------------------------------------------------------------------------------------------------ Number of complex questions? -----------------------------------------------------------------------------------------------Raw syntax score Syntax score 0 1 2 3+ 0 1 2 3 Syntax score Total raw syntax score
267
TEST ITEM LINGUISTIC COMPLEXITY PART 3: DISCOURSE Description: Uncommon genre, need for multiclausal processing, academic language. Consider whether students are required to synthesize information across sentences. Is the student required to make clausal connections between concepts and sentences? YES NO ( = 1) ( = 0)
TOTAL SCORE Syntax score Vocabulary score Discourse score = Total score
Total score 0 12
Classification Not linguistically complex Moderately linguistically complex Very linguistically complex
3+ yes
268
Using the items in Appendix A, here is the breakdown of their rating scores with information about how the items were scored (these are raw scores): Category Mathematics item: apples Vocabulary Raw score: Vocabulary score: Syntax sentence 1 1 Passive tense (can be used) Long nominal (coupon worth $.50) Complex question (2nd sentence) Raw score: Syntax score: Discourse Discourse score: Total Linguistic rating
a
Reading item: penguin No uncommon or multiplemeaning words 0 0 Passive tense (would be)
The raters disagreed on this category, and thus the score was averaged. The resulting syntax score is the same with either of the raters scores (3 vs. 4).
269