You are on page 1of 56

Applied Measurement in Education

THE REVISED SAT SCORE AND ITS MARGINAL PREDICTIVE VALIDITY

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo
Journal: Manuscript ID: Manuscript Type: Keywords:

ee

Applied Measurement in Education HAME-2012-0095 Empirical Article Predictive Validity, SAT, College Admissions, Revised SAT

rR

ev

ie w On ly

Page 1 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT This papers explores the predictive validity of the Revised SAT (R-SAT) score as an alternative to the student SAT score. Freedle proposed this score for students who may potentially be harmed by the relationship between item difficulty and ethnic DIF observed in the test they took in order to apply to college. The R-SAT score is defined as the score minority student would have received if only the hardest questions from the test had been considered and was computed using formula score and an inverse regression approach. Predictive validity of short and long-term academic outcomes is considered as well as the potential effect on the overprediction and underprediction of grades among minorities. The predictive power of the RSAT score was compared to the predictive capacity of the SAT score and to the predictive capacity of alternative Item Response Theory (IRT) ability estimates based on models that explicitly considered DIF and/or were based on the hardest test questions. We found no evidence of incremental validity in favor of the R-SAT score nor of the IRT ability estimates.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

w On ly
1

Applied Measurement in Education

Page 2 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity THE REVISED SAT SCORE AND EXPLORING ITS RELATIVE PREDICTIVE VALIDITY

Introduction One way that admission examinations are judged is by how well they are able to predict college outcomes. Predictive validity studies analyze the degree of association between admissions test scores and college outcomes, such as college grades and graduation rates. Academic outcomes are relatively easy to collect and are also related to the behavior that tests like the SAT are expected to predict, success in college. Some studies have also addressed the prediction of nonacademic outcomes such as earnings, leadership, job satisfaction, satisfaction with life and civic participation (Bowen & Bock, 1998; Allen, Robbins & Sawyer, 2010; Oswald, Schmitt, Kim, Ramsay & Gillespie, 2004; Willingham, 1985). In this study, we examine a measure of academic preparedness that has been proposed to complement the SAT. This measure, the Revised-SAT or R-SAT, was proposed by Roy Freedle (2003) with the goal of correcting the unfairness of SAT results for minorities he found through his application of the Standardization method for DIF (Dorans & Kulick, 1983, 1986; Dorans & Holland, 1992). The R-SAT was proposed as a score based on a subset of the SAT questions. We will judge its success using the result of predictive validity analyses of short and long-term outcomes.

This article is divided into five sections. The first section summarizes previous research on the prediction of college outcomes. The research question for this investigation, the data sources and methods are presented in the next 3 sections. Lastly, the results section presents the findings obtained when calculating the revised SAT score, and using it to predict academic

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

Page 3 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity outcomes. The predictive capacity of the R-SAT score will be compared to that of the original SAT score and three Item Response Theory (IRT) versions of the SAT score. Prior Research on Prediction of College Outcomes There is a substantial body of research on the validity of multiple variables to predict college outcomes in a wide range of dimensions: education, employment and social outcomes. This section presents a brief overview of this literature, with a particular focus on the role of high school grades and standardized test scores in the prediction of (i) college grades and (ii) graduation rates. Although these outcome indicators offer only a partial portrayal of students educational achievement, the convenience of their collection and updating process makes them the outcome most commonly used in predictive validity studies. A subsequent section describes recent studies examining the prediction of non academic college outcomes and the role of noncognitive predictors.

Freedle proposed computing a new score based on the hardest questions of the most widely taken standardized test in the US in order to compensate for the potentially unfair results of minority students he found when analyzing differential item functioning and its relationship to item difficulty. Details on the calculation of the R-SAT and Freedles expectations of this index as well as the criticisms made to it are presented in this section as well. College Grade Point Average

The relationship between high school grade point average, SAT scores and freshmen grade point average has been widely examined by researchers at the College Board and research units within higher education institutions (Ramist, Lewis, & McCamley-Jenkins, 1994; Geiser & Studley, 2004). In general College Board studies find that SAT scores make a substantial contribution to predicting cumulative college GPAs and that the combination of SAT scores and

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On ly
3

Applied Measurement in Education

Page 4 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity high school records provide better predictions than either grades or test scores alone (Burton & Ramist, 2001; Hezlett, Kuncel, Vey, Ahart, Ones, Campbell & Camara 2001). College Board researchers have studied the validity of the SAT mostly using correlational analysis and have taken into consideration the technical issues of range restriction, differences in grading across colleges and unreliability of college grades to measure success in college (Camara & Echternacht, 2000; Willingham, Lewis, Morgan & Ramist, 1990).1 Typical correlations between first-year grades and the SAT I (Verbal and Math scores combined) range between 0.3 and 0.6 depending on the characteristics of the studies with an average of 0.4 (Ramist, Lewis & McCamley-Jenkins, 1994; Zwick, 2002). Bridgman, Pollack and Burton (2004) for example, report a correlation between freshman grades and the SAT I score composite of 0.55, while the SAT Verbal test score has a correlation of 0.50 with freshman grades, the SAT Math correlates 0.52.2 On average the measurement error of the SAT I Math and Verbal sections is 30 points and the correlation with the outcome criterion tends to be less strong when measurement error is considered (Zwick, 2002). Standardized tests allow all applicants the opportunity to perform in an environment with the same testing conditions, instructions and time-constraints, opportunities to ask questions and procedures for scoring. Standardized test scores permit one to compare among students who come from different schools in which grading standards can vary significantly. Zwick (2002), aware that SAT scores add little predictive power to high school grades, justifies the use of the standardized test scores in admissions to large institutions by noting the cost of interviewing
1

Studies that do not adjust for range restriction and variations in grading standards tend to lower the

observed correlations underestimating the predictive power of the indexes used in admissions processes (Camara & Echternacht, 2000).
2

They also report a correlation between high school grades and first-year college grades of 0.58.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

Page 5 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity candidates or reviewing applications in elaborate detail. The cost for the school of collecting and processing the scores is minimal. In addition, standardized test scores help reduce the overprediction of African American college grades observed when purely using high school grades. 3 In 2005 the SAT I was revised in a number of ways (Kobrin, Patterson, Shaw, Mattern & Barbuti, 2008): analogies were removed and replaced with more questions on reading passages and the Verbal section was renamed the Critical Reading section. The Math section now includes items from more advanced courses and does not include quantitative comparison items. In addition, a third test was added including multiple-choice items on grammar and a studentproduced essay. Kobrin et al. report a correlation between test scores and first-year college grades similar to that from previous studies (unadjusted r=0.35, r adjusted for range restriction=0.53), concludes that the new writing test is the most predictive based on bivariate correlations and multiple correlations (unadjusted r=0.36, r adjusted for range restriction=0.51) and encouraged institutions to use both high school GPA and test scores when making admissions decisions since that maximizes pedictibility of first-year college grades (unadjusted

r=0.46, r adjusted for range restriction=0.62).

Relative Predictive Validity of Different Academic Indicators

Previously, Geiser and Studley (2002), from the University of California, analyzed the relative contribution of high school GPA, SAT I and SAT II scores to the prediction of college success and found that SAT II scores were the best single predictor of first year GPA, and that
3

Overprediction means that a groups average predicted first-year grade point average (GPA), is greater than its

average actual first-year GPA. Although this problem is known to be present in the SAT I for African American and Hispanic students, Ramist et al. (1994) finds it even more strongly when only using high school GPA to predict firstyear college grades.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

Applied Measurement in Education

Page 6 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity the SAT I scores added little to the prediction once SAT II scores and high school GPA were already considered.4 After taking the SAT II and high school GPA into consideration, the SAT I scores improved the overall prediction rate by a negligible 0.1% (from 21.0% to 21.1%). The standardized coefficient of the SAT I, after controlling for SAT II and high school GPA, was 0.07, but statistically significant due to the large number of observations used. Geiser & Studley (2002) analyzed a sample of 80,000 freshmen who entered the University of California from fall 1996 to fall 1999 using regression analysis. Their results were confirmed by subsequent findings from College Board researchers (Ramist et al., 2001; Bridgeman, Burton & Cline, 2001; Kobrin, Camara and Milewski, 2002) . For a more detailed review of these articles see Author (Year). Based on the findings from multivariate analyses considering multiple academic predictors of college performance such as the one conducted by Geiser & Studley (2002) the National Center for Fair and Open Testing (FairTest) has stated the SAT I has little value in predicting future college performance (FairTest, 2003) and highlights the better performance of class rank, high school grades and SAT II scores. Others, however, have chosen to advocate for admissions tests that are focus on achievement and that are based on standards and criterion

(Atkinson & Geiser, 2009). Prediction by Ethnic Group and Gender

Notable differences in the validity and predictive accuracy of SAT scores and high school grades by race and sex have been substantiated through numerous studies (Young, 2004). The accuracy of high school grades and SAT scores for predicting freshmen grade point average is higher for women, Asian Americans and White students, and lower for men, African Americans

rP Fo

ee

rR

ev

ie

On

ly

Geiser& Studley (2002) combined three SAT II scores into a single composite variable that weights each SAT II

test equally; they did not analyze the predictive validity of separate test scores.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

Page 7 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity and Hispanics. Furthermore, these admissions variables often overpredict the grades of AfricanAmericans and Hispanic students, and underpredict those of women (Burton & Ramist, 2001). Ramist et al. (1994) report an overprediction of first-year GPA of -0.16 for African American students and of -0.13 for Hispanic students when considering HSGPA and SAT scores. Geiser & Studley (2002), on the other hand, found no significant over-prediction for African Americans and an average overprediction of -0.04 for Hispanic students when including high school GPA and SAT I scores in the regression equation. Zwick, Brown & Sklar (2004) conducted the same type of analyses for each of the University of California campuses and for two merged-cohorts (1996-1997 and 1998-1999). Their results vary significantly by the campus and merged-cohort analyzed but were interpreted by the authors as supporting previous findings from the literature.5 There are a number of theories about the reasons for over and underprediction, for details see Zwick, Brown, & Sklar (2004), Zwick (2002) and Steele and Aronson (1998) . More recently researchers have looked at the differential prediction of test scores and high school grades among students from different language background (Zwick & Schemler, 2004; Zwick & Sklar, 2005) and from schools with different financial and teaching resources (Zwick & Himelfarb, 2011) as a way to investigate possible explanations to the issue of over and underprediciton. Results show a reduction of prediction error for Hispanics and African American students but not a complete elimination (from -0.15 to -0.08 and from -0.13 to -0.03 respectively) when using the second approach but no change when considering first language:
5

Zwick, Brown & Sklar (2004) observed no significant differences in overprediction of minorities if the SAT IIs

were considered instead of the SAT I and whether income and parental education were included in the regression equation. Geiser & Studley (2002) also reported no practical change in the overprediction of minority groups when examining the predictive power of using SAT II scores instead of SAT I scores but the underprediction of African American grows to 0.03 and the overprediction for Hispanics grows to 0.08.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

Applied Measurement in Education

Page 8 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity overprediction is still observed for African American and Hispanic students when considering first language. Relative Predictive Validity Using Multivariate Regression Analysis and Considering Sociodemographic Variables Parental income and education play a modest role in the prediction of college performance when controlling for additional academic indicators such as high school grades and standardized tests. Geiser & Studley (2002) for example reported standardized coefficients that ranged between 0.03-0.04 and 0.05-0.06 respectively.6 The modest standardized coefficients associated to parental income and education was also reported by Bowen & Bok (1998) when using multivariate regression analysis to predict college performance.7 The consideration of sociodemographic variables in the predictive validity regression equation however is based on the results of Rothstein (2004) who find that most of the SAT predictive power comes from the correlation with unobserved variables such as high school sociodemographic variables.8 Rothsteins estimates show that the predictive contribution of the SAT I score is 60% lower than would be indicated by traditional methods.

rP Fo

ee

rR

ev

ie

The R2 of the regression equation that included high school GPA, SAT I and SAT II scores increased from 22.3 to

On

22.8 when considering parental income and education.


7

Performance in college was measured as percentile rank based on cumulative GPA of the entering cohort, rather

than freshmen grade point average, as a way to avoid school and major differences in grading philosophies and practices (pages 72 to 76). The book also looks at differences in economic outcomes (such as employment, wage and job satisfaction) and social outcomes (such as civic contribution, marital status and satisfaction with quality of life).
8

The student-level variables he included are individual race and gender. The demographic make up of the school

was described by the fraction of students who were Black, Hispanics and Asian; the fraction of students receiving subsidized lunches; and the average education of students parents.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

ly

Page 9 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Controversy mounted between Geiser & Studley (2002) and Zwick and her colleagues (Zwick et al., 2004; Zwick & Green, 2007) around the issue of whether the SAT I or the SAT IIs were more sensitive to sociodemographic characteristics. This argument fueled the discussion that prompted modifications made to the SAT I in 2005 much in line with those suggested by the University of California (Atkinson & Pelfrey, 2004). The sensitivity of scores from different test type to socioedemographic characteristics has also been prominent in the discussion of whether general aptitude tests (like the SAT I) or curriculum-based test (more like the SAT IIs) should be used for college admissions (Atkinson & Geiser, 2009). For more details about the controversy see Author (Year, pp 107-109).

The ultimate goal of post-secondary education is college graduation, still this goal is elusive. According to Baum & Ma (2007) people with a college degree earned, on average, 62 percent more than individuals with only a high school diploma in 2005. According to the National Educational Longitudinal Study (NELS)9, 59% of those who started college earned bachellors degrees by age 26 (Bowen, Chingos & McPherson, 2009). The National Center for Higher Education Management Systems (NCES, IPEDS, 2007) reports that only 77.4 percent of first-time, full-time students attending a four-year institution returned to that institution for their second year of college in 2005 (this information excludes students who transfer to another institution). Studies typically find that woman are slightly more likely to graduate from college than men and that African Americans, Hispanics and Native Americans have a lower rate of graduation than White students (Astin, Tsui & Avalos, 1996; Bowen and Bok, 1998). In general, studies exploring the role of SAT scores and high school grades in college persistence and college graduation find a moderate relationship between these college outcomes
9

NELS surveyed students who were in eigth grade in 1988, most of whom graduated from high school in 1992.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

College Graduation

rR

ev

ie

On

ly

Applied Measurement in Education

Page 10 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity and preadmission measures (Astin et al., 1996; Burton & Ramist, 2001; Mattern & Patterson, 2009, 2011a, 2011b). Although the traditional variables included in the multivariate regression models explain a small proportion of the variance, Author (Year) found high school grades to be the strongest predictor of college persistence, followed by the SAT II Writing scores. The importance of high school grades was corroborated by Zwick & Sklar (2005). Sociodemographic variables play a minor role in explaining college persistence and graduation (Authot, Year) nevertheless Bowen & Bok (1998) found these variables to be more important in the college prediction for African American students than for White students. The lower correlation between college persistence and preadmission characteristics is to be expected since persistence in college and ultimate graduation are more substantially influenced by nonacademic factors than college GPA. Some of the variables that research has identified as playing an importante role in determining persistence are finances, motivation, social adjustment, family and health problems, institutions selectivity and size (Reason, 2009; Bowen, Chingos & McPherson, 2009).10

Recently a number of studies have looked into the importance of non academic variables to predict college success. These studies have claimed for the expansion of the definition of college success to include longer-term outcomes such as persistence and graduation, as well as less-researched outcomes such as leadership and civic participation, and have stressed the importance of nonacademic predictors (Camara & Kimmel, 2005; Robbins, Lauver, Le, Davis & Langley, 2004; Sternberg 1999, 2003; Kyllonen, 2008). Doing so allows to predict college success more broadly and avoid relying exclusively on cognitive criteria and predictors. This in
10

Wilsons (1983) observes that the best predictor of college graduation are persistence to sophomore year and firstyear GPA. This information is closest in time and in content to what is being predicted, and it is not available at admission.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

Non Academic Predictors of College Success

rR

ev

ie

On

ly

10

Page 11 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity light of universities broader missions including social and personal outcomes for their students and the reduced adverse impact that the consideration may have in the admission of traditional minority students (Oswald, Schmitt, Kim, Ramsay & Gillespie, 2004; Breland, Maxey, Gernard, Cumming & Trapani, 2001). The admissions decisions consider different dimensions of the applicant depending on the institutional mission and philosophy (Perfetto, 1999). Sinha, Oswald, Imus & Schmitt (2011) show that the adverse impact of admissions decisions can be reduced if colleges use a battery of cognitive and non-cognitive predictors that are weighted according to the values institutional stakeholders place on an expanded performance criterion of students success.

Previous studies that looked into nonacademic measures of success (Bowen & Bok, 1998; Willingham, 1985) showed that the traditional academic predictors such as test scores and high school records, have moderate to no relationship to nonacademic success. Sinha, Oswald, Imus & Schmitt (2011) confirmed the same type of results: SAT/ACT scores and High School GPA were more strongly correlated with college GPA than with non cognitive attributes.11 The Revised-SAT

Freedle proposed a methodology to correct the unfairness generated by the relationship he observed between item difficulty and differential item functioning in the SAT and known as the Freedle phenomenon: he observed that harder items showed DIF in favor of minority students while easier items showed DIF in favor of White students (Freedle, 2003).12 The

rP Fo

ee

rR

ev

ie

On

ly

Allen, Robbins & Swayer (2010), however, claim that noncognitive indicators and psychosocial factors can increase the marginal prediction of academic college outcomes beyond what is already explained by traditional predictors. 12 Differential item functioning (DIF) studies refer to how items function after differences in score distributions

11

between groups have been statistically removed. The remaining differences indicate that the items function differently for both groups. Typically, the groups examined are derived from classifications such as gender, race, ethnicity, or socioeconomic status. The performance of the group of interest (focal group) on a given test item is compared to that of a reference or comparison group. White examinees are often used as the reference group, while minority students are often the focal groups (Holland & Wainer, 1993).

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

11

Applied Measurement in Education

Page 12 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity proposed methodology focused on how students perform on the hard half of the SAT test and is called the Revised-SAT or R-SAT (Freedle, 2003). According to Freedle, the R-SAT would increase the SAT verbal scores by as much as 200 to 300 points for individual minority testtakers, it would reduce the mean score differences between White and minority test-takers by a third and would produce a score that is a better indicator of the academic abilities of minority

rP Fo

students.

Freedle, citing the work from Diaz-Guerrero & Szalay (1991), interprets the difference between a students R-SAT and his/her regular SAT score as a measure of the degree to which the examinees cultural background diverges from White, middle class culture. In his paper, Freedle recommends exploring the validity of the R-SAT index by comparing the correlation between the observed R-SAT index and college grades to that observed between the SAT score and college grades and by looking at how many admissions decisions would change if we assume that SAT or R-SAT scores over 600 indicate students qualified for college.13 Freedle was strongly criticized by the College Board (Camara & Sathy, 2004), Dorans (2004) and Dorans and Zeller (2004a, 2004b). Some of the criticisms concerned the method used to study differential item functioning (standardization approach) and the way Freedle implemented it. Those criticisms were addressed by Author (Year) and results partially replicated Freedles findings when correctly implementing the standardization approach. However, the relationship between item difficulty and DIF was present only in the SAT Verbal test and only for African American students (Author, Year). When considering IRT methods to

ee

rR

ev

ie

On

ly

Freedle recognizes that predictive validity analyses will necessarily be limited because many people who did not attend selective colleges might have matriculated at such schools if their R-SAT scores had been used in the admission process, but nevertheless considers it relevant to examine the implications of using the measure he proposed.

13

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

12

Page 13 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity study DIF and to model guessing Freedles findings were also observed for Hispanic students (Author, Year). Dorans (2004) and Dorans and Zeller (2004a) also criticized the methods Freedle used for calculating the necessary components of the R-SAT: the use of proportion correct rather than formula score, his consideration of different (ethnic) samples for the half-test and his application of inverse regression. Furthermore, Dorans & Zeller (2004b) explored the fairness of Freedles R-SAT using Score Equity Assessment (SEA), a new methodology presented as a complement to the existing procedures for fairness assessment, namely DIF analysis and differential prediction. Using SEA Dorans and Zeller (2004b) found that the half-test to total test linking may be population-dependent and therefore the scores produced on the hard-half test cannot be used interchangeably with scores produced on the full-length SAT verbal test. For a more comprehensive review of the criticisms Dorans (2004) and Dorans & Zeller (2004b) posed, see

Author (Year, pp. 113-114).

The current paper provides evidence regarding the predictive validity of the R-SAT and aims to explore the validity of Freedles measure using multivariate regression models. These models allow the exploration of the predictive validity of the R-SAT while controlling for the effect of other relevant measures influencing the academic outcomes achieved by students in college. The investigation starts by calculating the revised SAT score using Freedles methodology while considering the methodological criticisms made by Dorans & Zeller to the way Freedle calculated the necessary components of the R-SAT (Dorans, 2004; Dorans & Zeller, 2004a). Once the R-SAT is calculated, we examined how beneficial it was for minority students

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

Research Questions

ev

ie

On

ly

13

Applied Measurement in Education

Page 14 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity and how it fared in terms of predictive validity of college outcomes in comparison to the original SAT score. The predictive power of the R-SAT was also compared to the predictive capacity of alternative Item Response Theory (IRT) ability estimates. IRT methods consider students ability to be a latent variable to be inferred from the data and due to the invariance property these estimates are not dependant on the set of test items under analysis.14 The model used in this research, the Rasch model (Wu, Adams & Wilson, 1998), provides examinees ability estimates that are a direct transformation of the sum of correct responses and allowed us to include a parameter to consider DIF in the estimation of examinees ability. The predictive power of the RSAT score and original SAT score will be compared to the predictive capacity of ability estimates from the Rasch model and the Rasch DIF model (Paek, 2002).15

Since the analyses requires information about the students SAT test scores and college experience, the information was drawn from two primary sources: the University of California

Corporate Data System and the College Board.

The College Board datafiles contained item level performance, students individual scores as well as students responses to the Student Data Questionnaire (forty-three questions), including self-reported demographic and academic information such as parents education, family income, and high school grade point average. The University of California Corporate Student Information System provides systemwide admissions and performance data. Through their applications to UC, students provide academic
14

Note that the invariance property holds only when the models used hold. This is tested using fit statistics. For more details about this model and its aplication to the Freedle phenomenon see Author (Year) (2012).

rP Fo

ee

rR

Data Sources

ev

ie

On

ly

15

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

14

Page 15 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity and demographic information that is subsequently verified and standardized. For those students who enroll at UC, this information is complemented with their academic history including college grades, number of courses and number of units completed, persistence and graduation. Information about parental education level and family income is also provided. Information from the College Board and UC system was complemented with an indicator of school performance on a state standardized test (Academic Performance Index) from the California Department of Education. This study was conducted using the subset of examinees from the College Board file who were juniors, came from California public high schools, took the SAT forms DX and QI in 1994 or SAT forms IZ and VD in 1999, spoke English as their best language and applied and enrolled at the University of California. Only UC eligible students are admitted and are allowed to the University of California. Although at the time there were several routes to become UC eligible, most students became eligible through the statewide eligibility path. This path required students to complete a certain number of courses by subject area and to achieve a certain test score depending on their high school grades. In general, the UC eligibility criteria were set with the ultimate goal to identify the top 12.5% high school graduates who, according to the California Master Plan of Higher Education, should be considered for the University of California. As result of the eligibility criteria and of enrollment decisions, the sample used has a higher mean SAT score, higher high school grade point average, higher family income and parents education than the College Boad sample of all high shool juniors from California public high schools who took SAT forms DX and QI in 1994 and SAT forms IZ and VD in 1999 (see Table 1). The difference in academic and demographic characteristics does not change the phenomenon originally described by Freedle and studied by Author (Year). The relationship

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

15

Applied Measurement in Education

Page 16 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity between item difficulty and DIF estimates is still observed among high and low ability students when using the Rasch model to study DIF (Author, Year).16 INSERT TABLE 1 HERE

Methods This section presents the details of how the R-SAT score was calculated, how one IRT version of the original SAT score and two IRT versions of the R-SAT were estimated, and how the relative predictive validity of these scores and ability estimates was assessed. Since a previous study found stronger evidence of the relationship between DIF estimates and item difficulty in the Verbal test than in the quantitative test (Author, Year), the analyses presented in this paper focus exclusively on the Verbal test.

Calculation of the Revised SAT score and Estimation of IRT Ability Parameters The R-SAT scores were calculated and the IRT ability estimates were estimated for the specific SAT form and ethnic subgroup where previous studies (Author, Year) showed evidence of a relationship between DIF and item difficulty estimates as defined by the standardization method (Dorans & Kulick, 1983) and/or the Item Response Theory approach to DIF (Camilli & Shepard, 1994). Table 2 presents a summary of the results obtained when using these two methodologies across forms and ethnic groups. Thus, R-SAT was calculated and ability

rP Fo

ee

rR

ev

ie

On

ly

16

The Freedle phenomenon was analyzed among high and low ability students and ability was defined as having a

high SAT score. The Freedle phenomenon was not analyzed among enrolled and non-enrolled students as this categorization is not exclusively based on ability but it is also determined by financial considerations and personal preferences. In addition, the sample size would have been extremely small for minority students. See Author (Year) for more details.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

16

Page 17 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity estimates were estimated for African Americans in forms IZ, QI and DX and for Hispanics in forms IZ and VD.

INSERT TABLE 2 HERE The R-SAT was obtained by calculating the corresponding formula score17 in the hardest half of the test for all students who took each test form and then assigning African American/Hispanic students the total score obtained by White students who performed similarly in the hard half of that specific test form. Specifically, in order to obtain the revised score that African American/Hispanic students should have gotten, a linear regression was estimated only among the White students who took each form. The linear regression was used to predict their SAT scores using the formula score obtained in the hard half of the test. A constant and a slope coefficient were estimated and subsequently those parameter estimates were applied to the formula score obtained by African American/Hispanic students in the hard half of the test.18 Although the R-SAT was calculated incorporating Dorans and Zellers recommendations regarding the use of formula scores rather than the original proportion correct scores (Dorans, 2004; Dorans & Zeller, 2004a), the methodology employed to obtain the R-SAT is still subject to criticism for the use of inverse regression and combining results from different ethnic groups (Dorans & Zeller, 2004a, 2004b).

Hence, in addition to the R-SAT, ability estimates using IRT methodology were also obtained. Initially the Rasch and Rasch DIF models (Adams, Wilson & Wang, 1997; Moore, 1996) were estimated in each form and ethnic group for which there was evidence of the Freedle
Formula scoring adjusts scores for the possibility of random guessing (Frary, 1988; Rogers, 1999). This methodology, originally used by Freedle (2003), allowed expressing the number of correct responses (adjusted by random guessing) in a score that ranged from 200 to 800 just as the regular SAT Verbal score. The scores of White students are used as the reference because they have been considered the reference group in all DIF analyses.
18 17

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

17

Applied Measurement in Education

Page 18 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity phenomenon, using all students from California public high schools who took the form (see Table 3). These models were estimated using ConQuest (Wu, Adams & Wilson, 1998); the Rasch Model Ability Estimate and Rasch DIF Model Ability Estimates were obtained respectively. In addition, an IRT version of Freedles revised SAT was estimated by considering only the hardest half of the items in each test form (Hard Half Ability Estimate using the Rasch Model). 19,20 In total, three IRT ability estimates were obtained for each African American or Hispanic student.

While the ability estimates obtained from the Rasch model are a direct (but non-linear) transformation of the sum of correct responses, they differ from the original SAT score in that the IRT ability estimates consider guessing by using formula score. The ability estimates obtained from the Rasch DIF model directly incorporates a parameter for DIF, and therefore explicitly considers the phenomenon Freedle described in the ability estimation. The third IRT ability estimate, obtained from estimating the Rasch model in only the hard half of the test, attempts to adjust the ability estimate for the phenomenon described by Freedle following exactly the same logic behind the methodology he proposed, but using IRT methods instead. Since each of these models is directly estimated for a specific ethnic group comparison, the ability estimates generated are not subject to the concerns expressed by Dorans and Zeller (Dorans, 2004; Dorans & Zeller, 2004a; 2004b) regarding the use of inverse regression and aggregation of estimates from different ethnic groups. Although IRT scaling tends to produce ability estimates that are linearly related to the underlying ability measured, they may be more useful than aggregated scores when examining the linear relationship between test scores and

rP Fo

ee

rR

ev

ie

On

ly

19

See Author (Year, Appendix 1) for the model fit statistics for the Rasch, Rasch DIF and Hard Half models. The item difficulty estimates from the original Rasch DIF model were used to define the hardest half of the items.

20

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

18

Page 19 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity external variables (e.g., outcome measures) because IRT ability estimates are less subject to the ceiling and/or floor effects observed in aggregated scores (Thissen & Orlando, 2001; Xu & Stone, 2011).

Predictive Validity Analyses The predictive power of the regular SAT verbal score, the R-SAT score and the three IRT ability estimates were compared for African American, Hispanic and Whites students. Linear regression was used for GPA prediction and logistic regression was used for the prediction of graduation because UC GPA is a continuous numerical variable and graduation is a dichotomous outcome variable.21 The ordinary least squares method was used for estimating linear regressions and the maximum likelihood technique was implemented for the estimation of logistic regression. The college outcomes examined were the first through fourth year annual UC GPA, cumulative fourth year UCGPA and whether students graduated by their fourth year at UC. The academic outcomes included in this study are of particular interest because they are not limited to grade point averages and they span over four years of the college career of students taking the SAT in 1994 and 1999. Most research in this area has been limited to examining the predictive validity of standardized test scores and high school grades in short-term academic outcomes, specially grades.

The analyses controlled for academic and sociodemographic variables found to be significant in previous college prediction research (Geiser & Studley, 2002; Author, Year; Rothstein, 2004; Zwick et al., 2004). The sociodemographic variables included parents

rP Fo

ee

rR

ev

ie

On

ly

Although Bridgeman, Pollack and Burton (2004) find evidence suggesting a potential non-linear relationship between college grades and test scores, Rothstein (2004) does not find evidence along this line. Exploratory analyses conducted in this research sample did not provide evidence to support a non-linear relationshop between first-year college grades and SAT scores.

21

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

19

Applied Measurement in Education

Page 20 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity education and income level from the UC systemwide admissions and performance data. The academic variables included a weighted high school GPA, calculated with up to eight honorslevel courses, the SAT Math score22 and the school academic performance index expressed as quintile ranks for students who took the SAT in 1999. The school academic performance index information was not available for the students who took the SAT in 1994 because the index was calculated for the first time in 1998.23 Equations 1, 2 and 3 show the general regression equation models for the prediction of annual UC GPA, cumulative fourth-year UC GPA and fourth-year UC graduation respectively.
UCGPAi = 1 + 2 APIQ + 3 Educ + 4 Inc + 5 HSGPA + 6 SATM + 7 Zi CUMUCGPA4 = 1 + 2 APIQ + 3 Educ + 4 Inc + 5 HSGPA + 6 SATM + 7 Zi (1) (2)

LOGIT (GRAD4 ) = 1 + 2 APIQ + 3 Educ + 4 Inc + 5 HSGPA + 6 SATM + 7 Z i

where

UCGPAi is the grade point average that a student had in year i of college, where i ranges

between 1 and 4;

CUMUCGPA4 refers to the cumulative grade point average at the fourth college year; GRAD4 is a binary variable indicating graduation by the fourth year of college, where 1 indicates a student has graduated, 0 indicates a student who has not graduated; APIQ refers to the ranking of the school in the California Academic Performance Index; Educ is the maximum number years of education achieved by the parents as reported in

the UC application;

Inc refers to the family income reported in the UC application (expressed in dollars) as reported in the UC application;
22

Different ability/scores for the Verbal section were also included and the explanation is included below. Regression models excluding API rank as explanatory variables are included in Author (Year, Appendix 5).

rP Fo

ee

(3)

rR

ev

ie

On

ly

23

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

20

Page 21 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity HIGH SCHOOL GPA is the weighted GPA considering up to eight honors-level courses; SAT M is the original score obtained in the SAT Math test ; and Zi refers to different indices of verbal ability. For each of the three regression models there were five versions which differed in the verbal ability index included. In the first version of each model (models 1.1, 2.1 and 3.1 in the tables) the verbal ability index is the SAT Verbal score. The second version of each model uses the original SAT score for White students and the highest score between the revised SAT Verbal score and the original SAT score for minority students (models 1.2, 2.2 and 3.2 in the tables). The third and fourth versions of the models include the Verbal ability estimates from the Rasch (models 1.3, 2.3 and 3.3 in the tables) and Rasch DIF model respectively (models 1.4, 2.4 and 3.4 in the tables). Lastly, the fifth version of the models consider the Verbal ability estimate obtained from estimating the Rasch model using only the hardest half of the Verbal items (models 1.5, 2.5 and 3.5 in the tables). The model presented in the text includes only SAT I Verbal and SAT I Math scores as explanatory variables, and not SAT II scores, as most higher education institutions require only the SAT I (or ACT) exam and results from these models will be more generalizable to other institutions. Regressions including SAT II test scores as explanatory variables are included in Author (Year, Appendix 4) and do not offer stronger evidence in support of the R-SAT Verbal test score.

The analyses could not control for the effect of discipline or campus on the dependent variable due to the small sample size of minority groups (Brown & Zwick, 2006). Sample size also, as those from most limited our ability to properly model the within and between school variation in high-school GPA and API quintile (Zwick & Green, 2007). In addition, it is important to note that as most predictive validity studies, conclusions from this research are

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

21

Applied Measurement in Education

Page 22 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity necessarily limited because many people who did not attend selective colleges might have matriculated at such schools if their R-SAT Verbal scores had been used in the admission process. The analyses compared the explained variance as well as the size and statistical significance of the standardized coefficients across models. The explained variance was measured by the adjusted R2 statistic (Singer & Willett, 2003), an alternative to the R2, which considers the number of variables included in the model. The adjusted R2 statistic is presented below.

Adj R2 = 1-[(n-1)/(n-p)] (1- R2) where n is the sample size

p refers to the number of parameters in the model In logistic regression there is no precise counterpart to the R2 or adjusted R2 used in linear regression. Several measures of goodness of fit have been proposed and Nagelkerkes
~ maximum-rescaled R2 or R 2 is used here. The statistic, given below, can achieve a maximum

rP Fo
L ( 0) 2 / n } . ) L(

ee rR ev ie w On ly
22

value of 1:

R2 ~ R2 = 2 R max where

R2 = 1{

R2 achieves a maximum of less than 1 for discrete models, where the maximum is given by Rmax = 1 {L(0)}2 / n , L ( 0) is the likelihood of the intercept-only model,
2

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

Page 23 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity


) L( is the likelihood of the specified model, and

n is the sample size. Standardized regression coefficients, or beta weights, show the relative strength of different predictor variables within a regression equation; the weights represent the number of standard deviations that an outcome variable changes for each one standard deviation change in any given predictor variable, all other variables held constant. A standardized regression coefficient is computed by dividing a parameter estimate by the ratio of the sample standard deviation of the dependent variable to the sample standard deviation of the regressor.

This section presents the results of this research in three parts. The first two sections refer to the calculation of the R-SAT and its predictive validity compared to the SAT, including its performance on the issue of over or underprediction. The third section offers the predictive validity findings related to the IRT ability estimates.

Table 3 shows the number of students from California public high schools who originally took each test form and for whom the adjusted scores were calculated. The adjusted scores were calculated for a total of 3,922 Hispanic examinees and 2,234 African American examinees.

The R-SAT Verbal score mean is higher than the original mean SAT Verbal score in all ethnic groups and test forms (see Author (Year, Appendix 2) for details). On average, the R-SAT

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

Results

Freedles Revised SAT Verbal Score

INSERT TABLE 3 HERE

rR

ev

ie

On

ly
23

Applied Measurement in Education

Page 24 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Verbal score increases the mean performance of African American students from 382.5 to 407 (6.4%) and the mean performance of Hispanic students from 471.6 to 484.0 (2.6%). Table 4 shows results that display greater detail about whether and how the R-SAT Verbal score benefits minority students. Note that the bottom 3 rows represent the students who benefit from the use of the R-SAT Verbal score. We observe that 68% (a total of 1,537 over 2,234) of African American examinees improve their scores when the R-SAT Verbal score is considered in place of the SAT Verbal score. The same occurs for 58% (a total of 2,271 over 3,922) of the Hispanic sample. In addition, the R-SAT Verbal tends to benefit students in the low end of the original SAT Verbal score distribution. While most examinees increase their scores by between 0 and 50 points, the increment reaches as high as 202 points in a number of cases. On average, however, the score increase is not as large as Freedle described it to be. INSERT TABLE 4 HERE

In order to assess the impact of the revised SAT score in the admissions decisions of minority students, Freedle estimated and compared the number of African American students who would be offered admission at competitive colleges when considering each score. Freedle hypothesized that receiving an R-SAT score of at least 600 would be sufficiently meritorious to interest many colleges in an applicant who received such a score.24 He found that by considering the revised SAT score instead of the original SAT score the number of African Americans scoring over 600 in two of the forms he analyzed increased from 166 to 235 (Form 4I) and from

rP Fo

ee

rR

ev

ie

On

ly

Freedle chose to consider an SAT score of 600 or above as meritorious because students whose high school grade point average is between the 97 and 100 percentile receive an average SAT verbal score of 610 and, in addition, a score of 600 also reflects a level of test performance that only about 5 percent of the test-taking population receives, using the normal SAT scoring procedures (Freedle, 2003).

24

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

24

Page 25 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity 117 to 167 (Form OB023) which was equivalent to an increase in admission to selective colleges by 342 percent and by 334 respectively. The analyses reported here show an effect in the same direction Freedle described, however, the impact in the number of African American students whose admissions are likely to have changed is more modest. When using the maximum of the SAT and the R-SAT Verbal scores, the number of African American students scoring over 600 increases from 79 to 86. This represents an increase of 8.9% over the original number of African American students in the sample scoring over 600 (see Table 5) or an increase from 3.5% of all African Americans to 3.8% . When considering both African American and Hispanic students, the number of students scoring over 600 increases from 458 (7.4% of all minority students) to 516 (8.3% of all minority students), which is equivalent to an increase of 12.6%. Overall, 7.4% of minority examinees score over 600. In comparison, 3,889 White students, or 19.7% of all White examinees, score 600 or above and received an average score of 653.

The consideration of a different cut-off score would only result in significant benefit for minorities if it was drastically reduced. More than 60% of the African American and Hispanic students considered in this analysis would receive an R-SAT Verbal score below 450 therefore only a cut-off score around or below this level would result in a different admission decision. This drastic reduction in score level, however, does not seem consistent with the assumption of being admitted to highly competitive colleges.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

INSERT TABLE 5 HERE

rR

ev

ie w On ly
25

Applied Measurement in Education

Page 26 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity The analyses presented in Table 5 regarding the impact of Freedles R-SAT in admissions decisions and subsequent analyses looking at the R-SATs predictive validity consider the maximum score between the SAT Verbal score and R-SAT Verbal score for minority students, and not just the revised SAT score. This is done in consideration of Freedles own recommendations: the solution is to recognize that this is a pervasive phenomena that can be easily remedied by reporting two scores, the usual SAT and the R-SAT. (Freedle, 2003) Since Freedle recommends reporting both scores and interprets the difference between them as the difference between the White majoritys culture and the cultural background of minority groups, then the consideration of the maximum of the two scores represents the less disadvantageous scenario in which minority groups might compete for admission into selective colleges.

This section presents the results on the predictive capacity of the revised SAT Verbal score. Its capacity to predict short and long term academic outcomes is compared to that of the original SAT Verbal score by ethnic group and academic outcome. It is important to keep in mind that although the results are presented side-by-side for three ethnic groups, the main focus of this investigation was to compare the goodness of fit statistics and parameter estimates within ethnic groups, especially within minority groups. The results for the White students sample are presented as a comparison with minority students results. In order to increase the sample size the R-SAT Verbal score for all SAT forms were combined. This aggregation was possible because the performance in each form was previously

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

Predictive Validity of the Revised SAT Verbal Score

ee

rR

ev

ie

On

ly

26

Page 27 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity scaled by ETS. 25 The aggregation conducted also assumes that the four SAT forms were equated during test development. 26 The inclusion of the school ranking in the model, however, meant that only students taking the 1999 forms (IZ and VD) where included in the analysis.27 Table 6 shows the adjusted R2 for the multivariate models estimated within each ethnic group. The overall predictive power of the models examined varies depending on the academic outcome and ethnic group. In general, the models predict college grades better for White students than for minority students. While the capacity to predict annual college grades for all groups tends to decline over time, the overall prediction of cumulative fourth-year grade point average is unexpectedly high for White and Hispanic students. In addition, and only for White students, the prediction of fourth year graduation is significantly weaker than the prediction of college grades. Interestingly, this is not the case for African American and Hispanic students. The models capacity to predict long term outcomes, such as fourth-year cumulative grade point average and four-year graduation, is surprising considering that these indices are measured four years into the students college career. Long-term outcomes are often assumed to be affected by variables different from those included here, such as financial aid and previous experience in college

(Wilson, 1983; Reason, 2009).

rP Fo

ee

rR

ev

ie

On

25

Scaling refers to a psychometric process conducted to achieve comparability among test score from different test

forms.
26

Equating is a process different from scaling and aims to adjust for differences in difficulty among test forms. For

ly

an introduction to traditional scaling and equating methods please Kolen (1988).


27

The maximum score between the original SAT and the R-SAT Verbal score was used for minority students.

Models using just R-SAT Verbal score and excluding school ranking as explanatory variables are presented in Author (Year, Appendix 5) and result in findings similar to the ones displayed in this section. They and do not provide stronger evidence in favor of the R-SAT score.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

27

Applied Measurement in Education

Page 28 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity INSERT TABLE 6 HERE

In general, the adjusted R2 for Hispanic and White students are consistent with the results reported by similar studies (Author, Year; Author, Year; Geiser & Studley, 2002; Zwick et al., 2004). The power to predict college GPA for African American students, though, it is below what has been reported by other studies and below the power to predict college GPA for the other two ethnic groups and we believe it is in part an artifact of the small sample size. Geiser & Studley (2002), for example, reported R2s closer to 10% for African American students (pp. 15). When predicting graduation, however, the models predict better for African Americans than for White and Hispanic students.

Table 6 shows that the capacity to predict college outcomes using the R-SAT Verbal score is close to, but slightly less, than the predictive power capacity achieved when using the original SAT score. The R-SAT Verbal score predicts better than the original SAT score only in two cases and just for the African American group: 4th year college grade point average and fourth year cumulative grade point average. The difference in predictive power, though, does not seem of large practical significance. It ranges between 0 and 1 percentage point and the

maximum increase in predictive capacity is only 0.59%.

The relatively weaker capacity to predict college outcomes associated with the use of the R-SAT can also be observed in Tables 1, 2 and 3 in Author (Year, Appendix 3) which show the standardized coefficient estimates and their statistical significance (p-values) when predicting first-year UC GPA, cumulative fourth-year UC GPA and fourth-year graduation by ethnic group. They also present the adjusted R2 for each regression and its sample size. In Author (Year,

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

28

Page 29 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Appendix 3) we also discuss the results associated to the other explanatory variables included in the regression models which are similar to the findings from previous literature. Over and Underprediction of Freshmen Grades Freedle suggested that the revised SAT score would help reduce the problem of over and underprediction reported by the literature on predictive validity of college admissions tests (Zwick et al., 2004; Zwick et al., 2002; Ramist et al., 1994; Ramist et al. 2001). The potential improvement of over and underprediction obtained from using the revised SAT score rather than the original SAT score was assessed and the results are presented in this section. Under or overprediction is usually assessed by fitting one general prediction model for college students from all ethnic groups and then summing the regression residuals for a particular ethnic group. In order to have an idea of the average individual over or underprediction the sum of residuals is then divided by the number of students in each ethnic group. In this case, regression models 1.1 and 1.2 were estimated and the average residual by ethnic group compared. All explanatory variables included in these models were described in the previous

rP Fo

ee

rR

ev

ie

section.

1stYRGPA = 1 + 2 APIQ + 3 Educ + 4 Inc + 5 HSGPA + 6 SATM + 7 SATVi


1stYRGPA = 1 + 2 APIQ + 3 Educ + 4 Inc + 5 HSGPA + 6 SATM + 7 Max (SATV _ RSATV )i

Table 7 shows the regression output for regression models 1.1 and 1.2 for all first-year UC students. The results are similar to those presented in Table 6 for White students. This is not

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

(1.1)
(1.2)

On

ly

29

Applied Measurement in Education

Page 30 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity surprising given that White students are the most numerous ethnic group included in the sample.28 We find underprediction of White students grades (0.01) and overprediction of Hispanic (-0.025) and African American students grades (-0.098) when using the SAT, just as previous research did (Ramist et al., 1994, 2001). On average, the overprediction is smaller than the one reported by Ramist et al. (1994) for African American students (-0.16) and larger than the that reported by Geiser & Studley (2002) and by Zwick et al. (2004) for African American students, except for the1998-1999 UCLA mega-cohort for the African American group.29 For Hispanic students the overprediction is smaller than the one reported by Ramist et al. (2001) (-0.13) and similar to some of the results reported by Zwick et al. (2004) (see for example Berkeley 19961997 mega-cohort, Irvine 1998-1999 mega-cohort, San Diego 1996-1997 mega-cohort). We found no improvement in the prediction accuracy from using the R-SAT Verbal score for minority groups. On the contrary, the prediction errors for minorities increased when using the maximum from the SAT and R-SAT Verbal score to 0.114 for African American students

and 0.032 for Hispanic students respectively.30

rP Fo

ee

rR

ev

ie

w On

28

Although they also resemble somewhat the results obtained for the Hispanic subsample, the standardized

coefficients associated to parents education and income as well as the overall R2 are closer to those observed for the

ly

White students. See Author (Year, Appendix 3) for details.


29

We focused our attention on Zwick et al. s model 6, which is the most similar to the analyses reported in this

section.
30

The same analysis was conducted for fourth-year cumulative UC GPA and the average underprediction for

African American and Hispanic students increased as well (from 0.181 to 0.194 and from 0.033 to 0.040 respectively).

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

30

Page 31 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Predictive Validity of IRT Ability Estimates This section presents the results regarding the predictive power of the IRT ability estimates and compares those results to the predictive capacity of the R-SAT and original SAT Verbal scores. The IRT ability estimates include: (i) ability estimates obtained from estimating the Rasch model in all the test items, (ii) ability estimates obtained from estimating the Rasch DIF model in all the test items and (iii) ability estimate obtained from estimating the Rasch model in only the hardest half of the items. These three ability estimates were obtained for all White, African American and Hispanic students.31 These analyses were conducted separately for each combination of ethnic group, academic outcome and test form in which the Freedle phenomenon was observed. This analysis structure translated in reduced sample sizes. Table 2 of this paper shows the ethnic groups and forms in which the relationship between item difficulty and DIF estimates was observed. Test forms and ethnic groups could not be aggregated as in the R-SAT predictive validity analysis because the Conquest estimation, especially that of the Rasch DIF model, generates one student ability estimate per ethnic comparison. In addition, ability estimates from different Rasch models, student samples and test forms cannot be directly aggregated because they are on different scales. Even if we assumed that test forms were equated during test development, information about the difficulty parameters of items used in equating is not available, preventing

the use of a common scale for all ability estimates.

Two of the five output tables for form 1999 IZ are presented here (see Tables 8 and 9). Both tables display summary statistics of the analyses conducted using one of the most current

rP Fo

ee

rR

ev

ie

On

ly

31

This differs from the R-SAT analysis presented in the previous sections, in which, the new score was only

computed for minority students.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

31

Applied Measurement in Education

Page 32 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity forms analyzed (1999 IZ): (i) R square information for each of the models explaining a total of six dependent variables for the African American/White comparison and (ii) R square information for each of the models explaining a total of six dependent variables for the Hispanic/White comparison. Form 1999 IZ has the largest sample size. The two tables presented here are representative of the results obtained in the other test forms and ethnic groups (Hispanic students taking Form 1994 VD, African American students taking form 1994 QI and 1994 Form DX) . The remainder output tables are included in Author (Year, pp. 161-164). Although there are differences in the overall predictive capacity by ethnic group, academic outcome and test form, overall predictive validity results lead to the same conclusions as the findings presented in

Tables 8 and 9.

The predictive power of the multivariate regression models assessed fares best when predicting college grades of White students and the performance of the model decreases over time with the exception of cumulative fourth year GPA. College grades of minority students, especially African American students, are not predicted in any meaningful way. Surprisingly, the models under study predict fourth-year graduation better for African American and Hispanic students than for White students. This trend was already noted in the previous section. Negative adjusted R2 indicate very low explained variance in spite of the inclusion of a large number of parameter in the regression model.

Although the overall predictive power varies significantly by form, ethnic group and academic outcome, within the same ethnic group and academic outcome there is no significant practical difference among the predictive capacity achieved when using either of the three IRT ability estimates. In addition, there is no clear trend, as measured by the R2, in the superiority of using either of the IRT ability estimates, the SAT original score or the revised SAT scores.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

32

Page 33 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity The small sample size and related instability of results allow us to present only a tentative conclusion about the little practical difference observed in the overall predictive power associated to the different IRT ability estimates, and how they fare in comparison to the original SAT score. In addition there is some evidence suggesting that Rasch and Rasch DIF ability estimates fare better in predicting short term academic outcomes for minorities while the original SAT score predicts better long-term outcomes for the same group. Discussion

The research presented in this article aimed to examine the predictive validity of the RSAT score addressing the methodological criticisms made to the way Freedle obtained the different component to calculate de R-SAT score (Dorans, 2004; Dorans & Zeller, 2004a, 2004b). We did so by using formula score rather than proportion correct as the basis to calculate the R-SAT score and by directly estimating studentsability using the Rasch and the Rasch DIF model, both with all items and with only the hardest half of the items. This latter approach addressed the issue of inverse regression and aggregation of estimates from different ethnic

rP Fo

ee

rR

ev

ie

groups

Analyses presented above show that, in this sample, the R-SAT score helps minority students, although not as much as Freedle expected. On average, it increases scores by 24 points or 6% for African American students and by 12 points or 2.5% for Hispanic students. Using Freedles assumptions, the consideration of the R-SAT would change admissions decisions of minority students admitted into selective colleges by about 10%. This is much less than Freedles prediction of approximately 300% increase.32 The small increases in R-SAT scores are consistent with the magnitude of score increase reported by Dorans (2004) and Dorans & Zeller (2004a).
32

Freedle identified an increase of 342% for Form 4I and an increase of 334% for Form OB023.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

On

ly

33

Applied Measurement in Education

Page 34 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity In addition the predictive validity analyses show no significant difference in the capacity to predict short and long-term outcomes when using either the original or the revised SAT score. Also, results show that the traditional problem of over and underprediction would remain the same when using the revised SAT score. Results from using the IRT ability measures are somewhat less straightforward but also support the conclusion that there is little practical difference in the overall predictive power associated to the different IRT ability estimates, and how they fare in comparison to the original SAT and the R-SAT scores.

This research has several limitations. Among them is the fact that predictive validity analyses were conducted on a group of students who were already accepted to college and therefore present significant restriction of range in some of the explanatory variables. In addition many students who did not attend selective colleges might have matriculated at such schools if their R-SAT scores had been used in the admission process but this limitation is also observed in other predictive validity studies (Geiser & Studley, 2002; Zwick, 2002; Zwick, Brown & Sklar, 2004; Zwick & Sklar, 2005). This consideration limits in some extent the validity of our findings. The use of inverse regression and the aggregation of different ethnic groups in order to obtain the R-SAT scores (not the IRT ability estimates) are still subject to Dorans & Zellers original criticisms. Recent changes to the content of the SAT and the inclusion of a mandatory Writing test may limit the generalizability of the findings presented here since they were based in somewhat older test forms. Larger sample size for each minority group may be desirable in order to implement future research, especially for African American students, however, that will require the combination of data for a number of colleges and universities that exceeds the overall and minority sample size of the nine campuses of the University of California combined.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

34

Page 35 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Furthermore, and despite the limited sample size of African American and Hispanic students, we were still able to observe results that were similar to those reported by previous research, such as the statisticall significance and practical importance of high school grades for predicting college grades and graduation. These results provide support for the validity of our results for these particular samples. We think it is important to highlight the consistency of the results obtained in the numerous and diverse analyses implemented across African American and Hispanic students: no strong evidence in favor of the R-SAT score is observed when (a) recalculating the scores using only the most difficult items for minorities, (b) when using that R-SAT score to directly predict short and long term outcomes using models that considered and did not consider SAT II scores, (c) when using models that did not control for school quality and allowed us to have larger sample sizes, (d) when evaluating the over and underprediction problem for minorities, and (e) when using using IRT ability estimates (considering all items, all items plus a DIF parameter and only the hardest-half items of the test) to predict short and long term outcomes. The findings presented in this article consistently reveal that there are minimal benefits associated with Freedles R-SAT and suggest that, rather than using measures aimed to complement the SAT, efforts and energy should be directed to studying the phenomenon behind the systematic relationship between item difficulty and DIF estimates (Author, Year) and directly addressing those issues during test development. The investigation of potential causes should include studies that investigate at Freedles proposed explanation, the influence of academic versus home language (Freedle, 2010) including investigation of the cognitive processes of students while taking the test as well as quantitative analyses and modeling techniques (De Boeck, 2010). In addition, further research should investigate the sensibility of Freedles

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

35

Applied Measurement in Education

Page 36 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity phenomenon to alternative forms of guessing such as differential guessing strategies between White and students from other ethnic groups. These results also suggest that alternative policy options should be considered if the goal is to increase the representation of minority groups in higher education, specially at highly selective institutions (Bowen, Chingos & McPherson, 2009)33. Those options may include the use of school quality indices as input in the admissions processes (Zwick & Himelfarb, 2011) and/or explicitly considering nonacademic outcomes as desirable college goals and therefore adjusting the weight of admission indicators accordingly (Sinha, Oswald, Imus & Schmitt, 2011).

rP Fo

ee rR ev ie w On ly

33

Bowen et al. (2009) call undermatching to the phenomenon by which students enroll in institutions that are less

demanding than they are qualified to attend. The phenomenon is described as most pronounced among wellqualified low-income and minority students, who enroll at two-year institutions o less-selective four year insitutions. Since college completion varies sharply with school selectivity, even after controlling for student characteristics, the penomenon of undermatching results in minority students graduating from less-demanding colleges at lower rates than similar students at highly-selective institutions.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

36

Page 37 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity


Table 1: Descriptive Statistics. Overall Sample Taking SAT forms and Subsample of Students who enrolled at UC.
Variable N Overall Sample SAT Composite HSGPA Income 28,860 28,367 25,678 28,489 UC Applicant Sample 11,155 11,016 9,866 11,027 UC Enrolled Sample 4,804 1098 3.55 63,250 6.93 195 0.32 30,938 2.19 1067 3.47 62,550 6.89 206 0.36 30,779 2.16 958 3.23 56,853 6.40 224 0.45 30,239 2.18 Mean Std Dev.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo
Max Ed Level SAT Composite HSGPA Income Max Ed Level SAT Composite HSGPA Income Max Ed Level

Source: College Board

ee

rR
4,754 4,253 4,749

ev

ie w On ly
37

Applied Measurement in Education

Page 38 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity


Table 2 Presence of the Freedle Phenomenon According to the Standardization and Rasch Model Across Methods, Forms and Ethnic Groups. Verbal Tests.*
Group Method Standardization White, African Approach American Rasch Model YES NO YES NO NO YES YES NO NO YES NO NO YES NO YES NO 1999 IZ 1999 VD 1994 QI 1994 DX

White, Hispanics

* Presence of the Freedle phenomenon is defined as a statistically significant and high (above 0.3) correlation.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo
Approach

Standardization

Rasch Model

ee rR ev ie w On ly
38

Page 39 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity


Table 3: Number of Students for Whom the Revised Score was Calculated and IRT Ability Parameters Estimated.

Group
White Examinees Hispanics Examinees African American Examinees

1999 IZ 6548 1904 854

1999 VD 6682 2018 -

1994 QI 3360 671

1994 DX 3188 709

Total 19778 3922 2234

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee rR ev ie w On ly
39

Applied Measurement in Education

Page 40 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity


Table 4: Distribution of Score Difference by Ethnic Groups and Corresponding Mean SAT Verbal Score. Overall Sample.
Difference Between African American Examinees R-SAT Verbal Score and SAT Verbal Mean SAT Score (both end points included) [-106, -101] Number Percentage Score Number Percentage Mean SAT Score Hispanic Examinees

[-100, -51] [ -50, [ 0]

[ 50, 101] [ 100, 210] TOTAL

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo
39 658 0, 49] 966 452 119 2234

2 2% 29% 43% 20% 5% 433.6 438.7 396.2 301.6 251.7 382.5 95 1554 1704 418 149 3922

0% 2% 40% 43% 11% 4% 100%

515.0 506.2 518.4 468.9 370.0 276.1 471.6

ee
100%

rR ev ie w On ly
40

Page 41 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity


Table 5: Number of Examinees Scoring 600, or above, in the Sample and their Mean Scores.
Number of Students Scoring Mean SAT Ethnic Group Over 600 when Verbal considering SAT Verbal Score African American Students African between SAT and RSAT Verbal Score and R-SAT V in the Sample considering Max. Between SAT V Examinees Number of Student Scoring Over 600 when Mean of Max. Total Number of

American and Hispanic

Students White

Students

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo
79 458 3,889

637

86

643

2,234

645

516

648

6,156

ee

653

19,778

rR ev ie w On ly
41

Applied Measurement in Education

Page 42 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity


Table 6: Overall Predictive Power of the Original SAT Verbal Scores and the Maximum between the SAT Verbal Scores and the Revised SAT Verbal scores. Multivariable Regression Models.
Score African Hispanic American Students Students SAT V Max [SATV or RSATV] N 2.15% 15.36% 15.00% 597 21.24% 2253 Students Students 0.18% 0.07% 73 13.16% 12.40% 540 UCGPA 4th Year African Hispanic Students 8.13% 7.27% 497 White American Students Students 12.92% 1964 4.81% 4.94% 64 5.01% 4.38% 476 13.11% 1904 Students Students Hispanic White 16.55% 2120 White American Students Students UCGPA 1st Year African Hispanic White UCGPA 2nd Year

rP Fo
1.66% 78 African American Students -4.39% -5.19% 67 African American Students 0.12% 0.71% 65

UCGPA 3rd Year

ee

SAT V Max [SATV or RSATV] N

UC CUM GPA 4th YEAR

rR
Hispanic Students

UC GRADUATION BY 4th YEAR* African Hispanic American Students Students 15.97% 15.08% 78 13.35% 13.13% 613 6.91% 2314 Students White

ev
White Students 20.68% 1927

ie

SAT V Max [SATV or RSATV] N

15.18% 14.28% 481

*Pseudo R2 is reported for the logistic regression used to predict fourth-year graduation.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

On

ly
42

Page 43 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity

Table 7: Predictive Power of First-Year UC GPA: A Joint Regression Equation. Standardized Estimates and Statistical Significance.

Max [SATV Regression Model API Quintile Parents Education Income HS GPA Level Math SAT VERBAL SCORE] 0.039 (0.028) 0.038 0.330 (<0.001) 0.327 (<0.001) -0.021 (0.333) -0.015 (0.465) 0.181 (<0.001) 0.191 (<0.001) 23.69% 2,928 23.85% 2,928 Verbal SAT or RSAT Adjusted R2 N

rP Fo
0.102 0.098 (<0.001) 0.106 (<0.001) (<0.001) 0.102 (<0.001)

1.1

ee

1.2

(0.027)

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rR ev ie w On ly
43

Applied Measurement in Education

Page 44 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity


Table 8: Adjusted R for Different Academic Outcomes. SAT FORM 1999 IZ. African American and White Examinees. Multivariate Regression Models
UCGPA 1st Year African White American Students Students SAT V Max [SATV or RSATV] 2.15% 1.66% 3.95% 3.93% 3.49% 78 22.56% 22.56% 23.20% 23.20% 23.03% 1153 Students 0.18% 0.07% -0.33% -0.41% -0.77% 73 17.52% 17.52% 17.99% 17.99% 17.77% 1099 American Students UCGPA 2nd Year African White
2

Rasch Model Ability Estimate

Rasch DIF Model Ability Estimate Hard Half Ability Estimate (Rasch Model) N

SAT V Max [SATV or RSATV] Rasch Model Ability Estimate Rasch DIF Model Ability Estimate

Hard Half Ability Estimate (Rasch Model) N

SAT V Max [SATV or RSATV] Rasch Model Ability Estimate Rasch DIF Model Ability Estimate Hard Half Ability Estimate (Rasch Model) N
2

*Pseudo R is reported for the logistic regression used to predict fourth-year graduation.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee rR

UCGPA 3rd Year African White American Students Students -4.39% -5.19% 15.93% 15.93% 15.97% 15.97%

UCGPA 4th Year African White American Students Students 4.81% 4.94% 4.80% 4.80% 5.57% 64 16.54% 16.54% 16.68% 16.69% 16.76% 990

ev
-3.83% -3.68% -3.03% 67 African American Students 0.12% 0.71% 1.63% 1.66% 3.54% 65

UC CUM GPA 4th YEAR

ie

16.15% 1024

w
White Students 26.27% 26.27% 26.72% 26.73% 26.76% 1001

UC GRADUATION BY 4th

On
African American Students 15.97% 15.08% 14.14% 14.12% 14.38% 78

YEAR*

White Students

ly

8.44% 8.44% 8.44% 8.44% 8.50% 1184

44

Page 45 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity


Table 9: Adjusted R for Different Academic Outcomes. SAT FORM 1999 IZ. Hispanic and White Examinees. Multivariate Regression Models
UCGPA 1st Year Model Hispanic Students SAT V Max [SATV or R-SAT VERBAL SCORE] Rasch Model Ability Estimate 12.72% 12.07% 13.39% 13.37% 12.11% 272 White Students 22.56% 22.56% 23.19% 23.20% 23.03% 1153 UCGPA 2nd Year Hispanic Students 25.06% 23.27% 25.29% 25.26% 20.95% 248 White Students 17.52% 17.52% 17.99% 17.99% 17.77% 1099
2

Rasch DIF Model Ability Estimate Hard Half Ability Estimate (Rasch Model) N

SAT V

Max [SATV or R-SAT VERBAL SCORE] Rasch Model Ability Estimate Rasch DIF Model Ability Estimate

Hard Half Ability Estimate (Rasch Model) N

SAT V Max [SATV or R-SAT VERBAL SCORE] Rasch Model Ability Estimate Rasch DIF Model Ability Estimate Hard Half Ability Estimate (Rasch Model) N
2

*Pseudo R is reported for the logistic regression used to predict fourth-year graduation.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo
Model Model

UCGPA 3rd Year Hispanic Students 14.67% 12.42% 13.99% White Students 15.93% 15.93% 15.97% 15.97% 16.15% 1024

UCGPA 4th Year Hispanic Students 5.44% 5.88% 6.38% 6.40% 5.13% 221 White Students 16.54% 16.54% 16.68% 16.69% 16.76% 990

ee rR

ev
13.96% 11.57% 229 Hispanic Students 20.64% 20.59% 21.61% 21.59% 18.38% 225

UC CUM GPA 4th YEAR YEAR* Hispanic White Students 8.44% 8.44%

ie

w
White 26.27% 26.27% 26.72% 26.73% 26.76% 1001

UC GRADUATION BY 4th

Students

On
Students 25.80% 25.31% 25.75% 25.76% 25.03% 277

ly

8.44% 8.44% 8.50% 1184

45

Applied Measurement in Education

Page 46 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity References

Adams, R. J., Wilson, M. R., & Wang, M. L. (1997). The Multidimensional Random Coefficients Multinomial Logit. Applied Psychological Measurement, 21 (1), 1-24.

Allen, J., Robbins, S. & Sawyer, R. (2010). Can Measuring Psychosocial Factors Promote College Success?. Applied Measurement in Education, 23 (1), 1-22.

Astin, A. W., Tsui, L., & Avalos, J. (1996). Degree attainment rate at American colleges and Universities: effect of race, gender and institutional type. Los Angeles, CA: Higher Education Research Institute, Graduate School of Education & Information Studies.

Atkinson, R. C. & Geiser, S. (2009). Reflections on a century of college admissions tests. Educational Researcher, 38 (9), 665676.

Atkinson, R. C. & Pelfrey, P. (2004). Rethinking Admissions: Us Public Universities In The Post-Affirmative Action Age. Berkeley, CA: Center for Studies in Higher Education,

University

http://cshe.berkeley.edu/publications/docs/ROP.Atkinson.Pelfrey.11.04.pdf

Baum, S. & Ma, J. (2007). Education pays: The benefits of higher education for individuals and

society. New York, NY: The College Board.

Bowen, W., & Bok, D. (1998). The Shape of the River. Long-term Consequences of Considering Race in College and University Admissions. Princeton, NJ: Princeton University Press.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee
of

rR

ev

ie

California,

Berkeley:

On

ly
46

Page 47 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Bowen, W., Chingos, M. & McPherson, M. (2009). Crossing the Finish Line. Completing College at Americas Public Universities. Princeton, NJ: Princeton University Press.

Breland, H., Maxey, J., Gernand, R., Cumming, T., & Trapani, C. (2001). Trends in college admission 2000: A report of a national survey of undergraduate admissions policies, practices, and procedures. Retrieved from http://www.semworks.net/aboutus/

resources/docs/trends_in_college_admission.pdf

Bridgeman, B., Burton, N., & Cline, F. (2001). Substituting SAT II: Subject Tests for SAT I: Reasoning Test: Impact on Admitted Class Composition and Quality (No. 2001-03). New York, NY: College Board.

Bridgeman, B., Pollack, J., & Burton, N. (2004). Understanding What SAT Reasoning Test Scores Add to High School Grades: A Straightforward Approach (No. 2004-04). New

York, NY: College Board.

Brown, T., & Zwick, R. (2006). Using Hierarchical Linear Models to Describe First-Year Grades at the University of California. Paper presented at annual meeting of the American Educational Research Association, San Francisco.

Burton, N., & Ramist, L. (2001). Predicting Success in College: SAT Studies of Classes Graduating Since 1980 (No. 2001-2). New York, NY: College Board.

Camara, W., & Echternacht, J. (2000). The SAT I and high school grades: utility in predicting success in college. (No. Research Report RN-10). New York, NY: College Entrance Examination Board.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

47

Applied Measurement in Education

Page 48 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Camara, W., & Sathy, V. (2004). College Board Response to Harvard Educational Review Article by Freedle. Retrieved August 17, 2006, from

http://www.collegeboard.com/research/pdf/051425Harvard_050406.pdf#search=%22coll ege%20board%20freedle%22

Camara, W. J. & Kimmel, E. W. (Eds.). (2005). Choosing Students: Higher Education Admissions Tools for the 21st Century. Mahwah, NJ: Lawrence Erlbaum.

Camilli, G., & Shepard, L. (1994). Methods for Identifying Biased Test Items (Vol. 4). London, New Delhi: Sage Publications.

Diaz-Guerrero, R., & Szalay, L. B. (1991). Understanding Mexicans and Americans: Cultural Perspectives in Conflict. New York, NY: Plenum Press.

De Boeck, P. (2010, October 26). Another look at bias in the SAT [Web log post]. Retrieved from http://www.hepg.org/blog/45.

Dorans, N., & Kulick, E. (1983). Assessing unexpected differential item performance of female candidates on SAT and TSWE forms administered in December 1977: An application of the standardization approach. (No. RR-83-9). Princeton, NJ: Educational Testing Service.

Dorans, N., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23(4), 355-368.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie w On ly
48

Page 49 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Dorans, N., & Holland, P. (1992). DIF detection and description: Mantel-Haenszel and standardization (No. RR-92-10). Princeton, NJ: Educational Testing Services.

Dorans, N. (2004). Freedle's Table 2: Fact or Fiction. Harvard Educational Review, 74(1), 6272.

Dorans, N., & Zeller, K. (2004a). Examining Freedles Claims and His Proposed Solution: Dated Data, Inappropriate Measurement, and Incorrect and Unfair Scoring (No. RR-0426). Princeton, NJ: Educational Testing Service.

Dorans, N., & Zeller, K. (2004b). Using Score Equity Assessment to Evaluate the Equatability of the Hardest Half of a Test to the Total Test (No. RR-04-43). New Jersey, NJ: Educational

Testing Service.

FairTest. (2003). SAT I: A Faulty Instrument for Predicting College Success. Retrieved from http://fairtest.org/facts/satvalidity.html

Frary, R. B. (1988). Formula Scoring of Multiple-Choice Tests (Correction for Guessing). Educational Measurement: Issues and Practice, 7(2), 3338.

Freedle, R. (2003). Correcting the SATs Ethnic and Social-Class Bias: A Method for Reestimating SAT Scores. Harvard Educational Review, 73 (1), 1-43.

Freedle, R. (2010, Fall). On replicating ethnic test bias effects: The Santelices and Wilson study. Harvard Educational Review, 80, 394-404.

Author (Year)

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie w On ly
49

Applied Measurement in Education

Page 50 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Author (Year)

Hezlett, S. A., Kuncel, N. R., Vey M., Ahart, A. M., Ones, D. S., Campbell, J. P., & Camara, W. J. (2001). The Effectiveness of the SAT in Predicting Success Early and Late in College: A Comprehensive Meta-analysis. Paper presented at the annual meeting of the National Council of Measurement in Education, Seattle.

Holland, P., & Wainer, H. (1993). Concluding Remarks and Suggestions. In P. Holland & H. Wainer (Eds.), Differential Item Functioning (pp. 419-422). Hillside, NJ: Lawrence Erlbaum.

Kobrin, J., Camara, W., & Milewski, G. (2002). The Utility of the SAT I and SAT II for Admissions Decisions in California and the Nation (No. 2002-6). New Jersey, NJ: College Board.

Kobrin, J. L., Patterson, B. F., Shaw, E. J., Mattern, K. D., & Barbuti, S. M. (2008). Validity of the SAT for predicting first-year college grade point average (College Board Research Rep. No. 2008-5). New York, NY: The College Board.

Kolen, M. (1988). Traditional equating methodology. Educational. Measurement: Issues and Practice, 7(4), 29-36.

Kyllonen, P. C. (2008). The research behind the ETS Personal Potential Index. Retrieved from the ETS website:

http://www.ets.org/Media/Products/PPI/10411_PPI_bkgrd_report_RD4.pdf

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly
50

Page 51 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Mattern, K. & Patterson, B. (2009). Is Performance on the SAT Related to College Retention?. College Board Research Report No. 2009-7. New York: College Board.

Mattern, K. & Patterson, B. (2011a). The Relationship between SAT Scores and Retention to the Third Year: 2006 SAT Validity Sample. College Board Statistical Report No. 2011-2. New York, NY: College Board.

Mattern, K. & Patterson, B. (2011b). The Relationship Between SAT Scores and Retention to the Fourth Year: 2006 SAT Validity Sample. College Board Statistical Report No. 2011-6. New York, NY: College Board.

Moore, S. (1996). Estimating Differential Item Functioning in the Polytomous Case with the Random Coefficients Multinomial Logits (RCML) model. In G. Englehard & M. Wilson (Eds.), Objective Measurement III: theory into practice (pp. 219-239). Norwood, NJ:

Ablex.

NCES, IPEDS (2007). Retention Rates First-Time College Freshmen Returning Their Second

Year.

http://www.higheredinfo.org/dbrowser/index.php?submeasure=223&year=2007&level=n ation&mo de=data&state=0

Oswald, F., Schmitt, N., Kim, B. H., Ramsay, L., & Gillespie, M. (2004). Developing a biodata measure and situational judgment inventory as predictors of college student performance. Journal of Applied Psychology, 89(2), 187-207.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo
Retrieved

ee

rR

September

ev

ie

22,

2008,

from

On

ly

51

Applied Measurement in Education

Page 52 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Paek, I. (2002). Investigation of Differential Item Functioning: Comparisons Among Approaches, and Extension to a Multidimensional Context. Berkeley, CA: University of California, Berkeley.

Perfetto, G. (1999). Toward taxonomy of the admissions decision-making process: A public document based on the first and second College Board conferences on Admissions Models. New York, NY: College Board.

Ramist, L., Lewis, C. & McCamley-Jenkins. (2001). Using Achievement Tests/SAT II: Subject Tests to Demonstrate Achievement and Predict College Grades: Sex, Language, Ethnic, and Parental Education Groups (No. 2001-05). New York, NY: College Board.

Ramist, L., Lewis, C., & McCamley-Jenkins, L. (1994). Student group differences in Predicting College Grades: Sex, Language and Ethnic Groups (No. Report No. 93-1). New York, NY: College Entrance Examination Board.

Ramsey, P. (1993). Sensitivity Review: The ETS Experience as a Case Study. In P. Holland & H. Wainer (Eds.), Differential Item Functioning (pp. 367-388). Hillsdale, NJ: Lawrence

Erlbaum Associates.

Reason, R. (2009). An examination of the persistence research through the lens of a comprehensive conceptual framework. Journal of College Student Development, 50 (6), 659-682.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

52

Page 53 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Robbins, S., K. Lauver, H. Le, D. Davis, R. Langley (2004). Do psychological and study skill factors predict college outcomes? A Meta - Analysis. Psychological Bulletin 130 (2), 261-288.

Rogers, H. (1999). Guessing in Multiple Chice Tests. In G. Masters & J. Keeves (Eds.), Advances in Measurement in Educational Research and Assessment (23543). Amsterdam: Pergamon.

Rothstein, J. M. (2004). College performance predictions and the SAT. Journal of Econometrics, 121(1-2), 297-317.

Author (Year)

Sinha, R., Oswald, F., Imus, A., & Schmitt, N. (2011). Criterion-focused approach to reducing adverse impact in college admissions. Applied Measurement in Education, 24 (2), 1-25.

Singer, J. & Willett, J. (2003). Applied Longitudinal Data Analysis. New York: Oxford

University Press

Steele, C. M., & Aronson, J. (1998). How stereotypes influence the standardized test performance of talented African American students. In C. Jencks & M. Phillips (Eds.), Black-White test score differences (pp. 401-427). Harvard Press.

Sternberg, R. (1999). A triarchic approach to the understanding and assessment of intelligence in multicultural populations. Journal of School Psychology, 37(2), 145159.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee rR ev ie w On ly
53

Applied Measurement in Education

Page 54 of 55

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Sternberg, R. J. (2003). Our research program validating the triarchic theory of successful intelligence: Reply to Gotfredson. Intelligence, 31, 399413.

Thissen, D., & Orlando, M. (2001). Item response theory for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 73-137). London, England: Lawrence Erlbaum.

Willingham, W. (1985). Success in college: The role of personal qualities and academic ability. New York, NY: College Entrance Examination Board.

Willingham, W., Lewis, C., Morgan, R., & Ramist, L. (1990). Predicting College Grades: An Analysis of Institutional Trends Over Two Decades. New York, NY: College Entrance Examination Board and Educational Testing Service.

Wilson, K. (1983). A Review of Research on the Prediction of Academic Performance After the Freshman Year (No. 83-2). New York, NY: College Board.

Wu, M., Adams, R. J., & Wilson, M. (1998). ACERConQuest. Hawthorn, Australia: ACER Press.

Xu, T. & Stone, C. (2012). Using IRT Trait Estimates Versus Summated Scores in Predicting Outcomes. Educational and Psychological Measurement, June 2012; vol. 72, 3: pp. 453468.

Young, J. (2004). Differential Validity and Prediction: Race and Sex Differences in College Admissions Testing. In R. Zwick (Ed.), Rethinking the SAT. The Future of Standardized Testing in University Admissions (289-301). New York, NY: Routledge Falmer.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly

54

Page 55 of 55

Applied Measurement in Education

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

R-SAT Predictive Validity Zwick, R. (2002). Fair Game? The Use of Standardized Admissions Tests in Higher Education. New York: Routledge Falmer.

Zwick, R., Brown, T., & Sklar, J. C. (2004). California and the SAT: A reanalysis of University of California Admissions Data. Research & Occasional Paper Series: CSHE.8.04. Berkeley, CA: University of California, Berkeley. Retrieved from

http://cshe.berkeley.edu/publications/publications.php?s=1

Zwick, R. & Schlemer, L. (2004), SAT Validity for Linguistic Minorities at the University of California, Santa Barbara. Educational Measurement: Issues and Practice, 23 (1), 616.

Zwick, R. & Sklar, J. (2005). Predicting College Grades and Degree Completion Using High School Grades and SAT Scores: The Role of Student Ethnicity and First Language. American Educational Research Journal, 42 (3), 439464.

Zwick, R., & Green, J. (2007). New perspectives on the correlation of SAT scores, high school grades and socioeconomic factors. Journal of Educational Measurement, 44 (3), 2345.

Zwick, R. & Himelfarb, I (2011). The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average. Journal of Educational Measurement, 48 (2), 101121.

URL: http://mc.manuscriptcentral.com/hame Email: kgeisinger2@unl.edu

rP Fo

ee

rR

ev

ie

On

ly
55

You might also like