Professional Documents
Culture Documents
Manual
John Rust
888-298-6227 TalentLens.com
Copyright 2006 by NCS Pearson, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the copyright owner. The Pearson and TalentLens logos, and Advanced Numerical Reasoning Appraisal are trademarks, in the U.S. and/or other countries, of Pearson Education, Inc. or its affiliate(s). Portions of this work were previously published. Printed in the United States of America.
Table of Contents
Acknowledgements Chapter 1 Introduction ............................................................................................. 1 Numerical Reasoning and Critical Thinking ............................................................ 2 Chapter 2 History and Development of ANRA....................................................... 4 Description of the Test ............................................................................................ 4 Adapting RANRA .................................................................................................... 4 Development of RANRA ......................................................................................... 5 Chapter 3 Directions for Administration ................................................................ 6 General Information ................................................................................................ 6 Preparing for Administration.................................................................................... 6 Testing Conditions .................................................................................................. 7 Answering Questions .............................................................................................. 7 Administering the Test ............................................................................................ 7 Scoring and Reporting ............................................................................................ 8 Test Security ........................................................................................................... 8 Concluding Test Administration .............................................................................. 8 Administering ANRA and Watson-Glaser Critical Thinking Appraisal in a Single Testing Session..................................................................................... 9 Accommodating Examinees with Disabilities .......................................................... 9 Chapter 4 ANRA Norms Development.................................................................... 10 Using ANRA as a Norm- or Criterion-Referenced Test........................................... 10 Using Norms to Interpret Scores ............................................................................. 11 Converting Raw Scores to Percentile Ranks .......................................................... 12 Using Standard Scores to Interpret Performance ................................................... 12 Converting z Scores to T Scores....................................................................... 13 Using ANRA and Watson-Glaser Critical Thinking Appraisal Together .................. 14
Chapter 5 Evidence of Reliability............................................................................ 15 Reliability Coefficients and Standard Error of Measurement................................... 15 RANRA Reliability Studies ..................................................................................... 17 ANRA Reliability Studies ......................................................................................... 17 Evidence of Internal Consistency ...................................................................... 18 Evidence of Test-Retest Stability ...................................................................... 20 Chapter 6 Evidence of Validity ................................................................................ 20 Face Validity............................................................................................................ 20 Evidence Based on Test Content............................................................................ 21 Evidence Based on Test-Criterion Relationships .................................................... 22 Correlations Between ANRA Test1 and Test 2 ....................................................... 25 Evidence of Convergent and Discriminant Validity ................................................. 25 Correlations Between ANRA and Watson-Glaser Critical Thinking AppraisalShort Form ........................................................... 25 Correlations Between ANRA and Other Tests .................................................. 26 Chapter 7 Using ANRA as an Employment Selection Tool .................................. 27 Employment Selection ............................................................................................ 27 Using ANRA in Making a Hiring Decision ............................................................... 27 Differences in Reading Ability, Including the Use of English as a Second Language .......................................................................................... 29 Using ANRA as a Guide for Training, Learning, and Education.............................. 29 Fairness in Selection Testing .................................................................................. 30 Legal Considerations......................................................................................... 30 Group Differences and Adverse Impact ............................................................ 30 Monitoring the Selection System....................................................................... 31 References................................................................................................................... 32 Appendices Appendix A Description of the Normative Sample .................................................... 35 Appendix B Appendix C ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Norm Group....................................................................... 37 Combined Waston-Glaser and ANRA T Scores and Percentile Ranks by Norm Group.......................................................... 39
Coefficient Alpha, Odd-Even Split-Half Reliability, and Standard Error of Measurement (SEM) for RANRA (from Rust, 2002, p.85).......................................................................... 17 ANRA Means, Standard Deviations (SD), Standard Errors of Measurement (SEM), and Internal Consistency Reliability Coefficients (Alpha) ............................................................................... 18 ANRA Test-Retest Stability (N = 73)...................................................... 19 Evidence of ANRA Criterion-Related Validity (Total Raw Score) of Job Incumbents in Various Finance-Related Occupations and Position Levels ............................................................................... 24 Correlations Between Watson-Glaser Critical Thinking AppraisalShort Form and ANRA (N = 452) ........................................ 25 Correlations Between ANRA, the Miller Analogies Test for Professional Selection (MAT for PS), and the Differential Aptitude Tests for Personnel and Career AssessmentNumerical Ability (DAT for PCANA) .................................................................... 26 The Relationship of Percentiles to T Scores ......................................... 14
Table 5.2
Acknowledgements
Pearsons Talent Assessment group would like to recognize and thank Professor John Rust, Director of the Psychometrics Center at the University of Cambridge, United Kingdom, for his seminal efforts that led to his development of the Rust Advanced Numerical Reasoning Appraisal (RANRA). This manual details our adaptation of RANRA for use in the United Statesthe Advanced Numerical Reasoning Appraisal (ANRA). We are indebted to numerous professionals and organizations for their assistance during several phases of our workproject design, data collection, statistical data analyses, editing, and publication. We acknowledge the efforts of Julia Kearney, Sampling Projects Coordinator; Jane McDonald, Sampling Recruiter; Terri Garrard, Study Manager; David Quintero, Clinical Handscoring Supervisor; Hector Solis, Sampling Manager, and Victoria Locke, Director, Field Research, in driving the data collection activities. Nishidha Goel helped to collate and prepare the data. We thank Zhiming Yang, PhD, Psychometrician, and JJ Zhu, PhD, Director of Psychometrics, Clinical Products. Dr. Yangs technical expertise in analyzing the data and Dr. Zhu's psychometric leadership ensured the high level of psychometric integrity of the results. Our thanks also go to Toby Mahan and Troy Beehler, Project Managers, for diligently managing the logistics of this project. Toby and Troy worked with several team members from the Technology Products Group, Pearson to ensure the high quality and accuracy of the computer interface. These dedicated individuals included Paula Oles, Manager, Software Quality Assurance; Christina McCumber, Software Quality Assurance Analyst; Matt Morris, Manager, System Development; Maurya Buchanan, Technical Writer; and Alan Anderson, Director, Technology Products Group. Dawn Dunleavy, Senior Managing Editor, Konstantin Tikhonov, Project Editor; and Marion Jones, Director, Mathematics, provided editorial guidance. Mark Cooley assisted with the design of the cover. Finally, we wish to acknowledge the leadership, guidance, support, and commitment of the following people through all the phases of this project: Jenifer Kihm, PhD, Senior Product Line Manager, Talent Assessment; John Toomey, Director, Talent Assessment; Paul McKeown, International Product Development Director; Judy Chartrand, PhD, Director, Test Development; Gene Bowles, Vice President, Publishing and Technology; Larry Weiss, PhD, Vice President, Psychological Assessment Products Group; and Aurelio Prifitera, PhD, Group President and CEO of Clinical Assessment/Worldwide. Kingsley C. Ejiogu, PhD, Research Director John Trent, M.S., Research Director Mark Rose, PhD, Research Director
Chapter 1
Introduction
The Advanced Numerical Reasoning Appraisal (ANRA) measures the ability to recognize, understand, and apply mathematical and statistical reasoning. Specifically, ANRA measures numerical reasoning abilities that involve deduction, interpretation, and evaluation. Numerical reasoning, as measured by ANRA, is operationally defined as the ability to correctly perform the domain of tasks represented by two sets of itemsComparison of Quantities and Sufficiency of Information. Both require the use of analytical skills rather than straightforward computational skills. The key attribute ANRA measures is an individuals ability to apply numerical reasoning to everyday problem solving in professional and business settings. Starkey (1992) describes numerical reasoning as comprising a set of abilities that are used to operate upon or mentally manipulate representations of numerosity (p. 94). Research suggests that numerical reasoning abilities exist even in infancy, before children begin to receive explicit instruction in mathematics in school (Brannon, 2002; Feigenson, Dehaene, & Spelke, 2004; Spelke, 2005; Starkey, 1992; Wynn, Bloom, & Chiang, 2002). As Spelke (2005) observed, children harness these core abilities when they learn mathematics, and adults use the core abilities to engage in mathematical and scientific thinking. The numerical reasoning skill is the foundation of all other numerical ability (Rust, 2002). This skill enables individuals to learn how to evaluate situations, how to select and apply strategies for problem-solving, how to draw logical conclusions using numerical data, how to describe and develop solutions, and to recognize when and how to apply the solutions. Eventually, one is able to reflect on solutions to problems and determine whether the solutions make sense. The nature of work is changing significantly and there is an increased demand for a new kind of workerthe knowledge worker (Hunt, 1995). As Facione (2006) observed, though the ability to think critically and make sound decisions does not absolutely guarantee a life of happiness and economic success, having this ability equips an individual to improve his or her future and contribute to society. As the Internet has transformed home life and leisure time, people have been deluged with data of ever-increasing complexity. They must select, interpret, digest, evaluate, learn, and apply information. Employers are typically interested in tests that measure candidates' ability to apply constructively and critically, rather than by rote, what they have learned. A person can be trained or educated to
engage in numerical reasoning; as a result, tests that measure the ability to use mathematical reasoning within the context of work have an important function in career development. Such tests enable an organization to identify candidates who may need to improve their skills to enhance their work effectiveness and career success.
critical thinking skill, they are able to apply it in a wide variety of circumstances. Critical thinking can involve proper language use, applied logic, and practical mathematics. Because ANRA items require higher-order numerical reasoning skills, rather than rote calculation to solve, using the Watson-Glaser Critical Thinking Appraisal (a reliable and valid test of verbal critical thinking) in conjunction with ANRA provides a demanding, high-level measurement of numerical reasoning and verbal critical thinking skills, respectively. These two skills are important when recruiting in the competitive talent assessment market. In response to requests from Watson-Glaser Critical Thinking Appraisal customers in the United Kingdom, The Psychological Corporation (now Pearson) in the UK developed the Rust Advanced Numerical Reasoning Appraisal (RANRA) in 2000 as a companion numerical reasoning test for the Watson-Glaser Critical Thinking Appraisal. In 2006, Pearson adapted RANRA to enhance the suitability and applicability of the test in the United States. This manual contains detailed information on the U.S. adaptationANRA.
Chapter 2
Adapting RANRA
The Rust Advanced Numerical Reasoning Appraisal (RANRA) was adapted to reflect U.S. English and U.S. measurement units. Because RANRA measures reasoning more than computation, only the measurement units were changed and the original numbers were kept, except in cases where it affected the realism of the situation. For example, 82 kilograms was changed to 82 pounds, though 82 kg = 180.4 lbs. Similarly, 5,000 British pounds sterling was changed to 5,000 U.S. dollars, though 5,000 British pounds sterling 5,000 U.S. dollars. ANRA contains the original 32 RANRA items plus additional items for continuous test improvement purposes. All the items were reviewed by a group comprising 16 individuals researchers in test development, financial analysts, business development professionals, industrial/organizational psychologists, and editors in test publishing. Item sentence construction was modified in some items, based on input from the American reviewers.
Development of RANRA
In developing RANRA (2002), Rust first did a conceptual analysis of the role of critical thinking in the use of mathematics. Through this conceptual analysis, he identified the two subdomains of comparison of quantities and sufficiency of information as the key concepts in developing an assessment of mathematical reasoning. Rust then constructed 80 items and had a panel of educators and psychologists evaluate and modify them, and then generated the pilot version of RANRA. This pilot version of RANRA was administered to 76 students and staff from diverse subject backgrounds within the University of London. The data were subjected to detailed analysis at the item level. Distractor analysis led to the modification of some items. Itemdifficulty values were calculated for each item, based on the proportion of examinees passing each item. The discrimination index was also calculated, and those items that showed they were measuring a common quality in numerical reasoning were identified and retained. This approach led to the development of the 32-item RANRA.
Chapter 3
Testing Conditions
The test administrator has a significant responsibility to ensure that the conditions under which the test is taken do not contain undesirable influences on the test performance of candidates. Such undesirable influences can either inflate or reduce the test scores of candidates. Poor administration of a test undermines the value of test scores and makes an accurate interpretation of results very difficult, if not impossible. It is important to ensure that the test is administered in a quiet, well-lit room. The following conditions are necessary for accurate scores and for maintaining the cooperation of the examinee: good lighting, comfortable seating, adequate desk or table space, comfortable positioning of the computer screen, keyboard and mouse, and freedom from noise and other distractions. Interruptions and distractions from outside should be kept to a minimum, if not eliminated.
Answering Questions
The test administrator may answer examinees' questions about the test before giving the signal to begin. To maintain standard testing conditions, answer such questions by re-reading the appropriate section of these directions. Do not volunteer new explanations or examples. The test administrator is responsible for ensuring that examinees understand the correct way to indicate their answers and what is required of the examinees. The question period should never be rushed or omitted. If any examinees have routine questions after the testing has started, try to answer them without disturbing the other examinees. However, questions about the test items should be handled by telling the examinee to do his or her best.
Once the examinee clicks the Start Your Test button, administration begins with the first page of questions. The examinee may review test items at the end of the test. Allow examinees as much time as they reasonably need to complete the test. Average completion time is about 45 minutes. About 90% of candidates are finished with the test within 75 minutes. If an examinees computer develops technical problems during testing, the test administrator should move the examinee to another suitable computer location. If the technical problems cannot be solved by moving to another computer location, the administrator should contact Pearsons Technical Support at 1-888-298-6227 for assistance.
Test Security
ANRA scores are confidential and should be stored in a secure location accessible only to authorized individuals. It is unethical and poor test practice to allow test-score access to individuals who do not have a legitimate need for the information. Storing test scores in a locked cabinet or password-protected file that can only be accessed by designated test administrators will help ensure the security of the test scores. The security of testing materials (e.g., access to online tests) and protection of copyright must also be maintained by authorized individuals. Avoid disclosure of test access information such as usernames or passwords, and only administer ANRA in proctored environments. All the computer stations used in administering ANRA must be in locations that can be easily supervised and with adequate level of security.
Administering ANRA and Watson-Glaser Critical Thinking Appraisal in a Single Testing Session
When administering the ANRA and the Watson-Glaser in a single testing session, administer the Watson-Glaser first. Just as ANRA is intended as a test of numerical reasoning power rather than speed, the Watson-Glaser is intended as a test of critical thinking power rather than speed. Both tests are untimed; administration of ANRA and the Watson-Glaser Short Form in one session should take about 1 hour and 45 minutes.
Chapter 4
10
employment selections. For optimal results in such decisions, the overall total score, rather than the subtest scores should be used. Subtest scores represent fewer items and, therefore, are less stable than the total score. However, as a criterion-referenced measure, it is feasible to use subtest scores to analyze the numerical reasoning abilities of a class or larger group and to determine the types of numerical reasoning or critical thinking training that may be most appropriate. In norm-referenced situations, raw scores need to be converted before they can be compared. Though raw scores may be used to rank candidates in order of performance, little can be inferred from raw scores alone. There are two main reasons for this. First, raw scores cannot be treated as having equal intervals. For example, it would be incorrect to assume that the difference between raw scores of, say, 20 and 21 is of the same significance as the difference between raw scores of 30 and 31. Second, ANRA raw scores may not be normally distributed. Hence, they are not subject to the psychometric principles of parametric statistics required for the proper evaluation of validity.
11
normative sample, they do not show the amount of difference between scores. In a normal distribution of scores, percentile ranks tend to cluster around the 50th percentile. This clustering affects scores in the average range the most because a difference of one or two raw score points may change the percentile rank. Extreme scores are affected less; a change in one or two raw score points at the extremes typically does not produce a large change in percentile ranks. These factors should be considered when interpreting percentile ranks.
12
percentage scores, or of percentiles being interpreted as an interval scale. Standard scores avoid the unequal clustering of scores by adopting a scale based on standard deviation units. The basic type of standard score is the z score, which is a raw score converted to a standard deviation unit. Thus a raw score that is 0.53 standard deviations below the mean score for the group receives a z score of 0.53. z scores are generally in the 3.00 to + 3.00 range. However, there are certain disadvantages in saying that a person has a score of 0.53 on a test. From the point of view of presentation, the use of decimal points and the negative symbol is unappealing. Hence, certain transformation algorithms have become available that enable a more user-friendly image for standard scores.
13
Figure 4.1
14
Chapter 5
Evidence of Reliability
The reliability of a measurement instrument refers to the accuracy, consistency, and precision of test scores across situations (Anastasi & Urbina, 1997). Test theory posits that a test score is an estimate of an individuals hypothetical true score, or the score an individual would receive if the test were perfectly reliable. In actual practice, however, some measurement error is to be expected. A reliable test has relatively small measurement error. The methods most commonly used to estimate test reliability are testretest (the stability of test scores over time), alternate forms (the consistency of scores across alternate forms of a test), and internal consistency of the test items (e.g., Cronbachs alpha coefficient, Cronbach 1970). Decisions about the form of reliability to be used in comparing tests depend on a consideration of the nature of the error that is involved in each form. Different types of error can be operating at the same time, so it is to be expected that reliability coefficients will differ in different situations and on different groupings and samplings of respondents. An appropriate estimate of reliability can be obtained from a large representative sample of the respondents to whom the test is generally administered.
15
Repeated testing leads to some variation. Consequently, no single test event effectively measures an examinees actual ability with complete accuracy. Therefore, an estimate of the possible amount of error present in a test score, or the amount that scores would probably vary if an examinee were tested repeatedly with the same test is necessary. This estimate of error is known as the standard error of measurement (SEM). The SEM decreases as the reliability of a test increases. A large SEM denotes less reliable measurement and less reliable scores. The standard error of measurement is calculated with the formula:
SEM = SD 1 r xx
In this formula, SEM represents the standard error of measurement, SD represents the standard deviation of the distribution of obtained scores, and rxx represents the reliability coefficient of the test (Cascio, 1991, formula 7-11). The SEM is a quantity that is added to and subtracted from an examinees standard test score to create a confidence interval or band of scores around the obtained standard score. The confidence interval is a score range that, in all likelihood, includes the examinees hypothetical true score that represents the examinees actual ability. A true score is a theoretical score entirely free of error. Since the true score is a hypothetical value that can never be obtained because testing always involves some measurement error, the score obtained by an examinee on any test will vary somewhat from administration to administration. As a result, any obtained score is considered only an estimate of the examinees true score. Approximately 68% of the time, the observed standard score will lie within +1.0 and 1.0 SEM of the true score; 95% of the time, the observed standard score will lie within +1.96 and 1.96 SEM of the true score; and 99% of the time, the observed standard score will lie within +2.58 and 2.58 SEM of the true score. Using the SEM means that standard scores are interpreted as bands or ranges of scores, rather than as precise points (Nunnally, 1978). To illustrate the use of SEM with an example, assume a director candidate obtained a total raw score of 25 on ANRA, with SEM = 2.32. From the information in Table B.1, the standard score (T score) for this candidate is 57. We can, therefore, infer that if this candidate were administered a large number of alternative forms of ANRA, 95% of this candidates T scores would lie within the range between 57 1.96 x 2.32 52 T score points and 57 + 1.96 x 2.32 62 T score points. We can further infer that the expected average of this persons T scores from a large number of alternate forms of ANRA would be 57.
16
Thinking in terms of score ranges serves as a check against overemphasizing small differences between scores. The SEM may be used to determine if an individuals score is significantly different from a cut score, or if the scores of two individuals differ significantly. An example of one general rule of thumb is that the difference between two scores on the same test should not be interpreted as significant unless the difference is equal to at least twice the standard error of the difference (SED), where SED = SEM
The RANRA score reported in Table 5.1 is a T score transformed from the total raw score, while the standard error of measurement reported in the table was based on the split-half reliability (Rust, 2002).
17
Table 5.2 ANRA Means, Standard Deviations (SD), Standard Errors of Measurement (SEM), and Internal Consistency Reliability Coefficients (Alpha)
ANRA Total Raw Score Norm Group Executives/Directors Managers Professionals/Individual Contributors Employees in Financial Occupations ANRA Test 1: Comparison of Quantities Norm Group Executives/Directors Managers Professionals/Individual Contributors Employees in Financial Occupations ANRA Test 2: Sufficiency of Information Norm Group Executives/Directors Managers Professionals/Individual Contributors Employees in Financial Occupations N 91 88 200 198 N 91 88 200 198 N 91 88 200 198 Mean 21.3 20.1 22.1 21.9 Mean 10.9 10.3 11.4 11.3 Mean 10.4 9.9 10.7 10.6 SD 6.0 5.6 6.4 6.4 SD 3.4 3.4 3.6 3.5 SD 3.3 2.9 3.3 3.3 SEM 2.32 2.38 2.22 2.22 SEM 1.63 1.70 1.53 1.57 SEM 1.60 1.67 1.62 1.58 Alpha .85 .82 .88 .88 Alpha .77 .75 .82 .80 Alpha .75 .67 .76 .77
The values in Table 5.2 show that the ANRA total raw score possesses good internal consistency reliability. The ANRA subtests showed lower internal consistency reliability estimates than the ANRA total raw score. Consequently, the ANRA total score, not the subtest scores, should be used for optimal hiring results.
18
50.1
9.2
49.8
10.0
.82
.85
0.03
19
Chapter 6
Evidence of Validity
Validity refers to the degree to which specific data, research, or theory support the interpretation of test scores entailed by proposed uses of tests (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 1999). Cronbach (1970) observed that validity is high if a test gives the information the decision maker needs. Several sources of validity evidence are discussed next in relation to ANRA.
Face Validity
Face validity refers to a test's appearance and what the test seems to measure, rather than what the test actually measures. Face validity is not validity in any technical sense and should not be confused with content validity. Face validity refers to whether or not a test looks valid to candidates, administrators and other observers. If test content does not seem relevant to the candidate, the result may be lack of cooperation, regardless of the actual validity of the test. For a test to function effectively in practical situations, such a test not only has to be objectively valid but also face valid. However, a test cannot be judged solely on whether it looks right. Appearance and graphic design of a test are no guarantee of quality. Face validity should not be considered a substitute for objectively determined validity. As mentioned in the chapter on the development of ANRA, ANRA items were reviewed by a group of individuals who provided feedback on the test. The reviewers provided their feedback regarding issues like clarity of the items, the extent to which items appeared to measure numerical reasoning, extent to which test content appeared relevant to jobs that required numerical reasoning, and to what extent they thought the test would yield useful information. From the responses by this group, it was evident that ANRA had high face validity and participants recognized its relevance to the skills required by employees who deal with numbers or project planning. Although the item content of ANRA could not reflect every work situation for which the test would be appropriate, the operations and processes required in each subtest represent abilities that are valued and readily appreciated.
20
21
fail to finish a timed test. In any case, ANRA is not a speed test and it is unlikely that anyone failing to complete the test within a reasonable amount of time would improve his or her score significantly if given extra time. In an employment setting, evidence of ANRA content-related validity should be established by demonstrating that the jobs require the numerical reasoning skills measured by ANRA. Contentrelated validity in instructional settings may be examined for the extent to which ANRA measures a sample of the specified objectives of such instructional programs.
22
guidelines for interpreting validity coefficients: above .35 are considered very beneficial, .21 .35 are considered likely to be useful, .11.20 depends on the circumstances, and below .11 unlikely to be useful. It is important to point out that even relatively lower validities (e.g., .20) may justify the use of a test in a selection program (Anastasi & Urbina, 1997). This suggestion is because the practical value of the test depends not only on the validity, but also other factors, such as the base rate for success on the job (i.e., the proportion of people who would be successful in the absence of any selection procedure). If the base rate for success on the job is low (i.e., few people would be successful on the job), tests with low validity can have considerable utility or value. When the base rate is high (i.e., selected at random, most people would succeed on the job), even highly valid tests may not contribute significantly to the selection process. In addition to the practical value of validity coefficients, the statistical significance of coefficients should be noted. Statistical significance refers to the odds that a non-zero correlation could have occurred by chance. If the odds are 1 in 20 that a non-zero correlation could have occurred by chance, then the correlation is considered statistically significant. Some experts prefer even more stringent odds, such as 1 in 100, although the generally accepted odds are 1 in 20. In statistical analyses, these odds are designated by the lower case p (probability) to signify whether a nonzero correlation is statistically significant. When p is less than or equal to .05, the odds are presumed to be 1 in 20 (or less) that a non-zero correlation of that size could have occurred by chance. When p is less than or equal to .01, the odds are presumed to be 1 in 100 (or less) that a non-zero correlation of that size occurred by chance. In a study of ANRA criterion-related validity, we examined the relationship between ANRA scores and on-the-job performance of job incumbents in various occupations (mostly financerelated occupations) and position levels (mainly professionals, managers, and directors). Job performance was defined as supervisory ratings on behaviors determined through research to be important to most professional, managerial, and executive jobs. The study found that ANRA scores correlated .32 with supervisory ratings on a dimension made up of Analysis and Problem Solving behaviors, and .36 with supervisory ratings on a dimension made up of Judgment and Decision Making behaviors (see Table 6.1). Furthermore, ANRA scores correlated .36 with supervisory ratings on a dimension composed of job behaviors dealing with Quantitative/ Professional Knowledge and Expertise. Supervisory ratings from the sum of ratings on 24 job performance behaviors (Total Performance), as well as ratings on a single-item measure of Overall Potential were also obtained. The ANRA scores correlated .44 with Total Performance and .31 with ratings of Overall Potential. The correlation between ANRA scores and a single-item supervisory rating of Overall Performance was .38.
23
Table 6.1 Evidence of ANRA Criterion-Related Validity (Total Raw Score) of Job Incumbents in Various Finance-Related Occupations and Position Levels
Criterion Analysis and Problem Solving Judgment and Decision Making Quantitative/Professional Knowledge and Expertise Total Performance (24 items) Overall Performance (single item) Overall Potential
** p < .01
N 89 91 59 58 94 94
In Table 6.1, the column entitled N details the number of cases having valid supervisory ratings for every single job behavior contained in the specified criterion. The means and standard deviations refer to the criteria ratings shown in the table. The validity coefficients appear in the last column. The criterion-related validity coefficients reported in Table 6.1 apply to the specific sample of job incumbents mentioned in the table. These validity coefficients clearly indicate that ANRA is likely to be very beneficial as an indicator of the criteria shown in Table 6.1. However, test users should not automatically assume that these data constitute sole and sufficient justification for use of ANRA. Inferring validity for one group of employees or candidates from data reported for another group is not appropriate unless the organizations and job categories being compared are demonstrably similar. Careful examination of Table 6.1 can help test users make an informed judgment about the appropriateness of ANRA for their own organization. However, the data presented here are not intended to serve as a substitute for locally obtained validity data. Local validity studies, together with locally derived norms, provide a sound basis for determining the most appropriate use of ANRA. Hence, whenever technically feasible, test users should study the validity of ANRA, or any selection test, at their own location or organization. Sometimes it is not possible for a test user to conduct a local validation study. There may be too few incumbents in a particular job, an unbiased and reliable measure of job performance may not be available, or there may not be a sufficient range in the ratings of job performance to justify the computation of validity coefficients. In such circumstances, evidence of a tests validity reported elsewhere may be relevant, provided that the data refer to comparable jobs.
24
Watson-Glaser Watson-Glaser Short Form Total Raw Score Test 1: Inference Test 2: Recognition of Assumptions Test 3: Deduction Test 4: Interpretation Test 5: Evaluation of Arguments
Note. For all the correlations, p < .001
25
26
Chapter 7
Employment Selection
Many organizations use testing as a component of their employment selection process. Employment selection programs typically use cognitive ability tests, aptitude tests, personality tests, basic skills tests, and work values tests to screen out unqualified candidates, to categorize prospective employees according to their probability of success on the job, or to rank order a group of candidates according to merit. ANRA was designed to assist in the selection of employees for jobs that require numerical reasoning. Many finance-related, project-management, and technical professions require the type of numerical reasoning ability measured by ANRA. The test is useful to assess applicants for a variety of jobs, such as Accountant, Account Manager, Actuary, Banking Manager, Business Analyst, Business Development Manager, Business Unit Leader, Finance Analyst, Loan Officer, Project Manager, Inventory Planning Analyst, Procurement or Purchasing Manager, and leadership positions with financial responsibilities. It should not be assumed that the type of numerical reasoning required in a particular job is identical to that measured by ANRA. Job analysis and local validation of ANRA for selection purposes should follow accepted human resource research procedures, and conform to existing guidelines concerning fair employment practices. In addition, no single test score can possibly suggest all of the requisite knowledge and skills necessary for success in a job.
27
that selection decisions be based on multiple job-relevant tools rather than relying on any single test (e.g., using only ANRA scores to make employment decisions). Human resource professionals can look at the percentile rank that corresponds to the candidates raw score in several ways. Candidates scores may be rank ordered by percentiles so that those with the highest scores are considered further. Alternatively, a cut score (e.g., the 50th percentile) may be established so that candidates who score below the cut score are not considered further. In general, the higher the cut score is set, the higher the likelihood that a given candidate who scores above that cut score will be successful. However, the need to select high scoring candidates typically needs to be balanced with situational factors, such as the need to keep jobs filled and the supply of talent in the local labor market. When interpreting ANRA scores, it is useful to know the specific behaviors that an applicant with a high ANRA score may be expected to exhibit. These behaviors, as rated by supervisors, were consistently found to be related to ANRA scores across different occupations requiring numerical reasoning. In general, candidates who score low on ANRA may find it challenging to effectively demonstrate these behaviors. Conversely, candidates who score high on ANRA are likely to display a higher level of competence in the following behaviors: Uses quantitative reasoning to solve job-related problems. Learns new numerical concepts quickly. Applies sound logic and reasoning when making decisions. Demonstrates knowledge of financial indicators and their implications. Breaks down information into essential parts or underlying principles. Readily integrates new information into problem-solving and decision-making processes. Recognizes differences and similarities in situations or events. Engages in a broad analysis of relevant information before making decisions. Probes deeply to understand the root causes of problems. Reviews financial statements, sales reports, and/or other financial data when planning. Accurately assesses the financial value of things (e.g., worth of assets) or people (e.g., credit worthiness).
28
Human resource professionals who use ANRA should document and examine the relationship between applicants scores and their subsequent performance on the job. Using locally obtained criterion-related validity information provides the best foundation for interpreting scores and most effectively differentiating examinees who are likely to be successful from those who are not. Pearson does not establish or recommend a passing score for ANRA.
29
Legal Considerations
Governmental and professional regulations cover the use of all personnel selection procedures. Relevant source documents that the user may wish to consult include the Standards for Educational and Psychological Testing (AERA et al., 1999); the Principles for the Validation and Use of Personnel Selection Procedures (Society for Industrial and Organizational Psychology, 2003); and the federal Uniform Guidelines on Employee Selection Procedures (Equal Employment Opportunity Commission, 1978). For an overview of the statutes and types of legal proceedings that influence an organizations equal employment opportunity obligations, the user is referred to Cascio and Aguinis (2005) or the U.S. Department of Labors (1999) Testing and Assessment: An Employers Guide to Good Practices.
30
demonstrates that ANRA (or any employment assessment tool) is equally predictive for protected subgroups, as outlined by the Equal Employment Opportunity Commission, will assist in the demonstration of fairness of the test. For example, from the results of their review of 22 cases in U.S. Appellate and District Courts involving cognitive ability testing in class-action suits, Shoenfelt and Pedigo (2005, p. 6) reported that organizations that utilize professionally developed standardized cognitive ability tests that are validated and that set cutoff scores supported by the validation study data are likely to fare well in court.
31
References
Allen, M.J., & Yen, W.M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: Author. American Institute of Certified Public Accountants, AICPA (1999). Broad business perspective competencies. Retrieved February 27, 2006, from
http://www.aicpa.org/edu/bbfin.htm
Americans With Disabilities Act of 1990, Titles I & V (Pub. L. 101-336). United States Code, Volume 42, Sections 1210112213. Anastasi, A. & Urbina, S. (1997). Psychological testing (7th ed.). New York: Macmillan. Brannon, E.M. (2002). The development of ordinal numerical knowledge in infancy. Cognition, 83, 223240. Cascio, W.F. (1991). Applied psychology in personnel management (4th ed.). Englewood Cliffs, NJ: Prentice Hall. Cascio, W. F., & Aguinis, H. (2005). Applied psychology in human resource management (6th ed.). Upper Saddle River, NJ: Prentice Hall. Civil Rights Act of 1991. 102nd Congress, 1st Session, H.R.1. Retrieved August 4, 2006. Access: http://usinfo.state.gov/usa/infousa/laws/majorlaw/civil91.htm Cohen, B.H. (1996). Explaining psychological statistics. Pacific Grove, CA: Brooks & Cole. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Cronbach, L.J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row. Equal Employment Opportunity Commission. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43(166), 3829538309. Facione, P.A. (2006). Critical Thinking: What It Is and Why It Counts2006 Update. Retrieved July 28, 2006 from
http://www.insightassessment.com/pdf_files/what&why2006.pdf
Feigenson, L, Dehaene, S., & Spelke, E. (2004). Core systems of number. Trends in Cognitive Sciences, 8, 307314. Halpern, D. F. (1998) Teaching critical thinking for transfer across domains: Dispositions, skills, structure training, and metacognitive monitoring. American Psychologist, 53, 449455. Hill, W. H. (1959). Review of Watson-Glaser Critical Thinking Appraisal. In O.K. Buros (Ed.), The fifth mental measurements yearbook. Lincoln: University of Nebraska Press.
32
Hunt, E. (1995). Will we be smart enough? New York: Russell Sage Foundation. Kealy, B.T., Holland, J., & Watson, M. (2005). Preliminary evidence on the association between critical thinking and performance in principles of accounting. Issues in Accounting Education, 20 (1), 3347. Kosonen, P. & Winne, P. H. (1995). Effects of teaching statistical laws on reasoning about everyday problems. Journal of Educational Psychology, 87, 3346. Lawshe, C.H. (1975). A quantitative approach to content validity. Personnel Psychology, 28, 563575. National Education Goals Panel. (1991). The national education goals report. Washington, DC: U.S. Government Printing Office Nijenhuis, J., & Flier, H. (2005). Immigrant-majority group differences on work-related measures: the case for cognitive complexity. Personality and Individual Differences, 38, 12131221. Nisbett, R. E. (Ed.) (1993). Rules for reasoning. Hillsdale, NJ: Lawrence Erlbaum Nunnally, J.C. (1978). Psychometric theory (2nd ed.). Hew York: McGraw-Hill. O*Net OnLine (2005). Skill searches for: Mathematics, Critical Thinking. Occupational Information Network: O*Net OnLine. Retrieved July 17, 2006 via O*Net OnLine Access: http://online.onetcenter.org/skills/result?s=2.A.2.a&s=2.A.1.e&g=Go Paul, R., & Nosich, G.M. (2004). A Model for the National Assessment of Higher Order Thinking. Retrieved July 13, 2006, from
http://www.criticalthinking.org/resources/articles/a-model-nal-assessmenthot.shtml
Perkins, D. N. & Grotzer, T. A. (1997). Teaching intelligence. American Psychologist, 52, 11251133. Rust, J. (2002) Rust Advanced Numerical Reasoning Appraisal Manual. The Psychological Corporation: London. Shoenfelt, E.L., & Pedigo, L.C. (2005, April). A Review of Court Decisions on Cognitive Ability Testing, 1992-2004. Poster Presentation at the 20th Annual Conference of the Society for Industrial and Organizational Psychology, Los Angeles, CA. Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: Author. Spelke, Elizabeth S. (2005). Sex differences in intrinsic aptitude for mathematics and science? A critical review. American Psychologist, 60, 958-958. Starkey, P. (1992). The early development of numerical reasoning. Cognition, 43, 93126. U.S. Department of Labor. (1999). Testing and assessment: An employers guide to good practices. Washington, DC: Author. Vandenbroucke, Jan P. (1998). Clinical investigation in the 20th century: The ascendancy of numerical reasoning. The Lancet, 175(352), 1216.
33
Watson, G. B. & Glaser, E. M. (2006) WatsonGlaser Critical Thinking Appraisal Short Form Manual. San Antonio, TX: Pearson. Wynn, K., Bloom, P., & Chiang, W. (2002). Enumeration of collective entities by 5month-old infants. Cognition, 83, B55B62.
34
35
N = 88 Mean = 20.1 SD = 5.6 Industry Characteristics Financial Services/Banking/Insurance = 38.6% Government/Public Service/Defense = 19.3% Professional Business Services/Consulting = 10.2% Publishing/Printing = 12.5% Real Estate = 2.3% Retail/Wholesale = 1.1% Other (unspecified) = 14.8% N = 200 Mean = 22.1 SD = 6.4 Industry Characteristics Financial Services/Banking/Insurance = 23.0% Government/Public Service/Defense = 36.5% Professional Business Services/Consulting = 12.5% Publishing/Printing = 7.5% Real Estate = 1.0% Retail/Wholesale = 1.5% Other (unspecified) = 16.5%
36
Appendix B
ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Norm Group
Table B.1 ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Position Level
Executives/ Directors
99 99 96 92 87 81 76 67 58 54 48 43 40 34 30 26 23 18 13 9 7 6 5 4 3 2 1 1 1 1 1 1 1 Raw Score Mean = 21.3 SD = 6.0 N = 91
T Score
68 66 65 63 62 60 58 57 55 54 52 50 49 47 46 44 43 41 39 38 36 35 33 31 30 28 27 25 23 22 20 19 17
37
Table B.2
ANRA Total Raw Scores, MId-Point Percentile Ranks, and T Scores for Employees in Various Financial Occupations (see Table A.1 for a list of the
occupations in this norm group) ANRA Total Raw Score
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Raw Score Mean = 21.9 Raw Score SD = 6.4 N = 198
T Score
68 66 65 63 62 60 58 57 55 54 52 50 49 47 46 44 43 41 39 38 36 35 33 31 30 28 27 25 23 22 20 19 17
38
Appendix C
Combined Watson-Glaser and ANRA T Scores and Percentile Ranks by Norm Group
Table C.1 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level
Percentile Ranks by Position Combined T Scores
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110
Executives/ Directors
99 99 99 99 99 98 98 97 95 93 92 91 88 85 82 82 80 77 75 73 71 69 66 63 62 61
Managers
99 99 99 99 99 99 99 99 98 97 97 96 94 91 89 88 87 86 86 84 81 79 78 78 77 75
Combined T Scores
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110
39
Table C.1
Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level continued
Percentile Ranks by Position Level Executives/ Professionals/ Directors Individual Contributors Managers
60 58 56 55 53 51 48 46 43 42 41 40 38 37 34 31 29 27 27 25 22 21 20 20 19 18 17 16 16 16 16 14 11 10 9 73 71 70 67 64 61 60 57 55 54 52 51 48 44 42 39 36 34 32 29 27 26 25 24 23 21 19 18 16 15 14 14 13 12 12 51 48 46 44 42 41 39 37 36 35 33 31 30 29 28 27 26 25 23 22 22 21 19 18 17 17 16 14 13 12 11 10 9 8 8
Combined T Scores
109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75
Combined T Scores
109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75
40
Table C.1
Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level continued
Percentile Ranks by Position Level Executives/ Professionals/ Directors Individual Contributors Managers
8 7 6 6 6 5 4 4 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 1 1 N = 91 11 9 7 5 3 3 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 N = 88 8 8 7 7 7 7 6 5 5 5 4 4 3 3 2 2 2 2 2 <=1 <=1 <=1 <=1 <=1 <=1 N = 200
Combined T Scores
74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50
Combined T Scores
74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50
41
Table C.2
Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations
(See Table A.1 for a list of the occupations in this group.)
Combined T Scores
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96
Combined T Scores
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96
42
Table C.2 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations (continued)
Percentile Ranks for Employees in Financial Occupations
35 34 32 31 29 28 27 27 26 25 24 22 20 19 18 17 16 14 12 10 9 8 7 5 4 4 4 4 3 3 2 2 1 1 1 1
Combined T Scores
95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60
Combined T Scores
95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60
43
Table C.2 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations (continued)
Percentile Ranks for Employees in Financial Occupations
1 1 1 1 1 1 1 1 1 1 N = 198
Combined T Scores
59 58 57 56 55 54 53 52 51 50
Combined T Scores
59 58 57 56 55 54 53 52 51 50
44