You are on page 1of 50

Advanced Numerical TM Reasoning Appraisal (ANRA)

Manual
John Rust

888-298-6227 TalentLens.com

Copyright 2006 NCS Pearson, Inc. All rights reserved.

Copyright 2006 by NCS Pearson, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the copyright owner. The Pearson and TalentLens logos, and Advanced Numerical Reasoning Appraisal are trademarks, in the U.S. and/or other countries, of Pearson Education, Inc. or its affiliate(s). Portions of this work were previously published. Printed in the United States of America.

Table of Contents
Acknowledgements Chapter 1 Introduction ............................................................................................. 1 Numerical Reasoning and Critical Thinking ............................................................ 2 Chapter 2 History and Development of ANRA....................................................... 4 Description of the Test ............................................................................................ 4 Adapting RANRA .................................................................................................... 4 Development of RANRA ......................................................................................... 5 Chapter 3 Directions for Administration ................................................................ 6 General Information ................................................................................................ 6 Preparing for Administration.................................................................................... 6 Testing Conditions .................................................................................................. 7 Answering Questions .............................................................................................. 7 Administering the Test ............................................................................................ 7 Scoring and Reporting ............................................................................................ 8 Test Security ........................................................................................................... 8 Concluding Test Administration .............................................................................. 8 Administering ANRA and Watson-Glaser Critical Thinking Appraisal in a Single Testing Session..................................................................................... 9 Accommodating Examinees with Disabilities .......................................................... 9 Chapter 4 ANRA Norms Development.................................................................... 10 Using ANRA as a Norm- or Criterion-Referenced Test........................................... 10 Using Norms to Interpret Scores ............................................................................. 11 Converting Raw Scores to Percentile Ranks .......................................................... 12 Using Standard Scores to Interpret Performance ................................................... 12 Converting z Scores to T Scores....................................................................... 13 Using ANRA and Watson-Glaser Critical Thinking Appraisal Together .................. 14

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Chapter 5 Evidence of Reliability............................................................................ 15 Reliability Coefficients and Standard Error of Measurement................................... 15 RANRA Reliability Studies ..................................................................................... 17 ANRA Reliability Studies ......................................................................................... 17 Evidence of Internal Consistency ...................................................................... 18 Evidence of Test-Retest Stability ...................................................................... 20 Chapter 6 Evidence of Validity ................................................................................ 20 Face Validity............................................................................................................ 20 Evidence Based on Test Content............................................................................ 21 Evidence Based on Test-Criterion Relationships .................................................... 22 Correlations Between ANRA Test1 and Test 2 ....................................................... 25 Evidence of Convergent and Discriminant Validity ................................................. 25 Correlations Between ANRA and Watson-Glaser Critical Thinking AppraisalShort Form ........................................................... 25 Correlations Between ANRA and Other Tests .................................................. 26 Chapter 7 Using ANRA as an Employment Selection Tool .................................. 27 Employment Selection ............................................................................................ 27 Using ANRA in Making a Hiring Decision ............................................................... 27 Differences in Reading Ability, Including the Use of English as a Second Language .......................................................................................... 29 Using ANRA as a Guide for Training, Learning, and Education.............................. 29 Fairness in Selection Testing .................................................................................. 30 Legal Considerations......................................................................................... 30 Group Differences and Adverse Impact ............................................................ 30 Monitoring the Selection System....................................................................... 31 References................................................................................................................... 32 Appendices Appendix A Description of the Normative Sample .................................................... 35 Appendix B Appendix C ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Norm Group....................................................................... 37 Combined Waston-Glaser and ANRA T Scores and Percentile Ranks by Norm Group.......................................................... 39

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Tables Table 5.1

Coefficient Alpha, Odd-Even Split-Half Reliability, and Standard Error of Measurement (SEM) for RANRA (from Rust, 2002, p.85).......................................................................... 17 ANRA Means, Standard Deviations (SD), Standard Errors of Measurement (SEM), and Internal Consistency Reliability Coefficients (Alpha) ............................................................................... 18 ANRA Test-Retest Stability (N = 73)...................................................... 19 Evidence of ANRA Criterion-Related Validity (Total Raw Score) of Job Incumbents in Various Finance-Related Occupations and Position Levels ............................................................................... 24 Correlations Between Watson-Glaser Critical Thinking AppraisalShort Form and ANRA (N = 452) ........................................ 25 Correlations Between ANRA, the Miller Analogies Test for Professional Selection (MAT for PS), and the Differential Aptitude Tests for Personnel and Career AssessmentNumerical Ability (DAT for PCANA) .................................................................... 26 The Relationship of Percentiles to T Scores ......................................... 14

Table 5.2

Table 5.3 Table 6.1

Table 6.2 Table 6.3

Figure Figure 4.1

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Acknowledgements
Pearsons Talent Assessment group would like to recognize and thank Professor John Rust, Director of the Psychometrics Center at the University of Cambridge, United Kingdom, for his seminal efforts that led to his development of the Rust Advanced Numerical Reasoning Appraisal (RANRA). This manual details our adaptation of RANRA for use in the United Statesthe Advanced Numerical Reasoning Appraisal (ANRA). We are indebted to numerous professionals and organizations for their assistance during several phases of our workproject design, data collection, statistical data analyses, editing, and publication. We acknowledge the efforts of Julia Kearney, Sampling Projects Coordinator; Jane McDonald, Sampling Recruiter; Terri Garrard, Study Manager; David Quintero, Clinical Handscoring Supervisor; Hector Solis, Sampling Manager, and Victoria Locke, Director, Field Research, in driving the data collection activities. Nishidha Goel helped to collate and prepare the data. We thank Zhiming Yang, PhD, Psychometrician, and JJ Zhu, PhD, Director of Psychometrics, Clinical Products. Dr. Yangs technical expertise in analyzing the data and Dr. Zhu's psychometric leadership ensured the high level of psychometric integrity of the results. Our thanks also go to Toby Mahan and Troy Beehler, Project Managers, for diligently managing the logistics of this project. Toby and Troy worked with several team members from the Technology Products Group, Pearson to ensure the high quality and accuracy of the computer interface. These dedicated individuals included Paula Oles, Manager, Software Quality Assurance; Christina McCumber, Software Quality Assurance Analyst; Matt Morris, Manager, System Development; Maurya Buchanan, Technical Writer; and Alan Anderson, Director, Technology Products Group. Dawn Dunleavy, Senior Managing Editor, Konstantin Tikhonov, Project Editor; and Marion Jones, Director, Mathematics, provided editorial guidance. Mark Cooley assisted with the design of the cover. Finally, we wish to acknowledge the leadership, guidance, support, and commitment of the following people through all the phases of this project: Jenifer Kihm, PhD, Senior Product Line Manager, Talent Assessment; John Toomey, Director, Talent Assessment; Paul McKeown, International Product Development Director; Judy Chartrand, PhD, Director, Test Development; Gene Bowles, Vice President, Publishing and Technology; Larry Weiss, PhD, Vice President, Psychological Assessment Products Group; and Aurelio Prifitera, PhD, Group President and CEO of Clinical Assessment/Worldwide. Kingsley C. Ejiogu, PhD, Research Director John Trent, M.S., Research Director Mark Rose, PhD, Research Director

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Chapter 1

Introduction
The Advanced Numerical Reasoning Appraisal (ANRA) measures the ability to recognize, understand, and apply mathematical and statistical reasoning. Specifically, ANRA measures numerical reasoning abilities that involve deduction, interpretation, and evaluation. Numerical reasoning, as measured by ANRA, is operationally defined as the ability to correctly perform the domain of tasks represented by two sets of itemsComparison of Quantities and Sufficiency of Information. Both require the use of analytical skills rather than straightforward computational skills. The key attribute ANRA measures is an individuals ability to apply numerical reasoning to everyday problem solving in professional and business settings. Starkey (1992) describes numerical reasoning as comprising a set of abilities that are used to operate upon or mentally manipulate representations of numerosity (p. 94). Research suggests that numerical reasoning abilities exist even in infancy, before children begin to receive explicit instruction in mathematics in school (Brannon, 2002; Feigenson, Dehaene, & Spelke, 2004; Spelke, 2005; Starkey, 1992; Wynn, Bloom, & Chiang, 2002). As Spelke (2005) observed, children harness these core abilities when they learn mathematics, and adults use the core abilities to engage in mathematical and scientific thinking. The numerical reasoning skill is the foundation of all other numerical ability (Rust, 2002). This skill enables individuals to learn how to evaluate situations, how to select and apply strategies for problem-solving, how to draw logical conclusions using numerical data, how to describe and develop solutions, and to recognize when and how to apply the solutions. Eventually, one is able to reflect on solutions to problems and determine whether the solutions make sense. The nature of work is changing significantly and there is an increased demand for a new kind of workerthe knowledge worker (Hunt, 1995). As Facione (2006) observed, though the ability to think critically and make sound decisions does not absolutely guarantee a life of happiness and economic success, having this ability equips an individual to improve his or her future and contribute to society. As the Internet has transformed home life and leisure time, people have been deluged with data of ever-increasing complexity. They must select, interpret, digest, evaluate, learn, and apply information. Employers are typically interested in tests that measure candidates' ability to apply constructively and critically, rather than by rote, what they have learned. A person can be trained or educated to

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

engage in numerical reasoning; as a result, tests that measure the ability to use mathematical reasoning within the context of work have an important function in career development. Such tests enable an organization to identify candidates who may need to improve their skills to enhance their work effectiveness and career success.

Numerical Reasoning and Critical Thinking


In a skills search of the O*Net OnLine database for Mathematics (defined by O*Net OnLine as using mathematics to solve problems) and Critical Thinking (defined by O*Net OnLine as using logic and reasoning to identify the strengths and weaknesses of alternative solutions, conclusions, or approaches to problems), both of these skills were rated as Very Important for as many as 99 occupations (accountant, actuary, auditor, financial analyst, government service executive, management analyst, occupational health and safety specialist, etc.). Numerical reasoning and critical thinking are essential parts of the cognitive complexity that is a basic factor for understanding group differences in work performance (Nijenhuis & Flier, 2005). Both numerical reasoning and critical thinking are higher-order thinking skillsfundamental skills that are essential to being a responsible, decision-making member of the work-place (Paul & Nosich, 2004, p. 5). Paul and Nosich contrasted the higher-order thinking skills with such lower-order thinking skills as rote memorization and recall, and they noted that critical thinking could be applied to any subject matter and any situation where reasoning is relevant. Such a subject matter or situation could range from accounting (Kealy, Holland, & Watson, 2005; American Institute of Certified Public Accountants, 1999), through medicine (Vandenbroucke, 1998), to truck driving (Nijenhuis & Flier, 2005). As Paul and Nosich (2004) stated, in any context where we are thinking well, we are thinking critically. The enhancement of critical thinking in U.S. college students is a national priority (National Educational Goals Panel, 1991). In a paper commissioned by the United States Department of Education, Paul and Nosich (2004) highlighted what the National Council for Excellence in Critical Thinking Instruction regarded as a basic principle of critical thinking instruction as applied to subject-matter teaching: to achieve knowledge in any domain, it is essential to think critically (Paul & Nosich, p. 33). Critical thinking is the skill that is required to increase the probability of desirable outcomes in our lives, such as making the right career choice, using money wisely, or planning our future. Such critical thinking is reasoned, purposeful, and goal directed. At the cognitive level, such critical thinking involves solving problems, formulating inferences, calculating likely outcomes and decision-making. Once people have developed this

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

critical thinking skill, they are able to apply it in a wide variety of circumstances. Critical thinking can involve proper language use, applied logic, and practical mathematics. Because ANRA items require higher-order numerical reasoning skills, rather than rote calculation to solve, using the Watson-Glaser Critical Thinking Appraisal (a reliable and valid test of verbal critical thinking) in conjunction with ANRA provides a demanding, high-level measurement of numerical reasoning and verbal critical thinking skills, respectively. These two skills are important when recruiting in the competitive talent assessment market. In response to requests from Watson-Glaser Critical Thinking Appraisal customers in the United Kingdom, The Psychological Corporation (now Pearson) in the UK developed the Rust Advanced Numerical Reasoning Appraisal (RANRA) in 2000 as a companion numerical reasoning test for the Watson-Glaser Critical Thinking Appraisal. In 2006, Pearson adapted RANRA to enhance the suitability and applicability of the test in the United States. This manual contains detailed information on the U.S. adaptationANRA.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Chapter 2

History and Development of ANRA Description of the Test


ANRA consists of a set of two tests: Test 1Comparison of Quantities and Test 2Sufficiency of Information. The candidate must apply his or her numerical reasoning skills to decisions that reflect the wide variety of numerical estimation and analytic tasks frequently encountered in many everyday situations at work or in a learning environment. The two ANRA tests are designed to measure different, but interdependent, aspects of numerical reasoning. The tests require the candidate to consider alternatives (either by comparing quantities or judging information to be sufficient) in relation to given problems. The examinee's task is to study each problem and to evaluate the appropriateness or validity of the alternatives. The ANRA maximum total raw score is 32. Because ANRA is intended as a test of numerical reasoning power rather than speed, there is no rigid time limit for taking the test. Candidates should be given as much time as they reasonably need to finish the test. An individual typically completes the test in about 45 minutes. About 90% of the 452 individuals in the normative group who were employed in professional, management, and higher-level positions completed the test within 75 minutes.

Adapting RANRA
The Rust Advanced Numerical Reasoning Appraisal (RANRA) was adapted to reflect U.S. English and U.S. measurement units. Because RANRA measures reasoning more than computation, only the measurement units were changed and the original numbers were kept, except in cases where it affected the realism of the situation. For example, 82 kilograms was changed to 82 pounds, though 82 kg = 180.4 lbs. Similarly, 5,000 British pounds sterling was changed to 5,000 U.S. dollars, though 5,000 British pounds sterling 5,000 U.S. dollars. ANRA contains the original 32 RANRA items plus additional items for continuous test improvement purposes. All the items were reviewed by a group comprising 16 individuals researchers in test development, financial analysts, business development professionals, industrial/organizational psychologists, and editors in test publishing. Item sentence construction was modified in some items, based on input from the American reviewers.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Development of RANRA
In developing RANRA (2002), Rust first did a conceptual analysis of the role of critical thinking in the use of mathematics. Through this conceptual analysis, he identified the two subdomains of comparison of quantities and sufficiency of information as the key concepts in developing an assessment of mathematical reasoning. Rust then constructed 80 items and had a panel of educators and psychologists evaluate and modify them, and then generated the pilot version of RANRA. This pilot version of RANRA was administered to 76 students and staff from diverse subject backgrounds within the University of London. The data were subjected to detailed analysis at the item level. Distractor analysis led to the modification of some items. Itemdifficulty values were calculated for each item, based on the proportion of examinees passing each item. The discrimination index was also calculated, and those items that showed they were measuring a common quality in numerical reasoning were identified and retained. This approach led to the development of the 32-item RANRA.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Chapter 3

Directions for Administration


General Information
ANRA is administered through the online testing platform at TalentLens.com, an Internet-based testing system designed by Pearson, for the administration, scoring, and reporting of professional assessments. Instructions for administrators on how to order and access the test online are provided at TalentLens.com. Instructions for accessing ANRA interpretive reports are provided on the website. After a candidate has taken ANRA online, the test administrator can use the link Pearson provides to review the candidates results in an interpretive report.

Preparing for Administration


Being thoroughly prepared before administering the test results in a more efficient administration session. Test administrators should take ANRA prior to administering the test and comply with the directions. Candidates are not allowed to use calculators or similar calculation devices while completing the test. Test administrators should provide candidates with pencils, an eraser, and a sheet of paper to write their calculations if needed. Test administration must comply with the code of practice of the testing organization, applicable government regulations, and the recommendations of the test publisher. Candidates should be informed before the testing session about the nature of the assessment, why the test is being used, the conditions under which they will be tested, and the nature of any feedback they will receive. Test administrators need to assure candidates that their test results will remain confidential. The test administrator must obtain informed consent from the candidate before testing. The informed consent is a written statement that explains the type of test to be administered, the purpose of the test, as well as who will have access to the test data, signed by the candidate. It is the responsibility of the test user to ensure that candidates understand the testing procedure. The test administrator should also ensure that all relevant background information from the candidate is collected and verified (e.g., name, gender, educational level, current employment, occupational history, and so on).

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Testing Conditions
The test administrator has a significant responsibility to ensure that the conditions under which the test is taken do not contain undesirable influences on the test performance of candidates. Such undesirable influences can either inflate or reduce the test scores of candidates. Poor administration of a test undermines the value of test scores and makes an accurate interpretation of results very difficult, if not impossible. It is important to ensure that the test is administered in a quiet, well-lit room. The following conditions are necessary for accurate scores and for maintaining the cooperation of the examinee: good lighting, comfortable seating, adequate desk or table space, comfortable positioning of the computer screen, keyboard and mouse, and freedom from noise and other distractions. Interruptions and distractions from outside should be kept to a minimum, if not eliminated.

Answering Questions
The test administrator may answer examinees' questions about the test before giving the signal to begin. To maintain standard testing conditions, answer such questions by re-reading the appropriate section of these directions. Do not volunteer new explanations or examples. The test administrator is responsible for ensuring that examinees understand the correct way to indicate their answers and what is required of the examinees. The question period should never be rushed or omitted. If any examinees have routine questions after the testing has started, try to answer them without disturbing the other examinees. However, questions about the test items should be handled by telling the examinee to do his or her best.

Administering the Test


After the examinee is seated at the computer and the initial instruction screen for ANRA appears, say, The on-screen directions will take you through the entire process that begins with some demographic questions. After you have completed these questions, the test will begin. You will have as much time as you reasonably need to complete the test items. The test ends with a few additional demographic questions. Do you have any questions before starting the test? Answer any questions and say, Please begin the test.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Once the examinee clicks the Start Your Test button, administration begins with the first page of questions. The examinee may review test items at the end of the test. Allow examinees as much time as they reasonably need to complete the test. Average completion time is about 45 minutes. About 90% of candidates are finished with the test within 75 minutes. If an examinees computer develops technical problems during testing, the test administrator should move the examinee to another suitable computer location. If the technical problems cannot be solved by moving to another computer location, the administrator should contact Pearsons Technical Support at 1-888-298-6227 for assistance.

Scoring and Reporting


Scoring is automatic, and the report is typically available within a minute after the test is completed. A link to the report will be available on the online testing platform at TalentLens.com. Adobe Acrobat Reader is required to open the report. The test administrator may view, print, or save the candidates report.

Test Security
ANRA scores are confidential and should be stored in a secure location accessible only to authorized individuals. It is unethical and poor test practice to allow test-score access to individuals who do not have a legitimate need for the information. Storing test scores in a locked cabinet or password-protected file that can only be accessed by designated test administrators will help ensure the security of the test scores. The security of testing materials (e.g., access to online tests) and protection of copyright must also be maintained by authorized individuals. Avoid disclosure of test access information such as usernames or passwords, and only administer ANRA in proctored environments. All the computer stations used in administering ANRA must be in locations that can be easily supervised and with adequate level of security.

Concluding Test Administration


At the end of the testing session, thank the candidate(s) for his or her participation and check the computer station(s) to ensure that the test is closed. ANRA can be a demanding test for some candidates. It may be constructive to clarify what part the test plays within the context of the selection or assessment procedures. It is also constructive to reassure candidates about the confidentiality of their test scores.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Administering ANRA and Watson-Glaser Critical Thinking Appraisal in a Single Testing Session
When administering the ANRA and the Watson-Glaser in a single testing session, administer the Watson-Glaser first. Just as ANRA is intended as a test of numerical reasoning power rather than speed, the Watson-Glaser is intended as a test of critical thinking power rather than speed. Both tests are untimed; administration of ANRA and the Watson-Glaser Short Form in one session should take about 1 hour and 45 minutes.

Accommodating Examinees With Disabilities


The Americans with Disabilities Act (ADA) of 1990 requires an employer to reasonably accommodate the known disability of a qualified applicant, provided such accommodation would not cause an undue hardship to the operation of the employers business. The test administrator should provide reasonable accommodations to enable candidates with special needs to comfortably take the test. Reasonable accommodations may include, but are not limited to, modifications to the test environment (e.g., high desks) and medium (e.g., having a reader read questions to the examinee, or increasing the font size of questions) (Society for Industrial and Organizational Psychology, 2003). In situations where an examinees disability is not likely to impair his or her job performance, but may hinder the examinees performance on ANRA, the organization may want to consider waiving the test or de-emphasizing the score in lieu of other application criteria. Interpretive data as to whether scores on ANRA are comparable for examinees who are provided reasonable accommodations are not available at this time due to the small number of examinees who have requested such accommodations.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Chapter 4

ANRA Norms Development


Norms provide a basis for evaluating an individual's score relative to the scores of other individuals who took the same test. Norms allow for the conversion of raw scores to more useful comparative scores, such as percentile ranks. Typically, norms are constructed from the scores of a large sample of other individuals who took the test under similar conditions. This group of individuals is called the norm group. The characteristics of the sample used for preparing norms are critical in determining the usefulness of those norms. For such purposes as selecting from among applicants to fill a particular job, normative information derived from a specific, relevant, well-defined group might be most useful. However, the composition of the sample of job applicants is influenced by a variety of situational factors, including the job demands and local labor market conditions. Because such factors can vary across jobs, locations, and over time, the limitations on the usefulness of any set of published norms should be recognized. When a test is used to make employment decisions, the most appropriate norm group is one that is representative of those who will be taking the test in the local situation. It is best, whenever possible, to prepare local norms by accumulating the test scores of applicants, trainees, or employees. One of the factors that must be considered in establishing norms is sample size. Data from small samples tend to be unstable and the presentation of percentile ranks for all possible scores is imprecise. As a result, the use of in-house norms is only recommended when the sample is sufficiently large (about 100 or more people). Until a sufficient and representative number of cases has been collected, the test user should consider norms based on other similar groups rather than from local data with a small sample size. In the absence of adequate local norms, the norms provided in Appendixes A and B should be used to guide the interpretation of scores.

Using ANRA as a Norm- or Criterion-Referenced Test


ANRA may be used as a norm-referenced or as a criterion-referenced instrument. A normreferenced test enables a human resource professional to interpret an individual's test performance in comparison to a particular normative group. An individual's performance on a criterionreferenced instrument can only indicate whether or not that individual meets certain, predefined criteria. It is appropriate to use ANRA as a norm-referenced instrument in the process of

10

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

employment selections. For optimal results in such decisions, the overall total score, rather than the subtest scores should be used. Subtest scores represent fewer items and, therefore, are less stable than the total score. However, as a criterion-referenced measure, it is feasible to use subtest scores to analyze the numerical reasoning abilities of a class or larger group and to determine the types of numerical reasoning or critical thinking training that may be most appropriate. In norm-referenced situations, raw scores need to be converted before they can be compared. Though raw scores may be used to rank candidates in order of performance, little can be inferred from raw scores alone. There are two main reasons for this. First, raw scores cannot be treated as having equal intervals. For example, it would be incorrect to assume that the difference between raw scores of, say, 20 and 21 is of the same significance as the difference between raw scores of 30 and 31. Second, ANRA raw scores may not be normally distributed. Hence, they are not subject to the psychometric principles of parametric statistics required for the proper evaluation of validity.

Using Norms to Interpret Scores


The ANRA norms presented in Appendix B and Appendix C were derived from data collected February 2006 through June 2006, from 452 adults in a variety of employment settings. The tables in Appendix B (Tables B.1 and B.2) show the ANRA total raw scores with corresponding percentile ranks and T scores for the identified norm groups. When using the norms tables in Appendix B, look for a group that is similar to the individual or group tested. For example, you would compare the test score of a person who applied for a Manager position with norms derived from the scores of other managers. When using the norms in Appendix B to interpret candidates scores, keep in mind that norms are affected by the composition of the groups that participated in the normative study. Therefore, it is important to examine specific position level and occupational characteristics of a norm group. By comparing an individuals raw score to the data in a norms table, it is possible to determine the percentile rank corresponding to that score. The percentile rank indicates an individuals relative position in the norm group. Percentiles should not be confused with percentage scores that represent the percentage of correct items. Percentiles are derived scores that are expressed in terms of the percent of people in the norm group scoring equal to or below a given raw score. Percentiles have the advantage of being readily understood and universally applicable. However, although percentiles are useful for expressing an examinees performance relative to other candidates, percentiles have limitations. For example, percentile ranks do not have equal intervals. While percentiles indicate the relative position of each candidate in relation to the
Copyright 2006 by NCS Pearson, Inc. All rights reserved.

11

normative sample, they do not show the amount of difference between scores. In a normal distribution of scores, percentile ranks tend to cluster around the 50th percentile. This clustering affects scores in the average range the most because a difference of one or two raw score points may change the percentile rank. Extreme scores are affected less; a change in one or two raw score points at the extremes typically does not produce a large change in percentile ranks. These factors should be considered when interpreting percentile ranks.

Converting Raw Scores to Percentile Ranks


To find the percentile rank of a candidates raw score, locate the ANRA total raw score in Table B.1 or B.2. The corresponding percentile rank is read from the selected norm group column. For example, if a person applying for a job as a Director had a score of 25 on ANRA, it is appropriate to use the Executives/Directors norms in Table B.1 for comparison. In this case, the percentile rank corresponding to a raw score of 25 is 67. This percentile rank indicates that about 67% of the people in the norm group scored lower than or equal to a score of 25 on ANRA, and about 33% scored higher than a score of 25 on ANRA. The lowest raw score will lie at the 1st percentile; the median raw score will fall at the 50th percentile, and the highest raw score will lie at the 99th percentile. Each groups size (N), raw score mean, and raw score standard deviation (SD) are shown at the bottom of the norms tables. The group raw score mean or average is calculated by summing the raw scores and dividing the sum by the total number of examinees. The standard deviation indicates the amount of variation in a group of scores. In a normal distribution, approximately two-thirds (68.26%) of the scores are within the range of 1 SD below the mean to 1 SD above the mean. These statistics are often used in describing a sample and setting cut scores. For example, a cut score may be set as one SD below the mean. In compliance with the Civil Rights Act of 1991, Section 5 (a) (1), as amended, the norms provided in Appendix B and Appendix C combine data for males and females, and for white and minority candidates.

Using Standard Scores to Interpret Performance


Test results can be reported in many different formats. Examples of these formats include raw scores, percentiles, and various forms of standard scores. Standard scores express the score of each individual in terms of its distance from the mean. Examples of standard scores are z scores and T scores. Standard scores do not suffer from the drawbacks associated with percentiles. The advantage of percentiles is that they are readily understood and, therefore, immediately meaningful. As indicated above, however, there is a risk of percentiles being confused with

12

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

percentage scores, or of percentiles being interpreted as an interval scale. Standard scores avoid the unequal clustering of scores by adopting a scale based on standard deviation units. The basic type of standard score is the z score, which is a raw score converted to a standard deviation unit. Thus a raw score that is 0.53 standard deviations below the mean score for the group receives a z score of 0.53. z scores are generally in the 3.00 to + 3.00 range. However, there are certain disadvantages in saying that a person has a score of 0.53 on a test. From the point of view of presentation, the use of decimal points and the negative symbol is unappealing. Hence, certain transformation algorithms have become available that enable a more user-friendly image for standard scores.

Converting z Scores to T Scores


To convert a z score to a T score, multiply the z score by 10 and add 50. Thus, a z score of 0.53 becomes a T score of 44.7, which is then rounded, as a matter of convention, to the nearest whole number, that is, 45. A set of T scores has a mean of 50 and at each standard deviation point there is a score difference of 10. Thus, a T score of 30 is at two standard deviations below the mean, while a T score of 60 is one standard deviation above the mean. The T score transformation results in a scale that runs from 10 to 90, with each 10th interval coinciding with a standard deviation point. Appendix B shows ANRA T scores. Appendix C shows the sum of WatsonGlaser and ANRA T scores and their corresponding percentiles. Because the Watson-Glaser and ANRA do not measure identical constructs, their combined T scores must be derived by first transforming separate Watson-Glaser and ANRA raw score pairs to their respective T scores, and then summing the T scores. Figure 4.1 illustrates the relationship between percentiles and T scores.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

13

Figure 4.1

The Relationship of Percentiles to T Scores

Using ANRA and Watson-Glaser Critical Thinking Appraisal Together


The ANRA and Watson-Glaser combined score provides a broader range of critical reasoning skills than would be obtained by the use of each test alone. Scores from ANRA and the WatsonGlaser can be combined by first converting each total raw score to a T score and then adding the two T scores together. The sum of the T scores can also be converted to percentile ranks. Appendix C (Tables C.1 and C.2) shows the percentile ranks of the sum of ANRA and Watson-Glaser Short Form T scores. Another potential benefit from using ANRA and the Watson-Glaser together is in the expected difference between scores on the two tests. This expected difference depends on the type of norm group to which the candidate belongs. Generally speaking, candidates in financial or scientific occupations are expected to score higher on ANRA than on the Watson-Glaser. On the other hand, managers, particularly in fields where critical thinking using language is a key skill, and employees in occupations that do not require a great deal of numeracy, will be expected to perform better on the Watson-Glaser than on ANRA. By examining the difference between a candidates Watson-Glaser and ANRA scores, the user can make appropriate development suggestions to the candidate.

14

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Chapter 5

Evidence of Reliability
The reliability of a measurement instrument refers to the accuracy, consistency, and precision of test scores across situations (Anastasi & Urbina, 1997). Test theory posits that a test score is an estimate of an individuals hypothetical true score, or the score an individual would receive if the test were perfectly reliable. In actual practice, however, some measurement error is to be expected. A reliable test has relatively small measurement error. The methods most commonly used to estimate test reliability are testretest (the stability of test scores over time), alternate forms (the consistency of scores across alternate forms of a test), and internal consistency of the test items (e.g., Cronbachs alpha coefficient, Cronbach 1970). Decisions about the form of reliability to be used in comparing tests depend on a consideration of the nature of the error that is involved in each form. Different types of error can be operating at the same time, so it is to be expected that reliability coefficients will differ in different situations and on different groupings and samplings of respondents. An appropriate estimate of reliability can be obtained from a large representative sample of the respondents to whom the test is generally administered.

Reliability Coefficients and Standard Error of Measurement


The reliability of a test is expressed as a correlation coefficient, which represents the consistency of scores that would be obtained if a test could be given an infinite number of times. Reliability coefficients are a type of estimate of the amount of error associated with test scores and can range from .00 to 1.00. The closer the reliability coefficient is to 1.00, the more reliable the test. A perfectly reliable test would have a reliability coefficient of 1.00 and no measurement error. A completely unreliable test would have a reliability coefficient of .00. The U.S. Department of Labor (1999) provides the following general guidelines for interpreting a reliability coefficient: above .89 is considered excellent, .80.89 is good, .70.79 is considered adequate, and below .70 may have limited applicability.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

15

Repeated testing leads to some variation. Consequently, no single test event effectively measures an examinees actual ability with complete accuracy. Therefore, an estimate of the possible amount of error present in a test score, or the amount that scores would probably vary if an examinee were tested repeatedly with the same test is necessary. This estimate of error is known as the standard error of measurement (SEM). The SEM decreases as the reliability of a test increases. A large SEM denotes less reliable measurement and less reliable scores. The standard error of measurement is calculated with the formula:
SEM = SD 1 r xx

In this formula, SEM represents the standard error of measurement, SD represents the standard deviation of the distribution of obtained scores, and rxx represents the reliability coefficient of the test (Cascio, 1991, formula 7-11). The SEM is a quantity that is added to and subtracted from an examinees standard test score to create a confidence interval or band of scores around the obtained standard score. The confidence interval is a score range that, in all likelihood, includes the examinees hypothetical true score that represents the examinees actual ability. A true score is a theoretical score entirely free of error. Since the true score is a hypothetical value that can never be obtained because testing always involves some measurement error, the score obtained by an examinee on any test will vary somewhat from administration to administration. As a result, any obtained score is considered only an estimate of the examinees true score. Approximately 68% of the time, the observed standard score will lie within +1.0 and 1.0 SEM of the true score; 95% of the time, the observed standard score will lie within +1.96 and 1.96 SEM of the true score; and 99% of the time, the observed standard score will lie within +2.58 and 2.58 SEM of the true score. Using the SEM means that standard scores are interpreted as bands or ranges of scores, rather than as precise points (Nunnally, 1978). To illustrate the use of SEM with an example, assume a director candidate obtained a total raw score of 25 on ANRA, with SEM = 2.32. From the information in Table B.1, the standard score (T score) for this candidate is 57. We can, therefore, infer that if this candidate were administered a large number of alternative forms of ANRA, 95% of this candidates T scores would lie within the range between 57 1.96 x 2.32 52 T score points and 57 + 1.96 x 2.32 62 T score points. We can further infer that the expected average of this persons T scores from a large number of alternate forms of ANRA would be 57.

16

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Thinking in terms of score ranges serves as a check against overemphasizing small differences between scores. The SEM may be used to determine if an individuals score is significantly different from a cut score, or if the scores of two individuals differ significantly. An example of one general rule of thumb is that the difference between two scores on the same test should not be interpreted as significant unless the difference is equal to at least twice the standard error of the difference (SED), where SED = SEM

2 (Gulliksen, as cited in Cascio, 1991, p.143).

RANRA Reliability Studies


Because ANRA is a U.S. adaptation of RANRA, the information on previous studies refers to RANRA. For the sample used in the initial development of RANRA in the United Kingdom (N = 1546), Cronbachs alpha coefficient and split-half reliability were .78 for the overall RANRA score (Rust, 2002). The reliability coefficients of RANRA for both Test 1 and Test 2 and for the overall RANRA score are shown in Table 5.1. Table 5.1 Coefficient Alpha, Odd-Even Split-Half Reliability, and Standard Error of Measurement (SEM) for RANRA (from Rust, 2002, p. 8.5)

Test 1: Comparison of Quantities Test 2: Sufficiency of Information RANRA Score

Alpha .63 .70 .78

Split-Half .60 .71 .78

SEM 6.32 5.39 4.69

The RANRA score reported in Table 5.1 is a T score transformed from the total raw score, while the standard error of measurement reported in the table was based on the split-half reliability (Rust, 2002).

ANRA Reliability Studies


Evidence of Internal Consistency
Cronbachs alpha and the standard error of measurement (SEM) were calculated for the sample used for the ANRA norm groups reported in this manual. The internal consistency reliability estimates for ANRA total raw score and ANRA subtests are shown in Table 5.2.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

17

Table 5.2 ANRA Means, Standard Deviations (SD), Standard Errors of Measurement (SEM), and Internal Consistency Reliability Coefficients (Alpha)
ANRA Total Raw Score Norm Group Executives/Directors Managers Professionals/Individual Contributors Employees in Financial Occupations ANRA Test 1: Comparison of Quantities Norm Group Executives/Directors Managers Professionals/Individual Contributors Employees in Financial Occupations ANRA Test 2: Sufficiency of Information Norm Group Executives/Directors Managers Professionals/Individual Contributors Employees in Financial Occupations N 91 88 200 198 N 91 88 200 198 N 91 88 200 198 Mean 21.3 20.1 22.1 21.9 Mean 10.9 10.3 11.4 11.3 Mean 10.4 9.9 10.7 10.6 SD 6.0 5.6 6.4 6.4 SD 3.4 3.4 3.6 3.5 SD 3.3 2.9 3.3 3.3 SEM 2.32 2.38 2.22 2.22 SEM 1.63 1.70 1.53 1.57 SEM 1.60 1.67 1.62 1.58 Alpha .85 .82 .88 .88 Alpha .77 .75 .82 .80 Alpha .75 .67 .76 .77

The values in Table 5.2 show that the ANRA total raw score possesses good internal consistency reliability. The ANRA subtests showed lower internal consistency reliability estimates than the ANRA total raw score. Consequently, the ANRA total score, not the subtest scores, should be used for optimal hiring results.

Evidence of Test-Retest Stability


ANRA was administered on two separate occasions to determine the stability of performance on the test over time. A sample of 73 job incumbents representing various occupations and organizational levels took the test twice. The average test-retest interval was two weeks. The testretest stability was evaluated using Pearsons product-moment correlation of the standardized T scores from the first and second testing occasions. The test-retest correlation coefficient was corrected for the variability of the sample (Allen & Yen, 1979). Furthermore, the standard difference (i.e., effect size) was calculated using the mean score difference between the first and second testing occasions divided by the pooled standard deviation (Cohen, 1996, Formula 10.4). This difference (d), proposed by Cohen (1988), is useful as an index to measure the magnitude of the actual difference between two means. The corrected test-retest stability coefficient was .85. The difference in mean scores between the first testing and the second testing was statistically small (d = 0.03). As the data in Table 5.3 indicate, ANRA demonstrates good test-retest stability over time.

18

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Table 5.3 ANRA Test-Retest Stability (N = 73)


First Testing Mean SD ANRA Standardized T score Second Testing Mean SD r12 Corrected r12 Standard Difference (d)

50.1

9.2

49.8

10.0

.82

.85

0.03

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

19

Chapter 6

Evidence of Validity
Validity refers to the degree to which specific data, research, or theory support the interpretation of test scores entailed by proposed uses of tests (American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME], 1999). Cronbach (1970) observed that validity is high if a test gives the information the decision maker needs. Several sources of validity evidence are discussed next in relation to ANRA.

Face Validity
Face validity refers to a test's appearance and what the test seems to measure, rather than what the test actually measures. Face validity is not validity in any technical sense and should not be confused with content validity. Face validity refers to whether or not a test looks valid to candidates, administrators and other observers. If test content does not seem relevant to the candidate, the result may be lack of cooperation, regardless of the actual validity of the test. For a test to function effectively in practical situations, such a test not only has to be objectively valid but also face valid. However, a test cannot be judged solely on whether it looks right. Appearance and graphic design of a test are no guarantee of quality. Face validity should not be considered a substitute for objectively determined validity. As mentioned in the chapter on the development of ANRA, ANRA items were reviewed by a group of individuals who provided feedback on the test. The reviewers provided their feedback regarding issues like clarity of the items, the extent to which items appeared to measure numerical reasoning, extent to which test content appeared relevant to jobs that required numerical reasoning, and to what extent they thought the test would yield useful information. From the responses by this group, it was evident that ANRA had high face validity and participants recognized its relevance to the skills required by employees who deal with numbers or project planning. Although the item content of ANRA could not reflect every work situation for which the test would be appropriate, the operations and processes required in each subtest represent abilities that are valued and readily appreciated.

20

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Evidence Based on Test Content


Evidence based on the content of a test exists when the test includes a representative sample of tasks, behaviors, knowledge, skills, abilities, or other characteristics necessary to perform the job. Evidence of content validity is usually gathered through job analysis and is most appropriate for evaluating knowledge and skills tests. Evaluation of content-related evidence is usually a rational, judgmental process (Cascio & Aguinis, 2005). In employment settings, the principal concern is with making inferences about how well the test samples a job performance domaina segment or aspect of the job performance universe that has been identified and about which inferences are to be made (Lawshe, 1975). Because most jobs have several performance domains, a standardized test generally applies only to one segment of the job performance universe (e.g., a typing test administered to a secretary applies to typingone job performance domain in the job performance universe of a secretary). Thus, the judgment of whether content-related evidence exists depends on an evaluation of whether the same capabilities are required in both the job performance domain and the test (Cascio & Aguinis, 2005). When considering content validity, it is important to recognize that a test attempts to sample the area of behavior being measured. It is rarely the purpose of a test to be exhaustive in assessing every possible manifestation of a domain. While content exhaustiveness may seem feasible in some highly specific areas of achievement, in other measurement situations it would simply not be possible. Aptitude, ability and personality tests always aim to achieve representative sampling of the behaviors in question, and the evaluation of content validity relates to the degree to which this representation has been achieved. Evidence of content validity is most easily shown with reference to achievement tests where the relationship between the items and the expected manifestation of that ability in real-life situations is very clear. Achievement tests are designed to measure how well an individual has mastered a particular skill or course of study. From this perspective, it might seem that an informed inspection of the contents of a test would be sufficient to establish its validity for such a purpose. For example, a test of spelling should consist of spelling items. A careful analysis of the domain will be necessary to ensure that all the important features are covered by the test items, and that the features are appropriately represented in the test according to their significance. The effect of speed on test scores also needs to be checked. Participants may perform differently under the additional pressure of a timed test. There are also implications for test design and scoring arising from the interaction of speed and accuracy and from situations where candidates

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

21

fail to finish a timed test. In any case, ANRA is not a speed test and it is unlikely that anyone failing to complete the test within a reasonable amount of time would improve his or her score significantly if given extra time. In an employment setting, evidence of ANRA content-related validity should be established by demonstrating that the jobs require the numerical reasoning skills measured by ANRA. Contentrelated validity in instructional settings may be examined for the extent to which ANRA measures a sample of the specified objectives of such instructional programs.

Evidence Based on Test-Criterion Relationships


One of the primary reasons for using tests is to be able to make an informed prediction about an examinees potential for future success. For example, selection tests are used to hire or promote individuals most likely to be productive employees. The rationale behind using selection tests is the better an individual performs on the test, the better this individual will perform as an employee. Evidence of criterion-related validity addresses the inference that individuals who score better on tests will be successful on some criterion of interest. Criterion-related validity evidence indicates the statistical relationship (e.g., for a given sample of job applicants or incumbents) between scores on the test and one or more criteria, or between scores on the test and independently obtained measures of subsequent job performance. By collecting test scores and criterion scores (e.g., job performance results, grades in a training course, supervisor ratings), one can determine how much confidence may be placed in using test scores to predict job success. Typically, correlations between criterion measures and scores on the test serve as indicators of criterionrelated validity evidence. Provided the conditions for a meaningful validity study have been met (e.g., sufficient sample size, and adequate criteria), these correlation coefficients are important indicators of the utility of the test. The conditions for evaluating criterion-related validity evidence are often difficult to fulfill in the ordinary employment setting. Studies of test-criterion relationships should involve a sufficiently large number of persons hired for the same job and evaluated for success using a uniform criterion measure. The criterion itself should be reliable and job-relevant, and should provide a wide range of scores. In order to evaluate the quality of studies of test-criterion relationships, it is essential to know at least the size of the sample and the nature of the criterion. Assuming that the conditions for a meaningful evaluation of criterion-related validity evidence had been met, Cronbach (1970) characterized validity coefficients of .30 or better as having definite practical value. The U.S. Department of Labor (1999) provides the following general

22

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

guidelines for interpreting validity coefficients: above .35 are considered very beneficial, .21 .35 are considered likely to be useful, .11.20 depends on the circumstances, and below .11 unlikely to be useful. It is important to point out that even relatively lower validities (e.g., .20) may justify the use of a test in a selection program (Anastasi & Urbina, 1997). This suggestion is because the practical value of the test depends not only on the validity, but also other factors, such as the base rate for success on the job (i.e., the proportion of people who would be successful in the absence of any selection procedure). If the base rate for success on the job is low (i.e., few people would be successful on the job), tests with low validity can have considerable utility or value. When the base rate is high (i.e., selected at random, most people would succeed on the job), even highly valid tests may not contribute significantly to the selection process. In addition to the practical value of validity coefficients, the statistical significance of coefficients should be noted. Statistical significance refers to the odds that a non-zero correlation could have occurred by chance. If the odds are 1 in 20 that a non-zero correlation could have occurred by chance, then the correlation is considered statistically significant. Some experts prefer even more stringent odds, such as 1 in 100, although the generally accepted odds are 1 in 20. In statistical analyses, these odds are designated by the lower case p (probability) to signify whether a nonzero correlation is statistically significant. When p is less than or equal to .05, the odds are presumed to be 1 in 20 (or less) that a non-zero correlation of that size could have occurred by chance. When p is less than or equal to .01, the odds are presumed to be 1 in 100 (or less) that a non-zero correlation of that size occurred by chance. In a study of ANRA criterion-related validity, we examined the relationship between ANRA scores and on-the-job performance of job incumbents in various occupations (mostly financerelated occupations) and position levels (mainly professionals, managers, and directors). Job performance was defined as supervisory ratings on behaviors determined through research to be important to most professional, managerial, and executive jobs. The study found that ANRA scores correlated .32 with supervisory ratings on a dimension made up of Analysis and Problem Solving behaviors, and .36 with supervisory ratings on a dimension made up of Judgment and Decision Making behaviors (see Table 6.1). Furthermore, ANRA scores correlated .36 with supervisory ratings on a dimension composed of job behaviors dealing with Quantitative/ Professional Knowledge and Expertise. Supervisory ratings from the sum of ratings on 24 job performance behaviors (Total Performance), as well as ratings on a single-item measure of Overall Potential were also obtained. The ANRA scores correlated .44 with Total Performance and .31 with ratings of Overall Potential. The correlation between ANRA scores and a single-item supervisory rating of Overall Performance was .38.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

23

Table 6.1 Evidence of ANRA Criterion-Related Validity (Total Raw Score) of Job Incumbents in Various Finance-Related Occupations and Position Levels
Criterion Analysis and Problem Solving Judgment and Decision Making Quantitative/Professional Knowledge and Expertise Total Performance (24 items) Overall Performance (single item) Overall Potential
** p < .01

N 89 91 59 58 94 94

Mean 37.6 32.2 53.6 127.0 5.6 3.4

SD 7.0 5.9 8.9 22.0 1.1 1.1

r .32** .36** .36** .44** .38** .31**

In Table 6.1, the column entitled N details the number of cases having valid supervisory ratings for every single job behavior contained in the specified criterion. The means and standard deviations refer to the criteria ratings shown in the table. The validity coefficients appear in the last column. The criterion-related validity coefficients reported in Table 6.1 apply to the specific sample of job incumbents mentioned in the table. These validity coefficients clearly indicate that ANRA is likely to be very beneficial as an indicator of the criteria shown in Table 6.1. However, test users should not automatically assume that these data constitute sole and sufficient justification for use of ANRA. Inferring validity for one group of employees or candidates from data reported for another group is not appropriate unless the organizations and job categories being compared are demonstrably similar. Careful examination of Table 6.1 can help test users make an informed judgment about the appropriateness of ANRA for their own organization. However, the data presented here are not intended to serve as a substitute for locally obtained validity data. Local validity studies, together with locally derived norms, provide a sound basis for determining the most appropriate use of ANRA. Hence, whenever technically feasible, test users should study the validity of ANRA, or any selection test, at their own location or organization. Sometimes it is not possible for a test user to conduct a local validation study. There may be too few incumbents in a particular job, an unbiased and reliable measure of job performance may not be available, or there may not be a sufficient range in the ratings of job performance to justify the computation of validity coefficients. In such circumstances, evidence of a tests validity reported elsewhere may be relevant, provided that the data refer to comparable jobs.

24

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Correlations Between ANRA Test 1 and Test 2


The correlation between Test 1 (Comparison of Quantities) and Test 2 (Sufficiency of Information) of ANRA was 0.71 (N = 452, p < .0001). This correlation is clearly significant and also lower than the reliabilities of either test shown in Table 5.2, chapter 5. This evidence suggests that ANRA effectively samples both of these reasoning domains within the broader conception of numerical reasoning (Rust, 2002).

Evidence of Convergent and Discriminant Validity


Convergent evidence is provided when scores on a test relate to scores on other tests or variables that purport to measure similar traits or constructs. Evidence of relations with other variables can involve experimental (or quasi-experimental) as well as correlational evidence (AERA et al., 1999). Discriminant evidence is provided when scores on a test do not relate closely to scores on tests or variables that measure different traits or constructs.

Correlations Between ANRA and Watson-Glaser Critical Thinking AppraisalShort Form


Correlations between ANRA and the Watson-Glaser Critical Thinking AppraisalShort Form (see Table 6.2) suggest that the tests are measuring a common general ability. Evidence for the validity of the Watson-Glaser as a measure of critical thinking and reasoning appears in the Watson-Glaser Short Form Manual (Watson & Glaser, 2006). The data in Table 6.2 suggest that ANRA also measures reasoning ability. The fact that the correlations between ANRA and the Watson-Glaser Short Form tests are lower than the inter-correlation between the two ANRA tests suggests that ANRA also measures some distinct aspect of reasoning that is not measured by the Watson-Glaser (Rust, 2002). Table 6.2 Correlations Between Watson-Glaser Critical Thinking AppraisalShort Form and ANRA (N = 452)
ANRA Test 1: Comparison of Quantities .65 .48 .40 .53 .60 .35 ANRA Test 2: Sufficiency of Information .61 .47 .36 .51 .51 .36 ANRA Total Raw Score .68 .52 .41 .56 .60 .39

Watson-Glaser Watson-Glaser Short Form Total Raw Score Test 1: Inference Test 2: Recognition of Assumptions Test 3: Deduction Test 4: Interpretation Test 5: Evaluation of Arguments
Note. For all the correlations, p < .001

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

25

Correlations Between ANRA and Other Tests


In addition to the correlations with the Watson-Glaser, we also examined the correlations between ANRA and two other tests: Miller Analogies Test for Professional Selection (N = 67), and the DAT for Personnel and Career AssessmentNumerical Ability (N = 80). As would be expected, ANRA correlated higher with the Numerical Ability test of the DAT for PCA (r = .70, p < .001) than with the MAT for PS (r = .57, p = < .001). Details of these results, which suggest convergent as well as discriminant validity, are shown in Table 6.3. Table 6.3 Correlations Between ANRA, the Miller Analogies Test for Professional Selection (MAT for PS), and the Differential Aptitude Tests for Personnel and Career AssessmentNumerical Ability (DAT for PCANA)
ANRA ANRA Total Raw Score ANRA Test 1: Comparison of Quantities ANRA Test 2: Sufficiency of Information
Note. For all the correlations, p < .001

MAT for PS (N = 67) .57 .50 .50

DAT for PCANA (N = 80) .70 .69 .57

26

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Chapter 7

Using ANRA as an Employment Selection Tool


ANRA was developed for use in adult employment selection. It may be used to predict success in jobs that require application of numerical reasoning skills. ANRA can also be useful in monitoring the effectiveness of numerical reasoning instruction and training programs, and in researching the relationship between numerical reasoning and other abilities or skills.

Employment Selection
Many organizations use testing as a component of their employment selection process. Employment selection programs typically use cognitive ability tests, aptitude tests, personality tests, basic skills tests, and work values tests to screen out unqualified candidates, to categorize prospective employees according to their probability of success on the job, or to rank order a group of candidates according to merit. ANRA was designed to assist in the selection of employees for jobs that require numerical reasoning. Many finance-related, project-management, and technical professions require the type of numerical reasoning ability measured by ANRA. The test is useful to assess applicants for a variety of jobs, such as Accountant, Account Manager, Actuary, Banking Manager, Business Analyst, Business Development Manager, Business Unit Leader, Finance Analyst, Loan Officer, Project Manager, Inventory Planning Analyst, Procurement or Purchasing Manager, and leadership positions with financial responsibilities. It should not be assumed that the type of numerical reasoning required in a particular job is identical to that measured by ANRA. Job analysis and local validation of ANRA for selection purposes should follow accepted human resource research procedures, and conform to existing guidelines concerning fair employment practices. In addition, no single test score can possibly suggest all of the requisite knowledge and skills necessary for success in a job.

Using ANRA in Making a Hiring Decision


It is ultimately the responsibility of the hiring authority to determine how it uses ANRA scores. We recommend that if the hiring authority establishes a cut score, examinees scores should be considered in the context of appropriate measurement data for the test, such as the standard error of measurement and data regarding the predictive validity of the test. In addition, we recommend

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

27

that selection decisions be based on multiple job-relevant tools rather than relying on any single test (e.g., using only ANRA scores to make employment decisions). Human resource professionals can look at the percentile rank that corresponds to the candidates raw score in several ways. Candidates scores may be rank ordered by percentiles so that those with the highest scores are considered further. Alternatively, a cut score (e.g., the 50th percentile) may be established so that candidates who score below the cut score are not considered further. In general, the higher the cut score is set, the higher the likelihood that a given candidate who scores above that cut score will be successful. However, the need to select high scoring candidates typically needs to be balanced with situational factors, such as the need to keep jobs filled and the supply of talent in the local labor market. When interpreting ANRA scores, it is useful to know the specific behaviors that an applicant with a high ANRA score may be expected to exhibit. These behaviors, as rated by supervisors, were consistently found to be related to ANRA scores across different occupations requiring numerical reasoning. In general, candidates who score low on ANRA may find it challenging to effectively demonstrate these behaviors. Conversely, candidates who score high on ANRA are likely to display a higher level of competence in the following behaviors: Uses quantitative reasoning to solve job-related problems. Learns new numerical concepts quickly. Applies sound logic and reasoning when making decisions. Demonstrates knowledge of financial indicators and their implications. Breaks down information into essential parts or underlying principles. Readily integrates new information into problem-solving and decision-making processes. Recognizes differences and similarities in situations or events. Engages in a broad analysis of relevant information before making decisions. Probes deeply to understand the root causes of problems. Reviews financial statements, sales reports, and/or other financial data when planning. Accurately assesses the financial value of things (e.g., worth of assets) or people (e.g., credit worthiness).

28

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Human resource professionals who use ANRA should document and examine the relationship between applicants scores and their subsequent performance on the job. Using locally obtained criterion-related validity information provides the best foundation for interpreting scores and most effectively differentiating examinees who are likely to be successful from those who are not. Pearson does not establish or recommend a passing score for ANRA.

Differences in Reading Ability, Including the Use of English as a Second Language


Though ANRA is a mathematical test, a level of reading proficiency in the English language is assumed and reflected in the items. Where ANRA is being used to measure the numerical reasoning capabilities of a group, for some of whom English is not their first language, reasonable precautions need to be taken. If a candidate experiences difficulty with the language or the reading level of the test, note this information and consider it when interpreting the test scores. In some cases, it may be more appropriate to test such individuals with another assessment procedure that fully accommodates their language of preference or familiarity.

Using ANRA as a Guide for Training, Learning, and Education


Critical thinking, numerical or otherwise, is trainable (Halpern, 1998; Paul & Nosich, 2004). Thus, when interpreting test scores on ANRA, it is important to bear in mind the extent to which training may have influenced the scores. The ability to think critically has long been recognized as a desirable educational objective and studies that have been done in educational settings demonstrate that critical thinking can be improved as a result of training directed to this end (Hill, 1959; Kosonen & Winne, 1995; Nisbett, 1993, Perkins & Grotzer, 1997). Scores on ANRA are likely to be influenced by factors associated with training. Typically, individuals will differ in the extent to which such training has been made available to them. Although traditional classes in math and science in school are important, many of these classes involve computational arithmetic and other lower order-thinking skills, such as the rote application of rules that have been learned. Training in higher-order numerical reasoning during the school years will often have been indirect and largely dependent on the overall quality of education available to the individual. Consequently, this indirect training would likely depend on the amount of time spent in education or learning. Furthermore, the extent to which numerical reasoning skills are trainable will likely differ between individuals.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

29

Fairness in Selection Testing


Fair employment regulations and their interpretation are continuously subject to changes in the legal, social, and political environments. Therefore, ANRA users should consult with qualified legal advisors and human resources professionals as appropriate.

Legal Considerations
Governmental and professional regulations cover the use of all personnel selection procedures. Relevant source documents that the user may wish to consult include the Standards for Educational and Psychological Testing (AERA et al., 1999); the Principles for the Validation and Use of Personnel Selection Procedures (Society for Industrial and Organizational Psychology, 2003); and the federal Uniform Guidelines on Employee Selection Procedures (Equal Employment Opportunity Commission, 1978). For an overview of the statutes and types of legal proceedings that influence an organizations equal employment opportunity obligations, the user is referred to Cascio and Aguinis (2005) or the U.S. Department of Labors (1999) Testing and Assessment: An Employers Guide to Good Practices.

Group Differences and Adverse Impact


Local validation is particularly important when a selection test may have adverse impact. According to the Uniform Guidelines on Employee Selection Procedures (Equal Employment Opportunity Commission, 1978), adverse impact is indicated when the selection rate for one group is less than 80% (or 4 out of 5) of another group. Adverse impact is likely to occur with cognitive ability tests such as ANRA. Although it is within the law to use a test with adverse impact (Equal Employment Opportunity Commission, 1978), the testing organization must be prepared to demonstrate that the selection test is job-related and consistent with business necessity. The Civil Rights Act of 1991, as amended, defined business necessity to mean that, in the case of employment practices involving selection , the practice or group of practices must bear a significant relationship to successful performance of the job (Section 3 (o) (1) (A)). In deciding whether the standards for business necessity have been met, the Civil Rights Act of 1991 states that demonstrable evidence is required. The Act provides examples of demonstrable evidence as statistical reports, validation studies, expert testimony, prior successful experience and other evidence as permitted by the Federal Rules of Evidence (Section 3 (o) (1) (B)). A local validation study, in which ANRA scores are correlated with job performance indicators, can provide evidence to support the use of the test in a particular job context. An evaluation that

30

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

demonstrates that ANRA (or any employment assessment tool) is equally predictive for protected subgroups, as outlined by the Equal Employment Opportunity Commission, will assist in the demonstration of fairness of the test. For example, from the results of their review of 22 cases in U.S. Appellate and District Courts involving cognitive ability testing in class-action suits, Shoenfelt and Pedigo (2005, p. 6) reported that organizations that utilize professionally developed standardized cognitive ability tests that are validated and that set cutoff scores supported by the validation study data are likely to fare well in court.

Monitoring the Selection System


An organizations abilities to evaluate selection strategies and to implement fair employment practices depend on its awareness of the demographic characteristics of applicants and incumbents. Monitoring these characteristics and accumulating test score data are clearly necessary for establishing legal defensibility of a selection system, including those systems that incorporate ANRA. The most effective use of ANRA is with a local norms database that is regularly updated and monitored. The hiring organization should ensure that its selection process is clearly job related and focuses on characteristics that are important to job success. Good tests that are appropriate to the job in question can contribute a great deal towards monitoring and minimizing the major sources of bias in the selection procedures. ANRA is a reliable and valid instrument for the assessment of numerical reasoning. When used for the assessment of candidates or incumbents for work that requires this skill, ANRA can be useful in the selection of the better candidates. However, where candidates drawn from different sub-groups of the population are more or less deficient in numerical reasoning skills as a result of the failure to provide the necessary educational environment during schooling, then there is the risk of overlooking candidates who can develop this skill but who have not had the opportunity to do so. Employers can reasonably expect that candidates should have achieved all the necessary basic skills before applying for the job. However, in circumstances where adverse impact is manifest, an organization might wish to consider ways in which it can contribute to the reduction of adverse impact. This approach might take the form of providing training courses to employees in the deficient skill areas, or of increasing involvement with the local community to identify ways in which the community might assist, or of re-evaluating recruitment strategy, for example, by advertising job positions more widely or through different media.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

31

References
Allen, M.J., & Yen, W.M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: Author. American Institute of Certified Public Accountants, AICPA (1999). Broad business perspective competencies. Retrieved February 27, 2006, from
http://www.aicpa.org/edu/bbfin.htm

Americans With Disabilities Act of 1990, Titles I & V (Pub. L. 101-336). United States Code, Volume 42, Sections 1210112213. Anastasi, A. & Urbina, S. (1997). Psychological testing (7th ed.). New York: Macmillan. Brannon, E.M. (2002). The development of ordinal numerical knowledge in infancy. Cognition, 83, 223240. Cascio, W.F. (1991). Applied psychology in personnel management (4th ed.). Englewood Cliffs, NJ: Prentice Hall. Cascio, W. F., & Aguinis, H. (2005). Applied psychology in human resource management (6th ed.). Upper Saddle River, NJ: Prentice Hall. Civil Rights Act of 1991. 102nd Congress, 1st Session, H.R.1. Retrieved August 4, 2006. Access: http://usinfo.state.gov/usa/infousa/laws/majorlaw/civil91.htm Cohen, B.H. (1996). Explaining psychological statistics. Pacific Grove, CA: Brooks & Cole. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Cronbach, L.J. (1970). Essentials of psychological testing (3rd ed.). New York: Harper & Row. Equal Employment Opportunity Commission. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43(166), 3829538309. Facione, P.A. (2006). Critical Thinking: What It Is and Why It Counts2006 Update. Retrieved July 28, 2006 from
http://www.insightassessment.com/pdf_files/what&why2006.pdf

Feigenson, L, Dehaene, S., & Spelke, E. (2004). Core systems of number. Trends in Cognitive Sciences, 8, 307314. Halpern, D. F. (1998) Teaching critical thinking for transfer across domains: Dispositions, skills, structure training, and metacognitive monitoring. American Psychologist, 53, 449455. Hill, W. H. (1959). Review of Watson-Glaser Critical Thinking Appraisal. In O.K. Buros (Ed.), The fifth mental measurements yearbook. Lincoln: University of Nebraska Press.

32

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Hunt, E. (1995). Will we be smart enough? New York: Russell Sage Foundation. Kealy, B.T., Holland, J., & Watson, M. (2005). Preliminary evidence on the association between critical thinking and performance in principles of accounting. Issues in Accounting Education, 20 (1), 3347. Kosonen, P. & Winne, P. H. (1995). Effects of teaching statistical laws on reasoning about everyday problems. Journal of Educational Psychology, 87, 3346. Lawshe, C.H. (1975). A quantitative approach to content validity. Personnel Psychology, 28, 563575. National Education Goals Panel. (1991). The national education goals report. Washington, DC: U.S. Government Printing Office Nijenhuis, J., & Flier, H. (2005). Immigrant-majority group differences on work-related measures: the case for cognitive complexity. Personality and Individual Differences, 38, 12131221. Nisbett, R. E. (Ed.) (1993). Rules for reasoning. Hillsdale, NJ: Lawrence Erlbaum Nunnally, J.C. (1978). Psychometric theory (2nd ed.). Hew York: McGraw-Hill. O*Net OnLine (2005). Skill searches for: Mathematics, Critical Thinking. Occupational Information Network: O*Net OnLine. Retrieved July 17, 2006 via O*Net OnLine Access: http://online.onetcenter.org/skills/result?s=2.A.2.a&s=2.A.1.e&g=Go Paul, R., & Nosich, G.M. (2004). A Model for the National Assessment of Higher Order Thinking. Retrieved July 13, 2006, from
http://www.criticalthinking.org/resources/articles/a-model-nal-assessmenthot.shtml

Perkins, D. N. & Grotzer, T. A. (1997). Teaching intelligence. American Psychologist, 52, 11251133. Rust, J. (2002) Rust Advanced Numerical Reasoning Appraisal Manual. The Psychological Corporation: London. Shoenfelt, E.L., & Pedigo, L.C. (2005, April). A Review of Court Decisions on Cognitive Ability Testing, 1992-2004. Poster Presentation at the 20th Annual Conference of the Society for Industrial and Organizational Psychology, Los Angeles, CA. Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: Author. Spelke, Elizabeth S. (2005). Sex differences in intrinsic aptitude for mathematics and science? A critical review. American Psychologist, 60, 958-958. Starkey, P. (1992). The early development of numerical reasoning. Cognition, 43, 93126. U.S. Department of Labor. (1999). Testing and assessment: An employers guide to good practices. Washington, DC: Author. Vandenbroucke, Jan P. (1998). Clinical investigation in the 20th century: The ascendancy of numerical reasoning. The Lancet, 175(352), 1216.

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

33

Watson, G. B. & Glaser, E. M. (2006) WatsonGlaser Critical Thinking Appraisal Short Form Manual. San Antonio, TX: Pearson. Wynn, K., Bloom, P., & Chiang, W. (2002). Enumeration of collective entities by 5month-old infants. Cognition, 83, B55B62.

34

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Appendix A Description of the Normative Sample


The normative information provided below is based on data collected during the period of February 2006 through June 2006. Table A.1 Description of the Normative Sample by Occupation Occupation
Employees in Various Financial Occupations

Norms and Sample Characteristics


N = 198 Mean = 21.9 SD = 6.4 Occupations in the Financial Occupations norm group Accountants = 6.1% Accounting Analysts = 1.5% Actuaries = 32.3% Auditors = 1.0% Banking Supervisors/Managers = 5.1% Billing Coordinators = 1.0% Bookkeepers = 2.0% Business Analysts = 3.5% Business Specialists = 0.5% Buyers = 2.5% Chief Financial Officers = 2.5% Claims Adjusters = 1.0% Collections Supervisors/Managers = 1.0% Comptrollers/Controllers = 2.0% Finance Analysts/Managers = 17.7% Finance or Budget Estimators = 0.5% Financial Planners = 3.0% Insurance Agents = 2.5% Insurance Analysts = 0.5% Insurance Brokers = 2.0% Loan Officers = 2.0% Procurement or Purchasing Officers/Managers = 9.6%

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

35

Table A.2 Description of the Normative Sample by Position Level


Position Level
Executives/Directors Executive- and Director-level positions within various industries.

Norms and Characteristics


N = 91 Mean = 21.3 SD = 6.0 Industry Characteristics Financial Services/Banking/Insurance = 53.9% Government/Public Service/Defense = 7.7% Professional Business Services/Consulting = 6.6% Publishing/Printing = 12.1% Real Estate = 1.1% Retail/Wholesale = 2.2% Other (unspecified) = 16.5%

Managers Manager-level positions within various industries.

N = 88 Mean = 20.1 SD = 5.6 Industry Characteristics Financial Services/Banking/Insurance = 38.6% Government/Public Service/Defense = 19.3% Professional Business Services/Consulting = 10.2% Publishing/Printing = 12.5% Real Estate = 2.3% Retail/Wholesale = 1.1% Other (unspecified) = 14.8% N = 200 Mean = 22.1 SD = 6.4 Industry Characteristics Financial Services/Banking/Insurance = 23.0% Government/Public Service/Defense = 36.5% Professional Business Services/Consulting = 12.5% Publishing/Printing = 7.5% Real Estate = 1.0% Retail/Wholesale = 1.5% Other (unspecified) = 16.5%

Professionals/Individual Contributors Professional-level and individualcontributor positions within various industries.

36

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Appendix B
ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Norm Group
Table B.1 ANRA Total Raw Scores, Mid-Point Percentile Ranks, and T Scores by Position Level
Executives/ Directors
99 99 96 92 87 81 76 67 58 54 48 43 40 34 30 26 23 18 13 9 7 6 5 4 3 2 1 1 1 1 1 1 1 Raw Score Mean = 21.3 SD = 6.0 N = 91

ANRA Total Raw Score


32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Percentile Ranks by Position Level Professionals/Individual Contributors Managers


99 99 98 95 91 87 82 77 72 66 61 53 46 43 37 31 27 21 15 12 10 6 3 2 1 1 1 1 1 1 1 1 1 Raw Score Mean = 20.1 SD = 5.6 N = 88 99 98 94 88 81 73 66 60 53 47 42 38 34 29 26 23 19 16 13 11 9 7 6 5 4 2 1 1 1 1 1 1 1 Raw Score Mean = 22.1 SD = 6.4 N = 200

T Score
68 66 65 63 62 60 58 57 55 54 52 50 49 47 46 44 43 41 39 38 36 35 33 31 30 28 27 25 23 22 20 19 17

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

37

Table B.2

ANRA Total Raw Scores, MId-Point Percentile Ranks, and T Scores for Employees in Various Financial Occupations (see Table A.1 for a list of the
occupations in this norm group) ANRA Total Raw Score
32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Raw Score Mean = 21.9 Raw Score SD = 6.4 N = 198

Percentile Ranks for Employees in Financial Occupations


99 98 93 86 78 71 65 60 55 52 47 42 37 33 30 26 22 19 15 11 8 6 4 3 2 1 1 1 1 1 1 1 1

T Score
68 66 65 63 62 60 58 57 55 54 52 50 49 47 46 44 43 41 39 38 36 35 33 31 30 28 27 25 23 22 20 19 17

38

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Appendix C
Combined Watson-Glaser and ANRA T Scores and Percentile Ranks by Norm Group
Table C.1 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level
Percentile Ranks by Position Combined T Scores
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110

Executives/ Directors
99 99 99 99 99 98 98 97 95 93 92 91 88 85 82 82 80 77 75 73 71 69 66 63 62 61

Managers
99 99 99 99 99 99 99 99 98 97 97 96 94 91 89 88 87 86 86 84 81 79 78 78 77 75

Professionals/ Individual Contributors


99 99 99 99 99 99 99 99 97 95 93 90 88 84 81 77 74 72 70 67 65 63 60 58 57 54

Combined T Scores
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

39

Table C.1

Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level continued
Percentile Ranks by Position Level Executives/ Professionals/ Directors Individual Contributors Managers
60 58 56 55 53 51 48 46 43 42 41 40 38 37 34 31 29 27 27 25 22 21 20 20 19 18 17 16 16 16 16 14 11 10 9 73 71 70 67 64 61 60 57 55 54 52 51 48 44 42 39 36 34 32 29 27 26 25 24 23 21 19 18 16 15 14 14 13 12 12 51 48 46 44 42 41 39 37 36 35 33 31 30 29 28 27 26 25 23 22 22 21 19 18 17 17 16 14 13 12 11 10 9 8 8

Combined T Scores
109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75

Combined T Scores
109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75

40

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Table C.1

Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks by Position Level continued
Percentile Ranks by Position Level Executives/ Professionals/ Directors Individual Contributors Managers
8 7 6 6 6 5 4 4 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 1 1 N = 91 11 9 7 5 3 3 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 N = 88 8 8 7 7 7 7 6 5 5 5 4 4 3 3 2 2 2 2 2 <=1 <=1 <=1 <=1 <=1 <=1 N = 200

Combined T Scores
74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50

Combined T Scores
74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

41

Table C.2

Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations
(See Table A.1 for a list of the occupations in this group.)

Combined T Scores
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96

Percentile Ranks for Employees in Financial Occupations


99 99 99 99 99 99 99 97 95 92 89 87 84 80 77 74 73 71 69 67 65 64 62 60 60 59 58 57 55 54 52 50 48 47 45 43 40 38 37 37

Combined T Scores
135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96

42

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

Table C.2 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations (continued)
Percentile Ranks for Employees in Financial Occupations
35 34 32 31 29 28 27 27 26 25 24 22 20 19 18 17 16 14 12 10 9 8 7 5 4 4 4 4 3 3 2 2 1 1 1 1

Combined T Scores
95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60

Combined T Scores
95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

43

Table C.2 Combined Watson-Glaser Short Form and ANRA T Scores and Percentile Ranks for Employees in Various Financial Occupations (continued)
Percentile Ranks for Employees in Financial Occupations
1 1 1 1 1 1 1 1 1 1 N = 198

Combined T Scores
59 58 57 56 55 54 53 52 51 50

Combined T Scores
59 58 57 56 55 54 53 52 51 50

44

Copyright 2006 by NCS Pearson, Inc. All rights reserved.

You might also like