You are on page 1of 11

24

INTERNATlONAL IOURNAL O SELECTION A N D ASSESSMENT F

Evaluations of Tests Used for Making Selection and Promotion Decisions


David A. Kravitz, Veronica Stinson and Tracy L. Chavez
Participants (N = 212)rated the fairness, job relevance, appropriateness,and invasiveness of 16 tests that could be used to select o r promote people into production o management r positions. Fairness, job relevance and appropriateness were highly correlated, and were combined to form a composite evaluation scale. Evaluations and invasiveness ratings varied among the 16 tests, with the most positive ratings given to interviews and work samples and the most negative ratings given to astrology, graphology and polygraphs. Evaluations of four tests were affected by the position (manager vs. production worker). Evaluations of 11 tests and invasiveness ratings of two tests were affected by respondent experience with the test. Respondents who had experienced the tests evaluated them more positively and considered them to entail a smaller invasion of privacy. Responses were not affected by whether the test was to be used for selection versus promotion decisions.

David A. Kravitz, Veronica

Stison and Tracy L. Chavez,


Department of Psychology, Florida International University, North Miami Campus, 3000 NE 145 Street, North Miami,FL 33183-3600, USA. For correspondence - Present address 2079 NE I96 Terrace, North Miami Beach, FL 33179, USA.

n employees first important contact with an organization typically occurs when he or she applies for a job. Because first impressions strongly influence future relations (Rynes, Heneman and Schwab 1980; Zunin and Zunin 1972), the employees impression of the selection procedure may have long-term implications. For example, reactions to selection procedures may affect applicant attraction to the organization, motivation to join and work diligently for the organization, tendency to legally challenge organizational decisions and actions, willingness to recommend the organization to others, commitment to the organization and performance, as well as the validity and utility of the selection procedures themselves ( h e y 1992; Folger and Cropanzano in press; Gilliland and Honig 1994; Macan, Avedon, Paese and Smith 1994; Murphy 1986; Robertson, Iles, Gratton and Sharpley 1991; Smither, Reilly, Millsap, Pearlman and Stoffey 1993). In short, organizational climate, performance and profits can be impaired if applicants react negatively to the selection procedures. Consistent with this point, Sdunitt and Gilliland (1992) call for research on applicant reactions to selection processes. Reactions to selection procedures may vary along many dimensions, and these dimensions will be more or less strongly associated with overall evaluations. The dimension most commonly assessed is perceived job relevancelface validity (Fraser and Anderson 1989; Gilliland 1994; Macan ef aZ. 1994; Robertson ef al. 1991; Rynes et al. 1980; Smither ef al. 1993). This

research clearly reveals that job relevance affects evaluations of selection procedures and subsequent reactions (e.g. organizational commitment) and behaviours (e.g. performance). A second important dimension is perceived f fairness o the selection test. During the past 15 years, research has found that perceptions of fairness affect reactions to many organizational decisions (eg. Brockner 1990; Folger and Greenberg 1985; Greenberg 1986, 1990; Konovsky and Cropanzano 1991; McFarlin and Sweeney 1992; Nacoste 1987; Sheppard and Lewicki 1987). In addition, Cropanzano has pointed out that valid selection tests are often considered unfair, and that tests considered fair may be invalid (Cropanzano and Hunsberger 1994; Folger and Cropanzano in press). Cropanzano and his colleagues refer to this inconsistency as the justice dilemma. The justice dilemma makes it difficult for human resource managers to develop valid selection procedures that applicants will consider fair. This dilemma could be resolved by replacing unfair selection tests with equally valid tests that are considered fair, but this requires information about the perceived fairness of selection tests. In short, both the justice dilemma and research on fairness in other domains highlight the need for research on perceived fairness of selection tests. Such work has recently been reported (Gilliland 1994; Gilliland and Honig 1994; Kluger and Rothstein 1993; Macan et al. 1994). This work has provided some information about which tests are considered fair, and has shown that perceptions of fairness are positively

Volume 4

Number 1 January 1996

BBkckwell Publishers Ltd. 1996, 108 Cowley Road Oxford OX4 IJF, UK and 238 M i Street, Cambridge. MA 02142, USA. an

EVALUATlONS O TESTS F

25

associated with more general affective and behavioural reactions. For some time, research on drug testing, honesty testing and polygraphs has dealt with the invasion of privacy (Garland, Giacobbe and French 1989; Karren 1989; Moore and Stewart 1989; Stone and Kotch 1989) and more recent work has assessed invasiveness of other selection procedures as well. This work has found that selection procedures vary in invasiveness, and that invasiveness is associated with more general evaluations (Kluger and Rothstein 1993; Rosse, Miller and Stecher 1994; Stone, Stone and Hyatt 1989). Finally, perceptions of job relevance, fairness and invasiveness should determine judgements of appropriateness. That is, applicants should consider it appropriate for organizations to use tests that are job relevant, fair and non-invasive, and inappropriate for organizations to use tests that are irrelevant, unfair and invasive. Thus, appropriateness should represent a general evaluative judgement, and should vary with more basic judgements of job relevance, fairness and invasiveness. In summary, research on applicant reactions to selection tests has focused on the dimensions of job relevance, fairness and invasiveness. A conceptualization that ties these three variables together, along with a number of other variables, has been provided by Gilliland (1993).Gillilands justice model of applicant reactions to selection systems posits that selection practices, including choice of selection test, will affect applicant perceptions of whether rules of procedural and distributive justice are being met. These rules include job relatedness and invasiveness. Perceptions of rule satisfaction, in turn, will affect judgements of overall fairness. These fairness judgements will influence more general evaluations and reactions. Both theoretical and empirical work clearly suggest that perceptions of the job relevance, fairness and invasiveness of selection tests will strongly affect more general evaluations of the tests. The first two hypotheses are consistent with this previous work, and especially with Gillilands (1993) model:
Hypothesis 1: The type of selection test will influence perceptions of job relevance and invasiveness. Hypofhesis 2 Job relevance and invasiveness will : be associated with perceived fairness of the test.

depending on whether the test violates or complies with the procedural justice rules. On the other hand, experience also serves to eliminate misperceptions, and thus the effect of experience may depend on whether pre-existing f stereotypes o the selection test are more or less negative than the reality. Both of these perspectives suggest that experience may differentially affect reactions to different selection tests. Consistent with this suggestion, Fruhner, Schuler, Funke and Moser (1991, as cited in Schuler 1993) found that experience increased evaluations of interviews but not of psychological tests. Finally, other research indicates that familiarity leads to liking (Bornstein 1989; Zajonc 1968), so experience may result in more positive evaluations of all selection tests. It is not clear which of these processes will be most important, so one purpose of the present study was to obtain descriptive information about the effects of experience with specific selection tests on evaluations of those tests. Thus, the third hypothesis is nondirectional:
Hypothesis 3: Reactions to selection tests will be

influenced by the respondents experiences with the tests. This influence may be constant across tests or it may vary with the test. Thornton (1993) pointed out that reactions to selection tests may vary with job level, and called for research to test this moderating effect. The authors know of no such research, so in the present study the effect of job level on reactions to selection tests is assessed. In general, it is expected that selection tests that focus on technical skills or abilities will be considered more appropriate for production workers than for managers, and tests that focus on personality traits or general behaviour will be considered more appropriate for managers than for production workers. However, these expectations are tentative, so the hypothesis is nondirectional:
Hypothesis 4: Reactions to selection tests will be influenced differentially by the position being

filled. Finally, most selection tests could also be used for making promotion decisions. Indeed, the use of internal recruitment implies that selection and promotion can sometimes be seen as identical (Bureau of National Affairs 1988). Alternatively, because internal applicants are known to the organization one might argue that promotions should be based solely on prior performance within the organization, with no other test being appropriate. Thus, the present study addresses reactions to the use of tests for two different decisions - selection and promotion. The final hypothesis refers to the moderating effect of the decision:

Gilliland (1993) hypothesized that prior experience with a selection procedure would increase the salience of procedural rules associated with those experiences. This implies that experience will lead to more extreme reactions to selection bests - either positive or negative

0 Blakwell Publishers Ltd. 1996

Volume 4

Number 1 January 1996

26

INTERNATIONAL JOURNAL OF SELECTION AND ASSESSMENT

Hypothesis 5: Reactions to selection tests will be influenced by the decision (selection versus promotion) for which they are being used. This influence may be constant across tests o it may r vary with the test.

In summary, previous research has revealed that reactions to different selection tests vary along such dimensions as job relevance, fairness and invasiveness. In the present study subjects evaluated 16 tests on four dimensions - job relevance, invasiveness, fairness and overall appropriateness. To test for moderating effects, position and decision were manipulated and experience with the test was assessed. This research extends previous work in several ways. First, reactions to 16 different selection tests are obtained. With a few exceptions (Fraser and Anderson 1989; Rynes and Connerley 1993; Schuler 1993; Smither ef al. 1993; Stone e al. k 1989) previous studies have included no more than four tests at a time. Second, simultaneous assessment is made of job relevance, fairness and invasiveness; only Kluger and Rothstein (1993) have included all three of these key variables. Third, as suggested by Thornton (1993). the moderating effect of job level is assessed. Fourth, the moderating effect of decision (selection vs. promotion) is assessed. The authors do not know of any other research on the moderating effects of job level or decision. Finally, the moderating effect of experience with the test is assessed. Gilliland (1993) suggests that experience may moderate reactions to selection systems, but only Fruhner ef al. (1991, as cited in Schuler 1993) have actually examined the effects of experience. Results of the present research will fill the gaps in the literature as described above, These results also will have implications for Gillilands (1993) justice model o applicant f reaction to selections tests, and may be useful for resolving the justice dilemma.

Hispanics, 88 White Hispanics, and 14 people of other ethnicities. The highest level of education completed included high school (N = 86). A.A. college degree (N = 107), B.A. or B.S. college degree (N = 8) and graduate degree (N = 2). The respondents ranged in age from 17 to 45 (Mdn = 2 I) and had more work experience than traditional college students. They had worked for 0-27 years (Mdn = 3) at 0-14 jobs (Mdn = 3) and had been promoted 0-20 times (Mdn = 0;M = 1.4). They currently worked (r-64 hours per week (Mdn = 22). Sixteen respondents reported that they were not currently working. The most common jobs reported were student (N = 60), salesperson (N= 39), professional (N = 13), service worker (N = 12) and first-level manager (N = 12). It is clear that most respondents who reported their job as student were actually working part-time at some other job; this apparent inconsistency resulted from a poorly written question about current job. An experienced subsample was created by selecting respondents on the basis of their job experience. All 74 participants in this subsample had been promoted at least once ( M = 2.8) and were currently working at least part-time (M = 29.8 h). They had worked for a mean of 7.2 years at a mean of 4.9 jobs. All analyses were repeated f on this experienced subsample, and results o these supplementary analyses are reported when they differ meaningfully from results of the primary analyses.

Design, materials and procedure


Position and decision type were manipulated between-subjects, and test type was manipulated within-subjects. The position was skilled production worker, second-level manager or the respondents current position. The respondents current position was included for exploratory purposes and to enhance generalizability of the results. The decision was either selection (hiring a person into the given position) or promotion (promoting an employee one step into the given position). The questionnaire included the following 16 tests: interview, reference letters, accomplishment record, job skills test, work sample, drug tests, criminal record, cognitive ability test, personality test, medical examination, polygraph, paper-and-pencil honesty test, physical ability test, mental health test, graphology, and astrology . Dozens of selection tests could have been included in this research. To maximize the f acquisition o information, the authors wanted to include as many tests as possible. However, the use of too many tests would result in a questionnaire of inordinate length. The final set of 16 tests were selected on the basis of a literature review, and the tests were intended to

Method
Participants

Participants were 212 students at Florida International University, a mid-sized urban university with two campuses in the greater Miami area. They received class credit for participation in this study. Nine participants with more than one piece of invalid data in their evaluations of the tests were dropped from the sample. Fourteen participants with a single piece of invalid data were retained in the sample, so sample size varies slightly across analyses. Cell size ranged from 31 to 36 participants. The final sample included 52 males and 148 females. The sample was ethnically diverse: it included 77 White non-Hispanics, 24 Black non-

Volume 4

Number 1 January 1996

0 Blackwen Publishers Ltd. 1996

EVALUATIONS O TESTS F

27

vary in fairness, frequency of use, legality, validity, job relevance and intrusiveness. The questionnaire began with a page of instructions that explained the task and guaranteed anonymity. Each of the 16 subsequent pages briefly described one of the tests and then asked the respondent to rate the test with regard to fairness, relevance, invasion of privacy and overall appropriateness. (Copies of these test descriptions can be obtained from the first author.) Fairness, job relevance and invasiveness were included to replicate previous research; in addition, they are incorporated in Gillilands (1993) model. Appropriateness was included because it is a more general judgement, and f thus ratings o appropriateness should depend on perceptions of fairness, job relevance and invasiveness. Ratings were given on nine-point Likert-format response scales with endpoints of strongly disagree and stongly agree. Due to the large number of selection tests the authors were forced to use single-item measures of these concepts. The items were as follows. Fairness: It is fair for Company X to use [selection procedure] when selecting [position]. Job relevance: Information provided by [selection procedure] is related to the ability to perform well as a [position]. Invasiveness: Using [selection procedure] would be an invasion of privacy. Appropriateness: Overall, it is appropriate for Company X to use [selection procedure] to decide who to [decision] as a [position]. Respondents were then asked whether the test had been used when they had previously applied for jobs or promotions. Possible responses were: yes, no, dont know, dont remember, have never applied for a job/promotion. Ten different random orders of these 16 pages were used, and order was treated as a blocking factor. O n the last two pages of the questionnaire the respondent was asked to report hidher gender, ethnicity, education, current job, years in the workforce, number of previous jobs, hours worked per week, and number of promotions.

Responses were given on scan sheets. The entire session typically took less than 40 min.

Results
Correlafionsamong dependent variables and scale development
Correlations among the four ratings are given in Table 1. Consistent with Hypothesis 2, ratings of fairness were strongly associated with judgements of relevance and invasiveness. Consistent with the greater emphasis on job relevance observed in the empirical literature and in Gillilands (1993) model, fairness was more closely related to job relevance than to invasiveness. Similarly, appropriateness was highly correlated with fairness and job relevance. Correlations between these three variables and invasiveness were weaker but still substantial. The high intercorrelations among ratings of fairness, relevance and appropriateness imply that these measures represented a single construct. Thus, responses to these three items were combined to provide a measure of overall evaluation. This approach is consistent with previous research (Fraser and Anderson 1989; Kluger and Rothstein 1993; Robertson et al. 1991). Internal consistency reliabilities (Cronbachs alpha) of this three-item scale ranged from 0.72 to 0.92 when computed for each of the 16 tests. Invasiveness was not included in this evaluation scale because doing so would decrease the reliabilities by a mean of 0.07 (and up to 0.35). All subsequent analyses were performed on this measure of overall evaluation and on the invasiveness ratings.

Multivariate analyses of variance on ratings Evaluation scale. A Position ( 2 ) x Decision (2) Test Type (16) multivariate analysis o f variance (MANOVA) with repeated measures on the third factor was performed on the
X

Table 1: Correlations among dependent variables

! Vrriable
Fairness Relevance Appropriateness Invasiveness

Fairness
0.70 0.73 - 0.40

Relevance 0.95
0.72

Appropriateness
1.00 0.96

Invasiveness
-0.78 -0.74 - 0.80

-0.33

-0.36

Note: High values indicate high levels of fairness, job relevance, appropriateness, and invasiveness.Correlation coefficients above the diagonal were obtained by averaging ratings over participants for each test, and then computing correlations over the 16 tests. Correlation coefficients below the diagonal were obtained by computing correlations over participants for each test, and then averaging the coefficients across the 16 tests using Fishers r to Z transformation.

QBlackweU Publisherr Ltd 1996

Volume 4

Number 1 January 1996

28
-~
~ ~

INTERNATIONAL JOURNAL OF SELECTION AND ASSESSMEhlT

Table 2: Mean evaluations of tests

Positiortb
Test Meart"

Experienc6 with the test


Yes
7.8 7.9 7.4

SPw
~ ~ ~~

SLM
7.7 7.3 7.2 7.3 6.6 6.6 6.3 5.9 5.8** 6.0* 5.3** 4.8 4.0** 3.6 2.2 1.3

No
7.4* 7.1* 6.7** 6.6** 6.5 6.1** 5.7** 5.5* 5.4 5.0 4.6** 4.6** 4.6* 3.1*

Interview Work sample Accomplishment record Job skills test Drug test References Cognitive ability test Mental health assessment Criminal record Honesty test Personality test Medical examination Physical ability test Polygraph Graphology Astrology

7.6, ?.3& 6.9k 6.8bc 6.6h


6.4cd

5.9de 5.h 5.4,fg 5.&, 4.8, 4.8, 4.7, 3.2 2.0 1.3

7.4 7.5 6.7 6.7 6.5 6.2 5.6 5.4 4.8 4.6 4.1 4.9 5.9 3.2 1.9 1.3

7.6
6.8 6.9 7.0 7.1 5.3 6.0 6.1 6.5 5.9 4.5

Nofe: High values indicate that use of the test is positively evaluated '&Means with common subscripts do not differ at the p < 0.001 level by paired t-tests. These means and
I'

comparisons are based on data from all three position conditions. SPW, Skilled production worker; SLM, second-level manager. No means are presented for graphology or astrology because nobody had experienced the former and only one participant had experienced the latter. p < 0.05, * * y < 0.01. tests and personality tests were evaluated more positively when used for making decisions about the management position than when used for makmg decisions about the production position. Physical ability tests were evaluated more positively when used for making decisions about the production position than when used for making decisions about the management position. When these analyses were repeated with the experienced subsample, additional differences were found to be significant. Accomplishment record, references, cognitive ability test, mental health assessment, criminal record and personality tests were considered more appropriate when used for making decisions about the management position than when making decisions about the production f position. The use o physical ability tests was considered more appropriate for the production position than for the management position. In short, these results on the experienced subsample provide even more support for Hypothesis 4 than do the results based on the entire sample.

evaluation scale. The own posifion condition was excluded from this analysis because the variety of jobs held by the respondents makes it impossible to interpret results including this condition. Significant effects were obtained for test type, Wilks F (15, 117) = 177.14, p < 0.001, and the interaction of Position x Test Type, Wilks F (15, 117) = 4.12, p < 0.001. Contrary to Hypothesis 5, no effects involving decision were statistically significant. Table 2 &splays mean ratings of the tests and results of post-hoc comparisons among the means. Supporting Hypothesis I, evaluations vaned considerably among tests. The most positive evaluations were given to interviews and work samples; the most negative evaluations were given to astrology, graphology and polygraphs. (To maximize power and generalizability of results these comparisons were computed over all respondents, including those in the own position condition. Exclusion of the own posifion condition would not appreciably alter these means.) The interaction of Position x Test Type supports Hypothesis 4. To interpret this interaction t-tests were used to assess the effect of position on evaluations of the tests. As shown in Table 2 , significant effects of position were obtained for four tests. Criminal records, honesty

Invasiueness rafings. A Position ( 2 ) x Decision (2) X Test Type (16) MANOVA was performed on invasiveness ratings. The own posifion condition was excluded from this analysis because the

Volume 4

Number 1 lanuay 1996

0 Blackwell Publishers Ltd. I996

EVALUATlONS O TESTS F

29

variety of jobs held by the respondents precludes interpretation of results including this condition. This analysis revealed main effects of test type, Wilks F (15, 113) = 29.59, p < 0.001, and decision, F (I, 127) = 4.15, p < 0.05, and an interaction of Position X Test Type, Wilks F (15, 113) = 2.18, p < 0.05. Table 3 displays the mean invasiveness ratings and results of post-hoc comparisons among the means. Supporting Hypothesis I, ratings of invasiveness varied considerably among tests. For the most part these judgements are similar to the evaluations, but there are some interesting differences. Mental health assessments were evaluated in the middle of the group but received the second highest invasiveness ratings. The opposite was true of physical ability tests, which received low evaluations but only intermediate ratings of invasiveness. (To maximize power and generalizability these comparisons were computed over all respondents, including those in the own position condition. Exclusion of the own position condition would not appreciably alter these means.) The interaction of Position X Test Type was interpreted by using f-tests to assess the effects of position in invasiveness ratings of the tests. Significant effects of position were observed for only two tests. The use of physical ability tests was considered more invasive when used for the management position than when used for the production position. The use of personality tests

was considered more invasive when used for the production position than when used for the management position. The interaction of Position X Test Type was not significant when the analysis was repeated on the experienced subsample. Thus, the invasiveness ratings provided less support for Hypothesis 4 than did the general evaluations. The main effect of decision occurred because the tests were considered more invasive when used for selection decisions ( M = 4.1) than when used for promotion decisions (M = 3.7). This provides weak support for Hypothesis 5, which predicted either a main effect of decision or an interaction of Decision X Test in reactions to selection tests.
Effects of experience

Table 3: Mean invasiveness rafings of tests Test


Interview Work sample test Job skills test Accomplishment record References Cognitive ability test Physical ability test Drug test Criminal record Honest test Personality test Medical examination Graphology Astrology Mental health assessment Polygraph

Meana

2.0, 2.1,
2.2, 2.2ab 2.9h 3.4cd 3.8de 4.3ef 4.3ef 4.3ef 4.6efg 5.0fg 5.01, 5.Of, 5.1, 6.1

Note: High values indicate that use of the test is considered an invasion o privacy. f Within this column, means with common subscripts do not differ at the p < 0.001 level by paired t-tests. These means and comparisons are based on data from all three position conditions.

Experiences with the 16 tests varied considerably. Five tests had been experienced by at least 20% of the sample: interviews (57%), references (43%), accomplishment records (33%),job skills test (26%) and work sample tests (24%). O n the other hand, nobody had experienced graphology and only one respondent had experienced astrology. Among respondents in the experienced subsample, six tests had been experienced by at least 30% of the sample: interviews (86%), references (68%), accomplishment records (58%), job skills test (41%), work sample (39%) and drug test (32%). To test Hypothesis 3, regarding the moderating effects of personal experience, additional analyses were performed. Because relatively few respondents had experienced some of these tests it was not possible to incorporate the decision and position variables in these analyses. Graphology and astrology were dropped because they had not been experienced. For each of the 14 remaining tests two groups of participants were created: those who had experienced the test, and all others. Supporting Hypothesis 3, t-tests on general evaluations revealed significant differences for II of the tests - all except drug test, criminal record and honesty test. As can be seen in Table 2, in each case respondents who had experienced the test evaluated it more positively than did those who had not experienced it. Parallel f-tests on the experienced subsample revealed significant effects o test experience for f seven tests: work sample, accomplishment record, cognitive ability test, mental health assessment, personality test, medical examination and polygraph. In each case respondents who had experienced the test evaluated it more positively than did those who had not experienced it. Similar f-tests on invasiveness ratings revealed significant effects only for the cognitive ability test and medical examination. Respondents who

0 Blackwell Publishers Ltd. 1996

Volume 4

Number 1 January 1996

30

INTERNATIONAL JOURNAL O SELECTION A N D ASSESSMENT F

Table 4: Mean evaluations provided by respondents who had experienced each test
~

Testa

Mean WS IN IS AR 7.9 7.8 7.6


7.4

COG REF

DT MED PER
15 31* 20 18 11 26 33
15** 21** 15 1 1 6 17 12 22

HONPA
8 1' 4 19** 19** 11* 13 1 1 12 9 7 18 13 8 12* 1 1 1 6 6 22 5 19

CR
13** 27*"* 16" 17* 10 20 12 7 5 8 7 30

POL
13** 17*** 13* 10* 8 12* 13** 9 3 5 8 9 18

WS IN JS
AR

48

46 32 25*** 115 49 63** 53 28 67

17 26* 19 15
29

COG REF DT MED PER HON


PA

CR POL

7.0 6.9 6.8 6.5 6.1 6.0 5.9 5.3


4.5

33*** 83*** 39* 52* 19* 87

13** 24*** 14 18* 7 23**


8 7

25

Note: The number o participants who had experienced each test is given in the diagonal. Their mean ratings f are given in the second column. The number of participants who had experienced each pair of tests is given above the diagonal; the significance level of the t-test comparing the two means is indicated with asterisks (* p < 0.05, ** p < 0.01, *** p < 0.001).Such tests were not computed if fewer than 10respondents had experienced both tests. Abbreviations are as follows: WS, work sample; IN, interview; IS, job skills test; AR, accomplishment record; COG, cognitive ability test; REF, references; DT, drug test; MED, medical examination; PER, personality test; HON, honesty test; PA, physical ability test; CR, criminal record; POL, polygraph.

had taken cognitive ability tests ( M = 2.3) and medical examinations ( M = 3.8) considered them less invasive than did respondents who had not taken them ( M = 3.5 and 5.2, respectively). Parallel f-tests on the experienced subsample also revealed significant effects on cognitive ability tests, medical examinations and work samples. In each case experience was associated with lower ratings of invasiveness. f These effects o experience suggest that the results based on the entire sample of participants might need to be qualified. Thus, additional analyses limited to respondents who reported experiencing each test were performed. These analyses excluded mental health assessment, graphology and astrology because so few respondents had experienced them. The results of these analyses are summarized in Table 4. The number of respondents who reported experiencing each test is given in the main diagonal of Table 4, and the mean evaluation provided by those respondents is given in the second column. The ranking of these means is quite similar to the ranking of overall means given in Table 2. Only two tests have ranks that differ by more than two places: medical examinations are ranked eighth by those who experienced them but twelfth by the entire sample, and criminal record is ranked twelfth by those who experienced it but ninth by the entire sample. Thus, there is little need to qualify the previous results; to a substantial extent they should generalize to more experienced respondents.

Table 4 also reports the results of paired comparisons among the tests based on data from respondents who had experienced both tests. The number of such respondents is given above the diagonal, and the statistical significance of the f-test is indicated with asterisks. Thus, for example, 25 respondents had experienced both work samples and accomplishment records, and they preferred work samples at the p < 0.001 level. The t-test was not performed when fewer than 10 respondents had experienced both tests. These results reveal three overlapping sets of selection tests. The most positively evaluated tests are work samples, interviews and job skills tests. The next set includes job skills, accomplishment records and cognitive ability tests. The third set includes references, drug tests and medical examinations. The remaining tests were experienced too infrequently to provide stable comparisons.

Discussion
Review of results and implications for hypotheses Consistent with Hypothesis 2, perceptions of fairness were strongly related to ratings of job f relevance and moderately related to ratings o invasiveness. Of course these intercorrelations may have been inflated by common method variance, and additional research is needed to provide a stronger test of Hypothesis 2. Job relevance, fairness and appropriateness were so

Volume 4

Number 1 January 1996

8 Blackwell Publishers Ltd.

1996

EVALUATIONS O TESTS F

31

closely related that for the remaining analyses they were combined into a single evaluation scale. There were strong inverse relations between evaluations of tests and judgements of invasiveness, but there were also two interesting inconsistencies. First, two of the tests were ranked differently on the two dimensions. Mental health assessments received moderate evaluations but were considered among the most invasive; physical ability tests received relatively low evaluations but were not considered especially invasive. Second, experience rarely affected invasiveness ratings but consistently led to more positive evaluations. The effects of experience support Hypothesis 3. Consistent with previous research and with Hypothesis I, evaluations and judgement of invasiveness vaned from test to test. As predicted by Hypothesis 4, evaluations o f several tests were affected by the position. There was a tendency for tests of physical ability to be more highly evaluated when used for the production worker position,. and for tests of attitudes and general behaviour to be more highly evaluated when used for the manager position. Position had minimal effects on ratings of invasiveness. Finally, Hypothesis 5 received almost no support. As a group the tests were considered less invasive when used for promotion decisions than when used for selection decisions, but there were no other significant effects involving the decision.

Gilliland's (1993) model might safely be generalized from selection to promotion decisions, although this suggestion is very tentative. While the positive effects of experience were not observed on evaluation of all selection tests, in no case was a significant negative effect observed. Thus, these results are consistent both with Gillilands (1993) conceptualization and with the mere exposure hypothesis of Zajonc (1968; Bomstein 1989). Additional research is needed to provide more precise tests among the possible explanations for the effects of experience.

Practical implications
Because applicant reactions to selection tests will probably affect additional reactions to the organization, the differences in evaluations have considerable practical importance. The most positive evaluations were given to interviews, work samples, accomplishment records, job skills tests, drug tests and references. With the exception of drug tests, these tests also received the lowest invasiveness ratings. Previous research on applicant reactions to multiple tests has also observed relatively positive reactions to interviews and work samples (Fraser and Anderson 1989; Schuler 1993; Smither et al. 1993; Stone ef al. 1989). Previous work on evaluations of drug tests, however, has revealed considerable disagreement among respondents (eg. Labig 1992; Murphy, Thomton and Reynolds 1990). These results, therefore, imply that applicants will respond positively to organizations that use intereviews, work samples, accomplishment records, job skills tests, and references. Of course reactions to all these tests will vary with the details; for example, some recent work has revealed different reactions to different types of interviews (Rynes and Connerley 1993). These data are relevant for resolution of the justice dilemma (Cropanzano and Hunsberger 1994; Folger and Cropanzano in press), which would involve replacing unfair tests with equally valid tests that are considered fair. Three of the most positively evaluated tests (work samples, accomplishment record - via biodata and job skills tests) are among the most valid (Hunter and Hunter 1984; Muchinsky 1993). Furthermore, research has recently revealed that interviews can be quite valid, especially if they are structured and/or situational (Huffcutt and Arthur 1994; McDaniel, Whetzel, Schmidt and Maurer 1994). Thus, the careful development and use of interviews, work samples, job skills tests and accomplishment recordlbiodata tests can result in selection tests that predict performance and are positively evaluated by applicants. It is interesting to note how these selection

Theoretical implications
These results both support and extend Gillilands (1993) model. The effects of test type on ratings of job relevance and invasiveness are predicted by the model, as are the correlations between these variables and perceived fairness. These correlations do not provide a strong test of the model, however, because the model refers to fairness of the entire selection procedure whereas the authors assessed fairness of the test. Similarly, the model posits that experience will moderate the relation between rule compliance and fairness judgements, whereas the authors found that experience moderated the relation between test type and evaluation of the test. phis lack of parallelism between the present study and Gillilands model exists because the authors had completed data collection before learning of the model.) It was also found that the position being filled served a similar moderating role, and although Gilliland (1993) did not include position as a possible moderating variable, it could easily be incorporated in his model. Finally, the lack of significant interactions involving the decision variable suggest that

Q Blackwell Publishers Ltd. 1996

Volume 4

Number 1 January 1996

32

INTERNATlONAL JOURNAL O SELECTlON AND ASSESSMENT F

tests compare in actual frequency of use. Research in the United States, Europe and Australia has revealed that interviews, references and application forms are used most frequently (Bureau of National Affairs 1988;Di Milia, Smith and Brown 1994; Shackleton and Newell 1994) and, along with work samples, are evaluated most positively by HR professionals (Harris, Dworkin and Park 1990; Shackleton and Newell 1994). With the exception of 'normal' interviews and references, these procedures have good predictive validity and are liked by applicants. These results imply that the justice dilemma may not be particularly severe, although there is clearly room for improvement. The most negative evaluations were in response to the use of the astrology, graphology and polygraphs. These three tests, along with mental health assessments, received the highest ratings of invasiveness. The extremely negative reactions to polygraphs and graphology are consistent with previous research (Fraser and Anderson 1989; Rynes and Connerley 1993; Stone el a/. 1989), and suggest that these procedures should be avoided when possible. The use of polygraphs is constrained by law, so few organizations will be tempted to use them. f However, despite limited evidence o validity, graphology is commonly used in France and the French section of Belgium, and is becoming increasingly common in the United States (Fowler 1991; Neter and Ben-Shakhar 1989; Shackleton and Newell 1994; Smith and Abrahamsen 1992, as cited in Smith, Farr and Schuler 1993).Even if they are unconcerned with the lack of validity, organizations should eschew the use of graphology to avoid negative reactions from their applicants and employees. Astrology is almost never used for selection purposes (Di Milia, Smith and Brown 1994; Shackleton and Newell 1994). Previous experience with the 16 tests varied dramatically. Over half the respondents had been given interviews, and at least 20% had experienced references, accomplishment records, job skills tests and work sample tests. It is encouraging to note that these tests received the lowest invasiveness ratings, and they are five of the six tests with the most positive evaluations. In recent years a great deal of attention has been devoted to cognitive ability tests, largely because they appear to be valid for a great variety of iobs (Schmidt and Hunter 19il). It is interesting to note that only 14% of the sample had been given cognitive ability tests, and these tests are not positively evaluated. Finally, the positive effect of experience on evaluations has favourable implications for organizations; it suggests that applicants may be more accepting of many tests after having experienced them.

Research implications

The nonsignificant interaction of decision with test type merits further attention. These results imply an ability to generalize research and theory from selection to promotion situations, and selection has received considerably more attention than has promotion. Doubtlessly not all selection research will generalize to promotion situations, but it would be worthwhile to discover the limits of this generalization. The moderating effect of position implies a need to qualify the results of past research on reactions to selection procedures in which the position was not specified. In addition, of course, it implies that in future research the position for which a selection test is to be used must be clearly specified. Similarly, the effects of experience on evaluations suggest a possible need to qualify previous results in which experience was not assessed. Such qualification, however, should be minor because experience had little effect on the rankings of the tests. It is interesting to note that experience increased evaluations of all tests except those that implied a distrust of the employee (drug test, criminal record, paper-andpencil honesty test). The only other test that implied such a distrust is the polygraph, on which the effect of experience was small but significant. Future research might explore the stability of this finding and, if it is stable, possible reasons therefore. The present study included only 16 tests, and they were described in general terms. Future research should incorporate other types of tests. Furthermore, each of these tests could be developed in many ways. For example, interviews can be unstructured, structured or situational. Cognitive ability tests can use abstract questions or business-related questions. Such differences may affect evaluations o the tests (cf. f Kluger and Rothstein 1993) and future research should explore the parameters of such effects. Finally, it would be useful to discover whether providing respondents with accurate information about test validity affects their reactions to tests. Will people be less enthusiastic about traditional interviews if they are told that such interviews have limited validity? Will people evaluate cognitive ability tests more positively if they are told that such tests have good predictive validity?
Potential limitations

Like all research, the present study has limitations. Some readers may be concerned by the sample, which consisted of college students. The authors do not believe the sample poses a problem for several reasons. First, these were

Volume 4

Number 1 January 1996

8 Blackwell Publishers Ltd, 1996

EVALUATIONS OF TESTS

33

nontraditional college students with an unusual amount of work experience. Second, results of supplementary analyses on a subsample of experienced workers were consistent with results of the complete sample. Third, job experience and experience with selection tests are not identical. A 50-year-old employee who was hired 30 years ago and has continued to work for the same organization may well have experienced fewer selection tests than a 21year-old who has worked at three different jobs over the previous three years (as is true of the median respondent in this study). In short, the authors believe their sample of respondents was appropriate for the questions being asked. A greater concern is the use of scenarios. Rather than actually taking the tests the respondents evaluated them on the basis of brief written descriptions. The use of scenarios cannot be avoided if one wishes to compare reactions to a large number of tests, especially if one wishes to include tests that are rarely used. Thus, the methodology was driven by the research question. The weakness resulting from the use of scenarios was further ameliorated by the analyses based solely on participants who had experienced specific tests. Results of these supplementary analyses imply that there is no need to qualify the primary results.

Acknowledgements
We would like to thank Richard h e y , Bill Balzer, Scott Fraser, Stephen Gilliland, Barbara Martin, Juan Sanchez and two anonymous reviewers for the helpful suggestions they made at various stages of this research project. A preliminary report o this work was presented at f the 1994 meeting of the Society for Industrial and Organizational Psychology.

References
h e y , R.D. (1992)Fairness and ethical considerations in employee selection. In D.M. Saunders (ed.), New Approaches to Employee Munagemenf:Fairness in Employee Selection (vol. I, pp. 1-19). Greenwich, CT: IAI Press. Bornstein, R.F. (1989)Exposure and affect: overview and meta-analysis of research, 1968-1987. Psychologicu! Bulletin, 106, 265-289. Brockner, J. (1990)Scope of justice in the workplace: how survivors react to co-worker layoffs. Jounral of Social
SUB, 46,95-106.

Final note The present study is more descriptive than explanatory. Results reveal that people respond differently to different tests, and that these responses are moderated by the job for which the test is being used and by the respondents experience with the test. These results have theoretical, practical and methodological implications; but the data do not explain why these results occur. Why do people favour some tests and oppose others? Perceptions of fairness, job relevance and invasiveness presumably play important roles, but they must be manipulated if causal conclusions are to be drawn. Along the same lines, why does the job moderate evaluations? Jobs vary in many ways, including complexity, status and required tasks. What characteristics o jobs determine evaluations of f tests used to select or promote people into those jobs? Finally, why does experience with tests moderate evaluations of those tests? Is it mere familiarity with the test, or knowledge about the test? In summary, while the present study describes effects of test type, experience and position, future research is needed to explain these effects.

Bureau of National Affairs (1988)Recruiting and Selection Procedures (Personnel Policies F o m Survey No. 146, May). Washington, DC: Bureau of National Affairs. Gopanzano, R. and Hunsberger, H. (1994) The justice dilemma in employee selection: some reflections on the trade-offs between social justice and statistical validity. In S.W. Gifliland (Chair), Selecfion from the Apphnts Perspective: Justice and Employee Selection Procedures. Symposium conducted at the meeting of the Society for Industrial and Organizational Psychology, Nashville, TN, April 1994. Di Milia, L., Smith, PA. and Brown, D.F. (1994)Management selection in Australia: a comparison with British and French findings. Intemationul Journal of Selection and Assessment, 2(2), 80-90. Folger, R. and Cropanzano, R. (in press) Organimkionul Justice and Human Resource Manugement. Beverly Hills, CA: Sage. Folger, R. and Greenberg, J. (1985)Procedural justice: an interpretive analysis of personnel systems. In K. Rowland and G. Ferris (eds), Research in Personnel and Human Resources Mnugement (vol. 3, pp. 141-183). Greenwich, CT: JAI Press. Fowler, A. (1991) An even-handed approach to graphology. Personnel Munugement, 23(3), 40-43. Fraser, S.L. and Anderson, M. (1989)Perceived accuracy and acceptability of selection procedures. Paper presented at the annual meeting of the Southeastern Psychological Association, Washington, DC, March
1989.

Garland, H., Giacobbe, J. and French, J.L. (1989)Attitudes toward employee and employer rights in the workplace. Employee Responsibilities and Rights Journal, 2, 49-59. Gilliland, S.W. (1993)The perceived fairness of selection systems: an organizational justice perspective. Ac de m y of Manugement Review, 18, 694-734. Gilliland, S.W. (1994) Effects of procedural and distributive justice on reactions to a selection system. Jounral of Applied P~chology, 691-701. 79, Gilliland, S.W. and Honig, H. (1994) The perceived fairness of employee selection systems as a predictor of attitudes

8 Blackwell Publishers Ld I996 t

Volume 4

Number 1 January 1996

34

INTERNATIONAL JOURNAL OF SELECTION A N D AS.SESSMW

and seIf-concept. In S.W. Gilliland (Chair), Selection from the Applicants Perspective: Justice and Employee Selection Procedures. Symposium conducted at the meeting of the society for Industrial and Organizational Psychology, Nashville, TN, April 1994. Greenberg, J. (1986)Organization4 performance appraisal procedures: what makes them fair? In R.J. Lewicki, B.H. Sheppard and M.H. B a z e m (eds), Research on Negotiations in Organizations (vol. 1, pp. 25-41). Greenwich, CT: JAI Press. Greenberg, J. (1990) Organizational justice: yesterday, today, and tomorrow. ]ovrnal of Managemmt, 1 6 , 399-432. Harris, MM., Dworkin, J.B. and Park. J. (1990) Preemployment screening procedures: how human resource managers perceive them. Journal of Business and Psychobgy. 4 , 279-292. Huffcutt, A.I. and Arthur, W., Jr (1994) Hunter and Hunter (1984) revisited: interviewer validity for entry-level jobs. ]ournal of Applied Psychology, 79, 184-190. Hunter, J.E. and Hunter, R.F. (1984)Validity and utility of alternative predictors of job performance. Psycho[ogrcnl Bulletin, 96, 72-98. Karren, R.J. (1989) analysis o the drug testing decision. An f Employee Responsibilities and Rights Joumal, 2, 27-37. Kluger, A.N. and Rothstein, H.R. (1993) The intluence of selection test type on applicant reactions to employment testing. Journal of Business and Psychology, 8.3-25. Konovsky, MA. and Gopanzano, R (1991) Perceived fairness of employee drug testing as a predictor of employee attitudes and job performance. Journal of Applied Psychology, 76, 698-707. Labig, C.E., Jr (1992) Supervisory and nonsupervisory employee attitudes about drug testing. Employee es and Rights Journal, 5, 131-141. McDaniel, MA., Whetzel, D.L., Schmidt, F.L. and Maurer, S.D. (1994)The validity of employment interviews: a comprehensive review and meta-analysis. Journal of Applied Psychology, 79, 599-616. McFarlin, D.B. and Sweeney, P.D. (1992)Distributive and procedural justice as predictors of satisfaction with personal and organizational outcomes. Academy of Management Jouml, 35, 626-637. Macan, T.H., Avedon, M.J., Paese, M. and Smith, D.E. (1994)The effects o apphnts reactions to cognitive f ability tests and an assessment center. Personnel Psychology, 4 7, 715-738. Moore, R.W. and Stewart, RM. (1989) Evaluating employee integrity: moral and methodological problems. Employee Responsibilities and Rights ]ournal, 2, 203-215. Muchinsky, P.M. (1993)Psychology Applied to Work (4th edn.). Pacific Grove, CA: BrookslCole. When yor top choice tums you down: Murphy, K.R. (1986) effect of rejected offers on the utility of selection tests. Psychological Bulletin, 99, 133-138. Murphy, K.R., Thornton, G.C.,1 1 and Reynolds, D.H. 1 (1990) College students attitudes toward employee drug testing programs. Personnel Psychology, 4 3 , 615631. Nacoste, R.W. (1987)But do they care about fairness? The dynamics of preferential treatment and minority interest. Basic and Applied Social Psychology, 8, 177-191. Neter, E. and Ben-Shakhar, G. (1989) predictive validity The of graphological inferences: a meta-analytic approach. Personality and Individual Dfferences, 10,737-745.

Robertson, I.T., lles, P.A., Gratton, L. and Sharpley, D. (1991) The impact of personnel selection and assessment methods on candidates. Humun Relations, 4 4 . 963-982. Rosse, J.G., Miller, J.L. and Stecher, M.D. (1994)A field study of job applicants reactions to personality and cognitive ability testing. Journal of Applied Psychology, 79, 987-992. Rynes, S.L. and Connerley, M.L. (1993)Applicant reactions to alternative selection procedures. Journal of Business and Psychology, 7, 261-277. Rynes, S.L., Heneman, H.G., I11 and Schwab, D.P. (1980) Individual reactions to organizational reauiting: a review. Personnel Psychology, 33, 529-542. Schmidt, F.L. and Hunter, J.E. (1981)Employment testing: old theories and new research findings. Ammcan Psychologist, 36, 1128-1136. Schmitt, N.and Gilliland, S.W. (1992)Beyond differential prediction: fairness in selection. In D.M. Saunders (ed.), New Approaches to Employee Managemmt: Fairness in Employee Selection (vol. I, pp. 21-46). Greenwich, CT: JAI Press. Schuler, H. (1993)Social validity of selection situatiow: a concept and some empirical results. In H. Schuler, J.L. Farr and M. Smith (eds), Personnel Selection and Assessment: Individual and Organizational Perspectives (pp. 11-26). Hillsdale, NI. Lawrence Erlbaum Associates. Shackleton,V. and Newel S. (1994) European management selection methods: a comparison of five countries. International Journal of Sekction and Assessment, 2, 91102. Sheppard, B.H. and Lewicki, R.J. (1987)Toward general f principles o managerial fairness. Social Jusfice Research, 1, 161-176. Smith, M., Farr, J.L. and Schuler, H. (1993)Individual and organizational perspectives on personnel procedures: conclusions and horizons for future research. In H. Schuler, J.L. Farr and M. Smith (eds), Personnel Selection and Assessmenf: Individual and Organizational Perspectives (pp. 333-351). Hillsdale, NJ: Lawrence Erlbaum Associates. Smith, J.W., Reilly, R.R., Millsap, R.E., Pearlman, K. and Stoffey, R.W. (1993)Applicant reactions to selection procedures. Personnel Psychology, 46, 49-76. Stone, D.L. and Kotch, DA. (1989)Individuals attitudes toward organizational drug testing policies and practices. Iournal of Applied Psychology, 7 4 , 518-521. Stone, E.F.. Stone, D.L. and Hyatt, D. (1989)Personnel selection procedures and invasion of privacy. In R.M. Guion (Chair), Privacy in Organidiom: Personnel Selection, Pbysicnl Environment, and Legal Issues. Symposium conducted at the meeting of the Society for Industrial and Organizational Psychology, Boston, MA. April 1989. Thornton, G.C., I11 (1993)The effect o selection practices f on applicants perceptions of organizational characteristics. In H. Schuler, J.L. Fan and M. Smith (eds), Personnel Selection and Assessment: Individual and Organizational Perspectives (pp. 57-69). Hillsdale, N) Lawrence Erlbaum Associates. Zajonc, R.B. (1968)Attitudinal effects of mere exposure. Journal of Personality and Social Psychology, Monograph S u p p h e n t , 9 (2,Parl 2 , 1-27. ) Zunin, L. and Zunin, N. (1972)Contact - The First Four Minutes. Los Angeles: Nash Publishing.

Volume 4

Number 7

January 7996

0 Blackwell Publishers Ltd I996

You might also like