You are on page 1of 11

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/313024462

Item properties and the convergent validity of


personality assessment: A peer rating study

Article in Personality and Individual Differences August 2017


DOI: 10.1016/j.paid.2017.01.051

CITATIONS READS

0 19

3 authors, including:

Rachel Plouffe
The University of Western Ontario
7 PUBLICATIONS 1 CITATION

SEE PROFILE

All content following this page was uploaded by Rachel Plouffe on 07 February 2017.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Personality and Individual Differences 111 (2017) 96105

Contents lists available at ScienceDirect

Personality and Individual Differences

journal homepage: www.elsevier.com/locate/paid

Item properties and the convergent validity of personality assessment:


A peer rating study
Rachel A. Plouffe , Sampo V. Paunonen , Donald H. Saklofske
Department of Psychology, The University of Western Ontario, Canada

a r t i c l e i n f o a b s t r a c t

Article history: The current research evaluated the impact of personality questionnaire item content saturation, item social de-
Received 14 September 2016 sirability, and mean item responses on the overall convergent validity of three well-known personality measures.
Received in revised form 25 January 2017 Archival data representing groups of same-sex undergraduate roommate dyads were used for this research. Re-
Accepted 28 January 2017
sults demonstrated that content saturation, measured using item-total correlations, was the most consistent pre-
Available online xxxx
dictor of item convergent validity, measured using self-peer item response correlations. In order to predict
Keywords:
outcome variables in education, clinical, and vocational contexts using scores on personality questionnaires, it
Test construction is important for researchers to employ item selection procedures that take into account the item properties
Self-peer agreement that affect the test's convergent validity.
Convergent validity 2017 Elsevier Ltd. All rights reserved.
Content saturation
Social desirability
Mean responses
Personality
Self-report

1. Introduction paradigm crisis when a wealth of evidence introduced by critics of


self-report personality testing revealed that individuals' responses to
The most common method for assessing personality and individual personality test items demonstrated little cross-situational consistency.
differences for more than a century has been the self-report personality That is, there appeared to be little stability of reported behaviour across
questionnaire (Jackson & Paunonen, 1980; Paunonen & Hong, 2015; time and situations (e.g., Mischel, 1968; Shrauger & Schoeneman,
Paunonen & O'Neill, 2010). Traditional methods of administering ques- 1979). Furthermore, correlations between personality trait measures
tionnaires include the completion of paper-based rating scales, in which and relevant behaviours seldom surpassed a ceiling of 0.30 (Epstein,
participants indicate how representative a specic trait label, behaviour 1983; Jackson & Paunonen, 1985). The fundamental assumption under-
tendency, or attitude is to them (Holden & Troister, 2009). Modern tech- lying personality testing, which maintains that the characteristics and
nology allows for scales to be computerized and tailored to individual behaviours of individuals remain stable enough across diverse
respondents, making them arguably one of the most efcient indicators situations to classify them as enduring personality traits, was thus
of personality. Regardless of how they are administered, self-report undermined (Epstein & O'Brien, 1985).
questionnaires are an expedient way to assess an individual's attitudes The person-situation debate provided the basis for a wealth of
and behaviours (Jackson & Paunonen, 1980; Paunonen & O'Neill, 2010). research concerning the improvement of traditional methods of person-
The premise underlying self-report measures of personality is that ality test construction and assessment (Paunonen, 1984). Such im-
individuals possess enough insight into their own psychological pro- provements involve two fundamental psychometric requirements for
cesses and past behaviours to make accurate judgments about their per- sound personality measurement: the establishment of the measure's re-
sonality characteristics (John & Benet-Martinez, 2000; Paunonen & liability and validity (Clark & Watson, 1995; Loevinger, 1957). Many of
O'Neill, 2010). However, personality theory and assessment faced a the apparent inconsistencies in personality test scores across time re-
ported in some studies can be explained by the use of scales lacking in
these psychometric properties (Jackson & Paunonen, 1985). One nota-
This research was partially supported by Social Sciences and Humanities Research ble problem with past measures, for example, has been the use of self-
Council of Canada Research Grant 410-98-1555 awarded to the second author. report questionnaire items that tend to elicit socially desirable
Corresponding author at: Department of Psychology, University of Western Ontario,
London N6A 5C2, Canada.
responding (Jackson, 1984).
E-mail address: rplouffe@uwo.ca (R.A. Plouffe). Personality scales, even within the same omnibus questionnaire,

Deceased 29 December 2015. typically do not have identical reliabilities or validities. The items on

http://dx.doi.org/10.1016/j.paid.2017.01.051
0191-8869/ 2017 Elsevier Ltd. All rights reserved.
R.A. Plouffe et al. / Personality and Individual Differences 111 (2017) 96105 97

two scales might look similar in style and format, be written by the same measure single, unitary personality constructs. Thus, a highly content
item writers, and be the result of the same statistical item selection saturated scale will have high scale homogeneity, with all items
strategy, yet the scales have different validities. One reason could be dif- representing trait-relevant content and not trait-irrelevant content.
ferential desirability in the scales' items, but there are other item prop- One method for constructing personality questionnaires reecting
erties that could be at work, including an item's difculty, wording, high content saturation involves employing factor analytic procedures
direction of keying, face validity, content saturation, and more. The pri- in order to maximize item homogeneity and internal consistency (e.g.,
mary purpose of this study was to evaluate some of these psychometric Briggs & Cheek, 1986; Paunonen, 1984; Paunonen, 1987). Items with
item properties in terms of their contribution to the convergent validity the highest loadings on the largest factor underlying the scale's item in-
of self-report measures of personality. tercorrelation matrix are inferred to be most content saturated. These
items correlate more highly among themselves than do those with
1.1. Evaluating the convergent validity of self-report personality measures low loadings on the factor, which is likely due to their relation to a com-
mon theme that is, the trait being measured (assuming irrelevant ho-
There are a number of ways to evaluate the validity of individuals' mogenizing factors such as desirability can be ruled out). Paunonen
total scores on self-report personality inventories. The current study (1984) constructed ad hoc Personality Research Form (PRF; Jackson,
evaluates the convergent validity for a series of measures, which we 1984) subscales of varying length with varying content saturation by
do by comparing different methods for assessing the same construct retaining items with high to low loadings on the rst unrotated princi-
and looking for agreement (Campbell & Fiske, 1959; Nunnally & pal component extracted from the scale's items. Those scales construct-
Bernstein, 1994). ed to reect maximum content saturation (i.e., having the highest factor
One way to compute convergent validity coefcients involves hav- loadings) were more highly correlated with criteria such as peer ratings
ing an individual who is well acquainted with the target complete the than were scales simply constructed to maximize items' contributions
same questionnaires in a peer rating format, and then to correlate the to the prediction of a relevant trait criterion.
peer responses with the target responses to items (Holden & Troister, Maximizing content saturation may also involve computing item-
2009; Paunonen, 1984; Paunonen & O'Neill, 2010). This well- total correlations. Here, responses to individual items are correlated
acquainted individual could be a parent, friend, signicant other, teach- with total scale scores, and higher item-total correlations are then as-
er, or sibling. High correlations between self- and peer ratings provide sumed to reect more content saturation. This index is clearly linked
evidence for the measure's convergent validity (Campbell & Fiske, to the above-mentioned factor loading index. Paunonen (1987) demon-
1959; Foster & Cone, 1995). Using self-peer response convergence as a strated that item loadings in a multiple group factor analysis, where
means to assess convergent validity has been successfully applied in a each scale's items were assigned to their own factor, correlated in excess
number of previous studies (e.g., Costa & McCrae, 1992; Funder, 1987; of 0.99 with item-total correlations.
Funder & Colvin, 1988; Jackson, 1984; Paunonen, 1984; Paunonen & Construct-based scale items that are most saturated with trait rele-
Kam, 2014). It is assumed that the peer rater will have been exposed vant content can be more valid than even criterion-based scale items,
to a number of behavioural cues about the target's personality, and on which the development and selection of items is based primarily
that as these relevant cues increase, so too will self-peer agreement or on how well they contribute to the prediction of a criterion variable
convergence (Paunonen & O'Neill, 2010). This method of establishing (John & Benet-Martinez, 2000). Paunonen (1984) argued that such
convergent validity was employed for the purpose of the current re- higher correlations between peer ratings and self-ratings on the more
search. Specically, in the current study, convergent validity for a series content saturated PRF items in his study were attributable to those
of personality measures was established by correlating self-report re- items being most prototypical of the trait. In other words, content satu-
sponses to personality questionnaire items with roommate responses rated items are highly salient and likely to be highly representative of
to the questionnaire items. Convergent validity, in this case, is dened concrete trait-relevant behaviours. In contrast, items low in content sat-
as the self-peer correlations on individual personality questionnaire uration might measure multidimensional content or ambiguous con-
items. Higher self-peer convergence was indicative of higher conver- tent, which might be difcult for respondents, be they selves or peers,
gent validity. to interpret consensually.

1.2. Item properties affecting convergent validity 1.2.2. Item means


It is generally proposed that the optimal items to select in test con-
Various psychometric item properties can have a demonstrable ef- struction are those with moderate means or p-values (popularity or
fect on the convergent validity of a personality questionnaire. These in- probability of endorsement values). Items with moderate mean en-
clude, for example, item content saturation, item means, and item dorsement levels (e.g., around 0.50 on a binary True/False response
desirability. Many of the putative shortcomings of personality assess- scale, or 3 on a 5-point Likert scale) can demonstrate maximal observed
ment described in the literature, such as the absence of ndings of score variance and respondent discrimination (i.e., how well the item
test-behaviour predictability, can be said to be due to a lack of consider- distinguishes between respondents on the measured trait). On the
ation for these item characteristics (Jackson & Paunonen, 1980). other hand, items with extreme p-values (i.e., values close to 0.0 or
1.0, or to 1 or 5 on the 5-point scale) fail to differentiate between indi-
1.2.1. Content saturation viduals because of the restricted variance of item responses. Further-
A goal in the construction of theoretical, construct-based measures more, items with extreme p-values impose limits on the strength of
of personality is to establish construct validity, dened as the degree the correlations that the measure can demonstrate with criterion vari-
to which it measures some trait which really exists in some sense ables, thus attenuating indices of validity (Epstein, 1983).
(Loevinger, 1957; p. 685). More recently, Borsboom, Mellenbergh, and Holden, Fekken, and Jackson (1985) examined the relationship be-
van Heerden (2004) have argued that a personality questionnaire can tween absolute endorsement frequency of 80 binary PRF items and cri-
only possess validity if an attribute exists and if trait variation causally terion validity. Their results demonstrated a signicant correlation of
produces variation in test scores. An important consideration in estab- 0.29 between extreme endorsement levels and criterion validity.
lishing personality questionnaire validity is the extent to which items Thus, items that are endorsed by many respondents or by few respon-
are content saturated. Content saturated items contain trait-relevant dents hinder the criterion validity of a measure (Nunnally, 1978). This
content, and the best ones are the most prototypical representations does not mean that items with moderate means are denitely more
of their content domains (Paunonen, 1984). A general assumption in valid, but rather such items do not have the same statistical constraint
conventional personality scale construction is that single scales should on validity.
98 R.A. Plouffe et al. / Personality and Individual Differences 111 (2017) 96105

1.2.3. Item desirability 2010). Of the specic properties described earlier, those under consider-
Socially desirable responding (SDR) is a response bias in which indi- ation for the purpose of this study are thought to be among the most im-
viduals endorse response options to items on personality measures in portant considerations in writing and selecting personality test items:
order to present a favourable image of the self and to prevent negative (a) item social desirability, (b) item difculty (i.e., mean item re-
perceptions from others (Paulhus, 2002; Podsakoff, MacKenzie, Lee, & sponses), and (c) item content saturation.
Podsakoff, 2003). This tendency has problematic effects on personality
assessment validity. Not only might the respondent be grossly 1.3.1. Hypotheses
misrepresenting his or her true level of trait, which compromises the 1. Item social desirability scale values (SDSVs) will curvilinearly predict
measure's construct validity, but SDR can affect mean levels of self-peer agreement on a series of personality measures. To the extent
responding, thus altering relationships between the test and variables that items have high or low SDSVs, this will elicit low self-peer agree-
such as validation criteria (Podsakoff et al., 2003; but cf. Paunonen & ment. Items with neutral SDSVs will elicit relatively high self-peer
LeBel, 2012). agreement.
Personality test items differ in their tendency to elicit SDR. One 2. Correlations between scores on a socially desirable responding (SDR)
method used to determine an item's level of social desirability is to com- measure and item responses on a personality measure will linearly pre-
pute its social desirability scale value (SDSV) by asking judges to read dict self-peer agreement. To the extent that there are strong positive
the item and rate how socially desirable or undesirable they consider (or negative) correlations between personality items and SDR mea-
it to be as applied to others (Edwards, 1969; Paunonen, 2015). Evidence sures, this will elicit low self-peer agreement, in contrast to weak cor-
has consistently demonstrated that level of item endorsement is a relations that will elicit high self-peer agreement.
strong linear function of item SDSV, such that items with high SDSVs 3. Item difculty will predict self-peer agreement in a curvilinear manner.
are endorsed more frequently than items with low SDSVs (Berg, 1967; Items with extreme high and low mean endorsement values will
Edwards, 1969; Edwards, 1970). elicit low self-peer agreement, and moderate mean endorsement
Another method used to determine the social desirability of an item values will elicit high self-peer agreement.
is to correlate respondent scores on the item with their scores on a social 4. Content saturation, measured using item-total correlations, will linearly
desirability scale (e.g., Kam, 2013). Social desirability scales assess predict self-peer agreement. Items exhibiting higher content satura-
whether or not an individual has a tendency to respond in an overly tion will result in higher self-peer agreement than will items
favourable manner (Paulhus, 2002). Such scales typically represent exhibiting lower content saturation.
items that are heterogeneous with respect to trait content, but homoge- 2. Method
neous in desirability. A strong correlation (positive or negative) be-
tween a sample's responses to a particular test item and the 2.1. Participants
respondents' social desirability scale scores indicates that the item has
a preferred response option by those who are motivated to engage in Archival data representing self- and peer personality ratings were
SDR in other words, it is susceptible to desirability bias (Paulhus, used in the current study. The studies were conducted in 1981 (20
2002). men, 70 women; see Paunonen, 1982), 1997 (46 men, 95 women; see
Items that are neutral in desirability are generally preferred in per- Paunonen & Ashton, 2001), and 2004 (42 men, 82 women). The data
sonality assessment. Neutral items are likely to elicit the most accurate represent various groups of same-sex undergraduate roommate dyads
representations of an individual because there is no mechanism for mis- living in a dormitory at a large Canadian university. Students received
representation by the target, even if so inclined (Paunonen & LeBel, cash compensation in return for their participation.
2012). But what about the use of such items if peer ratings on those The samples used for the current study comprised mostly under-
items form the basis of a validation criterion? From one point of view, graduate students in their rst year of university (see Table 1). Each of
desirable items are less likely to elicit desirability biases in the peer, as the studies was conducted in the second to last month of the academic
the peer might not be as motivated to inate target ratings as they are year to ensure that participants had substantial time to become
to inate self-ratings (Paunonen & O'Neill, 2010). Consistent with this acquainted with their roommates. Sample means of participants' re-
notion is research that has found a curvilinear relationship between ported duration of time acquainted with their respective roommates
item SDSVs and self-peer agreement on 76 unipolar personality trait ad- ranged from 14.43 months (SD = 21.59) to 28.59 months (SD = 71.08).
jectives (John & Robins, 1993). Traits that were rated by judges as being Self-peer convergence and predictor statistics were computed on
neutral in social desirability elicited more self-peer agreement than did different data sets so as to prevent variables from being inextricably
traits that were rated as being high or low in desirability (John & Robins, linked in analyses. For example, if convergent validity and reliability
1993). The researchers found a signicant correlation of r = 0.53 be- were computed on the same data sets, this could spuriously inate the
tween absolute evaluativeness and self-peer agreement (John & Robins, relationship between the variables, as reliability is a precursor to valid-
1993). Thus, peers making personality judgments are less likely to agree ity. Consequently, the Supernumerary Personality Inventory (SPI) data,
with a target when the traits are highly evaluative (positive or nega- collected in 2004, were used to compute self-peer correlations and item
tive), and more likely to agree with a target when the traits are neutral, correlations with the PRF Desirability scale. The remaining SPI computa-
possibly because the target's self-ratings are wrong with the former tions, including mean item responses and content saturation (i.e., item-
ratings. total correlations), came from SPI normative data (n = 537; Paunonen,
2002). Responses to items on the NEO-Personality Inventory Revised
1.3. Objective were drawn from studies conducted in 1997 and 2004. The NEO-PI-R
data collected in 1997 were used to compute item-total correlations
The purpose of the current study was to extend ndings from previ- and item correlations with the PRF Desirability scale. The data collected
ous literature by evaluating which properties of personality test items in 2004 were used to compute self-peer correlations and mean item re-
make the most important contributions to overall convergent validity sponses. Responses to items on the Personality Research Form (PRF)
of a number of personality questionnaires. The convergent validity of were drawn from studies conducted in 1981 and 1997. An earlier ver-
these personality scales was estimated by correlating self-ratings of per- sion of the PRF was used for the data collected in 1981 (see Jackson,
sonality with peer ratings of personality, with the inference being that 1974). These PRF data were used to compute self-peer correlations
the personality questionnaires higher in convergent validity are those and mean item responses on a 9-point scale ranging from 1 = extremely
in which self- and peer ratings of personality are highly correlated uncharacteristic to 9 = extremely characteristic (Paunonen, 1982). The
(e.g., Holden & Troister, 2009; Paunonen, 1984; Paunonen & O'Neill, data collected in 1997 were used to compute item-total correlations
R.A. Plouffe et al. / Personality and Individual Differences 111 (2017) 96105 99

Table 1 Neuroticism. Items responses were measured on a 5-point Likert scale,


Demographic variables by sample. where 1 = strongly disagree and 5 = strongly agree. The NEO-PI-R's psy-
Variable 1981 1997 2004 chometric data are sound, with coefcient alpha values ranging from
(N = 90) (N = 141) (N = 124) 0.86 (Agreeableness) to 0.92 (Neuroticism) in an adult normative sam-
Age (years) ple (mean coefcient alpha = 0.89; Costa & McCrae, 1992).
Mean 19.18 19.20 18.79
SD .84 .66 .69
2.3.3. Personality Research Form (PRF; Jackson, 1974, 1984)
Minimum 18 17 17
Maximum 22 22 20 The 352-item PRF is available in ve forms, but the most commonly
used is Form-E. The 20 true/false trait scales are: Abasement, Achieve-
Gender (N) ment, Afliation, Aggression, Autonomy, Change, Cognitive Structure,
Male 20 (22%) 46 (33%) 42 (34%) Defendance, Dominance, Endurance, Exhibition, Harmavoidance, Im-
Female 70 (78%) 95 (67%) 82 (66%)
pulsivity, Nurturance, Order, Play, Sentience, Social Recognition,
Number of months acquainted with roommate Succorance, and Understanding. Two additional scales measure infre-
Mean 14.43 28.59 18.72 quency and desirability. Jackson (1984) has reported strong internal
SD 21.59 71.08 26.59
consistency for the PRF, with Kuder-Richardson 20 reliability values
Minimum 1 3 4
Maximum 96 720 120 ranging from 0.78 (Defendance) to 0.94 (Order), and a median KR-20
value of 0.93 (Jackson, 1970).
Roommate acquaintanceship ratingsa
Mean 5.82 7.12 6.94
SD 1.00 1.39 1.12 2.4. Desirability measures
Minimum 3 3 3
Maximum 7 9 9
The socially desirable response tendencies of the roommate raters
Year of study were assessed with one questionnaire-based measure. The desirability
1 NA 126 (89%) 122 (98%) levels of the individual personality inventory items were assessed
2 NA 13 (9%) 1 (1%) using standard rating procedures.
3 NA 0 1 (1%)
4 NA 1 (1%) 0
2.4.1. Personality Research Form Desirability Scale (PRF desirability;
Note. Ns refer to total number of subjects in each sample (i.e., N = 90 indicates 45 pairs and
90 total participants). Jackson, 1984)
NA indicates that no data were available. The PRF Desirability scale comprises 16 items of the larger 352-item
One missing case for 1997 year of study. questionnaire. This scale was designed as a content-heterogeneous and
a
Roommate acquaintanceship ratings in 1981 measured 7-point scale. Remaining
internally consistent measure of one's tendency to respond desirably.
roommate acquaintanceship ratings on 9-point scale.
This scale demonstrates adequate internal consistency (coefcient
alpha = 0.70; Jackson, 1984). In this study, participants' scores on the
PRF Desirability scale were correlated with their item responses to the
and item correlations with the PRF Desirability scale using the standard
personality questionnaires to evaluate item desirability. Responses to
true/false scale.
items on the PRF Desirability scale were drawn from studies conducted
in 1997 and 2004.
2.2. Procedure

Participants were recruited using posters displayed in a university 2.4.2. Social desirability scale values (SDSVs; Edwards, 1970)
dormitory. Those interested would sign up to participate, and were sub- Item SDSVs were evaluated in three different samples. Specically, a
sequently contacted regarding the nature of the study. group of 149 undergraduate students (67 males, 82 females) were pre-
The roommate pairs in each of the three samples were assessed at sented with items of the SPI, and a second group of 27 undergraduate
two time points separated by one week. In the rst testing session, students (8 males, 19 females) were presented with items of the NEO-
both roommates arrived together at a classroom on campus, where PI-R. Social desirability scale values for the PRF were drawn from
they completed a series of paper-and-pencil self-report personality Helmes, Reed, & Jackson, 1997 (N = 98). Participants in all samples
questionnaires. In the second testing session, participants lled out were asked to rate item SDSVs by considering each statement and esti-
the same measures, but instead they completed peer ratings of their mating how socially desirable or undesirable the behaviour or belief
roommates' personality characteristics. would be if characterizing other people in general (Edwards, 1970, p.
89). An item's SDSV was equal to the mean of all participants' SDSV rat-
2.3. Personality measures ings for that item. Item SDSVs were assessed using a 9-point scale,
where 1 = extremely undesirable and 9 = extremely desirable.
2.3.1. Supernumerary Personality Inventory (SPI; Paunonen, 2002)
This 150-item inventory contains 10 personality trait scales: Con- 3. Results
ventionality, Seductiveness, Manipulativeness, Thriftiness, Humorous-
ness, Integrity, Femininity, Religiosity, Risk-Taking, and Egotism. 3.1. Preliminary analyses
Participants responded to items on a 5-point Likert scale based on
how much they agree or disagree with the statement (1 = strongly Prior to carrying out the main analyses, background analyses were
disagree, 5 = strongly agree). The SPI has adequate internal consistency conducted. These analyses included item property descriptive statistics
reliability, with coefcient alpha values ranging from 0.66 (Convention- and bivariate correlations.
ality) to 0.95 (Religiosity), and a mean of 0.80 in a normative university
sample (Paunonen, 2002). 3.1.1. Item property descriptive statistics
Descriptive statistics for all item properties were evaluated, includ-
2.3.2. NEO-Personality Inventory-Revised (NEO-PI-R; Costa & McCrae, ing self-peer convergence (i.e., self-peer correlations on individual
1992) items), item-total correlations, item-desirability scale correlations,
This questionnaire comprises 240 items assessing Openness to item SDSVs, and mean item responses for all personality measures
Experience, Conscientiousness, Extraversion, Agreeableness, and (see Table 2). For all scales, mean convergent validity was moderate,
100 R.A. Plouffe et al. / Personality and Individual Differences 111 (2017) 96105

Table 2 Table 4
Descriptive statistics of item properties by personality questionnaire. NEO Personality Inventory item property bivariate correlations (N = 240).

Item property by scale N Minimum Maximum M SD Item property 1 2 3 4 5

SPI 1. Convergent validity


Self-peer convergence 150 .09 .64 0.27 .14 2. Item-total correlation .29
Item-total correlation 150 .28 .87 0.53 .12 3. SDSV .03 .07
SDSV 150 2.68 7.33 5.29 .97 4. Desirability correlation .02 .02 .83
SDSV2 150 7.17 53.71 28.86 10.22 5. Mean .01 .14 .70 .50
Desirability correlation 150 .42 .38 .03 .15
Note. N refers to total number of items.
Desirability correlation2 150 .00 .18 .02 .03
Desirability correlation refers to personality questionnaire item correlation with PRF
Mean 150 1.91 4.12 3.07 .52
Desirability scale.
Mean2 150 3.66 17.00 9.71 3.20
p b 0.01.
NEO-PI-R p b 0.001.
Self-peer convergence 240 .09 .59 .20 .12
Item-total correlation 240 .26 .83 .57 .11
SDSV 240 2.07 8.61 5.47 1.69 signicant negative correlations emerged between item social desirabil-
SDSV2 240 4.29 74.11 32.78 18.47
ity and convergent validity in the SPI, and signicant positive correla-
Desirability correlation 240 .47 .41 .00 .19
Desirability correlation2 240 .00 .22 .04 .04 tions emerged between item social desirability and convergent
Mean 240 1.76 4.36 3.20 .58 validity in the PRF. Strong positive correlations emerged between item
Mean2 240 3.09 18.97 10.56 3.70 social desirability and mean item responses in all questionnaires, indi-
PRF cating that participants tended to positively endorse items high in desir-
Self-peer convergence 352 .15 .64 .25 .15 ability. Mean item responses were uncorrelated with convergent
Item-total correlation 352 .06 .75 .45 .13 validity.
SDSV 352 1.39 7.76 5.20 1.25
SDSV2 352 1.93 60.22 28.60 12.87
Desirability correlation 352 .52 .41 .02 .15 3.2. Main analyses
Desirability correlation2 352 .00 .27 .02 .04
Mean 352 1.11 8.91 5.19 1.24 A total of three hierarchical multiple regression analyses were car-
Mean2 352 1.23 79.41 28.49 12.73 ried out to test the four main hypotheses.
Note. Squared variables were used to test curvilinear associations.
Ns refer to total number of items. 3.2.1. Supernumerary Personality Inventory
Desirability correlation refers to personality questionnaire item correlation with PRF
For the rst step of the hierarchical regression, SPI convergent valid-
Desirability scale.
ity, measured using self-peer correlations, was regressed onto item
SDSVs, item correlations with the PRF Desirability scale, item-total cor-
with mean self-peer response correlations ranging from 0.20 (NEO) to relations, and mean item responses to test our four hypotheses (see
0.27 (SPI). Table 6). In Step 2, to assess curvilinear components of the relationships
Mean item-total correlations were adequate, ranging from 0.45 between convergent validity and SPI item properties, convergent valid-
(PRF) to 0.57 (NEO), and the use of infrequency subscales likely attenu- ity was regressed onto the squared values of item social desirability and
ated these values, specically in the PRF. mean item responses. In the nal model, item properties accounted for a
The questionnaires had undergone rigorous statistical testing in signicant amount of the variance in convergent validity, R2 = 0.25, R2
order to eliminate items that were heavily desirable in content (Costa change = 0.01, F(7, 142) = 6.89, p b 0.001. A signicant beta weight for
& McCrae, 1992; Jackson, 1974, 1984; Paunonen, 2002). As such, mean the nal model was associated with item-total correlations, = 0.43,
item correlations with the PRF Desirability scale were low for all ques- t(142) = 5.83, p b 0.001 (see Fig. 1).
tionnaires, ranging from 0.03 (SPI) to 0.00 (NEO). Similarly, mean
item SDSVs were moderate-to-low for each scale, with values ranging 3.2.2. NEO Personality Inventory
from 5.20 (PRF) to 5.47 (NEO), both measured on a 9-point scale. Addi- For the rst step of the hierarchical regression, NEO convergent va-
tionally, mean item responses were acceptable, with values ranging lidity was regressed onto item SDSVs, item correlations with the PRF
from 3.07 on the 5-point SPI scale to 5.19 on the 9-point PRF scale. Desirability scale, item-total correlations, and mean item responses
(see Table 7). In Step 2, to assess curvilinear components of the relation-
3.1.2. Item property bivariate correlations ships between convergent validity and NEO item properties, convergent
Item property bivariate correlations for each questionnaire are pre- validity was regressed onto the squared values of item social desirability
sented in Tables 3 through 5. As anticipated, strong correlations and mean item responses. In the nal model, item properties accounted
emerged between convergent validity (i.e., item self-peer correlations) for a signicant amount of the variance in convergent validity, R2 =
and item-total correlations across all questionnaires. Interestingly, 0.11, R2 change = 0.02, F(7, 232) = 3.88, p b 0.001. A signicant beta

Table 3 Table 5
Supernumerary Personality Inventory item property bivariate correlations (N = 150). Personality Research Form item property bivariate correlations (N = 352).

Item property 1 2 3 4 5 Item property 1 2 3 4 5

1. Convergent validity 1. Convergent validity


2. Item-total correlation .47 2. Item-total correlation .35
3. SDSV .18 .13 3. SDSV .11 .03
4. Desirability correlation .20 .12 .59 4. Desirability correlation .11 .00 .66
5. Mean .15 .11 .83 .39 5. Mean .04 .02 .78 .49

Note. N refers to total number of items. Note. N refers to total number of items.
Desirability correlation refers to personality questionnaire item correlation with PRF Desirability correlation refers to personality questionnaire item correlation with PRF
Desirability scale. Desirability scale.
p b 0.01. p b 0.01.
p b 0.001. p b 0.001.
R.A. Plouffe et al. / Personality and Individual Differences 111 (2017) 96105 101

Table 6 Table 7
Final model standardized and unstandardized regression coefcients: prediction of SPI Final model standardized and unstandardized regression coefcients: prediction of NEO
convergent validity from SPI item properties. convergent validity from NEO item properties.

Item property b Standard error t p Item property b Standard error t p

Item-total correlation .48 .08 .43 5.83 .001 Item-total correlation .32 .07 .30 4.40 .001
SDSV .06 .15 .42 .40 .69 SDSV .06 .04 .84 1.55 .12
SDSV2 .01 .01 .38 .36 .72 SDSV2 .01 .003 .78 1.48 .14
Desirability correlation .14 .09 .15 1.53 .13 Desirability correlation .02 .07 .03 .26 .80
Desirability correlation2 .43 .38 .11 1.14 .26 Desirability correlation2 .13 .22 .05 .61 .54
Mean .23 .30 .87 .76 .45 Mean .09 .14 .43 .62 .54
Mean2 .04 .05 .89 .77 .44 Mean2 .02 .02 .54 .78 .44

Note. Squared variables were used to test curvilinear associations. Note. Squared variables were used to test curvilinear components.
Desirability correlation refers to personality questionnaire item correlation with PRF De- Desirability correlation refers to personality questionnaire item correlation with PRF
sirability scale. Desirability scale.

weight for the nal model was associated with item-total correlations, saturation. Convergent validity of the scales was estimated by correlat-
= 0.30, t(232) = 4.40, p b 0.001 (see Fig. 2). ing self-ratings on personality questionnaire items with peer ratings on
the same items, with the hypothesis that the most accurate measures
3.2.3. Personality Research Form are those in which self- and peer ratings of personality are highly corre-
For the rst step of the hierarchical regression, PRF convergent valid- lated (e.g., Holden & Troister, 2009; Paunonen, 1984; Paunonen &
ity was regressed onto item SDSVs, item correlations with the PRF Desir- O'Neill, 2010).
ability scale, item-total correlations, and mean item responses (see The current study used three personality questionnaires to provide
Table 8). In Step 2, to assess curvilinear components of the relationships robust tests of the hypotheses: the SPI, the NEO-PI-R, and the PRF. The
between convergent validity and PRF item properties, convergent valid- socially desirable response tendencies of the roommate raters were
ity was regressed onto the squared values of item social desirability and assessed using one self-report questionnaire-based measure: the PRF
mean item responses. In the nal model, item properties accounted for a Desirability scale.
signicant amount of the variance in convergent validity, R2 = 0.18, R2 Results revealed that item content saturation, measured using item-
change = 0.03, F(7, 344) = 10.40, p b 0.001. Signicant beta weights for total correlations, was the most salient linear predictor of item self-peer
the nal model were associated with item-total correlations, = 0.29, agreement across all three personality questionnaires. Results also
t(344) = 5.35, p b 0.001 (see Fig. 3), mean item responses, = 0.75, revealed that mean item responses curvilinearly predicted self-peer
t(344) = 2.45, p b 0.01, and squared values of the mean item responses, response convergence in the PRF. Neither item SDSVs nor personality
= 0.90, t(344) = 2.95, p b 0.003. The latter result suggests that questionnaire item correlations with an SDR measure predicted self-
there was a curvilinear relationship between mean item responses peer response convergence on any of the three personality
and self-peer correlations, such that moderate means elicited higher questionnaires.
self-peer correlations than did low or high mean item responses (see Consistent with prediction and with past literature (e.g., Ashton &
Fig. 4). However, although the R2 change was found to be statistically Goldberg, 1973; Hase & Goldberg, 1967; Jackson, 1975; Paunonen,
signicant, the additional explanatory accuracy afforded by such a 1984), participants responded to content saturated items infused with
small effect is negligible. trait-relevant content more accurately than items that were only tan-
gentially related to the trait in question. In other words, more content
4. Discussion saturated items on all three personality questionnaires elicited greater
convergence on self- and peer responses than less content saturated
The current study sought to evaluate several key properties of per- items. This nding is in accordance with past research that has empiri-
sonality test items that can affect overall convergent validity of a series cally compared construct-oriented scales, on which items are developed
of personality questionnaires, including (a) item social desirability, (b) to be representative and salient exemplars of the trait (Holden et al.,
item difculty (i.e., mean item responses), and (c) item content 1985; Paunonen, 1984; Paunonen & Jackson, 1985), to criterion-keyed

Fig. 1. Scatterplot of 2004 Supernumerary Personality Inventory (SPI) self-peer


convergent validity correlations (N = 124) on item-total correlations (normative Fig. 2. Scatterplot of 2004 NEO Personality Inventory - Revised (NEO-PI-R) self-peer
sample N = 537). convergent validity correlations (N = 124) on 1997 item-total correlations (N = 141).
102 R.A. Plouffe et al. / Personality and Individual Differences 111 (2017) 96105

Table 8
Final model standardized and unstandardized regression coefcients: prediction of PRF
convergent validity from PRF item properties.

Item property b Standard error t p

Item-total correlation .32 .06 .29 5.35 .001


SDSV .04 .05 .29 .68 .50
SDSV2 .01 .01 .48 1.09 .28
Desirability correlation .10 .07 .11 1.52 .13
Desirability correlation2 .39 .22 .10 1.73 .09
Mean .09 .04 .75 2.45 .02
Mean2 .01 .01 .90 2.95 .003

Note. Squared variables were used to test curvilinear components.


Desirability correlation refers to personality questionnaire item correlation with PRF
Desirability scale.

scales, on which items are selected on the basis of their ability to predict
a particular criterion (John & Benet-Martinez, 2000). For example,
Paunonen (1984) found that ad hoc PRF scales constructed to reect
maximum content saturation elicited higher correlations with predic-
tive criteria (i.e., peer ratings, adjective trait ratings, and nonverbal stim- Fig. 4. Scatterplot of 1981 Personality Research Form (PRF) self-peer convergent validity
uli) than did criterion-oriented scales that were constructed to optimize correlations (N = 90) on 1981 9-point scale mean item responses (N = 90).
the prediction of mean roommate ratings for a PRF trait measure.
Ashton and Goldberg (1973) found that the PRF trait scales devel- extreme low or high mean endorsement levels elicited less accurate
oped based on psychological theory (i.e., more content saturated) responding than did items with moderate mean endorsement levels.
outperformed the empirically-derived California Psychological Invento- However, the magnitude of the effect (R2 change = 0.03) was so
ry (CPI; Gough, 1957) on convergent validity measured by correlating small as to be theoretically and practically valueless in terms of incre-
self- and peer item responses. Similarly, the construct-oriented trait mental predictive accuracy. This prediction was derived on the basis
scales of the Jackson Personality Inventory were more highly correlated of restriction of range effects. Based on past literature, it was predicted
with both self-ratings and peer ratings on adjective scales representing that extreme mean item responses would result in a restricted variance
the traits in question than were the criterion-oriented scales of the CPI of item responses that would attenuate the relationship between two
(Jackson, 1975). Researchers have contended that construct-based variables, or criterion validity (e.g., Nunnally & Bernstein, 1994;
items, developed to be more internally consistent, prototypical, and Sackett, Lievens, Berry, & Landers, 2007). To elaborate, if everybody en-
content saturated, contribute more to personality questionnaire validity dorsed the same response to an item, the item would be ineffective in
than criterion-based items because they are highly salient and repre- assessing individual differences. Additionally, this restricted variance
sentative of the trait. To the contrary, criterion-oriented scales devel- would place upper limits on the strength of the correlations that could
oped without consideration of psychological theory appear to allow exist between the item and criterion variables, such as self-peer re-
for greater ambiguity in responses, guessing in interpretations, and a sponse correlations (Epstein, 1983). Items with moderate mean re-
varying focus on differential aspects of item content (Paunonen, sponses, on the other hand, maximize observed score variance,
1984). Thus, a theory-based method of test construction, in which allowing for maximal correlations between item responses and a crite-
more salient and content saturated items are selected, is generally pre- rion (e.g., Feldt, 1993).
ferred over the criterion-based method in order to elicit accurate On the SPI and NEO-PI-R, mean item endorsement levels had no ef-
responding. It should be noted, however, that items should not be de- fect on the correlations between self- and peer ratings of personality,
veloped to be too internally consistent so as to narrow or constrict the and a relatively small effect on correlations between self- and peer rat-
meaning of the construct being assessed by the items. ings on the PRF. One explanation is that items with extreme endorse-
Consistent with prediction, mean item endorsement levels predicted ment levels had been eliminated from the questionnaires following
self-peer item response convergence in the PRF. Specically, items with the rigorous validation procedures, and the relationship between
mean item responses and convergent validity was subsequently attenu-
ated. In line with this explanation, it was observed that the majority of
the mean item endorsement levels were moderate for each of the ques-
tionnaires. Jackson (1970) eliminated items from the PRF that only a
small percentage, or almost all of the individuals endorsed, thereby clas-
sifying such items as unreliable. Likewise, throughout the test con-
struction process and preliminary item analysis procedures for the SPI,
Paunonen (2002) discarded items representing extremely rare and ex-
tremely popular behaviours, and items for which variance values were
not acceptable. Therefore, the lack of items reecting extreme levels of
responding may have minimized potential larger correlations between
mean item responses and convergent validity.
Contrary to prediction, neither item SDSVs, nor personality ques-
tionnaire item correlations with SDR scale scores predicted self-peer
item response convergence across any of the three personality ques-
tionnaires while controlling for item-total correlations and mean item
responses. Additionally, when analyses were conducted with only
item social desirability as predictor variables, item SDR still did not sig-
nicantly predict self-peer item response convergence. This basis of the
Fig. 3. Scatterplot of 1981 Personality Research Form (PRF) self-peer convergent validity original prediction was that items infused with desirability bias have the
correlations (N = 90) on 1997 item-total correlations (N = 141). potential to elicit misrepresentation from the respondent on his or her
R.A. Plouffe et al. / Personality and Individual Differences 111 (2017) 96105 103

true level of trait, which, in turn, may compromise the measure's con- undesirable items have an effect on the construct validity of their scales,
struct validity. Specically, the respondent may endorse response op- and it is most likely prudent to avoid items that are extremely evalua-
tions to items on personality measures in order to present a more tive in nature.
favourable image of the self than is warranted (Paulhus, 2002; Finally, another possible explanation for the ndings of this study
Paunonen & LeBel, 2012; Podsakoff et al., 2003). Not only might the con- concerns the nature of the items' levels of social desirability. It could
struct validity of these scales be compromised, but items eliciting SDR be that some items were not infused with enough desirable content to
can also affect mean levels of responding (Ganster, Hennessey, & have had a substantial effect on the convergent validity of the measures.
Luthans, 1983; Podsakoff et al., 2003), which may alter relationships be- As stated previously, the questionnaires had already undergone exten-
tween the test and variables such as validation criteria. However, items sive validation procedures that eliminated extreme desirable or unde-
infused with desirable content are less likely to elicit the same desirabil- sirable items from their respective item pools. During initial item
ity biases in the peer, as the peer might not be as motivated to inate analysis phases of test construction for the PRF, Jackson (1976) used
target ratings as they are to inate self-ratings (John & Robins, 1993; the Differential Reliability Index (DRI), a statistical indicator of content
Paunonen & O'Neill, 2010). We hypothesized that this discrepancy be- saturation, to ensure that items selected for the scales did not reect
tween self- and peer responses to extremely (un)desirable items high desirability components. The other questionnaires underwent sim-
would subsequently attenuate validity coefcients. ilar validation procedures. For example, original SPI items that correlat-
Our ndings are in contrast with past research that has identied a ed more highly with a measure of desirability than with the other items
curvilinear relationship between item social desirability levels and in the scale were discarded during scale construction (Paunonen, 2002).
self-peer agreement (e.g., John & Robins, 1993). Specically, traits that Additionally, the authors of the NEO-PI-R have argued that socially de-
were rated by college students as being neutral in social desirability elic- sirable responding does not pose a threat to the validity of their scale
ited more self-peer agreement than did traits that were rated as being (Costa & McCrae, 1988; Costa & McCrae, 1992) based on scale correla-
high or low in desirability (John & Robins, 1993). However, these nd- tions with social desirability. Our ndings corroborated these assertions.
ings have been inconsistent across studies. Paunonen and Kam (2014) Social desirability scale values (SDSVs) in our samples ranged from 1.39
did not nd signicant relationships between item SDSVs and self- (PRF) to 8.61 (NEO-PI-R) on a 9-point scale, with means all approximat-
peer convergence in neither the SPI items (r = 0.04, p N 0.05), nor the ing 5.00. Similarly, our personality questionnaire item-PRF Desirability
NEO-PI-R items (r = 0.01, p N 0.05). Similarly, Holden et al. (1985) scale correlations ranged from 0.52 (PRF) to 0.41 (PRF), with all
found no relationship between absolute item desirability and item crite- means approximating zero. Therefore, it is evident that the personality
rion validity (r = 0.11, p N 0.05) in the PRF items. Interestingly, in a questionnaire items under investigation were largely free of content ex-
study using personality adjectives, Funder (1980) reported that more treme in desirability, which may have minimized the effects of item de-
socially desirable items elicited greater self-peer agreement than did so- sirability levels on convergent validity.
cially undesirable or neutral items, with a signicant correlation of 0.30
between item SDSVs and self-peer agreement.
One possible explanation for these inconsistent ndings is that some 4.1. Limitations and future directions
personality questionnaire items may possess both high levels of social
desirability and high levels of content saturation. In fact, it is near im- Some limitations of the current study should be noted. First, the
possible to solely select items high in content saturation that are also three samples used in the present research were convenience samples
entirely neutral in desirability (Jackson, 1970). If items reecting any comprising undergraduate students. Also, the groups were predomi-
(un)desirable content were removed, this would also result in removing nantly rst-year students with a mean age of 19 years, and a large num-
valid content variance from the scale, which would compromise its con- ber of the recruits were female. Thus, it is unclear whether the current
struct and content validity. Thus, items that have some social desirabil- results would generalize to a wider demographic. However, three differ-
ity in their content may elicit accurate responding, so long as the levels ent undergraduate samples collected over a time span of 23 years were
of item content saturation remain higher than levels of item desirability. used for the current research, so it is probable that the results likely
An additional explanation for the current results involves the link would generalize, at the very least, to undergraduate samples across
between item desirability levels and construct validity (cf. criterion va- time.
lidity). Specically, Paunonen and LeBel (2012) investigated the effect Participants varied widely in the duration of time that they were
of social desirability bias on the criterion validity of simulated responses acquainted with their roommates. Sample means of participants' re-
to bipolar personality trait adjectives using Monte Carlo procedures. ported duration of time acquainted with their respective roommates
These bipolar trait adjectives represented relatively desirable and unde- ranged from 14.43 months (SD = 21.59) to 28.59 months (SD =
sirable traits (e.g., honesty-dishonesty poles; Paunonen & LeBel, 2012). 71.08). Additionally, although each of the three studies were carried
The results revealed that although adding large components of social out during the second to last month of the academic year to ensure
desirability to test scores altered observed trait scores drastically from that participants had substantial time to become familiar with their
their true level of the trait, criterion validity remained relatively unaf- roommates, of the sample collected in 1981, 6.7% of participants report-
fected by the intrusion of desirability variance. The authors suggested ed having known their roommate for fewer than three months. It is
that the reason why SDR did not act as a moderator of criterion validity well-established in the person perception literature that self-peer re-
was because their linear data transformations did not systematically sponse convergence increases as a function of acquaintanceship (e.g.,
change the rankings of respondents' simulated personality trait scores Funder & Colvin, 1988; Norman & Goldberg, 1966; Paunonen, 1989).
depending on their levels of SDR. However, even if criterion validity Therefore, the length of time of roommate-peer acquaintanceship may
had not been compromised, the intrusion of extremely desirable con- have inuenced our results, such that relevant cues regarding criterion
tent into test items is still incredibly problematic for the construct valid- behaviours available to the peer varied as a function of the number of in-
ity of the measures because observed scores were drastically altered terpersonal encounters (Paunonen, 1989). As such, the responses to
from true scores on the traits (Paunonen & LeBel, 2012). Perhaps in personality questionnaire items may have correlated to a greater degree
the current study, although extreme desirable or undesirable items with their roommates' when they were better acquainted. However, the
did not alter self-peer personality test item response correlations, number of participants with less than three months of acquaintanceship
these items still may have elicited SDR tendencies from respondents, is small, with the majority (N 93%) reporting high degrees of familiarity
thus causing discrepancies between individuals' obtained and true with their roommates. Future research should ensure that participants
scores on the traits in question. Thus, researchers constructing personal- have been sufciently acquainted with their roommates for a minimum
ity tests should take into consideration that highly desirable and duration of time so as to avoid subsequent biases in the results.
104 R.A. Plouffe et al. / Personality and Individual Differences 111 (2017) 96105

This study only used scales that have undergone extensive and rigor- References
ous validation procedures. Each of the scales used for the current analy-
Ashton, S. G., & Goldberg, L. R. (1973). In response to Jackson's challenge: The compara-
ses are widely-used measures of general personality traits, and have tive validity of personality scales constructed by the external (empirical) strategy
been validated across many cultures and communities. Thus, items and scales developed intuitively by experts, novices, and laymen. Journal of
that are extremely low in content saturation, high in desirability, and Research in Personality, 7, 120.
Berg, I. A. (1967). Response set in personality assessment. Chicago: Aldine.
extreme in mean responses were most likely eliminated from their Borsboom, D., Cramer, A. O. J., Kievit, R. A., Scholten, A. Z., & Franic, S. (2009). The end of
item pools in the early scale construction process. It is possible, for in- construct validity. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new direc-
stance, that the relationship between item content saturation and self- tions, and applications (pp. 135170) (Chapter 7).
Borsboom, D., Mellenbergh, G. J., & Van Heerden, J. (2004). The concept of validity.
peer convergence may be attenuated due to restriction of range effects Psychological Review, 111(4), 10611071.
if only items high in content saturation are selected. It may be benecial Briggs, S. R., & Cheek, J. M. (1986). The role of factor analysis in the development and eval-
in future research to investigate the effects of these item properties on uation of personality scales. Journal of Personality, 54, 106148.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the
convergent validity using self-report personality questionnaires that
multitrait-multimethod matrix. Psychological Bulletin, 56, 81105.
have not undergone the same exhaustive validation procedures (e.g., a Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale de-
newly developed measure). This type of study could also serve as a val- velopment. Psychological Assessment, 7, 309319.
idation tool for new self-report measures of personality. Costa, P. T., Jr., & McCrae, R. R. (1988). From catalog to classication: Murray's needs and
the ve-factor model. Journal of Personality and Social Psychology, 55, 258265.
Finally, it should be noted that more recent validity literature pur- Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO personality inventory (NEO-PI-R) and
ports that a questionnaire is only valid if the trait exists and if trait var- NEO ve-factor inventory (NEO-FFI) professional manual. Odessa, FL: Psychological As-
iation causally produces variation in test scores (e.g., Borsboom, Cramer, sessment Resources.
Edwards, A. L. (1969). Trait and evaluative consistency in self-description. Educational and
Kievit, Scholten, & Franic, 2009; Borsboom et al., 2004). Thus, instead of Psychological Measurement, 29, 737752.
assessing validity by correlating two measures designed to assess the Edwards, A. L. (1970). The measurement of personality traits by scales and inventories. New
same construct (e.g., self-peer response correlations), future research York: Holt, Rinehart, & Winston.
Epstein, S. (1983). Aggregation and beyond: Some basic issues on the prediction of behav-
could assess the validity of these personality questionnaires using ior. Journal of Personality, 51, 360392.
such procedures as item response theory to delineate the mechanisms Epstein, S., & O'Brien, E. J. (1985). The person-situation debate in historical and current
underlying item response selection. perspective. Psychological Bulletin, 98, 513537.
Feldt, L. S. (1993). The relationship between the distribution of item difculties and test
reliability. Applied Measurement in Education, 6, 3748.
4.2. Concluding remarks Fiske, D. W. (1973). Can a personality construct be validated empirically? Psychological
Bulletin, 80, 8992.
Foster, S. L., & Cone, J. D. (1995). Validity issues in clinical assessment. Psychological
The primary purpose of the current study was to evaluate which
Assessment, 7, 248260.
item properties contribute to the convergent validity of a measure. Funder, D. C. (1980). On seeing ourselves as others see us: Self-other agreement and dis-
This study is the rst, to the authors' knowledge, that has investigated crepancy in personality ratings. Journal of Personality, 48, 473493.
the effects of the specic item properties (cf. overall scale properties) Funder, D. C. (1987). Errors and mistakes: Evaluating the accuracy of social judgment.
Psychological Bulletin, 101, 7590.
on the convergent validity of three widely-used self-report personality Funder, D. C., & Colvin, C. R. (1988). Friends and strangers: Acquaintanceship, agreement,
questionnaires. The samples constituted a diverse group of individuals and the accuracy of personality judgment. Journal of Personality and Social Psychology,
tested over a 23-year time span, which provides strong support for 55, 149158.
Ganster, D. C., Hennessey, H., & Luthans, F. (1983). Social desirability response effects:
the generalizability of the reported results. Furthermore, diverse Three alternative models. Academy of Management Journal, 26, 321331.
means of measuring the item properties investigated were employed Gough, H. G. (1957). Manual for the California psychological inventory. Palo Alto, CA: Con-
in the current study. For example, item social desirability was evaluated sulting Psychologist Press.
Hase, H. D., & Goldberg, L. R. (1967). Comparative validity of different strategies of con-
in two different ways (i.e., item SDSVs and item-PRF Desirability scale structing personality inventory scales. Psychological Bulletin, 67, 231248.
correlations), and item-total correlations were used to measure content Helmes, E., Reed, P. L., & Jackson, D. N. (1997). Desirability and frequency scale values and
saturation as an alternative to strategies previously employed, such as endorsement proportions for items of Personality Research Form-E. Psychological
Reports, 41, 435444.
factor analytic procedures (e.g., Paunonen, 1984). Holden, R. R., & Troister, T. (2009). Developments in the self-report assessment of person-
This study has enhanced our current understanding of some the ob- ality and psychopathology in adults. Canadian Psychology, 50, 120130.
served inconsistencies in personality testing over the years. The eld of Holden, R. R., Fekken, G. C., & Jackson, D. N. (1985). Structured personality test item char-
acteristics and validity. Journal of Research in Personality, 19, 386394.
personality psychology faced a great deal of scrutiny in the 1960s when
Jackson, D. N. (1970). A sequential system for personality scale development. In C. D.
critics of self-report personality testing, such as Fiske (1973) and Spielberger (Ed.), Current topics in clinical and community psychology (pp. 6196).
Mischel (1968), argued that personality questionnaire score correla- New York: Academic Press.
tions with objective measures of behaviour did not exceed a ceiling of Jackson, D. N. (1974). Personality research form manual. Port Huron, MI: Research Psychol-
ogy Press.
0.30 (Epstein & O'Brien, 1985). This led researchers to conclude that Jackson, D. N. (1975). The relative validity of scales prepared by naive item writers and
scores on self-report personality questionnaires were not attributable those based on empirical methods of personality scale construction. Educational
to the individual's stable, enduring traits, but instead, were attributable and Psychological Measurement, 35, 361370.
Jackson, D. N. (1976). Jackson personality inventory manual. Port Huron, MI: Research Psy-
only to the particular situation at hand (Mischel, 1968; Mischel & Peake, chologists Press.
1982). Jackson, D. N. (1984). Personality research form manual. Port Huron, MI: Research Psychol-
This debate spurned a wealth of research conducted by personality ogy Press.
Jackson, D. N., & Paunonen, S. V. (1980). Personality structure and assessment. Annual
theorists concerning the improvement of traditional methods of per- Review of Psychology, 31, 503551.
sonality test construction and assessment (Epstein & O'Brien, 1985; Jackson, D. N., & Paunonen, S. V. (1985). Construct validity and the predictability of behav-
Jackson & Paunonen, 1985; Paunonen, 1984; Paunonen & Jackson, ior. Journal of Personality and Social Psychology, 49, 554570.
John, O. P., & Benet-Martinez, V. (2000). Measurement: Reliability, construct validation,
1985). At present, it is generally agreed upon that many of the apparent and scale construction. In H. T. Reis, & C. M. Judd (Eds.), Handbook of research methods
inconsistencies in personality across time and situations reported in in social and personality psychology (pp. 339369). New York, NY: Cambridge Univer-
these studies can be explained by the use of scales lacking in such sity Press.
John, O. P., & Robins, R. W. (1993). Determinants of interjudge agreement on personality
basic, yet fundamental psychometric principles as reliability and validi-
traits: The big ve domains, observability, evaluativeness, and the unique perspective
ty (Epstein & O'Brien, 1985; Jackson & Paunonen, 1985). Based on the of the self. Journal of Personality, 61, 521551.
results of the current study, it is evident that in order to maximize the Kam, C. (2013). Probing item social desirability by correlating personality items with bal-
convergent validity of a self-report personality measure, it is necessary anced inventory of desirable responding (BIDR): A validity examination. Personality
and Individual Differences, 54, 513518.
to write items high in content saturation, and to write items that elicit Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological
moderate mean responses in order to achieve item-criterion correla- Reports, 3, 635694.
tions in excess of 0.30. Mischel, W. (1968). Personality and assessment. New York: Wiley.
R.A. Plouffe et al. / Personality and Individual Differences 111 (2017) 96105 105

Mischel, W., & Peake, P. K. (1982). Beyond dj vu in the search for cross-situational Paunonen, S. V., & Ashton, M. C. (2001). Big ve factors and facets and the prediction of
consistency. Psychological Review, 89, 730755. behavior. Journal of Personality and Social Psychology, 81, 524539.
Norman, W. T., & Goldberg, L. R. (1966). Raters, ratees, and randomness in personality Paunonen, S. V., & Hong, R. Y. (2015). On the properties of personality traits. In M.
structure. Journal of Personality and Social Psychology, 4, 681691. Mikulincer, & P. R. Shaver (Eds.), APA handbook of personality and social psychology.
Nunnally, J. C. (1978). Psychometric theory. New York: McGraw-Hill. Personality processes and individual differences, Vol. 4. (pp. 233259). Washington,
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: DC: American Psychological Association.
McGraw-Hill. Paunonen, S. V., & Jackson, D. N. (1985). The validity of formal and informal personality
Paulhus, D. L. (2002). Socially desirable responding: The evolution of a construct. In H. assessments. Journal of Research in Personality, 19, 331342.
Braun, D. N. Jackson, & D. E. Wiley (Eds.), The role of constructs in psychological and Paunonen, S. V., & Kam, C. (2014). The accuracy of roommate ratings of behaviors versus
educational measurement (pp. 6788). Hillsdale, NJ: Erlbaum. beliefs. Journal of Research in Personality, 52, 5567.
Paunonen, S. V. (1982). Behavioral consistency and individual differences in predictive Paunonen, S. V., & LeBel, E. P. (2012). Socially desirable responding and its elusive effects
structure. ((Doctoral dissertation). Retrieved from ProQuest Dissertations Publishing. on the validity of personality assessments. Journal of Personality and Social Psychology,
(Accession No. NK54116)). 103, 158175.
Paunonen, S. V. (1984). Optimizing the validity of personality assessments: The impor- Paunonen, S. V., & O'Neill, T. A. (2010). Self-reports, peer ratings and construct validity.
tance of aggregation and item content. Journal of Research in Personality, 18, 411431. European Journal of Personality, 24, 189206.
Paunonen, S. V. (1987). Test construction and targeted factor solutions derived by multi- Podsakoff, P., MacKenzie, S., Lee, J., & Podsakoff, N. (2003). Common method biases in be-
ple group and procrustes methods. Multivariate Behavioral Research, 22, 437455. havioral research: A critical review of the literature and recommended remedies.
Paunonen, S. V. (1989). Consensus in personality judgments: Moderating effects of Journal of Applied Psychology, 88, 879903.
target-rater acquaintanceship and behavior observability. Journal of Personality and Sackett, P. R., Lievens, F., Berry, C. M., & Landers, R. N. (2007). A cautionary note on the ef-
Social Psychology, 56, 823833. fects of range restriction on predictor intercorrelations. Journal of Applied Psychology,
Paunonen, S. V. (2002). Design and construction of the supernumerary personality inventory 92, 538544.
(research bulletin 763). London, Ontario: University of Western Ontario. Shrauger, J. S., & Schoeneman, T. J. (1979). Symbolic interactionist view of self-concept:
Paunonen, S. V. (2015). Sex differences in judgments of social desirability. Journal of Through the looking glass darkly. Psychological Bulletin, 86, 549573.
Personality, 84, 423432.

View publication stats

You might also like