You are on page 1of 20

Advances in Developing Human Resources

http://adh.sagepub.com/ Assessment Center: A Critical Mechanism for Assessing HRD Effectiveness and Accountability
Hsin-Chih Chen Advances in Developing Human Resources 2006 8: 247 DOI: 10.1177/1523422305286155 The online version of this article can be found at: http://adh.sagepub.com/content/8/2/247

Published by:
http://www.sagepublications.com

On behalf of:

Academy of Human Resource Development

Additional services and information for Advances in Developing Human Resources can be found at: Email Alerts: http://adh.sagepub.com/cgi/alerts Subscriptions: http://adh.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations: http://adh.sagepub.com/content/8/2/247.refs.html
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

>> Version of Record - Apr 17, 2006 What is This?

Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

Assessment Center: A Critical Mechanism for Assessing HRD Effectiveness and Accountability
Hsin-Chih Chen
The problem and the solution. Assessment center (AC), driven by job analysis or competency development, has long proven to have strong content-related and criterion-related validities and some construct-related validity to assess behavior, which is one important dimension for demonstrating training effectiveness and human resource development (HRD) accountability. Yet, AC has not been effectively utilized in HRD research and practice for such purposes.This article reviews research and practice of AC, discusses validity issues of ACs, identifies conceptual and evidence-based factors that affect AC validity, and discusses strengths, weaknesses, threats, and opportunities if AC is applied in HRD.The author suggests more adoption of AC to HRD practice. However, to make AC most useful for HRD and be able to integrate with other HR functions, HRD researchers should focus on how to improve the construct-related validity of ACs, particularly through design and development aspects of ACs and through the view of unified concept of construct validity. Keywords: assessment center; training effectiveness; competency-based training; validity; HRD effectiveness

Competency-based training has been a major human resource development (HRD) activity. A current research trend in HRD is to assess outcomes of competency-based training to determine and document the effectiveness and the accountability of HRD. HRD reflective practitioners and thought-provoking researchers have placed considerable attention and efforts in the area of HRD

This article is the authors independent work and is not funded by the authors current and former employers, Amedisys, Inc. and Louisiana State University, respectively.The opinions expressed in the article are the authors and do not necessarily reflect to the views of the authors employers. Correspondence concerning this article should be addressed to Dr. Hsin-Chih Chen, 11100 Mead Road, #300, Baton Rouge, LA 70816; e-mail: hsinchihchen@gmail.com. Advances in Developing Human Resources Vol. 8, No. 2 May 2006 247-264 DOI: 10.1177/1523422305286155 Copyright 2006 Sage Publications
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

248

Advances in Developing Human Resources

May 2006

effectiveness (e.g., return on investment, transfer of learning, on-the-job performance improvement) in recent years. However, relatively few have focused on mechanisms such as assessment center (AC), which has long been proven to have strong content-related and criterion-related validities, as a means of assessing one important dimension of training outcomesbehavior change. Thornton and Rupp (2004) defined AC as a method of evaluating performance in a set of assessment techniques at least one of which is a simulation (p. 319). A common practice in ACs is job analysis or competency development and/or modeling to identify dimension, which is often conceptualized as equivalent to competency. It is through these dimensions (or competencies) that behavioral indicators are determined and then assessed through various simulation tactics (e.g., in-basket, leaderless group, oral presentation, factfinding, etc.). As a result, AC is different from simulation itself because it involves a complex process of job analysis and competency development. AC has been empirically (Arthur, Day, McNelly, & Edens, 2003) and conceptually (Thornton, 1992) linked to various human resource (HR) functions (e.g., selection, promotion, training and development, and performance feedback) in three HRD-related fields: public administration, industrial organizational psychology, and HR management. Most empirical AC research and practice has largely been used for selection and promotion purposes, but the use of AC in training and development (or HRD)commonly termed as developmental AC or development centeris primarily in its conceptual stage (Spychalski, Quinones, Gaugler, & Pohley, 1997; Woehr & Arthur, 2003). It is surprising that the application of AC is scarcely utilized for training effectiveness (Halman & Fletcher, 2000). To the authors knowledge, and within the literature the author can access, it has been recently used in some universities (e.g., The California State University Fullerton Business School) and in corporate settings, such as Development Dimensions International (DDI; Mayes, Belloli, Riggio, & Aguirre, 1997; Riggio, Aguirre, Mayes, Belloli, & Kubiak, 1997).

Purposes and Objectives


The sparse use of AC in HRD along with the solidly grounded AC research documented in other fields have led the author into an inquiry on how HRD researchers and practitioners can utilize AC to demonstrate HRD effectiveness and accountability. Therefore, the purpose of this article is to conduct a critical review of AC literature and to provide implications drawn from the review to HRD research and practice. Specifically, this article seeks to answer the following questions: (a) How have research and practice of the AC mechanism been historically used in organizations around the world? (b) What validity (content related, construct related, and criterion related) issues have been discussed in AC literature?
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

Chen / ASSESSMENT CENTER: A CRITICAL MECHANISM

249

(c) What are the major conceptual and evidence-based factors that support and hinder AC validity in organizations? and (d) What implications can be drawn from this study to help advance the field of HRD to be more effective and accountable?

Method
A literature search was conducted through two electronic databases Academic Premier and Business Source Premierto collect relevant information. A keywordassessment centerwas used to pair or combine with other keywords such as survey, practice, training and development, HR development, training effectiveness, training evaluation, competency-based training, content validity, construct validity, and criterion validity through advanced search in the database. The condition used for these keyword searches was set to AB abstract or author-supplied abstract. Additional references were collected through secondary sources as cited in relevant literature found in the databases.

AC Practice and Research


AC is apparently inherent from, and closely related to, the theory of performance tests that focuses on assessing behavior change. Two classical literatures of AC, Thornton and Byham (1982) and Kraut (1973), vividly described the history and development of AC around the world. They found that performance tests were used as early as the 1990s to assess differences of individual behaviors and, in some cases, to predict job performance. Several key features of AC (e.g., multiple assessors, complex realistic situations, and measurement of individual characteristics) as identified in the Guidelines and Ethical Considerations for Assessment Center Operations (Joiner, 2000) emerged in military settings around World War I, and the German government used these features to select capable military leaders in the 1930s. Similar procedures were conducted to assess military personnels leadership potential in Great Britain (War Office Selection Boards) and in the United States (Office of Strategic Services and Veterans Administration Clinical Psychology Studies) in World War II, along with other tests (e.g., intellectual and personality tests) in Australia and Canada in the 1970s. In nonmilitary settings, as described in the two above-mentioned classical texts, AC procedures of the type conducted in the Harvard Psychological Clinic were used to assess effects of individual characteristics and environmental factors on individual behavior in 1938. In 1948, an Australian manufacturing plant conducted group observations to select executive trainees, and in 1950 the British Civil Service Commission conducted AC in selecting civil servants for all middle- or high-level jobs. In 1964, American Telephone and Telegraph Company (AT&T) conducted a large-scale, longitudinal study (the Management Progress Study) using multiple assessment procedures to study developmental
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

250

Advances in Developing Human Resources

May 2006

processes in consideration of both characteristics of individuals and organizational settings. This study became a milestone and sparked the success of AC development in nonmilitary settings. Since AT&Ts study, research and practice of AC in nonmilitary settings started to grow in the United States and all over the world. Several leading industrial companies applied AC techniques for selection and promotion purposes. These companies included Caterpillar Tractor, Eastman Kodak, Ford Motor Company, General Electric, General Motors, International Business Machines, J. C. Penney, Olin, Sears, Shell Oil, Standard Oil of Ohio, Syntex, Unilever, Union Carbide, and many other organizations in Australia, Brazil, Great Britain, Denmark, Germany, France, Finland, Japan, Mexico, the Netherlands, Taiwan, and the United States (Alexander, 1979; Cook & Herche, 1994; Kraut, 1973; Lievens, Harris, van Keer, & Bisqueret, 2003; Lin & Wang, 2000; Shackleton & Newell, 1991; Woodruffe, 1993). At present, the most globally recognized HR consulting firm specializing in AC design, development, and implementation is DDI. Over the past two decades, the use of AC in industries has accelerated, particularly in Great Britain and the United States. In Great Britain, only 7% of organizations used AC in 1973 (Gill, Ungerson, & Thakur, 1973). However, the use of AC increased to 21% in 1986 (Robertson & Makin, 1986) and 59% in 1991 (Shackleton & Newell, 1991). In the United States, 44% of metropolitan police and fire departments used AC for promotion purposes in the 1980s (Fitzgerald & Quaintance, 1982). In 1997, 74% of organizations used AC for selection, promotion, or development purposes (Spychalski et al., 1997). Reasonably, the practice or operation of AC varies across disciplines and settings, so the practice or operation of AC had not concluded its unification until 1975. In 1975, the International Congress on the Assessment Center Method held in Quebec, Canada formed an international task force to develop guidelines for AC practice. These are the well-known Guidelines and Ethical Considerations for Assessment Center Operations (Joiner, 2000). The guidelines were set forth and attempted to incorporate existing best practices of AC to guide practice. Over the years, these guidelines have been revised three times, and the International Public Management Association published the latest version of the guidelines for HR in 2000 (Joiner, 2000). According to the latest guidelines, 10 key features of AC were identified. These features include the following: conduct a job analysis of relevant behaviors, classify behaviors into meaningful and relevant dimensions or competencies, establish a link between the classified dimensions or competencies and assessment techniques, conduct multiple assessments, develop and implement job-related simulations to elicit behaviors related to classified dimensions or competencies,
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

Chen / ASSESSMENT CENTER: A CRITICAL MECHANISM

251

TABLE 1:

Sample Dimensions-Exercise Matrix InBasket Leaderless Group Oral Presentation FactFinding RolePlaying

Oral communication Interpersonal skills Problem solving Conflict management Team building Decisiveness Goal setting Analytic skills Adaptability Coaching skills Customer service Workflow design

X X

X X X X X X

X X

X X X X X

X X X X

X X X X X X

X X

use multiple assessors to observe and evaluate each participant, train assessors with performance standards, record specific behavior observations, report observations made during each exercise before the integration discussion, and integrate data through a statistical integration process validated in accordance with professionally accepted standards.

Validity Issues in AC Literature


AC literature on investigating validity issues has focused on three streams: content-related, criterion-related, and construct-related validities. Content-related and criterion-related validity issues of AC were raised and discussed by researchers in the late 1970s. Norton (1977) alleged that AC could appropriately be utilized to select candidates for managerial positions even if empirical validation is absent. He argued that behaviors solicited in simulations of AC are samples rather than signs; a sample is validated through content-related validity, whereas a sign is validated through criterion-related validity. Dreher and Sackett (1981) stated that sign and sample are not mutually exclusive and contended that AC should be considered as both a sample of behavior and a sign of future job performance. In terms of sources of containment of content-related validity, Dreher and Sackett (1981) and Sackett and Dreher (1982) argued that the sources originated from inadequate job analysis or lack of fidelity (a close match between job activities and AC dimensions and exercises), whereas Norton (1981) contended they were results of poor design and implementation of AC. Debates about construct-related validities, such as convergent and discriminant validities, have centered on issues of design and implementation of AC.
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

252

Advances in Developing Human Resources

May 2006

As widely discussed in the AC literature, one of the key AC designs was to structure a dimensions-exercises matrix. Table 1 shows a sample of a dimensionsexercise matrix. In terms of the design aspect, dimensions are built into a variety of exercises for assessors to evaluate AC participants behaviors in demonstrating these dimensions or competencies. A dimension is often assessed through different exercises, and an exercise includes several different dimensions to be assessed. Accordingly, the ratings for the same dimension across exercises are expected to highly correlate (convergent validity), whereas ratings of different dimensions within an exercise are less or not correlated (divergent validity). For instance, in the sample provided in Table 1, using the problem-solving dimension as an example, to have construct-related validity, one would expect to see that the ratings of problem solving in the in-basket, leaderless group, fact-finding, and role-playing exercises are highly correlated. On the other hand, the rating of problem solving should be correlated lower with other dimensions (i.e., oral communication, interpersonal skills, conflict management, team building, decisiveness, and coaching skills) within the same exercise (i.e., leaderless group). Over the years, considerable research focusing on investigating constructrelated validity has consistently found low to moderate convergent validity of AC dimensions across exercises and has weakly exhibited discriminant validity of dimensions within exercises (Archambeau, 1979; Bycio, Alvares, & Hahn, 1987; Fleenor, 1996; Gorham, 1978; Herriot, 1986; Highhouse & Harris, 1993; Jackson, Stillman, & Atkins, 2005; Joyce, Thayer, & Pond, 1994; Kauffman, Jex, Love, & Libkuman, 1993; Kleinmann & Koller, 1997; Klimoski & Brickner, 1987; Lance et al., 2000; Lance, Lambert, Gewin, Lievens, & Conway, 2004; Lowry, 1995, 1997; Robertson, Gratton, & Sharpley, 1987; Sackett & Dreher, 1982; Sackett & Harris, 1988; Schneider & Schmitt, 1992; Silverman, Dalessio, Woods, & Johnson, 1986; Turnage & Muchinsky, 1982). These research findings have put construct-related validity in doubt, and many suggested that AC ratings should be based on exercises instead of dimensions. In contrast, some recent research has reported encouraging findings as opposed to traditional evidence of failure of construct-related validity in AC (Arthur, Woehr, & Maldegen, 2000; Kudisch, Ladd, & Dobbins, 1997; Lievens, 2001; Lievens & Conway, 2001; Reilly, Henry, & Smither, 1990; Russell & Domm, 1995; Thornton, Tziner, Dahan, Clevenger, & Meir, 1997; Woehr & Arthur, 2003). These studies pointed out that construct-related validity may not be as troubling an issue as it has been perceived. The studies suggested that issues of development, implementation, design, and methods together contributed to construct-related validity of AC. Issues related to criterion-related validity of AC do not appear as sophisticated as do those of construct-related validity. Specifically, evidence shows that AC has been consistently proved capable of effectively predicting various outcome criteria (e.g., selection, promotion, and development). For example, Gaugler, Rosenthal, Thornton, and Benton (1987) conducted a meta-analysis
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

Chen / ASSESSMENT CENTER: A CRITICAL MECHANISM

253

of AC validity to examine the predictive (criterion-related) validity of AC. The study meta-analyzed relationships between AC and its outcome variables from 109 research references. The average validity coefficient was .37, which was calculated by overall AC rating and corrected by sampling error, restriction of range, and criterion unreliability. Recently, Arthur et al. (2003) conducted a similar meta-analysis of criterion-related validity of AC dimensions. The study examined 179 research articles including published (87%) and unpublished (13%) references. In contrast to Gaugler et al. (1987) who used overall AC rating as a single predictor, Arthur et al. (2003) collapsed 168 dimensions into six overarching constructs as predictors. The results showed a range of estimated criterion-related validities from .25 to .39. Although methods varied, both studies synthesized a large body of knowledge in establishing criterion-related validity that evidently demonstrated the generalizability of AC.

Factors Affecting the Validity of AC


The issues discussed in relation to validity of AC can assist in identifying factors that support or hinder the success of AC where the success is understood through effective planning, design, development, and implementation to demonstrate the concept of modern, unified construct validity. Two types of factors are included in this section: conceptual factors and evidence-based factors. Conceptual Factors Caldwell, Thornton, and Gruys (2003) identified 10 common assessment errors. The errors include (a) poor planning, (b) inadequate job analysis, (c) weakly defined dimensions, (d) poor exercises, (e) no pretest evaluations, (f) unqualified assessors, (g) inadequate assessor training, (h) inadequate candidate preparation, (i) sloppy behavior documentation and scoring, and (j) misuse of results. These common errors are well documented by the Guideline and Ethical Considerations for Assessment Center Operations. In addition, Thornton (1992) found that the number of dimensions to be assessed in an AC could also affect assessors accuracy in determining participants performance score. The rationales behind these errors (factors) are summarized below. Poor planning. Planning AC initiatives requires dedicated work and sufficient resources and experts available to ensure that effective AC practice is carried out. Poor planning may occur when key upper management is not involved and not committed to support the effort of preliminary AC planning. Inadequate job analysis. Job analysis is the previous requirement for AC design. In practice, many organizations, particularly within the public sector, rely solely on content-related validity of AC because of limited resources and time constraints. This has left job analysis extremely critical in producing
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

254

Advances in Developing Human Resources

May 2006

effective AC. Without adequate job analysis, the process carried forth by AC will not be justifiable. Weakly defined dimensions. Sound definition of dimensions to be assessed in AC has been essential. This is because it requires articulation of certain performance conditions with the level of effectiveness measures to address behaviors, which are expected to be demonstrated on a target job. Poor exercises. Design, development, and administration of AC exercises are sophisticated processes that require carefully crafted skills for all phases. In design and development phases, building work samples into exercises and ensuring the exercises are designed in a way that behaviors of participants can be elicited to reflect the defined dimensions are essential. In the administration phase, carefully standardized exercises and clear specification of what role individuals should play will make the process effective, as interactive simulations are often utilized in AC. No pretest evaluation. The design of AC should be valid for both content and construct measures. Specifically, not only should the measurement instruments appropriately match the work domain of the position but also the knowledge and skills to be observed should be valid. Pretest evaluation should not be solely based on prior experience. Rather, it should closely align with procedures such as test construction, job analysis, job validation, statistics, and technical factors. Unqualified assessors. Assessor selection should be based upon technical knowledge and expertise in the profession being assessed. It is inevitable that managers may play the assessor role, but the qualified assessors who are also managers of participants must be able to demonstrate objective judgments in ratings. Inadequate assessor training. Adequate assessor training has been confirmed to have a dramatic impact on the validity of AC because assessor training is critical to ensure that proper classification of dimensions and behaviors are well understood by assessors. Contents of adequate assessor training should include context information for effective judgments, detailed information, examples of effective and ineffective performance, and knowledge of assessment techniques, evaluation, policies, rating procedures, and feedback procedures. Inadequate candidate preparation. Candidates preparation or readiness to attend AC is also a factor for effective AC. Orientation given to AC candidates and/or participants will help them grasp an understanding of the how and what for the process being carried out. Contents to assist candidate preparation may include what the objective is, how individuals are selected, what options candidates may have, what key staff is involved, how the materials and
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

Chen / ASSESSMENT CENTER: A CRITICAL MECHANISM

255

results will be used, when and what kind of feedback will be given, who will have access to the reports, and who will be the contact person. Sloppy behavior documentation and scoring. The process of behavior documentation is vital, because description of behavior serves as a foundation for subsequent behavioral assessment. On the other hand, the process of behavior scoring is equally critical because it underpins accuracy on which observed behaviors are assessed. AC validity can be achieved when both schema-driven and behaviordriven approaches are taken into account in behavior documentation and scoring, where the schema-driven approach assists in evaluating knowledge globally and the behavior-driven approach facilitates in providing evidence-based judgment. Misuse of results. Misuse of AC results has been a practical concern of AC researchers. AC practitioners should effectively communicate with participants with regard to the results being used and consistently adhere to the standards and criteria set forth at the beginning of AC implementation. By so doing, positive responses to AC can be expected so the AC process can be continually implemented and sustained. Number of dimensions to be assessed in an exercise. Mental constraint of assessors has been a great concern in AC literature, which proposes that too many dimensions to be assessed in an AC may limit an assessors ability to accurately classify the desired dimensions candidates exhibit (Lievens & Klimoski, 2001). In this case, the assessor may demonstrate inconsistency in rating of dimensions across exercises due to such a constraint. In addition, some literature (e.g., Sackett & Hakel, 1979) has consistently found that many dimensions of AC that designers originally identify are highly intercorrelated so they can be subsequently collapsed into a lesser number of global dimensions. Although suggestions vary for appropriate number of dimensions to be assessed in an AC, the rule of thumb is to minimize the dimensions if possible. Five to 10 dimensions being assessed in an exercise are deemed adequate (Thornton, 1992). Evidence-Based Factors The above-mentioned factors are mostly administrative, descriptive, or conceptual in nature. Literature through empirical (evidence-based) examination of AC issues can also substantially contribute to our understanding of factors affecting AC success or validity. Mostly, the empirical literature on AC has focused on design, development, and implementation aspects to improve AC validity. These factors include (a) rating approach, (b) transparent dimension, (c) assessor training strategy, (d) assessing technique, and (e) variety of exercises. Rating approach. Robie, Osburn, Morris, Etchegaray, and Adams (2000) examined the effect of the rating process on the construct-related validity of
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

256

Advances in Developing Human Resources

May 2006

AC. In their study, AC dimensions were rated using either a within-exercise rating process or a within-dimension rating process, where the former all dimensions are rated within one exercise, and the latter one dimension is rated across all exercises. Their finding suggested that the within-exercise rating process results in exercise factors, whereas the within-dimension rating process results in dimension factors, implying that the within-dimension rating approach could enhance construct-related validity of AC. Transparent dimension. Revealing dimensions to the candidates before the AC is conducted can increase candidates readiness for AC assessment. Kolk, Born, and Van der Flier (2003) examined the effect of transparent dimension through two independent studiesone with a student sample and the other with actual job applicants. The results showed a significant improvement in construct-related validity for the transparent group with actual job applicants. Assessor training strategy. Although the importance of careful assessor training has been repeatedly stressed in AC literature, relatively limited research has empirically examined the effect of assessor training strategies. Schleicher, Day, Mayes, and Riggio (2002) used frame-of-reference assessor training to examine construct validity of AC. The frame-of-reference training was to eliminate bias of raters with a common frame of reference for rating. The results of their study showed that frame-of-reference assessor training is effective in improving the reliability, accuracy, construct-related validity, and criterion-related validity of assessment ratings. Assessing technique. Cognitive demands of assessors may limit assessors ability to classify dimension ratings in an AC. Similar to controlling the number of dimensions to be assessed in an AC, using assessing techniques such as behavior checklists can assist with assessors reducing cognitive demands and improving construct validity of AC. Some research results suggested that the behavior checklist technique can improve construct-related validity (Donahue, Truxillo, Cornwell, & Gerrity, 1997; Reilly et al., 1990). Variety of exercises. Assessing human behavior is a complex undertaking. Therefore, dimensions to be assessed in an AC are inherently complicated. To determine the number of exercises to be used in an AC can be a contingent factor. However, Gaugler et al. (1987) found that AC shows a stronger criterionrelated validity as a greater number of different types of exercises are used in an AC. The finding suggested that single or limited number of exercises used in an AC may not result in solid criterion-related validity. Figure 1 demonstrates a summary of the factors affecting AC validity reviewed in this article. An overarching factor, systematic planning, drives the whole process of AC. Individual factors are categorized into four groups: the assessment-related, design-related, development-related, and implementation-related factors.
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

Chen / ASSESSMENT CENTER: A CRITICAL MECHANISM

257

Systematic Planning

Assessment-Related Factors Job Analysis/Competency Development Dimension Definitions

Design-Related Factors Pre-Test Exercise Design Rating Approach Assessing Techniques Variety of Exercises

Development-Related Factors Exercise Development Assessor Training Strategies Assessor Selection

Implementation-Related Factors Exercise Implementation Candidate Preparation Purposes Articulation and Result Reporting Behavior Documentation and Scoring

Assessment Center Validities (Content-Related, Construct-Related, and Criterion-Related)

FIGURE 1:

Factors Affecting Validity of Assessment Center

Implications for HRD Research and Practice: Strengths, Weaknesses, Opportunities, and Threats (SWOT) Analysis
The author adopts a SWOT analysis, which has been often used for organizational strategic planning, to develop implications of adopting AC for HRD research and practice. The SWOT refers to strengths, weaknesses, opportunities, and threats. Although the four components are interrelated, the strengths and weaknesses are often used for a diagnosis for internal environment within an organization, whereas the opportunities and threats are associated with the external environment. In the context of this article, the scope of SWOT analysis is not focused on using AC in a single organization. Rather, it is conceptualized to understand how AC can affect HRD research and practice, particularly for the adoption of AC to assess competency-based training outcomes or effectiveness of HRD interventions. Strengths of Using AC in HRD As this article shows, it is clear that existing AC literature from other fields has been informative, allowing HRD practitioners and researchers to benefit from their learning experiences (e.g., known models, wide adoption in different disciplines across the globe, historical development, ethical guidelines, issues of validity, and key factors influencing AC success) to help demonstrate HRD effectiveness. The author provides six key points explaining why AC has strong
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

258

Advances in Developing Human Resources

May 2006

implications for assessing competency-based training and HRD effectiveness and accountability. First, AC is a behavior-based assessment, which is conceptually a more reliable indicator than cognitive assessment for measuring performance. Second, AC is a valid mechanism, as its content-related and criterion-related validities have been well established with partial support from construct-related validities. Third, implementation of AC is more controllable because one can conduct pre- and post-AC evaluations to compare learning gains through behavior exposure. Fourth, AC can help demonstrate unique characteristics of human capital of an organization and match organizational needs because the AC development process is rooted in job analysis and/or competency development. Fifth, AC simulations commonly use work samples, so the behaviors demonstrated in AC are closely related to real job performance. Last but not least, AC can measure competencies at individual (e.g., oral presentation), process (e.g., in-basket), and group and organization levels (e.g., leaderless group discussion). HRD practice has exercised a multidimensional approach (e.g., on-the-job performance, return on investment, organizational results) to assessing effectiveness of HRD interventions. AC is conceptually instrumental in integrating with other dimensions of HRD effectiveness, particularly for learning transfer, and could be used as a moderator to link to these dimensions. For example, AC participant ratings can be an evidence of short-term learning transfer, whereas the on-the-job performance is related to long-term learning transfer. On the other hand, because AC has the potential to assess HRD interventions across different levels (e.g., individual, group, and organization), it can be used as an independent dimension to represent HRD effectiveness. Weaknesses of Using AC in HRD A key weakness of AC is the issue of the construct-related validity that is only partially supported by the literature and has been highly debated over the years. This is particularly an issue if one conducts a competency development and in turn develops a competency-based AC from a system perspective. Specifically, most of the AC literature in assessing construct-related validity has suffered from receiving low to moderate convergent validity of dimensions across the exercises and low discriminant validity within each exercise. The weakness of construct-related validity of AC has somewhat challenged AC to be used for multiple purposes even if it is conceptually sound to do so (Thornton et al., 1997). For instance, in terms of the review of this article, research focusing on examining whether AC ratings should be based on dimensions or exercises has not come to a steadfast conclusion. The fact is that mainstream research results are in favor of the exercise-rating approach. Yet, such practice is not suitable for HRD, nor does it have the capacity to integrate HR-related functions. If AC ratings are based on exercises, it will misplace its ability to identify strengths and weaknesses of the dimensions or competencies
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

Chen / ASSESSMENT CENTER: A CRITICAL MECHANISM

259

that individuals demonstrate in the AC. This is because exercise ratings can only provide information on how well individuals perform in an AC but will be unable to determine what dimensions are effective predictors for job performance. Following this logic, it is also palpable that the adoption of the AC exercise-rating approach is inadequate to link to other HRD-related functions. A further example is provided by the following. Suppose a leaderless group simulation is designed to assess communication, decision making, and negotiation competencies for promotion purposes. If one receives a higher score than the standard in the simulation, he or she will be promoted. Conversely, if the individual does not achieve the desired performance level, he or she will require competency training. When an exercise-rating approach is adopted in the given case, the training need becomes undetermined, because the exerciserating approach is unable to provide adequate information. Moreover, if an exercise-rating approach was best suited for AC, the job analysis/competency development in an AC design would have become an irrelevant or unnecessary procedure, because job analysis/competency development is the process of identifying competencies and dimensions rather than exercises. Threats to and Opportunities for Using AC in HRD Understandably, the weakness of AC will become a major threat once AC is adopted in HRD practice. Nevertheless, a modern conceptualization of these various validities has been spoken of as a unified, comprehensive concept of construct validity. Messick (1995) stated, This comprehensive view of [construct] validity integrates content, criterion, and consequences into a construct framework for empirically testing rational hypotheses about score meaning and utility (p. 742). He contended that the unified concept of construct validation is more than an attempt to define or understand the construct. Rather, the construct validation should adhere to meaningful measure and its utility. This unified concept of construct validity has recently been adopted to explain the construct-related paradox in AC literature. For example, Russell and Domm (1995) asserted that although construct-related validity of assessment has been questionable, the consistent criterion-related validity in AC has implied that some valid construct must have existed; the problem is that we do not know what the construct is. In addition, Woehr and Arthur (2003) found that research in examining construct-related and criterion-related validities has been done separately, and only a few studies have been conducted to assess both simultaneously. A lack of construct-related and criterion-related validities was found in the few studies that examined both. In other words, when construct-related and criterion-related validities are examined, the findings of both validities tend to be consistent. Following this logic, the modern conceptualization of the unified construct validity seems to be a reasonable explanation as to why conventional evidence is lacking for construct-related validity
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

260

Advances in Developing Human Resources

May 2006

of AC, because to some extent, the three types of validities had been historically considered in isolation. It could be possible that the research showing a lack of construct-related validity of AC could also lack criterion-related validity if both are conducted together. Moreover, in terms of the review of this article, the sources of constructrelated validity issues of AC are related to design, development, and implementation. Because HRD has a great tradition in dealing with these issues, the challenge of the AC construct-related validity issue becomes an opportunity for HRD to make contributions to AC research and practice. Improvement of construct-related validity of AC not only helps demonstrate the effectiveness of HRD interventions but also has potential to put HRD in a key position in organizations for designing competency-based assessment systems, (e.g., competency modeling, performance appraisal, competency-based selection, etc.), further recognizing the accountability of HRD.

Contributions to HRD and Future Research Direction


Proven content-related and criterion-related validities and some constructrelated validity documented in the literature have confirmed that AC is a valid mechanism to assess behaviors. Unfortunately, research and practice in HRD has not paid enough attention to this important area. Perhaps it is due to the nature of the complex AC process and cost-effective concerns causing the ignorance or hesitant adoption of AC in HRD. However, as discussed, if AC can enhance effectiveness and accountability of HRD, such concerns turn out to be unnecessary. The author hopes through exposing the wide use of AC in the world, demonstrating evidence of AC validity, identifying influential factors of AC validity, and identifying strengths, weaknesses, threats, and opportunities of using AC in HRD that this article will encourage reflective practitioners of HRD to adopt AC, and provoke researchers of HRD to study research initiatives in AC in order to enhance HRD effectiveness and accountability. Finally, to make AC most appropriate for use in HRD, future research, as done in other fields, should continually focus on how to improve the construct-related validity of AC, particularly through design and development perspectives of AC and through the unified concept of construct validity.

References
Alexander, L. D. (1979). An exploratory study of the utilization of assessment center results. Academy of Management Journal, 22(1), 152-157. Archambeau, D. J. (1979). Relationships among skill ratings in an assessment center. Journal of Assessment Center Technology, 2, 7-20. Arthur, W., Jr., Day, E. A., McNelly, T. L., & Edens, P. S. (2003). A meta-analysis of the criterion-related validity of assessment center dimensions. Personnel Psychology, 56, 125-154. Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

Chen / ASSESSMENT CENTER: A CRITICAL MECHANISM

261

Arthur, W., Jr., Woehr, D. J., & Maldegen, R. (2000). Convergent and discriminant validity of assessment center dimensions: An empirical re-examination of the assessment center construct-related validity paradox. Journal of Management, 26, 813-835. Bycio, P., Alvares, K. M., & Hahn, J. (1987). Situation specificity in assessment center ratings: A confirmatory analysis. Journal of Applied Psychology, 72, 463-474. Caldwell, C., Thornton, G. C., III, & Gruys, M. L. (2003). Ten classic assessment center errors: Challenges to selection validity. Public Personnel Management, 32, 73-88. Cook, R., & Herche, J. (1994). Assessment centers: A contrast of usage in diverse environments. The International Executive, 36(5), 645-656. Donahue, L. M., Truxillo, D. M., Cornwell, J. M., & Gerrity, M. J. (1997). Assessment center validity and behavioral checklists: Some additional findings. Journal of Social Behavior and Personality, 12, 85-108. Dreher, G. F., & Sackett, P. R. (1981). Some problems with applying content validity evidence to assessment center procedures. The Academy of Management Review, 6, 551-560. Fitzgerald, L. F., & Quaintance, M. K. (1982). Survey of assessment center use in state and local government. Journal of Assessment Center Technology, 5, 9-21. Fleenor, J. W. (1996). Constructs and developmental assessment centers: Further troubling empirical findings. Journal of Business and Psychology, 3, 319-333. Gaugler, B., Rosenthal, D., Thornton, G., & Benton, C. (1987). Meta-analysis of assessment center validity. Journal of Applied Psychology, 72, 492-510. Gill, D., Ungerson, B., & Thakur, M. (1973). Performance appraisal in perspective: A survey of current practice. London: Institute of Personnel Management. Gorham, W. A. (1978). Federal executive agency guidelines and their impact on the assessment center method. Journal of Assessment Center Technology, 1, 2-8. Halman, F., & Fletcher, C. (2000). The impact of development centre participation and the role of individual differences in changing self-assessments. Journal of Occupational and Organizational Psychology, 73, 423-442. Herriot, P. (1986). Assessment centers revisited. Guidance and Assessment Review, 2, 7-8. Highhouse, S., & Harris, M. M. (1993). The measurement of assessment center situations. Journal of Applied Social Psychology, 23, 140-155. Jackson, D. J., Stillman, J. A., & Atkins, S. G. (2005). Rating tasks versus dimensions in assessment centers: A psychometric comparison. Human Performance, 18(3), 213-241. Joiner, D. A. (2000). Guidelines and ethical considerations for assessment center operations: International task force on assessment center guidelines. Public Personnel Management, 29, 315-331. Joyce, L. W., Thayer, P. W., & Pond, S. B., III. (1994). Managerial functions: An alternative to traditional assessment center dimensions? Personnel Psychology, 47, 109-121. Kauffman, J. R., Jex, S. M., Love, K. G., & Libkuman, T. M. (1993). The construct validity of assessment center performance dimensions. International Journal of Selection and Assessment, 1, 213-223. Kleinmann, M., & Koller, O. (1997). Construct validity of assessment centers: Appropriate use of confirmatory factor analysis and suitable construction principles. Journal of Social Behavior and Personality, 12, 65-84. Klimoski, R., & Brickner, M. (1987). Why do assessment centers work? The puzzle of assessment center validity. Personnel Psychology, 40, 243-260. Kolk, N. J., Born, M. P., & Van der Flier, H. (2003). The transparent assessment center: The effect of revealing dimensions to applicants. Applied Psychology: An International Review , 52(4), 648-668. Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

262

Advances in Developing Human Resources

May 2006

Kraut, A. (1973). Management assessment in international organizations. Industrial Relations, 12(2), 172-182. Kudisch, J. D., Ladd, R. T., & Dobbins, G. H. (1997). New evidence on the construct validity of diagnostic assessment centers: The findings may not be so troubling after all. Journal of Social Behavior and Personality, 12, 129-144. Lance, C. E., Lambert, T. A., Gewin, A. G., Lievens, F., & Conway, J. M. (2004). Revised estimates of dimension and exercise variance components in assessment center post-exercise dimension ratings. Journal of Applied Psychology, 89, 377-385. Lance, C. E., Newbolt, W. H., Gatewood, R. D., Foster, M. R., French, N. R., & Smith, D. E. (2000). Assessment center exercise factors represent cross-situational specificity, not method bias. Human Performance, 13(4), 323-353. Lievens, F. (2001). Assessors and use of assessment center dimensions: A fresh look at a troubling issue. Journal of Organizational Behavior, 22, 203-221. Lievens, F., & Conway, J. M. (2001). Dimension and exercise variance in assessment center scores: A large-scale evaluation of multitrait-multimethod studies. Journal of Applied Psychology, 86, 1202-1222. Lievens, F., Harris, M. M., van Keer, E., & Bisqueret, C. (2003). Predicting crosscultural training performance: The validity of personality, cognitive ability, and dimensions measured by an assessment center and a behavior description interview. Journal of Applied Psychology, 88, 476-489. Lievens, F., & Klimoski, R. (2001). Understanding of assessment center process: Where are we now? In C. L. Cooper & I. T. Robertson (Eds.), International review of industrial and organizational psychology (Vol. 16). New York: John Wiley. Lin, T.-Y. S., & Wang, S.-M. (2000, September). The application of assessment center method on assessing technological and vocational teachers. Paper presented at the International Conference of Scholars on Technology Education, Braunschweig, Germany. Lowry, P. E. (1995). The assessment center process: Assessing leadership in the public sector. Public Personnel Management, 24, 443-450. Lowry, P. E. (1997). The assessment center process: New directions. Journal of Social Behavior and Personality, 12, 53-62. Mayes, B. T., Belloli, C. A., Riggio, R. E., & Aguirre, M. (1997). Assessment centers for course evaluations: A demonstration. Journal of Social Behavior and Personality, 12, 303-320. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons responses and performance as scientific inquiry into scoring meaning. American Psychologist, 9, 741-749. Norton, S. D. (1977). The empirical and content validity of assessment centers vs. traditional methods for predicting managerial success. Academy of Management Review, 2, 442-453. Norton, S. D. (1981). The assessment center process and content validity: A reply to Dreher and Sackett. Academy of Management Review, 6, 561-566. Reilly, R. R., Henry, S., & Smither, J. W. (1990). An examination of the effects of using behavior checklists on the construct validity of assessment center dimensions. Personnel Psychology, 43, 71-84. Riggio, R. E., Aguirre, M., Mayes, B. T., Belloli, C., & Kubiak, C. (1997). The use of assessment center methods for students outcome assessment. Journal of Social Behavior and Personality, 12, 273-288.
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

Chen / ASSESSMENT CENTER: A CRITICAL MECHANISM

263

Robertson, I. T., Gratton, L., & Sharpley, D. (1987). The psychometric properties of managerial assessment centers: Dimension into exercises wont go. Journal of Occupational Psychology, 60, 187-195. Robertson, I. T., & Makin, P. J. (1986). Management selection in Britain: A survey and critique. Journal of Occupational Psychology, 59, 45-57. Robie, C., Osburn, H. G., Morris, M. A., Etchegaray, J. M., & Adams, K. A. (2000). Effects of the rating process on the construct validity of assessment center dimension evaluations. Human Performance, 13(4), 355-370. Russell, C. J., & Domm, D. R. (1995). Two field tests of an explanation of assessment center validity. Journal of Occupational and Organizational Psychology, 68, 25-47. Sackett, P. R., & Dreher, G. F. (1981). Some misconceptions about content-oriented validation: A rejoinder to Norton. Academy of Management Review, 6, 567-568. Sackett, P. R., & Dreher, G. F. (1982). Constructs and assessment center dimensions: Some troubling empirical findings. Journal of Applied Psychology, 67, 401-410. Sackett, P. R., & Hakel, M. D. (1979). Temporal stability and individual differences in using assessment center information to form overall ratings. Organizational Behavior and Human Performance, 23, 120-137. Sackett, P. R., & Harris, M. (1988). A further examination of the constructs underlying assessment center ratings. Journal of Business and Psychology, 3, 214-229. Schleicher, D. J., Day, C. V., Mayes, B. T., & Riggio, R. E. (2002). A new frame for frame-of-reference training: Enhancing the construct validity of assessment centers. Journal of Applied Psychology, 87, 735-746. Schneider, J. R., & Schmitt, N. (1992). An exercise design approach to understanding assessment center dimension and exercise constructs. Journal of Applied Psychology, 77, 32-41. Silverman, W. H., Dalessio, A., Woods, S. B., & Johnson, R. L., Jr. (1986). Influence of assessment center methods on assessors ratings. Personnel Psychology, 39, 565-578. Shackleton, V., & Newell, S. (1991). Management selection: A comparative survey of methods used in top British and French companies. Journal of Occupational Psychology, 64, 23-36. Spychalski, A. C., Quinones, M., Gaugler, B. B., & Pohley, K. (1997). A survey of assessment centre practices in organizations in the United States. Personnel Psychology, 50, 71-90. Thornton, G. C., III. (1992). Assessment centers in human resource management. Reading, MA: Addison-Wesley. Thornton, G. C., III, & Byham, W. C. (1982). Assessment centers and managerial performance. New York: Academic Press. Thornton, G. C., III, & Rupp, D. E. (2004). Simulations and assessment centers. In M. Hersen (Ed.), Comprehensive handbook of psychological assessment: Vol. 4. Industrial and organizational assessment. (J. C. Thomas, Vol. Ed., pp. 319-344). Hoboken, NJ: John Wiley. Thornton, G. C., III, Tziner, A., Dahan, M., Clevenger, J. P., & Meir, E. (1997). Construct validity of assessment center judgments: Analyses of the behavioral reporting method. Journal of Social Behavior and Personality, 12, 109-128. Turnage, J. J., & Muchinsky, P. M. (1982). Transitional variability in human performance within assessment centers. Organizational Behavior and Human Performance, 30, 174-200.
Downloaded from adh.sagepub.com at University of Bucharest on January 6, 2014

264

Advances in Developing Human Resources

May 2006

Woehr, D. J., & Arthur, W., Jr. (2003). The construct-related validity of assessment center ratings: A review and meta-analysis of the role of methodological factors. Journal of Management, 29, 231-258. Woodruffe, C. (1993). Assessment center: Identifying and developing competence. London: Institute of Personnel Management. Hsin-Chih Chen, PhD, is a statistician/research analyst at Amedisys, Inc., a leading provider of home health care services, where he conducts data-driven research on quality of services, market analyses, and corporate strategies across all levels. Prior to joining Amedisys, Inc., he served as a postdoctoral researcher at Louisiana State University. He has published a number of research articles in peer-reviewed human resource development journals, and currently serves as associate editor for the 2006 International Conference Proceedings of Academy of Human Resource Development. His recent research interests include competency-based development, assessment center, transfer of learning, and effectiveness, strategy, and philosophy of human resource development. His doctorate was completed in human resource development at Louisiana State University. Chen, H.-C. (2006). Assessment center: A critical mechanism for assessing HRD effectiveness and accountability. Advances in Developing Human Resources, 8(2), 247-264.

You might also like