De Nition of A Model Based On Bibliometric Indicators For Assessing Applicants To Academic Positions

Definition of a Model Based on Bibliometric Indicators
for Assessing Applicants to Academic Positions
Elizabeth S. Vieira
REQUIMTE/Departamento de Qumica e Bioqumica, Faculdade de Cincias, Universidade do Porto,
Rua do Campo Alegre, 687, 4169-007 Porto, Portugal; Departamento Engenharia Industrial e Gesto,
Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n, 4200-465 Porto, Portugal.
E-mail: elizabeth.vieira@fc.up.pt
Jos A.S. Cabral
INESC-TEC, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n, 4200-465 Porto,
Portugal. E-mail: jacabral@fe.up.pt
Jos A.N.F. Gomes
REQUIMTE/Departamento de Qumica e Bioqumica, Faculdade de Cincias, Universidade do Porto, Rua do
Campo Alegre, 687, 4169-007 Porto, Portugal. E-mail: jfgomes@fc.up.pt
A model based on a set of bibliometric indicators is

proposed for the prediction of the ranking of applicants
to an academic position as produced by a committee of
peers. The results show that a very small number of
indicators may lead to a robust prediction of about 75%
of the cases.
We start with 12 indicators to build a few composite
indicators by factor analysis. Following a discrete
choice model, we arrive at 3 comparatively good predicative models. We conclude that these models have a
surprisingly good predictive power and may help peers
in their selection process.
Introduction
Bibliometric indicators have been widely used for assessing the scientic performance of a given research body. The
design of indicators has attracted a lot of attention in the last
few years as national authorities, funding bodies, and institutional leaders show a growing interest in indicators that
can, automatically, rate the performance of their institutions.
The rankings published by Centre for Science and
Technology Studies (CWTS), SCImago, the Performance
Ranking of Scientic Papers for World Universities
(Taiwan), and The Academic Ranking of World Universities
Received: January 3, 2013; revised March 28, 2013; accepted March 28,
2013
2014 ASIS&T Published online 7 January 2014 in Wiley Online
Library (wileyonlinelibrary.com). DOI: 10.1002/asi.22981
(Shanghai Jiao Tong University) use bibliometric indicators

to describe the scientic performance of institutions.
These indicators are becoming increasingly important
and this may be explained by their advantages relative to
peer review:
The indicators provide objective information about the scientic performance;
The use of indicators allows quick and cheap updates;
It is possible to assess a large number of documents;
The time and implementation costs of a method based on
bibliometric indicators are much lower than those of peer
review.
Currently, several countries use a combination of peer

review and bibliometric indicators to assess the research
performance of higher education institutions and to allocate funding. Examples of this mixed strategy are the
Excellence in Research for Australia (ERA) and the Italian
Valutazione Triennale della Ricerca (VTR). The British
Research Excellence Framework (REF), follows from the
Research Assessment Exercise (RAE) that was used for
many years and was based solely on peer review. In
REF, expert review is still the primary means of assessing
the submitted outputs, but peers can now make use of
citation data as additional input in their assessment. In
Finland, the number of international publications is one
of the measures used to allocate funding to universities.
In Germany, the impact factor of the publications is
used in performance-based funding systems. In Norway
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 65(3):560577, 2014
the performance-based reallocation system applies a bibliometric indicator that considers publications at two levels
(OECD, 2010).
Peer review is still the gold standard for research evaluation but the pressure for more frequent and extensive
assessment of the performance of researchers, research
groups, and research institutions makes bibliometry attractive and, in fact, the only method available in practice in
many situations. It is therefore very important to benchmark
bibliometric indicators against traditional peer assessment in
real situations. We know that bibliometric indicators are
more accepted in the assessment of large research bodies,
but they are still used frequently for individuals and it is
important to clarify what should be expected of them in this
practice.
Some studies have been carried out in recent years with
the main goal of nding a relation between the two methods.
These studies consider the judgments of peer-review at
several levels: a) national level; b) research programs,
research groups, departments; and c) at the individual level.
The next section summarizes some of the most relevant
published results on this topic.
Benchmarking of Indicators Against Peer Review
National or institutional assessment exercises provide
invaluable collections of data that may be used for some sort
of quality control. Several studies have been made to explore
how the results of the RAE in Britain and the VTR in Italy
compare with simple bibliometric indicators.
Reale, Barbara, and Costantini (2007), explored the relationship between the results of peer review evaluations in
three disciplines, (biology, chemistry, and economics) at the
VTR, with the impact factors of the journals where the
documents submitted for evaluation were published. They
found that the impact factor correlates positively and signicantly with the decisions of peers, but these correlations are
weak. The Spearman rank correlation coefcients vary
between 0.44 and 0.48.
Abramo, DAngelo, and Caprasecca (2009), studied how
the results of the VTR evaluations made by peers for the
hard sciences are related with the results obtained using the
impact factor of the journals and the number of documents
published. They found correlations between 0.336 and 0.876
for the scientic elds analyzed. Franceschet and Costantini
(2011), undertook a similar study, but considering more
bibliometric indicators. They used the mean number of citations per document, the impact factor of the journal and the
h-index of a set of publications. A positive and signicant
correlation was found between the mean number of citations
per document and the decisions made by peers for most of
the scientic elds analyzed (Spearmans coefcients varied
from 0.32 to 0.81). The same was found for the impact factor
(Spearmans coefcients varied from 0.29 to 0.85). The
h-index was shown to be a reasonable indicator to discriminate among the documents classied in a given level (Excellent, Good, Acceptable, Limited) by peers.
Several studies of the peer review results in the RAE

found a correlation between peer judgments and bibliometric indicators. Norris and Oppenheim (2003), showed that
there is a correlation between citation counts and the RAE
results. High correlations were obtained between 2008 RAE
results and several quantitative indicators (Taylor, 2011).
At the level of research group or department, there are
some studies considering different elds. Rinia, van
Leeuwen, van Vuren, and van Raan (1998), studied the
relation between bibliometric indicators and the results of
peer review in the context of research programs in physics in
the Netherlands. A set of eight indicators that consider the
quantity and impact of the documents was used. The authors
found that the number of publications does not correlate
signicantly with the results obtained in peer review. The
impact indicators correlate positively and signicantly with
peer review ratings. The Spearman rank correlation coefcients vary between 0.47 and 0.68.
Aksnes and Taxt (2004) calculated a set of bibliometric indicators based on the scientic production of the
departments in the Faculty of Mathematics and Natural Sciences of Bergen University and compared them with the
ratings obtained by peer review. They found that correlations, despite of being positive and signicant, were
weak. The Pearson correlations obtained were between
0.28 and 0.48.
Van Raan (2006) used the results of an evaluation of 147
university chemistry research groups in the Netherlands and
looked for a relation with the h-index and the CWTS eld
normalized citation impact indicator to test for correlations.
Just a few studies have been made at the individual level.
On the one hand, this may be explained by the small number
of documents associated with most individual researchers
and the shortcomings of the statistical analysis when the
assumptions of most statistical methods are not met. On
the other hand, bibliometric indicators for measuring impact
are normally based on the citations and the distribution of
the citations tends to be skewed. When normalized indicators are used for large institutions like universities, this
effect may not affect the results substantially, but in the case
of low numbers of published documents, this may bias the
results.
The relation between bibliometric indicators and the
results obtained through peer review, at this level, was
studied by Nederhof and van Raan (1987). They considered
the selection of candidates to a doctorate degree in physics
to nd that the 13% that received an honors degree labeled
cum laude were the most productive and obtained more
citations in the 2 years after completion of the doctorate.
Bornmann and Daniel (2006), studied the relation
between peer review decisions and citation counts for the
applicants to the Boehringer Ingelheim Fonds (BIF) fellowship. The authors concluded that for articles published by
approved applicants the number of citations expected is
higher than for articles published by rejected applicants.
Bornmann and Daniel (2007) analyzed the scientic production of applicants to a long term fellowship using some
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYMarch 2014

DOI: 10.1002/asi
561
bibliometric indicators and compared the data with the decisions of the peers in the selection process. They observed
that those awarded the fellowship published more documents than those rejected and that their documents obtained
more citations. These observations support those obtained
by Nederhof and van Raan (1987).
Bornmann, Wallon, and Ledin (2008), considered the
applicants to the Long-Term Fellowship and the Young
Investigator of the European Molecular Biology Organization and studied the relation between the scientic performance of the applicants and the decisions made by peers.
The authors found that, on average, the approved applicants
have better scientic performance than rejected applicants
when quantitative and impact indicators are used to describe
performance.
Some general ndings were:
Positive and statistically signicant correlations were
observed between peer review and selected bibliometric
indicators.
The use of indicators as supporting instruments in peer review
appears to be justied by these studies.
Just a few papers have been published on peer review as

a validation tool of bibliometric indicators. However, there
are factors other than quantity or impact that may inuence
the evaluation undertaken by peers. Several studies have
shown that the absence of agreement among peers is one of
the limitations of this method (Bornmann & Daniel, 2005;
Jayasinghe, Marsh, & Bond, 2001, 2003, 2006; Reale et al.,
2007; Wood, Roberts, & Howell, 2004). Potential biases
were also identied in peer review approaches. In grant
proposals for example, the academic title, status of the university, age of the applicant, and other variables were identied as potential biases (Jayasinghe et al., 2003; Marsh,
Bonds, & Jayasinghe, 2007; Reale et al., 2007). These
studies do not allow for general conclusions about what
constitutes potential bias. We can say that potential biases
exist, but the way it inuences the nal rating depends on
factors such as the scientic area and the scientic culture of
the system that is being evaluated. In peer review of grant
proposals it was found that the number of proposals assessed
by peers is important. Peers who evaluated three or more
proposals provided results that are more reliable and valid
(Jayasinghe et al., 2003). Studies that have as their main
goal the validation of bibliometric indicators using the outcomes of peer review should consider these limitations of
peer review.
In this article we consider peer decisions in a collection
of academic job openings in Portuguese universities and
look for ways to design a model based on bibliometric
indicators that may emulate these results. The peer decision
takes the form of a ranking of the candidates and we
examine the ability of bibliometric indicators to reproduce
this ranking. Considering the weighting that scientic performance has in these selection processes, we looked for a
562
set of indicators that may be thought to be implicit in the

judgments made by peers. The ranking of candidates in this
type of job opening cannot be performed using only a simple
model and avoiding peers. However, we consider that if
a model based on bibliometric indicators is feasible, the
model as a complementary instrument can help peers in the
evaluation of the applicants and formulation of the nal
decisions.
We select a relatively large number of indicators that may
be calculated for each applicant to nd which indicator or
combination of indicators best predicts the peer decisions.
The major questions addressed are as follows:
Is it possible to dene a model based on bibliometric indicators to predict the nal decisions made by peers?
Is the model robust with respect to the particular set of cases
used to parameterize it?
What is the inuence of each indicator on the nal decisions
made by peers?
How well does this model describe the nal decision of the
peers?
This paper is organized as follows: The next section,

Material and Methods, describes the indicators and the
method later used in the construction of the model;
the section Results and Discussion describes all steps of
the development of the model and its characteristics; Conclusions, encompasses the main ndings of our study. In
Appendix A, B, and C a detailed description of the statistical
methodologies and results is provided.
Material and Methods
Data Set
A set of selection processes that took place between 2007
and 2011 for Associado and Catedrtico Professor at the
universities of Porto, Lisbon, Technical University of
Lisbon, New University of Lisbon, Coimbra and Minho was
used in this study. The areas of chemistry, physics, biology,
mathematics, mechanics, geology, and computer science
were considered. Our work involves a total of 27 selection
processes with 174 candidates who published a total of
7,654 documents indexed in the Web of Science (WoS) in
the 10 years up to their application.
In Portugal, peer review is used to select among several
candidates the one or the ones thought to be best suited for
the vacancy(ies). The panel members are required to assess
different dimensions of performance but scientic production is generally assumed to be dominant.
We considered publications indexed in WoS in the last
10 years before the application for two reasons: rst, the
assumption that most panel members pay more attention to
the most recent activity of the candidate and, second, all 174
applicants had at least 10 years of activity after doctorate.
Normalized indicators were used in the study. For the
normalization, large sets of information were needed. All
the information was retrieved from the WoS using the

DOI: 10.1002/asi
instruments available online. The whole set of documents

with addresses in the EU_15 countries were used. Some of
the indicators considered here will require data that are not
currently available in the major databases in an easy way.
However most bibliometric analysts resort to this type of
method for comparisons across different scientic areas and
it appears that no alternative exists. Maybe in the near future
these types of data will be provided by the databases as they
include now the h-index to make life easier for the individual
user.
Data Collection
For each applicant, all documents indexed at the WoS in
the 10 years up to the application were retrieved. Search
queries, based on the different names used by the candidates
to author the documents, the institutional afliation in the 10
years considered and other information about the publications, were developed and used for retrieving the data from
the WoS. The information was then compared with that
given by the candidates in their curricula vitae at the time of
the application.
The WoS was chosen because of the broader coverage of
sources in the relevant scientic areas. The Scopus database
could also have been used, but our method for searching the
publications of a given candidate is based on the authors
address(es) and Scopus has several limitations as has been
demonstrated (Vieira & Gomes, 2009).
the number of documents indexed in WoS (or in other databases), this being possibly weighted by the type of document
or the number of authors. There is a growing literature on
special techniques for counting documents beyond normal
counting where each author of the publication gets one publication. In rst author counting, only the rst author gets a
full publication ( Cole & Cole, 1973 ); in proportional counting, a fraction of the publication is attributed to each author
taking into account his or her position in the authors list
(Hagen, 2008; Hodge & Greenberg, 1981; Van Hooydonk,
1997); in fractional counting each author gets 1/a publications, a being the number of authors in the publication
(Burrell & Rousseau, 1995). We adopted fractional counting
to compensate for the frequently perceived trend of adding
names to the author lists of individuals who contributed very
little to the work being reported. This should not be seen as
penalizing scientic collaboration as each collaborating
team and each collaborating researcher still gets
his fair share of the publication credit and the premium
from counting the publications that only the joint work
allowed.
Number of Documents Fractioned (NDF)
A document j with aj authors is counted as 1/aj and the

total production of the author is computed by fractional
counting as:
P
1
a
j =1 j
NDF =
Selection of Indicators
Peers are asked to evaluate the scientic performance of
the applicants but very little is, normally, said about the
criteria to be used. They are then expected to use personal
criteria based on their own interpretation of international
standards. In assessing the quantity and quality of the scientic production, each panel member may be expected to
use personal criteria based on his or her personal experience
and expectations. In approaching this problem, we tried to
select those indicators that describe different dimensions of
the scientic performance of a given researcher. Several
indicators have been developed over the years and we were
forced to limit our choice for this study. All indicators
present advantages and limitations making the selection
process even more complex. In fact, some studies of this
kind have been undertaken, but just a few indicators are
normally used. The alternative was to choose a set of bibliometric indicators and to show the potential of those bibliometric indicators as a complementary instrument in peer
review. We are looking for indicators that are implicit in peer
judgments, that may be used to describe scientic performance and that allow a fair comparison of researchers. Indicators such as the total number of documents and citations
were not used as they were not adequate to compare
researchers in different areas.
Quantity of scientific production. Indicators that quantify
the scientic production of a given researcher are based on
(1)
where P is the total number of documents indexed in the

WoS that carry the name of one particular researcher.
Impact of the scientific production. Normally, the impact
that the research activities have on the scientic community
is used as a proxy for quality and we adopted here the same
concept. Several very different indicators have been developed to measure impact. Here we adopted the following.
The Normalized Indicator for Researchers (NIR)
The NIR is a normalized average number of citations

per document, considering the variability of the number of
citations with the subcategory of the journal where it is
published.
Consider a researcher who, in a certain period, has
authored P documents. Consider that document j of type x,
was published in year y in a journal that belongs to subject
categories i = 1, 2, . . . N (in the classication of Thomson
Reuters for documents and journals). If in a given period the
document obtained Cj(xyi) citations, this number is normalized to obtain a corrected number of citations:
C nj ( xyi ) = C j ( xyi )
I xy
1
N
k =1 xyk

DOI: 10.1002/asi
(2)
563
where Ixyk is the average number of citations of the documents of type x, published in year y in all journals of subject
category k and Ixy is the average number of citations of all
documents of type x published in year y.
The quantity Ixy may be calculated as:
I xy =
M xyk I xyk
M xy
(3)
The hnf index
where Mxyk is the number of documents of type x and subcategory k that were published in year y and Mxy is the
number of documents of type x published in year y in all
journals of subject category k.
Indicator NIR is then calculated as:
NIR =
1 P n
C j xyi
P j =1 ( )
(4)
For the normalization just described we considered only

documents with at least one address from one of the European Unions 15 countries (EU_15). As showed by King
(2004), the culture of publication differs markedly among
regions. This may be because of the different institutional
context. The institutional contexts in North America, in
Europe, and in East Asia are very different and consequently
the parameters that affect the scientic career of researchers
are also different. We apply this model to Portugal, a small
country with a very open research system having about 70%
of its international coauthoring with the EU_15. Of course,
we can easily check that 22% of the UE_15 documents are
coauthored with someone from another region thus creating
a link, arguably weak, with those other cultures. However
we suggest that this benchmarking universe is irrelevant for
the goals of this paper as the novel method proposed here is
independent of the choice.
Articles, reviews, proceeding papers, letters, meeting
abstracts, and editorial material were the type of documents
used. As we are considering publications of the last 10 years
before the application, a decreasing citation window was
used. That is, for an opening that took place in 2007 we
considered the publications of each applicant between
1998 and 2007. For documents published in 1998 we
counted the citations between 1998 and 2007, for documents
published in 1999 citations were counted between 1999
and 2007.
Other normalized indicators exist, such as the CPP/
FCSm (CWTS eld normalized citation impact indicator)
and the MNCS (mean normalized citation score). These are
certainly the best known. The CPP/FCSm was not used
because normalization is not made document by document.
In the calculation of the MNCS the normalization is undertaken document by document, but the indicator only gives
information about the relative position of the documents in
relation to the world average. In the NIR the normalization is
made document by document treating each document
separately as it is in fact an independent piece of work. The
564
average number of normalized citations per document is

given. The normalization adopted also allows the valorization of those types of documents published in subject categories characterized by a small number of citations per
document. In these situations the impact is low because the
culture of citation is characterized by low citation trafc and
not due to lack of quality of the documents.
Indicators as the NIR might be inuenced by the presence of documents with a large number of citations or by a
considerable number of documents with zero citations. The
hnf index is immune to these documents and can constitute a
good alternative to the NIR. In fact, there are several variants of the h-index, but only the hI (Batista, Campiteli,
Kinouchi, & Martinez, 2006) and the IQp (Antonakis &
Lalive, 2008) take into account eld dependence. In the hI
the h-index of a given author is divided by the average
number of authors per document determined for the documents in the h-core. However this indicator is strongly
affected by the presence of outliers. Normalization of the
IQp considers only the three subject categories where the
authors get more citations and the normalization is not
made document by document. To be fair to individual
researchers, the performance index should compensate
for the citation cultures of different scientic elds and
this is a recognized failure of the original h-index. The hnf
index compensates for this by the normalization of the citation count. It is well known that the average number
of authors in papers is growing rapidly (Ioannidis, 2008;
Papatheodorou, Trikalinos, & Ioannidis, 2008; Weeks,
Wallace, & Kimberly, 2004), probably because of the
pressure to publish. The other factor for this growth is probably the increased national and international collaboration
and generating synergies that raise both the number and
impact of the publications. Fractioning methods have been
used to withdraw the incentives of articial or pseudocollaborations without introducing disincentives to the true
collaboration. The h-variant adopted here, hnf, allows fractional counting, after the normalization to the EU_15
average. This normalization is performed for each document thus comparing the performance of each document
with the average performance of the documents that belong
to the same subject category, document type, and year of
publication. Each document is treated separately as it is in
fact an independent piece of work. The normalization in the
hnf is the same as in the NIR. A detailed description of this
indicator can be found in Vieira and Gomes (2011).
On the hnf and the NIR we did not remove the selfcitations as it would require a complex analysis of the whole
set of data.
Percentage of Documents Cited (PDC)
The PDC gives the fraction of the authors P documents

that have already been cited, the citations being collected

DOI: 10.1002/asi
between the year of publication of each document and the

application date.
PDC =
Pc
100
P
(5)
where Pc is the number of documents cited.

This indicator gives an idea of the proportion of documents that contribute to the global impact.
Percentage of Citing Documents (CD)
The use of the citation count as an indicator of impact of

a paper inuenced authors themselves and this may be
avoided by discounting individual self-citations. A study by
Bonzi and Snyder (1991) showed that the reasons for an
author to cite his or her own work are similar to those for
other citations, being a natural part of scientic communication. The study by Aksnes (2003) showed that selfcitations represent a high percentage of the total citations in
the rst years after publication. For the set of data used,
about 45,000 documents, it was found that the maximum
yield is obtained for the third year after publication and that
the percentage decreases along the years besides this year. In
reality when using indicators with short citation windows
self-citation may inuence the results. At the individual
level, self-citation may be more problematic and this is why
we use the DC.
Considering the number of documents, TCD, that cite the
authors P documents, we calculate the CDws of those documents that do not carry the authors name and compute the
percentage CD as:
CD =
CDws
100
TCD
(6)
Percentage of Highly Cited Documents (HCD)
For the set of documents with at least one authors

address in Portugal, we retrieved from the WoS the
minimum number of citations of the documents that are in
the top 10% most-cited documents in each subject category.
The type of documents and the year of publication were
taken into account. This set was used as our reference to
determine the HCD of each applicant. This was adopted to
limit the universe of documents used and will not invalidate
our results.
For contests that took place in 2007 the scientic production of the applicant between 1998 and 2007 was
considered. The Top 10% was calculated considering the
documents published between 1998 and 2005 and the citations obtained between the year of publication and 2006.
The documents published in 2006 and 2007 were not considered as the period to count citations is too short. With this
information we counted, for each candidate, the number, P10,
of documents in the top 10% in each subject category and
calculated the percentage.
HCD =
P10
100
P1
(7)
P1 here is the number of documents published in 10 years

minus the documents published in the last 2 years before
application.
This indicator deserves attention as the publication rate
is augmented and the competition among researchers
increases. It is considered to be an indicator of excellence.
Impact of the publication source. The effort required to
publish the results of research activities in a given journal
depends on the prestige and perceived quality of that journal.
Journals considered to be the best in their eld or subject
category tend to have lower acceptance rates. This justies
the taking into consideration of the prestige and impact of
the journals where the candidates published their results.
Several indicators have been developed to this end. Here, we
used the source normalized impact per paper (SNIP) (Moed,
2009) and the SCImago Journal Rank (SJR) (SCImago,
2007) of the journals. Both consider the different cultures of
publication and citation among elds but the SJR goes
further to consider the impact of the citing journal. The SNIP
and SJR values were taken from Scopus, which has coverage
of sources different from that of WoS, but this will not
invalidate the nal results. The journal impact factor (IF)
(Reuters, 2013) available in the Journal Citation Reports
(JCR) was not considered here due to its weaknesses
(Bornmann, Marx, Gasparyan, & Kitas, 2012). Obviously,
there is no perfect indicator and SNIP and SJR have also
some limitations, but their normalization by subcategory
allows a comparison across scientic domains.
The SNIP and SJR were retrieved from Scopus for the
year of publication of each document. As their distributions
are skewed, the median value for each candidate was
determined.
SNIPm = Md (SNIP )
(8)
SJRm = Md(SJR)
(9)
Prestige of the affiliation institution. The prestige of the

institution to which the applicant is afliated may inuence
the decision of the selection panel. As a proxy for prestige,
we use two indicators taken from the SCImago Institutions
Ranking, the Normalized Impact (NI) and the High Quality
Journal (Q1) (SCImago, 2011). Other indicators are now
available in SCImago that could also be tested.
Collaboration. In this last dimension, we attempt to assess
the practice of collaboration of the candidates by the number
of coauthors in the papers and by the fraction of the documents with international collaboration as measured by the
presence of more than one country in the address.
Median number of authors per document (NAm)

DOI: 10.1002/asi
565
The NAm is a normalized median number of authors per

document, considering the variability of the number of
authors in the subject categories.
Consider a researcher who, in a certain period, has
authored P documents, each with a number of authors a1,
a2, . . ., aj, . . . , aP. Let us assume that document j of type
x was published in year y in a journal that belongs to subject
categories i = 1, 2, . . . N the (classication of Thomson
Reuters for journals and document types was applied). The
number of authors, NAj(xyi), is normalized as:
NAnj ( xyi ) = NA j ( xyi )
MNAxy
1
N
k =1
MNAxyk
(10)
where MNAxyk is the average number of authors of the documents of type x, published in year y in all journals of subject
category k, MNAxy is the average number of authors of all
documents of type x published in year y. If, Mxy is the
number of documents of type x published in year y in all
journals of subject categories k, then
MNAxy =
M xyk MNAxyk
M xy
(11)
U nj = Vnj + nj
and the median NAm is obtained as:
NAm = Md ( NA
n
j ( xyi )
(12)
For normalization of NAm we considered all the documents

of all the candidates in our data set. This approach was
adopted to limit the universe of documents to be analyzed and
will not affect the validity of the arguments and conclusions.
The normalization was performed document by document.
Percentage of documents with international collaboration
(DIC)
DIC =
Contingency tables and Pearsons correlations have also

been applied. The study by Reale et al. (2007) used the
ordered logistic regression.
In Portuguese universities, academic panels are required
to rank applicants for a certain position. The goal of this
study is to design a model based on bibliometric indicators
and validate it against recorded peer review. This model
should have a certain predictive potential: if two or more
candidates apply to a certain position, we should be able to
predict the result with a given degree of accuracy.
Considering the features of our data set, we consider that
discrete choice models, more precisely, the rank ordered
logistic regression (ROLR) will perform best. Our starting
point is a set of rankings with a varying number of candidates per academic opening, in different areas. Each ranking
was dened by a certain panel. Comparisons between the
positions in different rankings (for different openings) are
not allowed. This does not allow the application of any of the
methods in previous studies.
The basic concept of the ROLR is that the panels rank a
set of applicants taking into account the so-called utility
associated with each applicant. Peers will choose rst the
applicant with the highest utility. Utility is dened by:
PIC
100
P
(13)
where PIC is the number of documents with at least one

international collaboration and P is the total number of the
documents of the applicant.
The increasing specialization and the technical sophistication of the research activities require that researchers look
for collaborations. Peers might see collaboration as an effort
to develop research of high quality and this is why we select
the NAm and the DIC.
(14)
where Vnj = b.xnj is a linear function of the observed explanatory indicators (xnj) and b the vector of coefcients. The enj
represent the factors that inuence the utility, but are
unknown to the analyst. The probability of an applicant, i, to
be selected in rst place by the peer panel, n, within each
contest with j applicants is given as:
Pni =
exni
j exnj
(15)
A detailed description of the ROLR is given in Appendix A.

As it will be shown in the section Results and Discussion,
the indicators selected are not independent. The coefcients
determined in the ROLR will be affected by the presence of
multicollinearity. Some strategies were adopted to deal with
with this problem. One of the strategies adopted was the
application of factor analysis. This technique allowed
obtaining a set of variables that are independent and that
were used as input data in the ROLR. A short description
of factor analysis and of the steps carried out is given in
Appendix A.
Robustness of the Models
Theoretical Background of the Statistical Method

As stated earlier in the Introduction section, several authors
have attempted to nd a relationship between peer review
and bibliometric indicators. A simple analysis of the Spearmans coefcient has been the most frequently used measure
of the correlation between peer judgment and indicators.
566
We used a set of 27 job openings with 2 to 11 applicants

each. Because of the collinearity of some of the variables
considered earlier, a reduced set of variables was picked to
dene each model as described below in the Results and
Discussion section. When a set of variables is adopted, their
coefcients are determined by the maximum likelihood
function for the observed rankings in each of the 27 cases.

DOI: 10.1002/asi
The model may be tested for the reproduction of the

multiple choices in the 27 rankings. A better test would be
the application of the model to predict the result of new
openings. As this is not possible, a resampling method,
the jackknife, was adopted to study the robustness of the
model. The idea behind this method is to determine the
required parameters (in our case the coefcients of each
independent variable with signicant impact) after the
removal of one observation (the ranking of 1 of the 27
openings) at a time from the initial data set. In each
instance, the parameters were estimated (using the remaining 26 rankings) and then the new adjusted model used to
predict the probabilities for each of the applicants to be
rst in the opening not considered. This was done for
all 27 openings. The probabilities obtained for each candidate using this method were compared with those
obtained when the whole data set is used. This permits
an evaluation of the robustness and stability of the
model.
Dependence on Significant Bibliometric Indicators

Small variations in the most important indicators produce
changes in the utility that can be measured in a simple way.
Stated otherwise, we studied the inuence of the indicators
(with signicant impact) using predictive margins and marginal effects.
The predictive margins were calculated for several
situations:
For the indicators set at their average value.
For the indicators set at the value of the percentiles (10%;
25%; 50%; 75%; 90%; 99%).
Marginal effects were calculated as:

The marginal change in Vnj for a change in xnj
Vnj xnj (
)=
Vnj
xnj
(16)
The relative (or percental) change in Vnj due to a relative (or

percental) change in xnj
eVnj exnj (
Vnj xnj
xnj Vnj
)=
(17)
A proportional change in Vnj for a change in xnj
eVnj xnj (
)=
Vnj 1
xnj Vnj
(18)
A change in Vnj for a proportional change in xnj
nj exnj (
)=
Vnj
xnj
xnj
(19)
Relation Between Peer Review Judgments and the

Model Forecasts
For the set of openings considered, we determined the
number of times the candidate placed in the rst position by
peers was the one with the highest probability of being
chosen rst or with the highest utility.
As we consider that more information is available from
the results given by our models we also compared the model
forecasts and the actual results of the peer selection process
for each opening in the following way. For each pair of
applicants, in each opening, the model allows the forecast of
the one that will come out in a higher position. For our data
on 27 openings we considered 426 pairs of applicants.
Notice that the relative positions of the candidates ranked
5 and below (when more than four candidates apply) will not
be considered in the evaluation of the model as it has been
suggested that panels tend to pay less attention to the
ranking given as they go down in a large set of applicants. A
detailed description of this subject and how our dependent
variable was dened is given in Results and Discussion. This
is the reason why we drew in Figure 1 just the pairs formed
by each of the candidates 1 to 4 to all other applicants. In the
example given in Figure 1 we get 14 pairs.
Pairs
Ranking
1
1-2
1-3
1-4
1-5
1-6
2-3
2-4
2-5
2-6
3-4
3-5
3-6
4-5
4-6
FIG. 1. Illustration of the counting of pairs of applicants for a ranking obtained in a given position opening. [Color gure can be viewed in the online issue,
which is available at wileyonlinelibrary.com.]

DOI: 10.1002/asi
567
Results and Discussion
TABLE 2. Factor loading and communality values obtained in the factor

analysis. Only factor loadings above |0.3| are shown.
The results obtained in the different steps necessary

to dene our model are now presented and discussed.
The descriptive statistics of our data are presented in Appendix B in Table B1
Definition of the Model
Multicollinearity occurs frequently in the data set. When
this happens it is difcult to determine which of the independent variables best explains the dependent variable. In
the presence of multicollinearity the coefcients estimated
may suffer large variations with small changes on the model
or data. Multicollinearity can be evaluated using the Variance Ination Factor (VIF) and the Condition Index (CI)
(Maroco, 2007 ). These two parameters were calculated. The
values obtained for the VIF are presented in Table 1 for the
12 variables considered.
In Table 1 we can see that the variables NDF and hnf have
the highest VIF. The value of 10 is suggested in the literature
(Pestana & Gageiro, 1998) for the cutoff point. The variable
NDF has a VIF not far from 10 and the variable hnf has a VIF
larger than 10 suggesting that multicollinearity is present.
The value obtained for the CI was 8.12, lower than the cutoff
point (15) suggested in the literature (Pestana & Gageiro,
1998). The results obtained for VIF lead us to conclude that
multicollinearity can be a problem in our analysis. To deal
with the problem three strategies were considered:
To group the indicators using factor analysis (Strategy 1)
To eliminate the variable hnf from our set of indicators
(Strategy 2).
To eliminate the variable NDF from our set of indicators
(Strategy 3).
For each of these cases one model was dened and fully
evaluated.
TABLE 1.
Variable
NDF
HCD
CD
PDC
hnf
NIR
Value obtained for the variance ination factor (VIF)

VIF
Variable
VIF
8,09
2,47
1,17
2,10
10,62
3,06
SNIPm
SJRm
NI
Q1
DIC
NAm
1,51
1,28
2,19
2,08
1,26
1,39
Output of Factor Analysis

Factor analysis resulted in three dimensions that explain
71.4% of the total variance in the data set. Three variables
were excluded from the analysis. A detailed discussion is
presented in Appendix B.
568
Factor loadings
Variable
hnf
NIR
NDF
PDC
HCD
Q1
NI
CD
SJRm
Factor 1
Factor 2
0.929
0.855
0.796
0.758
0.675
Factor 3
-0.333
-0.300
0.917
0.893
0.739
0.718
Communality after
Extraction
0.901
0.772
0.773
0.672
0.511
0.844
0.818
0.553
0.579
Factor 1 explains 32.7% of the total variance and hnf, NIR,

NDF, PDC and HCD are the variables that have the highest
factor loadings (the correlation between each variable and
the factor). Factor 2 explains 18.9% of the total variance and
the variables with the highest factor loadings in the factor are
Q1 and NI. Factor 3 explains 15.3% and CD and SJRm are
the variables with the highest factor loadings in this factor.
This can be observed in Table 2.
Having dened the structure, we can now attempt to label
our factors:
The indicators with the highest factor loadings in Factor 1 are,
essentially, related with the contribution of the candidate to
the scientic knowledge. The dimension can be labeled as
Personal Contribution (PC).
The most important indicators in Factor 2 are related with the
visibility of the institutions where the scientic activities of
the researcher were performed. This dimension can be entitled
as Prestige of the Institution (PI).
The indicators with the highest factor loadings in Factor 3 are
related with the scientic community that shares the same
interests as the researcher. This dimension can be entitled
Scientific Community (SC).
The SNIPm, DIC, and NAm were excluded with the application of factor analysis as they do not meet some of the
criteria. However, the indicators were used with the output
from factor analysis as input data in the ROLR.
With the strategies adopted, multicolinearity was
eliminated. The results are presented in Table B3 in Appendix B.
For ROLR to be applicable, the relative probabilities of
two alternatives must be independent of all other alternatives, the so-called independence of irrelevant alternatives
(IIA) must be veried. From the discussion in the method
section it follows that:
Pni
= eVnj eVni
Pnj
(20)
This means that the ratio is independent of alternatives other

than i and j, that is, independent of the irrelevant alternatives.
This property must be checked for our data to avoid the

DOI: 10.1002/asi
inconsistency of the coefcients for, which we adopt

the Hausman test (Hausman, 1978). This test compares the
coefcients estimated using different choice sets. One
choice set considers all the alternatives available and the
coefcients estimated are said to be consistent and efcient
if the null hypothesis is veried. Then the coefcients are
estimated using a choice set where some alternatives are
deleted. The Hausman test determines the differences
between the coefcients estimated for the two data sets and
checks whether the differences are systematic. If the differences are not systematic (null hypothesis) the IIA property is
considered to be satised.
To test the IIA hypothesis we started by using the whole
data set, that is, the complete rankings for all known openings and comparing the coefcients obtained for this full
data set with those obtained when it is reduced by the elimination of some of the applicants. We observed that the variables with signicant impact in the subsets differ from
those where the whole data set is used (see Appendix C,
Table C1). Two reasons may explain this behavior:
It is reasonable to assume that the peers in the selection
committee will put more effort to rank the applicants they
consider better than those they expect to be relegated for
lower positions. After all, only some of these candidates (normally one or two) are to be accepted. The lower section of the
ranking is irrelevant for the institution, although the applicants in those positions may value highly their individual
result. This suggests giving a higher weight in the statistical
regression to the higher positions in the rankings.
The number of applicants per opening varies widely. In the
present study we have ve openings with 10 applicants and 21
openings with at least three applicants. As the number of openings with many applicants is small, this may lead to an efciency
loss with the method. The distribution of the applicants in the 27
openings is given in Appendix C in Table C2.
ratio index (Train, 2009). This index measures how well the
model ts the data and is calculated as:
= 1
log l ( )
log l (0 )
(21)
The log l (b) is the log likelihood function with the parameters estimated and the log l (0) is the log likelihood function
of the null model (without parameters). The value varies
between 0 and 1 and values close to 1 mean that the parameters estimated can be used to represent the choices made by
the peers. The results are presented in Figure 2.
When a person is asked to rank a set of alternatives, the
parameters of the model and the choices can be estimated
more efciently than when asked to choose only the most
preferred alternative (Fok, Paap, & Van Dijk, 2012). This
may explain why the index r increases from B1 to B2. The
index is higher for B2 than for any other dependent variable.
It decreases slowly up to B4 and then it is reasonably stable
with much lower values for B5 to B11 even when the
number of variables with signicant impact is higher for
some of the cases. As we go from B2 to B4 the number of
openings decreases markedly and this may be the reason for
the small decrease observed in the index. The overall picture
of the situation conrms the expectation raised above that
panel members put more effort in the decision about the rst
few positions in the ranking. The marked decrease of the
likelihood index as we go beyond B4 suggests that we
should fully consider the rankings up to the fourth position
and this may give all other applicants an ex-aequo position.
After choosing the most appropriate dependent variable
the Hausman test was performed showing that the IIA
hypothesis is veried, p > .05. Taking into account that the
IIA holds the parameters of each model were estimated and
listed in Table 3.
To overcome this problem the dependent variable (ranking)

was changed. Initially we have several contests with different
numbers of candidates (maximum 11 candidates), each candidate ranked according to the judgments made by peers. Our
dependent variable has values in the range 1-11. We changed
our dependent variable in different ways:
In each ranking the rst position is labeled 1 and the remaining positions are labeled 0. This is the same as asking the
peers to pick only the preferred applicant for each opening.
Let us call this the B1 dependent variable.
In each ranking the rst and second position are labeled 1 and
2, respectively, and the remaining positions labeled 0 to indicate that their ex-aequo position. The peers in the selection
committee are now asked to select two applicants and to rank
them rst and second. This is the B2 dependent variable.
The same process is repeated adding each time a new position
to produce dependent variables B3 to BT.
To decide which of the dependent variables is more

appropriate for our analysis we calculated the likelihood
FIG. 2. The likelihood ratio index when different dependent variables

(B1BT) are used. [Color gure can be viewed in the online issue, which
is available at wileyonlinelibrary.com.]

DOI: 10.1002/asi
569
TABLE 3.
Model
Parameters estimated for each model.

Independent variable
Coefcient
Standard deviation
P > |Z|c
LR chi2a
Prob > chi2b
PC
hnf
HCD
NDF
HCD
NAm
1.06
0.40
0.032
0.11
0.044
0.19
0.21
0.09
0.01
0.02
0.01
0.09
5.06
4.61
2.45
4.58
3.47
2.07
<0.001*
<0.001*
0.014*
<0.001*
0.001*
0.038*
33.38
40.61
<0.001
<0.001
40.99
<0.001
1
2
3
Note. aLR chi2is the likelihood ratio chi-square test that at least one independent variables coefcient is not equal to zero in the model.
Prob > chi2Probability of obtaining the LR chi2 if there is no effect of the independent variables. This is compared to a specic alpha level, to accept the
type error I, which is set at 0.05 or 0.1.
c
Z and P > |Z|statistics test and p-value, respectively, related with the null hypothesis that an individual independent variables coefcient is zero.
*signicant at the 0.05 level.
b
The likelihood ratio is similar for Models 2 and 3 and is

the lowest for Model 1. For the three models this index is
signicant and allows us to reject the null hypothesis that the
coefcients of all the independent variables are equal to
zero.
Table 3 shows that all the variables with signicant
impact have a positive effect on the probability of a given
candidate to be ranked rst, this meaning that the probability
increases with the increase of these variables.
We found that the NI and the Q1 indicators do not have a
signicant impact on peer judgments. However, this result
should be considered with caution as we are dealing with
only a few institutions. In the set of applicants analyzed,
50% have as their afliation Universidade do Porto. A larger
set with more heterogeneous applicants with respect to the
afliation will be needed to clarify the real importance of the
prestige of the afliation in peer decisions.
Taking into account the estimated parameters we can
determine the probability that each candidate will be ranked
rst. For each model this probability was determined using
the following expressions:
Model 1.
e1.06 PCni
j e1.06 PCnj
(22)
e 0.40 hnfni + 0.032 HCDni

j e0.40hnfni +0.032 HCDni
(23)
e0.11NDFni + 0.044 HCDni + 0.19 NAmni

j e0.11NDFni + 0.044 HCDni + 0.19NAmni
(24)
Pni =
Model 2.
Pni =
Model 3.
Pni =
Having nally established our models for the prediction

of the rankings, we go on to study some of the characteristics
of these models.
570
The Robustness of the Models

The models proposed were obtained from our data set of 27
openings. To test the robustness of our models we used the
jackknife method. We compare the probability (for any
applicant to attain rst position) calculated with the full data
set with that obtained after the elimination from the data set
of the opening for which he is applying. This is shown in
Figure 3.
The two probabilities are very similar and the Pearsons
coefcients are high and signicant (p < .001). The nal
conclusion is that the models are robust and this is a very
impressive result if we consider that we include in our data
sets job openings in different scientic elds, in different
universities, and for two of the top levels in the academic
career system.
Choice of the Model That Best Explains the Decisions

Made by Peers
The ROLR is tted by the maximum likelihood criterion.
For each model the value of the log likelihood function can
be used in the calculation of the Akaike Information Criteria
(AIC) (Wagenmakers & Farrell, 2004). The AIC is a model
selection procedure that takes the form of a penalized likelihood (a negative likelihood plus a penalty term). Among
several models, the best approximation model is chosen as
the one that gives the lowest value for the AIC. We can also
calculate that the Akaike weights that are to be interpreted as
the probability of each model be the best among all the
candidate models.
The results presented in Table 4 suggest that Model 2 has
the highest probability of being the best among the models
considered. Model 2 is (about) twice as likely to give the best
explanation of the decisions made by peers compared with
Model 3, and 14 times more likely compared to Model 1. The
conclusion that Model 2 is likely to be better than Model 1
was expected from Figure 2 that shows that the likelihood
ration index for Model 1 is the lowest among the three
models. For Models 2 and 3 we see that the likelihood ratio
index is the same, thus the value of the log likelihood function
is almost the same for our dependent variable (B4). When we

DOI: 10.1002/asi
are tting a model to a data set by the maximum likelihood

criterion, the goal is to obtain the best model using the
smallest possible number of independent variables. This is
what AIC gives us. As stated earlier, this measure has a
penalization term that is related with the number of signicant independent variables in the model. As the number of
signicant independent variables increases in the model, the
effect of the penalization term increases. In our case we have
two models with almost the likelihood value function, but the
number of independent variables in Model 3 is higher than in
Model 2. This explains the different values observed for the
AIC, obtained for Model 2 and 3.
Dependence on the Significant Indicators
Having chosen the model, it is now important to study
how changes in the bibliometric indicators inuence the
results. This can be studied using predictive margins and
marginal effects as mentioned in the section, Material and
Methods. This allows us to determine the inuence of the
indicators on the deterministic component. The determinist
component is given for each model by:
Model 1.
Vni = 1.06 PCni
(25)
Vni = 0.40hnfni + 0.032 HCDni
(26)
Vni = 0.11NDFni + 0.044 HCDni + 0.19 NAmni
(27)
Model 2.
Model 3.
FIG. 3. Correlation between the probabilities obtained with the jackknife

method and the probabilities obtained using the whole sample. [Color
gure can be viewed in the online issue, which is available at
wileyonlinelibrary.com.]
TABLE 4.
Model
Model 2
Model 3
Model 1
Values obtained for the AIC and the Akaike weights.

AIC
Akaike weight (%)
253.06
254.68
258.30
65.8
29.4
4.8
We consider the universe of 171 applicants to discuss the

relative effect of the variations observed in the individual
indicators. In Table 5 we show the values of the deterministic component, Vni, for certain values of the variables. The
percentile 10 means that all variables take their percentile 10
values. For example, an applicant A with his PC value at the
percentile 10 will get a deterministic utility of -1.67 and
another applicant B with a PC values at the median will have
a value of -0.03 for the utility, this corresponding to a probability of A being ranked above B of 16% (see expression
A.12 in Appendix A).
Tables 6 and 7 present the results for the marginal effects
as discussed in the Material and Methods section.
We determined the marginal effects for the average and
median values of each indicator.
To help the reader with Tables 6 and 7, we explain the
meaning of each value for Model 2 and variable hnf in
Table 6.
yx() = 0.40this means that the deterministic component
increases as the hnf increases, it will increase (assuming a
constant rate) 0.40 when hnf increases 1 unity.

DOI: 10.1002/asi
571
TABLE 5.
Values obtained for the deterministic component, Vnr.

Percentile
Model
10
25
50
75
90
99
Average
Model 1
Model 2
Model 3
-1.67
0.47
1.07
-0.77
1.15
2.05
-0.03
1.82
2.97
0.74
2.72
4.12
1.44
3.84
5.55
2.62
5.43
8.71
0.16
2.02
3.23
TABLE 6.
indicator.
Marginal effects determined using the average value of each
Model
Model 1
Model 2
Model 3
TABLE 7.
indicator.
Model
Model 1
Model 2
Model 3
Variable
yx()
eyex()
eyx()
yex()
PC
hnf
HCD
NDF
HCD
NAm
1.06
0.40
0.032
0.11
0.044
0.19
1.00
0.68
0.32
0.36
0.27
0.36
6.44
0.20
0.016
0.033
0.014
0.058
0.16
1.38
0.64
1.18
0.88
1.17
Marginal effects determined using the median value of each
Variable
yx()
eyex()
eyx()
yex()
PC
hnf
HCD
NDF
HCD
NAm
1.06
0.40
0.032
0.11
0.044
0.19
1.00
0.68
0.32
0.32
0.27
0.41
-33.11
0.22
0.02
0.036
0.015
0.063
-0.03
1.24
0.58
0.95
0.80
1.22
eyex() = 0.68this means that the deterministic component

increases as the hnf increases, it will increase 68% when
hnf doubles (or increase 100%, i.e., to twice the average
value).
eyx() = 0.20this means that the deterministic component
increases as the hnf increases, it increases 20% when hnf
increases 1 unity.
yex() = 1.38this means that the deterministic component
increases as the hnf increases, it increases by 1.38 when hnf
doubles (increase 100%, i.e., to twice the average value).
value individual characteristics of applicants as measured by

certain bibliometric indicators.
So far we have discussed the steps taken to dene certain
models, their robustness and the inuence of each indicator
on the deterministic component in the model. A rst simplistic assessment of the success of this project can be made
by comparing the predictions with the actual decisions of
peers in the selection committees. It was found that the
applicants placed in the rst position by peers were those
with the highest probability of being chosen rst in 56%,
52%, and 48% of the openings, respectively, for Model 1,
Model 2, and Model 3. This is a good result. In fact, in a
scenario without information about the applicants it is
expected that for 23% of the openings, on average, the
applicant selected in rst place is the same as decided by the
panel. Our model predicts the probability that any candidate
will be placed rst, but this is not the most interesting
measure of the performance of the model as different openings have a number of applicants that varies from 2 to 11.
The other option to assess the predictive power of our
models is to consider the total number of pairs of our sample
as described in the Materials and Methods section. Of the
426 pairs of applicants that can be identied in the 27 openings considered, Model 1 (described in detail earlier) was
able to predict correctly 75% of the orderings. For Model 2
we get exactly the same result and Model 3 is marginally
better with 76% success. These results far exceed our best
expectations as the data are very heterogeneous and the
decision process of the selection committees is more
complex than conceded here.
To nd the indicators that are implicit in the peer evaluations, we tted the models to the nal decisions. However,
there are several studies that demonstrated that a potential
bias is present in peer review (Marsh et al., 2007; Marsh,
Jayasinghe, & Bond, 2008). We wished to design a method
that is immune to these potential biases. Fitting our models
to peer review results may introduce some bias but this is
still the best that can be done, as peer review has a long
history and is well accepted by the scientic community,
despite of all the limitations.
Conclusions
How Good is this Model to Predict the Decisions

of Peers?
Starting with a set of 12 indicators that may be assumed

to describe major aspects of the scientic performance of a
researcher, we built three models that would, we hoped,
have some power to predict peer decisions. We are well
aware that human decisions and especially peer evaluation
of scientic merit cannot be reduced to a mathematical
algorithm. Given the enormous effort and cost of peer
assessment in all types of situations and with very different
goals, it is important to nd how far mathematical modeling can go in helping peer evaluation.
In this paper we proposed three models.
The nal goal of this exercise is to predict the decisions of

peers or, in other words, to understand what and how peers
Model 1 is dened by a composite variable that encompasses

the nine indicators.
We should stress that the coefcients in our models

cannot be compared among themselves as the variables are
not standardized; in fact, only PC is. This is the reason why
it is relevant to discuss the marginal effects as we did earlier
to understand the reaction of the model to variations in the
performance of the individuals.
572

DOI: 10.1002/asi
Model 2 is dened by two indicators, the hnf and the HCD,

both somehow related with the impact of the published
documents.
Model 3 is dened by three indicators, the NDF, HCD and
NAm representing quantity, impact and collaboration.
Among the three models, Model 2 is the one with the highest
probability (65.8%) of being the best model.
All the models are shown to be robust with respect to data
accretion or subtraction.
The applicants placed in the rst position by peers were those
with the highest probability of being chosen rst in 56%, 52%
and 48% of the openings, respectively for Model 1, Model 2,
and Model 3.
The predictive success of these models is around 75% if we
count success using pairs of applicants.
Out of the 3 models, Model 1 is probably the one that will

be disregarded. In addition to using nine indicators, its predictive power is near to that of Model 2 and 3. The values of
the AIC also suggest that this model is probably the worst in
explaining peer judgments. If the AIC is used to choosing
one model, the results showed that Model 2 should be preferred.
The sample used in this analysis is small (only 27 rankings), this being probably the main limitation of the study.
However, we consider that these results open the way for
studies at the individual level with larger sets, other indicators and databases. We suggest the use of these models
to complement peer review evaluations but not as a substitute method. There are several aspects that only peers can
assess. A simplistic measure of the success of these models
was used here, although studies aiming to explore in more
detail the predictive power of these models are being
performed.
Acknowledgments
We wish to acknowledge the nancial support from the
FCT (Foundation of Science and Technology), Portugal,
through a grant N SFRH/BD/75190/2010. We would like to
acknowledge Professor Paulo Guimares for help with the
discrete choice models and implementation of the resampling methodology in Stata.
References
Abramo, G., DAngelo, C. A., & Caprasecca, A. (2009). Allocative efciency in public research funding: Can bibliometrics help? Research
Policy, 38(1), 206215. doi: 10.1016/j.respol.2008.11.001
Aksnes, D. W. (2003). A macro study of self-citation. Scientometrics,
56(2), 235246.
Aksnes, D. W., & Taxt, R. E. (2004). Peer reviews and bibliometric indicators: a comparative study at a Norwegian university. Research Evaluation, 13(1), 3341. doi: 10.3152/147154404781776563
Antonakis, J., & Lalive, R. (2008). Quantifying scholarly impact: IQp
versus the Hirsch h. Journal of the American Society for Information
Science and Technology, 59(6), 956969. doi: 10.1002/asi.20802
Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3 rd ed.): Wiley.
Batista, P. D., Campiteli, M. G., Kinouchi, O., & Martinez, A. S. (2006). Is
it possible to compare researchers with different scientic interests?
Scientometrics, 68(1), 179189.
Bonzi, S., & Snyder, H. W. (1991). Motivations for citationA comparison

of self citation and citation to others. Scientometrics, 21(2), 245254.
doi: 10.1007/bf02017571
Bornmann, L., & Daniel, H. D. (2005). Selection of research fellowship
recipients by committee peer review. Reliability, fairness and predictive
validity of Board of Trustees decisions. Scientometrics, 63(2), 297320.
doi: 10.1007/s11192-005-0214-2
Bornmann, L., & Daniel, H. D. (2006). Selecting scientic excellence
through committee peer reviewA citation analysis of publications previously published to approval or rejection of post-doctoral research fellowship applicants. Scientometrics, 68(3), 427440. doi: 10.1007/
s11192-006-0121-1
Bornmann, L., & Daniel, H. D. (2007). Convergent validation of peer
review decisions using the h indexExtent of and reasons for type I and
type II errors. Journal of Informetrics, 1(3), 204213. doi: 10.1016/
j.joi.2007.01.002
Bornmann, L., Marx, W., Gasparyan, A. Y., & Kitas, G. (2012). Diversity,
value and limitations of the journal impact factor and alternative metrics.
Rheumatology International, 32(7), 18611867. doi: 10.1007/s00296011-2276-1
Bornmann, L., Wallon, G., & Ledin, A. (2008). Does the Committee
Peer Review Select the Best Applicants for Funding? An Investigation
of the Selection Process for Two European Molecular Biology Organization Programmes. Plos One, 3(10). doi: e348010.1371/journal.pone
.0003480
Burrell, Q., & Rousseau, R. (1995). Fractional counts for authorship
attributionA numerical study. Journal of the American Society for
Information Science, 46(2), 97102.
Cole, J.R., & Cole, S. (1973). Social stratication in science Chicago: The
University of Chicago Press.
Field, A. (2005). Discovering Statistics Using SPSS (2nd edition ed.): SAGE
Publications.
Fok, D., Paap, R., & Van Dijk, B. (2012). A rank-ordered logit
model with unobserved heterogeneity in ranking capabilities.
Journal of Applied Econometrics, 27(5), 831846. doi: 10.1002/
jae.1223
Franceschet, M., & Costantini, A. (2011). The rst Italian research assessment exercise: A bibliometric perspective. Journal of Informetrics, 5(2),
275291. doi: 10.1016/j.joi.2010.12.002
Hagen, N. T. (2008). Harmonic Allocation of Authorship Credit: SourceLevel Correction of Bibliometric Bias Assures Accurate Publication and
Citation Analysis. Plos One, 3(12). doi: e402110.1371/journal.pone.
0004021
Hair, J.F. Black, W.C. Babin, B.J. Anderson, R.E. (2010). Multivariate Data
Analysis (7th Edition ed.): Published by Prentice Hall.
Hausman, J. A. (1978). Specication tests in econometrics. Econometrica,
46(6), 12511271. doi: 10.2307/1913827
Hodge, S. E., & Greenberg, D. A. (1981). Publication credit. Science,
213(4511), 950950.
Ioannidis, J. P. A. (2008). Measuring Co-Authorship and NetworkingAdjusted Scientic Impact. Plos One, 3(7). doi: e277810.1371/
journal.pone.0002778
Jayasinghe, U. W., Marsh, H. W., & Bond, N. (2001). Peer review in the
funding of research in higher education: The Australian experience.
Educational Evaluation and Policy Analysis, 23(4), 343364. doi:
10.3102/01623737023004343
Jayasinghe, U. W., Marsh, H. W., & Bond, N. (2003). A multilevel crossclassied modelling approach to peer review of grant proposals: the
effects of assessor and researcher attributes on assessor ratings. Journal
of the Royal Statistical Society Series a-Statistics in Society, 166, 279
300. doi: 10.1111/1467-985x.00278
Jayasinghe, U. W., Marsh, H. W., & Bond, N. (2006). A new reader trial
approach to peer review in funding research grants: An Australian experiment. Scientometrics, 69(3), 591606.
King, D. A. (2004). The scientic impact of nations. Nature, 430(6997),
311316. doi: 10.1038/430311a
Maroco, J. (2007). Anlise EstatsticaCom utilizao do SPSS (3rd ed.):
Edies Slabo, LDA.

DOI: 10.1002/asi
573
Marsh, H. W., Bonds, N. W., & Jayasinghe, U. W. (2007). Peer review

process: Assessments by applicant-nominated referees are biased,
inated, unreliable and invalid. Australian Psychologist, 42(1), 3338.
doi: 10.1080/00050060600823275
Marsh, H. W., Jayasinghe, U. W., & Bond, N. W. (2008). Improving the
peer-review process for grant applicationsReliability, validity, bias,
and generalizability. American Psychologist, 63(3), 160168. doi:
10.1037/0003-066x.63.3.160
Moed, H.F. (2009, 6 Frebuary, 2013). Measuring contextual citation impact
of scientic journals. from arXiv:0911.2632v1
Nederhof, A. J., & van Raan, A. F. J. (1987). Peer -review and bibliometric
indicators of scientic performancea comparison of cum laude doctorates with ordinary doctorates in physics. Scientometrics, 11(56),
333350.
Norris, M., & Oppenheim, C. (2003). Citation counts and the Research
Assessment Exercise VArchaeology and the 2001 RAE. Journal of
Documentation, 59(6), 709730.
OECD. (2010). Performance-based Funding for Public Research in
Tertiary Education Institutions: Workshop Proceedings
Papatheodorou, S. I., Trikalinos, T. A., & Ioannidis, J. P. A. (2008). Inated
numbers of authors over time have not been just due to increasing
research complexity. Journal of Clinical Epidemiology, 61(6), 546551.
doi: 10.1016/j.jclinepi.2007.07.017
Pestana, M. H., & Gageiro, J.N. (1998). Anlise de dados para cincias
sociais -A complementariedade do SPSS (1a ed.). Lisboa: Edies Slabo,
Lda.
Reale, E., Barbara, A., & Costantini, A. (2007). Peer review for the evaluation of academic research: lessons from the Italian experience. Research
Evaluation, 16(3), 216228. doi: 10.3152/095820207x227501
Reuters, T. (2013). Journal Citations Reports. Retrieved 6 February,
2013, from http://admin-apps.webofknowledge.com/JCR/JCR?SID
=W1NlEmi49NF%40i5eMEDL
Rinia, E. J., van Leeuwen, T. N., van Vuren, H. G., & van Raan, A. F. J.
(1998). Comparative analysis of a set of bibliometric indicators and
central peer review criteriaEvaluation of condensed matter physics in
the Netherlands. Research Policy, 27(1), 95107.
SCImago. (2007). SJRSCImago Journal & Country Rank. Retrieved
February 06, 2013, from http://www.scimagojr.com
SCImago. (2011). SIRSCImago Institutions Rankings. Retrieved
November, 2011, from http://www.scimagoir.com/index.php#
Taylor, J. (2011). The Assessment of Research Quality in UK Universities:
Peer Review or Metrics? British Journal of Management, 22(2), 202
217. doi: 10.1111/j.1467-8551.2010.00722.x
Train, K. (2009). Discrete Choice Methods with Simulation (Second ed.
Vol. 2009): Cambridge University Press.
Van Hooydonk, G. (1997). Fractional counting of multiauthored publications: Consequences for the impact of authors. Journal of the American
Society for Information Science, 48(10), 944945.
Van Raan, A. F. J. (2006). Comparison of the Hirsch-index with standard
bibliometric indicators and with peer judgment for 147 chemistry
research groups. Scientometrics, 67(3), 491502. doi: 10.1556/
Scient.67.2006.3.10
Vieira, E.S., & Gomes, J.A.N.F. (2009). A comparison of Scopus and Web
of Science for a typical university. Scientometrics, 81(2), 587600. doi:
10.1007/s11192-009-2178-0
Vieira, E.S., & Gomes, J.A.N.F. (2011). An impact indicator for researchers. Scientometrics, 89(2), 607629. doi: 10.1007/s11192-0110464-0
Wagenmakers, E. J., & Farrell, S. (2004). AIC model selection using
Akaike weights. Psychonomic Bulletin & Review, 11(1), 192196. doi:
10.3758/bf03206482
Weeks, W. B., Wallace, A. E., & Kimberly, B. C. S. (2004). Changes in
authorship patterns in prestigious US medical journals. Social Science
& Medicine, 59(9), 19491954. doi: 10.1016/j.socscimed.2004.02.
029
Wood, M., Roberts, M., & Howell, B. (2004). The reliability of peer
reviews of papers on information systems. Journal of Information
Science, 30(1), 211. doi: 10.1177/016551504041673
574
Appendix AMethodology
A description of the statistical methodologies is given in
this appendix. For factor analysis only a brief description of
the required assumptions and of the steps needed to undertake the analysis is provided. The algebra behind factor
analysis is very complex and we chose not to present them in
this paper. For the rank ordered logistic regression, a more
detailed description is given as this methodology has not
been applied in this type of study. It is important to know
how this methodology works and the results that we obtain
from its application.
Factor Analysis Factor analysis is an interdependency
technique that is used to verify if an underlying structure
among variables exists. This is accomplished by looking for
correlations among variables and grouping the variables into
sets. Each set also labeled as a factor represents a dimension
within the data. The nal application of the output of factor
analysis depends on the goal of each study. If the goal of
the study is the reduction of data (reduction of variables) the
dimensions obtained can be used in the development of
composite indicators that can be applied to further analysis.
On the other hand, if we have a conceptual basis for understanding the relationship among variables, the dimensions
can represent concepts that cannot be described precisely
using only one variable.
Here we used factor analysis to eliminate multicollinearity. The variables were grouped into composite scores
(factor scores) and these were used as input data in the rank
ordered logistic regression.
Several assumptions are required in factor analysis:
This methodology uses the correlation matrix as input data. If
several correlations in the matrix are higher than 0.3, then factor
analysis can be used.
Partial correlations should be analyzed. In the data a signicant
number of small partial correlations should be observed.
The Bartlett test of sphericity should be signicant. This is used
to test the null hypothesis that the correlation matrix is an identity
matrix.
The measure of sampling adequacy (MSA) should be evaluated.
This index varies between 0 and 1. The higher this value, the
better the prediction of each variable by the other variables.
Values close to 1 mean that factor analysis is appropriate. The
MSA can also be applied to individual variables with the effect
that variables with MSA lower than 0.5 should be excluded from
the analysis.
Selection of the Factor Extraction Method The method

used on the factor extraction was the principal component.
Some of the indicators do not have a Gaussian distribution.
The principal components methodology does not imply a
Gaussian distribution of the indicators. The Kaiser criterion
was used for determining the number of factors to be
extracted. All factors with eigenvalues higher than one were
retained.
Rotation of Factors The Varimax rotation was used. As we
have mentioned, factor analysis was used with the aim to

DOI: 10.1002/asi
eliminate multicolinearity. In the Varimax methodology the

rotation of factors is orthogonal and the factors retained are
not correlated.
Factorial Structure Analysis The factorial structure was
analyzed in detail. Factor loadings were analyzed and the
indicators that contribute more for the denition of each
factor were identied.
The percentage of the total variance of each variable
explained by the factorial structure was analyzed. In our
study we established that 50% of the total variance of each
variable (also known as communality) must be accounted by
the factorial structure.
The total variance explained by the retained factors is
important. Small values for this percentage are not satisfactory as we want to identify a new set of variables losing
small quantities of information.
Determination of Factor Scores After the analysis of the
factorial structure the factor scores, the new variables should
be determined. Factor scores were calculated using the
regression method. These factor scores and the indicators
that were excluded from the factor analysis were used as
input data in the regression analysis.
Rank Ordered Logistic Regression (ROLR) In discrete
choice models a decision maker n (in our case, one of the
panels) makes choices among a set of J alternatives (in our
case candidates) taking into account the utility of each alternative to the decision maker. This utility is represented as
Unj, where n is the decision maker and j = 1, 2 . . . J the
alternatives. In this sense, we can say that the decision maker
will choose as the best alternative the one with the highest
utility: A decision maker will rank i as rst if and only if
Uni > Unj for all j i. In reality the analyst cannot observe
the total utility, but only some features of the alternatives
and of the decision makers. The total utility is known only to
the decision maker. Let us dene the features or properties
of the alternatives as xj and the features of the decision
makers as sn. The function that represents the relation
between the observed features and the utility is shown here
and is denoted as the representative utility:
Vnj = V ( xnj , sn ) j
(A.1)
As mentioned earlier, there may be some aspects that the

analyst cannot observe and this makes the utility of alternative
j measured by decision maker n into Unj Vnj. The measured
utility is the addition of a systematic or deterministic term, Vnj
and a stochastic term (containing the aspects taken into
account by the decision maker but hidden to the analyst):
U nj = Vnj + nj
(A.2)
The stochastic component, enj, will be treated as a random

variable with a probability density f(enj) and it is assumed
that it follows an iid extreme value distribution:
f ( nj ) = e nj e e
nj
(A.3)
The cumulative distribution function is:
F ( nj ) = e e
nj
(A.4)
The probability that a decision maker n chooses the alternative i from a set of alternatives is:
Pni = P (U ni > U nj , j i )
(A.5)
Pni = P(Vni Vnj > nj ni , j i )
(A.6)
Using the probability density function f(enj), this probability can be rewritten as:
Pni ni = I ( nj ni < Vni Vnj ) f ( nj ) d nj
(A.7)
where I(. . .) is the indicator function that is one if argument

is true and zero if it is false.
As it is assumed that the enjs are independent, the cumulative distribution of all j i is the product of the individual
cumulative distributions and the probability is given by:
Pni ni = e e
( ni +Vni Vnj )
(A.8)
j i
As the value of eni is unknown, the probability Pni can be

obtained by integration of Pni ni over the whole domain of eni:
Pni = e e
( ni +Vni Vnj )
e ni e e
ni
d ni
j i
(A.9)
The choice probability is determined using the expression below, after calculation of the integral (Train, 2009):
Pni =
eVni
j eVnj
(A.10)
The relation of the probability with the (deterministic

part of the) utility is sigmoid.
The systematic component (Vnj) is normally modeled as a
linear function of the properties of the alternatives:
Vnj = xnj
(A.11)
The xnj is a vector of observed variables for alternative j

and b the vector of coefcients of these variables.
Then the choice probability can be written as the
so-called logit expression:
Pni =
exni
j exnj
(A.12)
The ranking of the J alternatives can be considered as a

product of logit expressions. If a decision maker is required

DOI: 10.1002/asi
575
The description of the models theory was based on

Train (2009). The Stata software was used to carry out all
the statistical work.
to rank 1, 2, . . ., J alternatives and the utility of each alternative is distributed as a iid value extreme, the probability of
ranking 1 > 2 > 3 >. . . is given as:
Pn (1 > 2 > 3 > )

=
e xn1
j =1, 2 ,3,
e xnj
e xn 2
j = 2,3, e xnj
e xn 3
j =3, e xnj
Appendix BResults and Discussion

A detailed discussion of the factor analysis is provided in
this appendix. In the main body of the paper, only the nal
results are presented.
To carry out factor analysis we used 174 candidates and
12 indicators. Regarding the sample size, we can say that our
data set is sufcient as it has been recommended (Hair,
2010) that it should have at least ve times as many observations as the number of variables.
The analysis of outliers was carried out using box plots
and the Mahalanobis distance (Barnett & Lewis, 1994).
From this analysis, we concluded that three candidates
should be excluded. The descriptive of our data set is given
in Table B1.
By direct inspection of Table B2, we can see that 23% of
the correlations are in fact higher than 0.3. Only the correlation matrix is showed as it is the starting point to factor
analysis takes place.
(A.13)
A model is dened by vector b, the set of linear parameters that multiply the independent variables xnj. This vector
will be calculated by the maximization of the likelihood
function, 1(b), as summarized here. For a set of N decision
markers the log likelihood function is given by:
N J
N J
J
log l ( ) = xnj log e xnk (A.14)

n =1 j =1
n =1 j =1
k= j
This function is concave in b so that a unique maximum

of the likelihood function exists. Here the maximization of
the likelihood provides estimates for the models parameters, b. The value obtained for the likelihood is the probability that our model predicts the rankings observed.
TABLE B1.
The descriptive statistics of our data set.
Variable
Average
Median
Standard deviation
Variation coefcient
Maximum
Minimum
Skewness
NDF
hnf
NIR
HCD
SNIPm
SJRm
DIC
CD
NI
Q1
NAm
PDC
10.84
3.42
4.45
20
1.049
0.165
44
91
1.2
51.1
6.22
62
8.78
3.09
3.76
18
1.010
0.120
40
93
1.2
50.8
6.47
66
7.79
2.01
2.82
13
0.347
0.152
26
6
0.1
4.3
1.78
17
0.719
0.587
0.635
0.660
0.331
0.923
0.594
0.068
0.095
0.084
0.286
0.277
36.71
8.56
14.44
66
2.135
0.891
96
100
1.6
68.8
11.50
100
0.45
0.20
0.23
0
0.210
0.039
0
61
0.8
38.4
0.48
14
1.345
0.615
1.328
0.612
0.292
2.663
0.212
-2.129
0.560
0.409
-0.754
-0.868
TABLE B2.
NDF
NIR
hnf
HCD
NI
Q1
PDC
SJRm
SNIPm
DIC
CD
NAm
576
Correlation matrix for the set of indicators used.

NDF
NIR
hnf
HCD
NI
Q1
PDC
SJRm
SNIPm
DIC
CD
NAm
1.00
0.50
0.92
0.33
0.23
0.17
0.39
0.04
0.17
0.00
-0.14
-0.22
0.50
1.00
0.71
0.66
0.15
0.06
0.67
0.21
0.41
0.21
0.10
0.17
0.92
0.71
1.00
0.48
0.22
0.11
0.60
0.13
0.30
0.09
-0.06
-0.07
0.33
0.66
0.48
1.00
0.14
0.15
0.43
0.25
0.13
0.11
-0.03
0.22
0.23
0.15
0.22
0.14
1.00
0.66
0.11
0.03
0.10
0.17
0.00
-0.16
0.17
0.06
0.11
0.15
0.66
1.00
-0.06
-0.05
-0.02
0.13
-0.04
-0.09
0.39
0.67
-0.60
0.43
0.11
-0.06
1.00
0.35
0.43
0.28
0.08
0.13
0.04
0.21
0.13
0.25
0.03
-0.05
0.35
1.00
0.26
0.20
0.22
0.05
0.17
0.41
0.30
0.13
0.10
-0.02
0.43
0.26
1.00
0.15
0.20
-0.11
0.00
0.21
0.09
0.11
0.17
0.13
0.28
0.20
0.15
1.00
0.12
0.11
-0.14
0.10
-0.06
-0.03
0.00
-0.04
0.08
0.22
0.20
0.12
1.00
0.05
-0.22
0.17
-0.07
0.22
-0.16
-0.09
0.13
0.05
-0.11
0.11
0.05
1.00

DOI: 10.1002/asi
TABLE B3.
Values of variance ination factor (VIF) and condition index (CI).

VIF of Variables
Strategy 1
Strategy 2
Strategy 3
PC
PI
SC
NDF
HCD
CD
PDC
hnf
NIR
SNIPm
SJRm
NI
QI
DIC
NAm
CI
1.17
1.05
1.24
1.49
2.4
2.43
1.14
1.15
2
2.09
2.69
2.88
1.31
1.46
1.45
1.28
1.28
2.19
2.18
2.05
2.04
1.14
1.25
1.23
1.12
1.34
1.25
1.89
1.75
3.83
1.96
The common variance (communality) is above 0.5 for all

the variables, this meaning that at least one half of the variance
of each variable is accounted by the three factors identied.
Once we have found a structure, it is important to evaluate its internal consistency or reliability. This allows us to
verify if the individual variables are all measuring the same
construct and thus highly correlated. The Cronbachs a is
the most used measure. A value in the range 0.70.8 is
considered acceptable (Field, 2005). For Factor 1, Factor 2,
and Factor 3 values of 0.7, 0.8, and 0.4 were obtained,
respectively. Factor 3 has a Cronbachs a outside the suggested range. Nevertheless, this was expected because of the
diversity/heterogeneity of the construct being measured in
this dimension.
Once we have the output of factor analysis, multicollinearity was analyzed for the three strategies dened. The
results are presented in Table B3.
For Model 1 we considered as variables the three composite indicators obtained in factor analysis and those
variables that were excluded. The results show that multicollinearity was eliminated and the rank ordered logistic
regression can be applied.
Factor analysis groups variables are correlated among

them. In this way, it is important to consider the variables
variance that is shared by the other variables. This variance is
called communality. In our analysis we considered communalities of at least 0.5 as have been suggested in the literature
(Hair, 2010). The SNIPm and DIC did not meet this criterion
and were excluded. The NAm did not have a signicant
correlation with the other variables and was also excluded.
The analysis was carried out with the remaining variables.
It was observed that the variables have low partial correlations with exception of the NDF and the hnf that have a
partial correlation of 0.93, but these variables have partial
correlation with other variables that cannot be disregarded
and thus were kept in the analysis. The Bartlett test is signicant, p < .001, the overall MSA is 0.671; all the variables
have an individual MSA equal or higher than 0.5.
Hair (2010), discusses the sample size needed to obtain
signicant values of factor loadings. For a sample size of
150 (and 200), the factor loadings should be at least 0.45
(and 0.40) to be considered signicant. Our results showed
that the variables that are more important in the denition of
a given factor have signicant factor loadings.
Appendix CTables
TABLE C1.
Variables with signicant impact retained for each model.
Dependent
variable
B1
B2
B3
B4
B5
B6
B7
Model 1
Model 2
PC
hnf
PC
hnf
PC
hnf, HCD
PC
hnf, HCD
PC
hnf
PC
hnf, NAm
PC
hnf, NAm
Model 3
NDF
NDF,
HCD
NDF, HCD
NDF, HCD,
NAm
NDF,
HCD,
NAm
NDF, HCD,
NAm
NDF, HCD,
NAm
TABLE C2.
B9
B10
B11
PC
hnf, NAm,
SJRm
NDF, SJRm,
NAm
PC
hnf, NAm,
SJRm
NDF, HCD,
SJRm, NI,
QI, NAm
PC
hnf, NAm,
SJRm
NDF, HCD,
SJRm,
NAm
PC
hnf, NAm,
SJRm
NDF, HCD,
SJRm, NI,
QI, NAm
Distribution of the number of applicants in the 27 openings.
Number of applicants
1 or more
2 or more
3 or more
4 or more
5 or more
6 or more
7 or more
8 or more
9 or more
10 or more
11 or more
B8
Number of openings
27
27
21
19
18
17
14
12
10
5
1

DOI: 10.1002/asi
577

De Nition of A Model Based On Bibliometric Indicators For Assessing Applicants To Academic Positions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

De Nition of A Model Based On Bibliometric Indicators For Assessing Applicants To Academic Positions

Uploaded by

Copyright:

Available Formats

Definition of a Model Based on Bibliometric Indicators

for Assessing Applicants to Academic Positions

A model based on a set of bibliometric indicators is

(Shanghai Jiao Tong University) use bibliometric indicators

Currently, several countries use a combination of peer

Several studies of the peer review results in the RAE

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYMarch 2014

Just a few papers have been published on peer review as

set of indicators that may be thought to be implicit in the

This paper is organized as follows: The next section,

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYMarch 2014

instruments available online. The whole set of documents

A document j with aj authors is counted as 1/aj and the

where P is the total number of documents indexed in the

The NIR is a normalized average number of citations

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYMarch 2014

For the normalization just described we considered only

average number of normalized citations per document is

The PDC gives the fraction of the authors P documents

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYMarch 2014

between the year of publication of each document and the

where Pc is the number of documents cited.

The use of the citation count as an indicator of impact of

Percentage of Highly Cited Documents (HCD)

For the set of documents with at least one authors

P1 here is the number of documents published in 10 years

Prestige of the affiliation institution. The prestige of the

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYMarch 2014

The NAm is a normalized median number of authors per

NAnj ( xyi ) = NA j ( xyi )

and the median NAm is obtained as:

For normalization of NAm we considered all the documents

Contingency tables and Pearsons correlations have also

where PIC is the number of documents with at least one

A detailed description of the ROLR is given in Appendix A.

Theoretical Background of the Statistical Method

We used a set of 27 job openings with 2 to 11 applicants

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYMarch 2014

The model may be tested for the reproduction of the

Dependence on Significant Bibliometric Indicators

Marginal effects were calculated as:

The relative (or percental) change in Vnj due to a relative (or

A proportional change in Vnj for a change in xnj

A change in Vnj for a proportional change in xnj

Relation Between Peer Review Judgments and the

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYMarch 2014

Results and Discussion

TABLE 2. Factor loading and communality values obtained in the factor

The results obtained in the different steps necessary

Value obtained for the variance ination factor (VIF)

Output of Factor Analysis

Factor 1 explains 32.7% of the total variance and hnf, NIR,

This means that the ratio is independent of alternatives other

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYMarch 2014

inconsistency of the coefcients for, which we adopt

To overcome this problem the dependent variable (ranking)

To decide which of the dependent variables is more

FIG. 2. The likelihood ratio index when different dependent variables

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGYMarch 2014

Parameters estimated for each model.

Prob > chi2b

The likelihood ratio is similar for Models 2 and 3 and is

e 0.40 hnfni + 0.032 HCDni