Professional Documents
Culture Documents
Elizabeth S. Vieira
REQUIMTE/Departamento de Qumica e Bioqumica, Faculdade de Cincias, Universidade do Porto,
Rua do Campo Alegre, 687, 4169-007 Porto, Portugal; Departamento Engenharia Industrial e Gesto,
Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n, 4200-465 Porto, Portugal.
E-mail: elizabeth.vieira@fc.up.pt
Jos A.S. Cabral
INESC-TEC, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n, 4200-465 Porto,
Portugal. E-mail: jacabral@fe.up.pt
Jos A.N.F. Gomes
REQUIMTE/Departamento de Qumica e Bioqumica, Faculdade de Cincias, Universidade do Porto, Rua do
Campo Alegre, 687, 4169-007 Porto, Portugal. E-mail: jfgomes@fc.up.pt
Introduction
Bibliometric indicators have been widely used for assessing the scientic performance of a given research body. The
design of indicators has attracted a lot of attention in the last
few years as national authorities, funding bodies, and institutional leaders show a growing interest in indicators that
can, automatically, rate the performance of their institutions.
The rankings published by Centre for Science and
Technology Studies (CWTS), SCImago, the Performance
Ranking of Scientic Papers for World Universities
(Taiwan), and The Academic Ranking of World Universities
Received: January 3, 2013; revised March 28, 2013; accepted March 28,
2013
2014 ASIS&T Published online 7 January 2014 in Wiley Online
Library (wileyonlinelibrary.com). DOI: 10.1002/asi.22981
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 65(3):560577, 2014
the performance-based reallocation system applies a bibliometric indicator that considers publications at two levels
(OECD, 2010).
Peer review is still the gold standard for research evaluation but the pressure for more frequent and extensive
assessment of the performance of researchers, research
groups, and research institutions makes bibliometry attractive and, in fact, the only method available in practice in
many situations. It is therefore very important to benchmark
bibliometric indicators against traditional peer assessment in
real situations. We know that bibliometric indicators are
more accepted in the assessment of large research bodies,
but they are still used frequently for individuals and it is
important to clarify what should be expected of them in this
practice.
Some studies have been carried out in recent years with
the main goal of nding a relation between the two methods.
These studies consider the judgments of peer-review at
several levels: a) national level; b) research programs,
research groups, departments; and c) at the individual level.
The next section summarizes some of the most relevant
published results on this topic.
Benchmarking of Indicators Against Peer Review
National or institutional assessment exercises provide
invaluable collections of data that may be used for some sort
of quality control. Several studies have been made to explore
how the results of the RAE in Britain and the VTR in Italy
compare with simple bibliometric indicators.
Reale, Barbara, and Costantini (2007), explored the relationship between the results of peer review evaluations in
three disciplines, (biology, chemistry, and economics) at the
VTR, with the impact factors of the journals where the
documents submitted for evaluation were published. They
found that the impact factor correlates positively and signicantly with the decisions of peers, but these correlations are
weak. The Spearman rank correlation coefcients vary
between 0.44 and 0.48.
Abramo, DAngelo, and Caprasecca (2009), studied how
the results of the VTR evaluations made by peers for the
hard sciences are related with the results obtained using the
impact factor of the journals and the number of documents
published. They found correlations between 0.336 and 0.876
for the scientic elds analyzed. Franceschet and Costantini
(2011), undertook a similar study, but considering more
bibliometric indicators. They used the mean number of citations per document, the impact factor of the journal and the
h-index of a set of publications. A positive and signicant
correlation was found between the mean number of citations
per document and the decisions made by peers for most of
the scientic elds analyzed (Spearmans coefcients varied
from 0.32 to 0.81). The same was found for the impact factor
(Spearmans coefcients varied from 0.29 to 0.85). The
h-index was shown to be a reasonable indicator to discriminate among the documents classied in a given level (Excellent, Good, Acceptable, Limited) by peers.
561
bibliometric indicators and compared the data with the decisions of the peers in the selection process. They observed
that those awarded the fellowship published more documents than those rejected and that their documents obtained
more citations. These observations support those obtained
by Nederhof and van Raan (1987).
Bornmann, Wallon, and Ledin (2008), considered the
applicants to the Long-Term Fellowship and the Young
Investigator of the European Molecular Biology Organization and studied the relation between the scientic performance of the applicants and the decisions made by peers.
The authors found that, on average, the approved applicants
have better scientic performance than rejected applicants
when quantitative and impact indicators are used to describe
performance.
Some general ndings were:
Positive and statistically signicant correlations were
observed between peer review and selected bibliometric
indicators.
The use of indicators as supporting instruments in peer review
appears to be justied by these studies.
the number of documents indexed in WoS (or in other databases), this being possibly weighted by the type of document
or the number of authors. There is a growing literature on
special techniques for counting documents beyond normal
counting where each author of the publication gets one publication. In rst author counting, only the rst author gets a
full publication ( Cole & Cole, 1973 ); in proportional counting, a fraction of the publication is attributed to each author
taking into account his or her position in the authors list
(Hagen, 2008; Hodge & Greenberg, 1981; Van Hooydonk,
1997); in fractional counting each author gets 1/a publications, a being the number of authors in the publication
(Burrell & Rousseau, 1995). We adopted fractional counting
to compensate for the frequently perceived trend of adding
names to the author lists of individuals who contributed very
little to the work being reported. This should not be seen as
penalizing scientic collaboration as each collaborating
team and each collaborating researcher still gets
his fair share of the publication credit and the premium
from counting the publications that only the joint work
allowed.
Number of Documents Fractioned (NDF)
1
a
j =1 j
NDF =
Selection of Indicators
Peers are asked to evaluate the scientic performance of
the applicants but very little is, normally, said about the
criteria to be used. They are then expected to use personal
criteria based on their own interpretation of international
standards. In assessing the quantity and quality of the scientic production, each panel member may be expected to
use personal criteria based on his or her personal experience
and expectations. In approaching this problem, we tried to
select those indicators that describe different dimensions of
the scientic performance of a given researcher. Several
indicators have been developed over the years and we were
forced to limit our choice for this study. All indicators
present advantages and limitations making the selection
process even more complex. In fact, some studies of this
kind have been undertaken, but just a few indicators are
normally used. The alternative was to choose a set of bibliometric indicators and to show the potential of those bibliometric indicators as a complementary instrument in peer
review. We are looking for indicators that are implicit in peer
judgments, that may be used to describe scientic performance and that allow a fair comparison of researchers. Indicators such as the total number of documents and citations
were not used as they were not adequate to compare
researchers in different areas.
Quantity of scientific production. Indicators that quantify
the scientic production of a given researcher are based on
(1)
C nj ( xyi ) = C j ( xyi )
I xy
1
N
k =1 xyk
(2)
563
where Ixyk is the average number of citations of the documents of type x, published in year y in all journals of subject
category k and Ixy is the average number of citations of all
documents of type x published in year y.
The quantity Ixy may be calculated as:
I xy =
M xyk I xyk
M xy
(3)
The hnf index
where Mxyk is the number of documents of type x and subcategory k that were published in year y and Mxy is the
number of documents of type x published in year y in all
journals of subject category k.
Indicator NIR is then calculated as:
NIR =
1 P n
C j xyi
P j =1 ( )
(4)
Indicators as the NIR might be inuenced by the presence of documents with a large number of citations or by a
considerable number of documents with zero citations. The
hnf index is immune to these documents and can constitute a
good alternative to the NIR. In fact, there are several variants of the h-index, but only the hI (Batista, Campiteli,
Kinouchi, & Martinez, 2006) and the IQp (Antonakis &
Lalive, 2008) take into account eld dependence. In the hI
the h-index of a given author is divided by the average
number of authors per document determined for the documents in the h-core. However this indicator is strongly
affected by the presence of outliers. Normalization of the
IQp considers only the three subject categories where the
authors get more citations and the normalization is not
made document by document. To be fair to individual
researchers, the performance index should compensate
for the citation cultures of different scientic elds and
this is a recognized failure of the original h-index. The hnf
index compensates for this by the normalization of the citation count. It is well known that the average number
of authors in papers is growing rapidly (Ioannidis, 2008;
Papatheodorou, Trikalinos, & Ioannidis, 2008; Weeks,
Wallace, & Kimberly, 2004), probably because of the
pressure to publish. The other factor for this growth is probably the increased national and international collaboration
and generating synergies that raise both the number and
impact of the publications. Fractioning methods have been
used to withdraw the incentives of articial or pseudocollaborations without introducing disincentives to the true
collaboration. The h-variant adopted here, hnf, allows fractional counting, after the normalization to the EU_15
average. This normalization is performed for each document thus comparing the performance of each document
with the average performance of the documents that belong
to the same subject category, document type, and year of
publication. Each document is treated separately as it is in
fact an independent piece of work. The normalization in the
hnf is the same as in the NIR. A detailed description of this
indicator can be found in Vieira and Gomes (2011).
On the hnf and the NIR we did not remove the selfcitations as it would require a complex analysis of the whole
set of data.
Percentage of Documents Cited (PDC)
PDC =
Pc
100
P
(5)
CD =
CDws
100
TCD
(6)
HCD =
P10
100
P1
(7)
SNIPm = Md (SNIP )
(8)
SJRm = Md(SJR)
(9)
565
MNAxy
1
N
k =1
MNAxyk
(10)
where MNAxyk is the average number of authors of the documents of type x, published in year y in all journals of subject
category k, MNAxy is the average number of authors of all
documents of type x published in year y. If, Mxy is the
number of documents of type x published in year y in all
journals of subject categories k, then
MNAxy =
M xyk MNAxyk
M xy
(11)
U nj = Vnj + nj
NAm = Md ( NA
n
j ( xyi )
(12)
DIC =
PIC
100
P
(13)
(14)
where Vnj = b.xnj is a linear function of the observed explanatory indicators (xnj) and b the vector of coefcients. The enj
represent the factors that inuence the utility, but are
unknown to the analyst. The probability of an applicant, i, to
be selected in rst place by the peer panel, n, within each
contest with j applicants is given as:
Pni =
exni
j exnj
(15)
Vnj xnj (
)=
Vnj
xnj
(16)
eVnj exnj (
Vnj xnj
xnj Vnj
)=
(17)
eVnj xnj (
)=
Vnj 1
xnj Vnj
(18)
nj exnj (
)=
Vnj
xnj
xnj
(19)
Pairs
Ranking
1
1-2
1-3
1-4
1-5
1-6
2-3
2-4
2-5
2-6
3-4
3-5
3-6
4-5
4-6
FIG. 1. Illustration of the counting of pairs of applicants for a ranking obtained in a given position opening. [Color gure can be viewed in the online issue,
which is available at wileyonlinelibrary.com.]
567
For each of these cases one model was dened and fully
evaluated.
TABLE 1.
Variable
NDF
HCD
CD
PDC
hnf
NIR
Variable
VIF
8,09
2,47
1,17
2,10
10,62
3,06
SNIPm
SJRm
NI
Q1
DIC
NAm
1,51
1,28
2,19
2,08
1,26
1,39
Factor loadings
Variable
hnf
NIR
NDF
PDC
HCD
Q1
NI
CD
SJRm
Factor 1
Factor 2
0.929
0.855
0.796
0.758
0.675
Factor 3
-0.333
-0.300
0.917
0.893
0.739
0.718
Communality after
Extraction
0.901
0.772
0.773
0.672
0.511
0.844
0.818
0.553
0.579
The SNIPm, DIC, and NAm were excluded with the application of factor analysis as they do not meet some of the
criteria. However, the indicators were used with the output
from factor analysis as input data in the ROLR.
With the strategies adopted, multicolinearity was
eliminated. The results are presented in Table B3 in Appendix B.
For ROLR to be applicable, the relative probabilities of
two alternatives must be independent of all other alternatives, the so-called independence of irrelevant alternatives
(IIA) must be veried. From the discussion in the method
section it follows that:
Pni
= eVnj eVni
Pnj
(20)
ratio index (Train, 2009). This index measures how well the
model ts the data and is calculated as:
= 1
log l ( )
log l (0 )
(21)
The log l (b) is the log likelihood function with the parameters estimated and the log l (0) is the log likelihood function
of the null model (without parameters). The value varies
between 0 and 1 and values close to 1 mean that the parameters estimated can be used to represent the choices made by
the peers. The results are presented in Figure 2.
When a person is asked to rank a set of alternatives, the
parameters of the model and the choices can be estimated
more efciently than when asked to choose only the most
preferred alternative (Fok, Paap, & Van Dijk, 2012). This
may explain why the index r increases from B1 to B2. The
index is higher for B2 than for any other dependent variable.
It decreases slowly up to B4 and then it is reasonably stable
with much lower values for B5 to B11 even when the
number of variables with signicant impact is higher for
some of the cases. As we go from B2 to B4 the number of
openings decreases markedly and this may be the reason for
the small decrease observed in the index. The overall picture
of the situation conrms the expectation raised above that
panel members put more effort in the decision about the rst
few positions in the ranking. The marked decrease of the
likelihood index as we go beyond B4 suggests that we
should fully consider the rankings up to the fourth position
and this may give all other applicants an ex-aequo position.
After choosing the most appropriate dependent variable
the Hausman test was performed showing that the IIA
hypothesis is veried, p > .05. Taking into account that the
IIA holds the parameters of each model were estimated and
listed in Table 3.
569
TABLE 3.
Model
Coefcient
Standard deviation
P > |Z|c
LR chi2a
PC
hnf
HCD
NDF
HCD
NAm
1.06
0.40
0.032
0.11
0.044
0.19
0.21
0.09
0.01
0.02
0.01
0.09
5.06
4.61
2.45
4.58
3.47
2.07
<0.001*
<0.001*
0.014*
<0.001*
0.001*
0.038*
33.38
40.61
<0.001
<0.001
40.99
<0.001
1
2
3
Note. aLR chi2is the likelihood ratio chi-square test that at least one independent variables coefcient is not equal to zero in the model.
Prob > chi2Probability of obtaining the LR chi2 if there is no effect of the independent variables. This is compared to a specic alpha level, to accept the
type error I, which is set at 0.05 or 0.1.
c
Z and P > |Z|statistics test and p-value, respectively, related with the null hypothesis that an individual independent variables coefcient is zero.
*signicant at the 0.05 level.
b
e1.06 PCni
j e1.06 PCnj
(22)
(23)
(24)
Pni =
Model 2.
Pni =
Model 3.
Pni =
(25)
(26)
(27)
Model 2.
Model 3.
TABLE 4.
Model
Model 2
Model 3
Model 1
253.06
254.68
258.30
65.8
29.4
4.8
571
TABLE 5.
Model
10
25
50
75
90
99
Average
Model 1
Model 2
Model 3
-1.67
0.47
1.07
-0.77
1.15
2.05
-0.03
1.82
2.97
0.74
2.72
4.12
1.44
3.84
5.55
2.62
5.43
8.71
0.16
2.02
3.23
TABLE 6.
indicator.
Model
Model 1
Model 2
Model 3
TABLE 7.
indicator.
Model
Model 1
Model 2
Model 3
Variable
yx()
eyex()
eyx()
yex()
PC
hnf
HCD
NDF
HCD
NAm
1.06
0.40
0.032
0.11
0.044
0.19
1.00
0.68
0.32
0.36
0.27
0.36
6.44
0.20
0.016
0.033
0.014
0.058
0.16
1.38
0.64
1.18
0.88
1.17
Variable
yx()
eyex()
eyx()
yex()
PC
hnf
HCD
NDF
HCD
NAm
1.06
0.40
0.032
0.11
0.044
0.19
1.00
0.68
0.32
0.32
0.27
0.41
-33.11
0.22
0.02
0.036
0.015
0.063
-0.03
1.24
0.58
0.95
0.80
1.22
572
573
574
Appendix AMethodology
A description of the statistical methodologies is given in
this appendix. For factor analysis only a brief description of
the required assumptions and of the steps needed to undertake the analysis is provided. The algebra behind factor
analysis is very complex and we chose not to present them in
this paper. For the rank ordered logistic regression, a more
detailed description is given as this methodology has not
been applied in this type of study. It is important to know
how this methodology works and the results that we obtain
from its application.
Factor Analysis Factor analysis is an interdependency
technique that is used to verify if an underlying structure
among variables exists. This is accomplished by looking for
correlations among variables and grouping the variables into
sets. Each set also labeled as a factor represents a dimension
within the data. The nal application of the output of factor
analysis depends on the goal of each study. If the goal of
the study is the reduction of data (reduction of variables) the
dimensions obtained can be used in the development of
composite indicators that can be applied to further analysis.
On the other hand, if we have a conceptual basis for understanding the relationship among variables, the dimensions
can represent concepts that cannot be described precisely
using only one variable.
Here we used factor analysis to eliminate multicollinearity. The variables were grouped into composite scores
(factor scores) and these were used as input data in the rank
ordered logistic regression.
Several assumptions are required in factor analysis:
This methodology uses the correlation matrix as input data. If
several correlations in the matrix are higher than 0.3, then factor
analysis can be used.
Partial correlations should be analyzed. In the data a signicant
number of small partial correlations should be observed.
The Bartlett test of sphericity should be signicant. This is used
to test the null hypothesis that the correlation matrix is an identity
matrix.
The measure of sampling adequacy (MSA) should be evaluated.
This index varies between 0 and 1. The higher this value, the
better the prediction of each variable by the other variables.
Values close to 1 mean that factor analysis is appropriate. The
MSA can also be applied to individual variables with the effect
that variables with MSA lower than 0.5 should be excluded from
the analysis.
Vnj = V ( xnj , sn ) j
(A.1)
U nj = Vnj + nj
(A.2)
f ( nj ) = e nj e e
nj
(A.3)
F ( nj ) = e e
nj
(A.4)
The probability that a decision maker n chooses the alternative i from a set of alternatives is:
Pni = P (U ni > U nj , j i )
(A.5)
(A.6)
Using the probability density function f(enj), this probability can be rewritten as:
(A.7)
Pni ni = e e
( ni +Vni Vnj )
(A.8)
j i
Pni = e e
( ni +Vni Vnj )
e ni e e
ni
d ni
j i
(A.9)
The choice probability is determined using the expression below, after calculation of the integral (Train, 2009):
Pni =
eVni
j eVnj
(A.10)
Vnj = xnj
(A.11)
Pni =
exni
j exnj
(A.12)
575
to rank 1, 2, . . ., J alternatives and the utility of each alternative is distributed as a iid value extreme, the probability of
ranking 1 > 2 > 3 >. . . is given as:
e xn1
j =1, 2 ,3,
e xnj
e xn 2
j = 2,3, e xnj
e xn 3
j =3, e xnj
(A.13)
A model is dened by vector b, the set of linear parameters that multiply the independent variables xnj. This vector
will be calculated by the maximization of the likelihood
function, 1(b), as summarized here. For a set of N decision
markers the log likelihood function is given by:
N J
N J
J
TABLE B1.
Variable
Average
Median
Standard deviation
Variation coefcient
Maximum
Minimum
Skewness
NDF
hnf
NIR
HCD
SNIPm
SJRm
DIC
CD
NI
Q1
NAm
PDC
10.84
3.42
4.45
20
1.049
0.165
44
91
1.2
51.1
6.22
62
8.78
3.09
3.76
18
1.010
0.120
40
93
1.2
50.8
6.47
66
7.79
2.01
2.82
13
0.347
0.152
26
6
0.1
4.3
1.78
17
0.719
0.587
0.635
0.660
0.331
0.923
0.594
0.068
0.095
0.084
0.286
0.277
36.71
8.56
14.44
66
2.135
0.891
96
100
1.6
68.8
11.50
100
0.45
0.20
0.23
0
0.210
0.039
0
61
0.8
38.4
0.48
14
1.345
0.615
1.328
0.612
0.292
2.663
0.212
-2.129
0.560
0.409
-0.754
-0.868
TABLE B2.
NDF
NIR
hnf
HCD
NI
Q1
PDC
SJRm
SNIPm
DIC
CD
NAm
576
NIR
hnf
HCD
NI
Q1
PDC
SJRm
SNIPm
DIC
CD
NAm
1.00
0.50
0.92
0.33
0.23
0.17
0.39
0.04
0.17
0.00
-0.14
-0.22
0.50
1.00
0.71
0.66
0.15
0.06
0.67
0.21
0.41
0.21
0.10
0.17
0.92
0.71
1.00
0.48
0.22
0.11
0.60
0.13
0.30
0.09
-0.06
-0.07
0.33
0.66
0.48
1.00
0.14
0.15
0.43
0.25
0.13
0.11
-0.03
0.22
0.23
0.15
0.22
0.14
1.00
0.66
0.11
0.03
0.10
0.17
0.00
-0.16
0.17
0.06
0.11
0.15
0.66
1.00
-0.06
-0.05
-0.02
0.13
-0.04
-0.09
0.39
0.67
-0.60
0.43
0.11
-0.06
1.00
0.35
0.43
0.28
0.08
0.13
0.04
0.21
0.13
0.25
0.03
-0.05
0.35
1.00
0.26
0.20
0.22
0.05
0.17
0.41
0.30
0.13
0.10
-0.02
0.43
0.26
1.00
0.15
0.20
-0.11
0.00
0.21
0.09
0.11
0.17
0.13
0.28
0.20
0.15
1.00
0.12
0.11
-0.14
0.10
-0.06
-0.03
0.00
-0.04
0.08
0.22
0.20
0.12
1.00
0.05
-0.22
0.17
-0.07
0.22
-0.16
-0.09
0.13
0.05
-0.11
0.11
0.05
1.00
TABLE B3.
Strategy 1
Strategy 2
Strategy 3
PC
PI
SC
NDF
HCD
CD
PDC
hnf
NIR
SNIPm
SJRm
NI
QI
DIC
NAm
CI
1.17
1.05
1.24
1.49
2.4
2.43
1.14
1.15
2
2.09
2.69
2.88
1.31
1.46
1.45
1.28
1.28
2.19
2.18
2.05
2.04
1.14
1.25
1.23
1.12
1.34
1.25
1.89
1.75
3.83
1.96
Dependent
variable
B1
B2
B3
B4
B5
B6
B7
Model 1
Model 2
PC
hnf
PC
hnf
PC
hnf, HCD
PC
hnf, HCD
PC
hnf
PC
hnf, NAm
PC
hnf, NAm
Model 3
NDF
NDF,
HCD
NDF, HCD
NDF, HCD,
NAm
NDF,
HCD,
NAm
NDF, HCD,
NAm
NDF, HCD,
NAm
TABLE C2.
B9
B10
B11
PC
hnf, NAm,
SJRm
NDF, SJRm,
NAm
PC
hnf, NAm,
SJRm
NDF, HCD,
SJRm, NI,
QI, NAm
PC
hnf, NAm,
SJRm
NDF, HCD,
SJRm,
NAm
PC
hnf, NAm,
SJRm
NDF, HCD,
SJRm, NI,
QI, NAm
Number of applicants
1 or more
2 or more
3 or more
4 or more
5 or more
6 or more
7 or more
8 or more
9 or more
10 or more
11 or more
B8
Number of openings
27
27
21
19
18
17
14
12
10
5
1
577