Professional Documents
Culture Documents
1 INTRODUCTION
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TPDS.2015.2457924, IEEE Transactions on Parallel and Distributed Systems
2
2 RELATED WORK
As an emerging distributed computing model,
crowdsourcing has become an important and active research field in recent years [19], [20], [21]. Crowdsourcing
arise in many forms, such as citizen, peer production/cocreation, wisdom of crowds, collective intelligence, and so
on [22]. In the past few years, a large number of
crowdsourcing platforms have been set up and used in
many fields [23], [24], [25], [26]. Amazon Mechanical Turk
(AMT) is one of the most prominent crowdsourcing platforms today. It is a crowdsourcing Internet marketplace
that enables workers and crowdsourcers to coordinate the
use of human intelligence to perform tasks that computers are currently unable to do. Most of these crowdsourcing systems rely on offline or artificial worker quality
control and evaluation or simply ignore the quality control issues. There are increasing numbers of researches
focus on quality control issue at present [27], [28]. E.
Kamar [29] presents a model to enable the system to balance the expected benefit versus the costs of hiring a
worker. D. Vakharia [22] studies the issue of quality assurance and control and it is an important part of their
research. The multiple crowdsourcing platforms discussed in [22] use different methods to implement worker
quality control. These worker quality control methods
more or less require human intervention, which places
burden on crowdsourcers. Worker quality control has
become a bottleneck, affecting crowdsourcing system development [30]. The core issue of worker quality control
is the worker quality evaluation [31], [32], and the online
worker quality evaluation is attracting more and more
public attention.
J. M. Rzeszotarski [33] distinguishes the quality of different workers through analyzing the behaviours of the
workers. However, this method requires the crowdsourcing system to provide the workers behaviour logs. R.
Snow, J. Whitehill, V. C. Raykar, and X. Liu [34], [35], [36],
[37] are mainly based on the EM algorithm [38], [39], [40],
[41] to calculate the accuracy of the worker and mining
the potential quality of the worker using the answers matrix. These studies are all focused on the determination of
a single label. M. Joglekar [14] studies the worker quality
evaluation based on the frequency of disagreement regarding the results among workers. It also uses confidence intervals to evaluate the accuracy of the worker,
which improves the evaluation accuracy. However, this
study only applies to a Boolean problem, and there are
some constraints on the quality of the worker to be evaluated (>0.5). A. Ramesh [42] mainly studies the dynamical
control of worker behaviours in the process of evaluation.
The research on worker quality evaluation is a slightly
weak, and the evaluation model is simple. P. Welinder
[43] studies the evaluation of workers using the aged EM
algorithm. P. G. Ipeirotis [44] analyses a workers preference through the worker quality evaluation. The studies
above are mostly focused on the traditional architecture
and do not take the big data environment into considera-
tion; thus the practicability and extensibility of these studies are not sufficient.
In recent years, with the continuous development of
cloud computing and the explosive growth of data size,
data-driven has already become the focus of attention for
enterprises. The crowdsourcing model also faces the challenge of big data [45]. Unlike the existing research, ours is
the first paper that considers worker quality evaluation in
a big data environment. Above all, the paper proposes a
general crowdsourcing worker evaluation algorithm, and
we implement it in the Hadoop platform using the
MapReduce programming model.
Tij
1u N
X iju .
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TPDS.2015.2457924, IEEE Transactions on Parallel and Distributed Systems
D.DANG ET AL.: A CROWDSOURCING WORKER QUALITY EVALUATION ALGORITHM ON MAPREDUCE FOR BIG DATA APPLICATIONS
tion of the Bernoulli trial, that is, the probability that the
response for wi and wj will agree with each other.
T
Qij ij
N
There are a wide range of tasks in crowdsourcing platform. Different tasks own different M value and have
their own characteristics. The newly-released tasks may
have completely different modes compared with historical tasks. Moreover, all the newly-released tasks have no
pre-developed answers, and the order of the options is
unpredictable. Therefore, it is difficult to predict which
option is more likely to be the correct answer and the
workers preference for different options. To provide a
general solution for crowdsourcing worker quality evaluation, we assume that each worker owns the same probability to select each wrong option for one problem. According to the idea of the M-1 algorithm and the definitions above, we can obtain the following equation.
1
1
Qij Ai Aj (
(1 Ai ))(
(1 Aj )) ( M 1) (1)
M 1
M 1
Herein, Ai represents the probability that wi chooses
the correct option, while (
M1
(1 Ai )) represents the
1
M1
(1 Ai )) (
M1
(1 Aj )) (M 1) represents
M
M
M Q23 1
Q13 A1 A3 (
7
8
9
10
11
12
13
if ==
set to 1
else
set to 0
end for
14
= + (
( )) (
( )) ( )
15 end for
16 calculate the accuracy rate of using the three equations
obtained in row 14
17 feedback to crowdsourcer
start
start
k-1
k-2
k-1
1
k-2
k-1
1
k-2
......
...
2
...
Step 1
...
2
...
...
2
...
Step 2
Step k
Input: TaskId
N problems , , ,
K workers , , ,
Output: accuracy rate of each worker
1 K workers arranged in a circle
2 Define i=0
3 while i<K
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TPDS.2015.2457924, IEEE Transactions on Parallel and Distributed Systems
4
4
5
6
7
8
9
10 =
11end for
The M-X algorithm focuses on multiple-choice problems. For multiple-choice, the answer to the same problem for different workers tends to have large differences.
So when we address a multiple-choice problem, we can
hardly use the M-1 algorithm to evaluate worker quality
directly. Therefore, based on the M-1 algorithm, we propose a multiple-choice problem-oriented Worker Quality
Evaluation Algorithm, which is named the M-X algorithm. The idea of the M-X algorithm is as follows. First,
we divide a problem into M sub-problems according to its
M value, and M value represents the number of options
of each multiple-choice problem. Each option Oj (j=1, 2,,
M) is treated as a single choice problem with two options,
which represents choosing the option or not. Thus, every
multiple-choice problem with M options is converted to
M single choice problems. Second, for each option dimension, we treat it as a sub-task, which consists of N single
choice problems. In this way, each task is divided into M
sub-tasks. Then we use Algorithm 2 to calculate the
workers accuracy on each option dimension respectively.
4 IMPLEMENTATION ON MAPREDUCE
To cope with the challenges brought by big data and improve the efficiency of the crowdsourcing worker quality
evaluation algorithm in big data, we use the MapReduce
parallel computing framework to implement a general
algorithm called the MRM-X algorithm based on the algorithm proposed in section 3. MapReduce is a parallel programming model and computing framework for processing massive data, which solves the scalability, fault
tolerance and other issues at system level. By accepting
the user-written Map function and Reduce function, it can
automatically execute in parallel on the scalable largescale clusters; thus, it can process and analyse a largescale data set [46], [47].
In actual crowdsourcing platforms, one task may contain different problem types, including single choice and
multiple-choice. According to the idea of the M-X algorithm, first, we need to convert multiple-choice problems
to single choice problems. And then we calculate the
workers accuracy according to the multi-worker evaluation scheme of the M-1 algorithm. Considering the characteristics of MapReduce programming model, we design
three MapReduce tasks in this section. Task one is mainly
responsible for the data pre-processing, including problem type conversion and classifying workers. Task two is
mainly responsible for using the M-1 algorithm to calculate the accuracy of the workers. Task three will calculate
the average accuracy of the workers. Fig. 2 illustrates the
process of the MRM-X algorithm.
The original data format is <Wid, Tid, Pid, Ptype, Sid>,
which represents worker id, task id, problem id, problem
type and workers response, respectively.
1) Task One
As described above, task one will first pre-process the
initial data by Ptype to obtain the data set that can be processed by multi-worker evaluation scheme of the M-1
algorithm. And then it will group the workers who are
involved in the same task.
Map-1 processes the initial data and preprocesses different problems according to Ptype. If it is a multiplechoice problem, we translate the M options into M single
choice problems. Then, the M single choice problems will
be numbered sequentially. After this operation, we combine the id of each single choice problem with the original
Pid as a new Pid. Moreover, we set the workers response
Sid as one if the worker selects the option; otherwise, we
set it to zero. If it is a single choice problem, we ignore
this step. Then, the algorithm regards <Tid+Pid> as a key
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TPDS.2015.2457924, IEEE Transactions on Parallel and Distributed Systems
D.DANG ET AL.: A CROWDSOURCING WORKER QUALITY EVALUATION ALGORITHM ON MAPREDUCE FOR BIG DATA APPLICATIONS
...
Mapper
Mapper
...
Mapper
Mapoutput
Reducer
...
Reducer
output
<W i +W j +W k , Tid +Pid +Ptype+Si +Sj +Sk >
input
<W i +W j +W k +Tid ,
Pid +Ptype+Si +Sj +Sk >
...
...
Mapper
Mapper
Mapper
...
Mapper
...
Mapper
Mapoutput
Mapoutput
Reducer
...
Reducer
Reducer
output
input
...
Reducer
output
<W id +Tid , avgA id >
HDFS
Fig. 2. The flow chart of the crowdsourcing worker quality evaluation for the MRM-X algorithm.
single calculation.
Map-3 takes <Wid+Tid> as the key to shuffle and assigns the same workers three accuracies of one task to the
same Reducer.
Reduce-3 receives the output of Map-3 as an input to
calculate the average accuracy of each worker. The output
is in the form of <Wid+Tid, avgAid>, and avgAid is the final
result.
5 EXPERIMENTAL RESULTS
In experiment one of this section, we recruit 10 workers to
involve in the same task. And then we use the proposed
worker quality evaluation algorithm to calculate each
workers accuracy to preliminary verify the effectiveness
of our algorithm. To more effectively verify the accuracy
and effectiveness of the algorithm in a wide variety of big
data scenarios, we further conduct a series of simulation
experiments to analyse and evaluate the performance of
the worker quality evaluation algorithm in this section.
Experiments are conducted on the Hadoop platform with
simulation data. For one task, we first randomly generate
the answers to the problems according to the problem
types (Boolean, single choice, multiple-choice and so on)
in the task. Then we randomly generate the workers with
different levels. The workers levels are mainly determined by the accuracy. And the accuracy is between 0
and 1. Finally, we generate each workers responses to
each problem according to the worker accuracy that we
generated.
The
scale
of
the
data
set
is
[10000*100*(20|50|100|200|500)], which means that
10,000 workers participate in 100 tasks. When each task
includes a different number of problems, such as [20, 50,
100, 200 and 500], we run the algorithm to observe the
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TPDS.2015.2457924, IEEE Transactions on Parallel and Distributed Systems
6
TABLE 1
THE DISTRIBUTION OF THE PROBLEMS
Problem Type
Boolean
M Value
2
3
4
5
4
Single choice
Multiple-choice
Number
20
20
20
20
20
Fig. 4. The normal Q-Q chart for the expectation of worker accuracy.
Fig. 5. The normal Q-Q chart for the deviation of worker accuracy.
0.9
0.8
Accuracy
0.7
0.6
0.5
0.4
0.3
Real Accuracy
0.2
Our Algorithm
0.1
Vote-based Algorithm
0
1
10
worker
Fig. 3. The comparison between the worker accuracy values based on
different methods and the worker real accuracy.
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TPDS.2015.2457924, IEEE Transactions on Parallel and Distributed Systems
D.DANG ET AL.: A CROWDSOURCING WORKER QUALITY EVALUATION ALGORITHM ON MAPREDUCE FOR BIG DATA APPLICATIONS
TABLE 2
SINGLE SAMPLE KOLMOGOROV-SMIRNOV TEST (K=1)
20
50
100
200
500
100
100
100
100
100
.6431
.6818
.6626
.6694
.6661
.11847
.08880
.06160
.03810
.02888
absolute value
.051
.072
.064
.046
.051
positive
.036
.052
.064
.046
.048
negative
-.051
-.072
-.060
-.044
-.051
Kolmogorov-Smirnov Z
.506
.716
.644
.460
.514
.960
.684
.802
.984
.954
Normal parameter
a, b
average
standard deviation
TABLE 3
SINGLE SAMPLE KOLMOGOROV-SMIRNOV TEST (K=10)
1
N
10
100
100
100
100
100
100
100
100
100
100
.8867
.6716
.8395
.8741
.7652
.7714
.6753
.6069
.9131
.6714
.8972
.6894
.8553
.8885
.7780
.7850
.6865
.6171
.9233
.6912
.04073
.05488
.05411
.04854
.04619
.06413
.07314
.07366
.03844
.05881
absolute value
.020
.038
.026
.029
.038
.020
.061
.034
.020
.032
positive
.011
.027
.026
.029
.024
.014
.044
.029
.020
.020
negative
-.020
-.038
-.017
-.015
-.038
-.020
-.061
-.034
-.013
-.032
Kolmogorov-Smirnov Z
.623
1.188
.823
.911
1.209
.634
1.942
1.091
.621
1.016
.832
.119
.507
.378
.108
.817
.201
.185
.835
.254
Real accuracy
Normal parameter
a, b
average
standard deviation
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TPDS.2015.2457924, IEEE Transactions on Parallel and Distributed Systems
8
TABLE 4
SINGLE SAMPLE KOLMOGOROV-SMIRNOV TEST
1
N
10
10
10
10
10
10
10
10
10
10
10
.0136
.0155
.0133
.0142
.0141
.0132
.0162
.0113
.0089
.0172
.00334
.00562
.00492
.00517
.00614
.00515
.00607
.00652
.00547
.00495
absolute value
.166
.199
.217
.169
.201
.173
.170
.184
.128
.160
positive
.166
.199
.156
.169
.110
.121
.170
.184
.124
.160
negative
-.153
-.126
-.217
-.129
-.201
-.173
-.162
-.141
-.128
-.154
Kolmogorov-Smirnov Z
.526
.628
.688
.535
.636
.548
.536
.581
.405
.506
.945
.825
.732
.937
.813
.925
.936
.888
.997
.960
Normal parameter
a,b
average
standard deviation
TABLE 5
APPROXIMATE MATRIX OF EUCLIDEAN DISTANCE BETWEEN DIFFERENT WORKERS ACCURACY (K=10)
Euclidean Distance
1
10
.000
6.915
2.470
1.844
4.235
4.304
7.179
9.243
1.925
6.873
6.915
.000
5.757
6.724
3.589
4.024
2.789
3.725
7.703
2.536
2.470
5.757
.000
2.505
3.316
3.508
6.041
8.082
2.975
5.768
1.844
6.724
2.505
.000
4.113
4.145
6.952
9.028
2.208
6.661
4.235
3.589
3.316
4.113
.000
2.524
3.978
5.755
4.966
3.614
4.304
4.024
3.508
4.145
2.524
.000
4.376
6.132
4.961
4.022
7.179
2.789
6.041
6.952
3.978
4.376
.000
3.857
7.935
3.027
9.243
3.725
8.082
9.028
5.755
6.132
3.857
.000
10.022
3.739
1.925
7.703
2.975
2.208
4.966
4.961
7.935
10.022
.000
7.670
10
6.873
2.536
5.768
6.661
3.614
4.022
3.027
3.739
7.670
.000
ment, we still use the 10 workers randomly selected in experiment 4. From Table 5, we can observe that the larger
the difference between different workers real accuracy is,
the larger the Euclidean distance is (e.g., worker 8 and
worker 9); the smaller the difference between different
workers real accuracy is, the smaller the Euclidean distance is (e.g., worker 1 and worker 4), which verifies that
the workers quality are more similar in multiple tasks.
Therefore, the algorithm can distinguish and reflect different workers quality.
Experiment 6: This experiment focuses on discussing
the effect of the MapReduce on the promotion of the performance of the algorithm. The algorithm involves three
variables, including the number of workers, the number of
tasks and the number of problems in each task. The scale of
the problem is N3. The proposed worker quality evaluation
algorithm can be divided into three steps:
(1) Sort and group the workers who involved in each
task. The time complexity of the existing sorting algorithm
can reach O(NlogN). So the time complexity of this step is
O(N2logN).
(2) Calculate the worker accuracy. For N tasks, each task
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TPDS.2015.2457924, IEEE Transactions on Parallel and Distributed Systems
D.DANG ET AL.: A CROWDSOURCING WORKER QUALITY EVALUATION ALGORITHM ON MAPREDUCE FOR BIG DATA APPLICATIONS
time/ms
20000*100*100
8300000
8250000
8200000
8150000
8100000
8050000
8000000
7950000
7900000
7850000
1
thread/n
time/ms
10000*100*100
20000*100*100
9000000
8000000
7000000
6000000
5000000
4000000
3000000
2000000
1000000
0
6 CONCLUSION
In this paper, we first proposed a general worker quality
evaluation algorithm, which is applied to any critical
crowdsourcing tasks without pre-developed answers.
Then, to satisfy the demand of parallel evaluation for a
multitude of workers in a big data environment, we implement the proposed algorithm in the Hadoop platform
using the MapReduce programming model. The experimental results show that the algorithm is accurate and
has high efficiency and performance in a big data environment.
In our future studies, we will further consider other
factors that affect worker quality, such as answer time
and task difficulty. And these factors will help realize the
comprehensive evaluation of worker quality to adapt the
worker quality evaluation issue under different situations
for the crowdsourcing mode in a big data environment.
ACKNOWLEDGMENT
D.Dang is the corresponding author of this paper. This
paper is supported by the National Natural Science
Foundation of China under Grant No.60940032,
No.61073034, and No.61370064; the Program for New
Century Excellent Talents in University of Ministry of
Education of China under Grant No.NCET-10-0239; and
the Science Foundation of Ministry of Education of China
and China Mobile Communicaions Corporation under
Grant No. MCM20130371.
REFERENCES
[1]
1
node/n
the communication time in a distributed computing environment), the execution performance of the algorithm has
been significantly improved. Moreover, the larger the
dataset, the more obvious the acceleration is.
From Experiment 6.1 and 6.2, we can see that for computation-intensive tasks, the algorithm on a single machine is unable to solve the performance problem. While
it can be done by distributed computing, and MapReduce
parallel framework is just a good choice. Moreover,
MapReduce just distributes the computing tasks to the
cluster and it does not change the process of the algorithm. Therefore, it will improve the performance without
compromising the accuracy of the algorithm. Moreover,
The MapReduce cluster has horizontal scalability. The
calculated performance of MapReduce can remain approximately linear growth varying with the increase of
the number of nodes. With the expansion of the data size,
the algorithm shows sustained effectiveness. Thus, we
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
D.C. Brabham, Crowdsourcing as a Model for Problem Solving: An Introduction and Cases, Convergence the International Journal of Research Into New Media Technologies,
vol. 14, no. 1, pp. 75-90, 2008.
M. Allahbakhsh, B. Benatallah, A. Ignjatovic, et al, Quality
Control in Crowdsourcing Systems: Issues and Directions,
IEEE Internet Computing, vol. 17, no. 2, pp. 76-81, 2013.
A. Doan, R. Ramakrishnan, and A.Y. Halevy, Crowdsourcing
Systems on the World-Wide Web, Communications of the
ACM, vol. 54, no. 4, pp. 86-96, 2011.
P. Clough, M. Sanderson, J. Tang, et al, Examining the Limits
of Crowdsourcing for Relevance Assessment, IEEE Internet
Computing, vol. 17, no. 4, pp. 32-38, 2013.
B. Carpenter, Multilevel Bayesian Models of Categorical Data
Annotation, unpublished, 2008.
A. Brew, D. Greene, and P. Cunningham, Using crowdsourcing and active learning to track sentiment in online media, In
Proceedings of the 6th Conference on Prestigious Applications
of Intelligent Systems, 2010.
J. Howe, The Rise of Crowdsourcing, Wired Magazine, vol.
14, no.14, pp. 176-183, 2006.
V. C. Raykar, S. Yu, L. H. Zhao, et al, Learning From Crowds,
Journal of Machine Learning Research, vol. 11, no. 2, pp. 12971322, 2010.
J. Manyika, M. Chui, B. Brown, et al, Big Data: The next fron-
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TPDS.2015.2457924, IEEE Transactions on Parallel and Distributed Systems
10
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation
information: DOI 10.1109/TPDS.2015.2457924, IEEE Transactions on Parallel and Distributed Systems
D.DANG ET AL.: A CROWDSOURCING WORKER QUALITY EVALUATION ALGORITHM ON MAPREDUCE FOR BIG DATA APPLICATIONS
1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
11