Professional Documents
Culture Documents
DOI 10.1007/s11761-012-0122-2
Received: 31 March 2012 / Revised: 22 July 2012 / Accepted: 25 October 2012 / Published online: 11 November 2012
Springer-Verlag London 2012
1 Introduction
Unemployment rate prediction has become critically
significant, in particular during economic recession, because
it can not only help government to make decision and
design policies, but also offer practitioners to have a better understanding of the future economic trend. In recent
years, forecast of unemployment rate attracts much attention from governments, organizations, research institutes,
and scholars. A great number of methods are proposed for
unemployment rate prediction. Traditional univariate time
series models have been proposed for the unemployment
rate prediction [3,13,20,22]. For example, a time deformation model is applied to US unemployment data, and the
experimental results indicate that the proposed method has
better performance than other better-known models, such
as the autoregressive integrated moving average (ARIMA)
[22]. Similarly, autoregressive fractionally integrated moving
average (ARFIMA) is offered to analyze the US unemployment trend, and the results show that ARFIMA has a
better forecasting performance than threshold autoregressive
(TAR) and symmetric ARFIMA model [13].
123
34
123
35
f (x) = e
(x.)2
2
cos(1.75x)
(5)
Hidden Layer
Output Layer
(a j y j ) f (.)yi + w ji (t 1)
w ji (t) =
(2)
s=1
2
1
1 + e(2x)
(3)
(x2 )
(4)
where (x ) represents the mean value of Gaussian distribution, and 2 stands for the variance. Spread is a parameter
that reflects the changing speed of RBF. The larger value
of spread means that the neurons are required to fit a fastchanging function, while a smaller spread indicates that the
neurons are needed to fit a smooth function.
Similarly, for wavelet neural network (WNN), the wavelet
function imbedded in hidden layer is regarded as the activation function. This function could be described as follow.
(6)
n
L ( f (xi ), yi ) +
i=1
1
w2
2
(7)
n
i=1
s.t.
(i + j ) +
1
2
w2
(9)
Because in -SVR, selection of in -sensitivity loss function is difficult, -SVR is designed to overcome this problem
by introducing another parameter v (0, 1] for controlling
the number of support vector. And in -SVR, the optimization
123
36
C(v +
1
n
n
i=1
(i + j )) +
1
2
w2
(10)
Nsv
(i j )K (x, xi ) + b
Testing set
Training set
Feature Selection
(11)
i=1
(12)
(13)
(15)
123
No
Methods/Models
Evaluation
Yes
The Unemployment Rate Prediction
As can be seen from Fig. 2, the main process of the proposed framework can be decomposed into the following four
steps.
Step 1: Data collection Both the search engine query data
and the unemployment data are collected to help
build the model. Suggested in [4], two types of the
query data, Local/Jobs and Society/Social Services/Welfare & Unemployment, are supposed to
be related to the unemployment queries. The weekly
counts for the query data are available from 2004
to now at the Google Search Insight (http://www.
google.com/insights/#), and the unemployment data
is available at US Department of Labor (http://www.
ows.doleta.gov/unemploy/claims.asp).
Step 2: Feature selection The query data collected in the
first step is of the low correlation with the predict target. To exclude these outliers and improve
the performance of the model, a Pearson function
is applied to calculate the correlation coefficient
between each feature and the predicted target [9].
Through correlating the search engine query data
and the unemployment data, the top 100 features
(see Appendix) with highest correlation values are
chosen as the original feature set.
Step 3: Modeling Different data mining tools are tested to
measure the fitness between the search engine query
data and the unemployment rate data. The details are
described in Sect. 3.2.
37
Randomly initializing GA
populations
Selection
Crossover
...
P1
...
...
Parameter Set
Dataset
Crossover
...
Pm
...
Randomly initializing GA
F1
F2
F3
...
populations
Fn
Feature Set
Fitness Function
No
Training set
RMSE
Testing set
Maximal
Generation ?
Yes
vector regression models, -SVR and v-SVR are implemented with four different kernel functions: linear, polynomial, RBF, and sigmoid kernel. In the process of fitness
function construction, a five-fold cross-validation, in which
the data are divided into five folds evenly, is carried out,
and each time, four folds are trained by neural networks or
support vector regressions, while the other fold is used as
testing set and is used to validate the performance of data
mining models; furthermore, the average RMSE is calculated through this fivefold cross-validation, and 1/RMSE is
chosen as the value of fitness function.
4 Empirical analysis
4.1 Data description and evaluation criteria
The US government only releases a monthly report of unemployment rate to the public. In order to improve the prediction
performance, instead of forecasting the unemployment rate
itself, the Unemployment Initial Claims (UIC) is used in our
experiments. UIC is a leading indicator of US labor market
to estimate the unemployment rate, which is a weekly report
that issued by US Department of Labor. Thus, the weekly
initial claims data are collected from the Web site of the US
Department of Labor.
On the another hand, as proposed in [4], two types
of the query data, Local/Jobs and Society/Social Services/Welfare & Unemployment, are supposed to be related
123
38
2
(Ai Pi )
RMSE =
n
i=1
MAE =
n
|Ai Pi |
i=1
n
MAPE =
i=1
(16)
|Ai Pi |
Ai
(17)
(18)
-SVR
-SVR
Activation/kernel function
Parameters
Hyperbolic tangent
Learning rate
RBF
Spread
Wavelet
Learning rate 1 (for adjusting the weights of network) and learning rate 2 (for adjusting the scale factor
and displacement factor)
Linear
in Eqs. (8), (9), (10), C in Eqs. (7), (9), (10) and e which is the value of condition for stop training
Poly
in Eqs. (13), (14), (15), d in Eq. (13), r in Eqs. (13), (15), in Eqs. (8), (9), (10), C in Eq. (7), (9), (10)
and e which is the value of condition for stop training
RBF
in Eqs. (13), (14), (15), in Eqs. (8), (9), (10), C in Eqs. (7), (9), (10) and e which is the value of
condition for stop training
Sigmoid
in Eqs. (13), (14), (15), r in Eqs. (13), (15), in Eqs. (8), (9), (10), C in Eqs. (7), (9), (10) and e which is
the value of condition for stop training
Linear
in Eq. (10), C in Eqs. (7), (9), (10) and e which is the value of condition for stop training
poly
in Eqs. (8), (9), (10), d in Eq. (13), r in Eqs. (13), (15), in Eq. (10), C in Eqs. (7), (9), (10) and e which
is the value of condition for stop training
RBF
in Eqs. (8), (9), (10), in Eq. (10), C in Eqs. (7), (9), (10) and e which is the value of condition for stop
training
Sigmoid
in Eqs. (8), (9), (10), r in Eqs. (8), (10), in Eq. (10), C in Eqs. (7), (9), (10) and e which is the value of
condition for stop training
123
39
Activation/kernel function
Iteration
1
NN
-SVR
Hyperbolic tangent
77,957.73
79,619.08
76,909.18
88,358.48
79,610.98
80,491.09
RBF
106,410.07
136,737.73
255,692.73
177,538.48
126,402.07
160,556.22
Wavelet
164,882.24
144,489.20
218,489.45
180,275.99
196,644.55
180,956.29
Linear
53,194.09
56,290.26
55,020.55
54,680.86
57,409.90
55,319.13
Poly
55,193.32
55,788.99
53,336.77
59,073.74
56,444.20
55,967.40
RBF
67,840.33
57,772.81
57,925.41
57,730.18
55,707.13
59,395.17
Sigmoid
-SVR
Average
100,893.90
336,514.17
110,264.82
121,147.52
114,994.74
156,763.03
Linear
53,691.49
51,854.42
54,903.67
55,957.82
51,358.71
53,553.22
Poly
52,578.54
51,799.83
52,961.33
55,934.42
50,330.03
52,720.83
RBF
54,326.95
56,733.23
50,505.49
51,385.02
52,649.24
53,119.98
119,182.46
102,275.04
111,150.78
112,942.50
99,708.12
109,051.78
Sigmoid
Activation/kernel function
Iteration
1
NN
v-SVR
Hyperbolic tangent
58,010.78
60,887.08
56,901.86
66,254.51
56,909.93
59,792.83
RBF
64,166.28
87,371.29
110,593.98
99,175.90
76,442.86
87,550.06
Wavelet
e-SVR
Average
145,274.03
124,797.69
200,542.05
161,490.04
175,488.29
161,518.42
Linear
41,401.69
44,171.93
41,412.31
41,702.17
44,037.61
42,545.14
Poly
42,718.27
41,770.96
41,214.89
44,187.66
44,018.46
42,782.05
RBF
53,664.54
43,167.90
43,324.22
43,446.21
42,147.97
45,150.17
Sigmoid
79,626.90
205,544.91
93,198.05
94,959.17
93,480.45
113,361.90
Linear
38,918.37
39,442.63
40,536.70
41,618.78
38,218.97
39,747.09
Poly
38,353.66
39,649.21
39,951.25
40,081.89
37,886.79
39,184.56
RBF
38,638.26
39,689.48
36,305.30
36,687.78
37,753.21
37,814.81
Sigmoid
93,814.35
88,568.73
78,217.22
84,028.40
77,205.05
84,366.75
123
40
NN
e-SVR
v-SVR
Activation/Kernel function
Iteration
Average
Hyperbolic tangent
14.82
15.84
14.30
16.95
14.24
15.23
RBF
16.32
21.26
28.21
24.46
18.36
21.72
48.53
Wavelet
43.78
35.20
60.92
49.14
53.58
Linear
10.96
11.55
10.82
10.77
11.61
11.14
Poly
11.34
11.14
11.00
11.31
11.96
11.35
RBF
14.68
11.53
11.50
11.40
11.04
12.03
Sigmoid
21.17
50.59
26.36
23.96
25.69
29.56
9.91
10.22
10.18
10.72
9.76
10.16
Linear
Poly
9.74
10.63
10.70
10.04
10.06
10.24
RBF
9.89
10.10
9.18
9.29
9.46
9.58
23.86
24.95
17.84
20.63
19.17
21.29
Sigmoid
0.12503
0.56357
C
2.3622
e
0.10823
Selected features
No. 5, 8, 12, 13, 16, 19, 22, 24, 25, 29, 30, 31, 32, 35, 36, 38, 39, 41, 44, 45, 50, 51, 52, 53, 59, 60, 61, 62, 67, 69, 70, 73, 75, 76, 77, 78, 80,
81, 82, 85, 87, 88, 89, 91, 93, 95, 97, 99, and 100
123
41
21
unemployment claims
71
22
72
unemployment insurance
benefit
23
73
unemployment dol
24
unemployment ca
74
unemployment info
25
unemployment services
75
unemployment commission
26
unemployment security
76
michigan unemployment
benefits
27
unemployment
77
weekly unemployment
insurance
28
to file unemployment
78
weekly unemployment
benefits
31
29
unemployment benefits
79
30
80
green jobs
81
82
unemployment rate
83
unemployment insurance
benefits
34
unemployment benefits
pa
84
unemployment weekly
benefits
85
online unemployment
application
32
33
No.
Key words
No.
Key words
35
unemployment benefit
filing unemployment
51
36
86
unemployment rate ny
52
unemployment ny
37
87
jobs in usa
unemployment office
53
unemployment
compensation
38
state unemployment
benefit
connecticut
unemployment benefits
88
54
unemployment in az
55
unemployment state
56
unemployment insurance
claim
39
dept of unemployment
89
40
90
police jobs
41
for unemployment
benefits
uimn.org
91
dc unemployment
92
unemployment in kansas
unemployment in
michigan
unemployment benefit
claim
unemployment payment
93
94
unemployment online
95
unemployment in florida
unemployment in
colorado
apply for unemployment
online
96
97
benefits of unemployment
insurance
unemployment benefits
insurance
application for
unemployment
benefits unemployment
insurance
98
unemployment eligibility
99
construction jobs
100
state of unemployment
57
unemployment department
of labor
insurance unemployment
58
department of labor
unemployment
washington
unemployment
59
labor department
unemployment
45
10
unemployment file
60
unemployment check
46
11
unemployment insurance
61
unemployment for mn
12
unemployment apply
62
unemployment in indiana
13
department of
unemployment
unemployment website
63
unemployment in california
14
64
snag a job
unemployment
application
unemployment new york
65
unemployment grants
66
unemployment in
pennsylvania
17
washington state
unemployment
67
unemployment benefit
insurance
18
Wisconsinunemployment
benefits
insurance for
unemployment
apply for unemployment
68
69
70
security jobs
15
16
19
20
42
43
44
47
48
49
50
References
1. Askitas N, Zimmermann KF (2009) Google econometrics and
unemployment forecasting. Appl Econom Q 55(2):107120
123
42
2. Blasco N, Corredor P, Del Rio C, Santamaria R (2005) Bad news
and Dow Jones make the Spanish stocks go round. Eur J Oper Res
163(1):253275
3. Chen CI (2008) Application of the novel nonlinear grey Bernoulli
model for forecasting unemployment rate. Chao Solitons Fractals
37(1):278287
4. Choi H, Varian H (2009) Predicting initial claims for unemployment benefits. Google technical report
5. Choi H, Varian H (2009) Predicting the present with Google trends.
Google technical report
6. DAmuri F (2009) Predicting unemployment in short samples with
internet job search query data. MPRA paper no. 18403:117
7. DAmuri F, Marcucci J (2009) Google it! forecasting the US unemployment rate with a Google job search index. MPRA Paper No.
18248:152
8. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS
(2009) Detecting influenza epidemics using search engine query
data. Nature 457(19):10121014
9. Guyon I, Elisseeff A (2003) An introduction to variable and feature
selection. J Mach Learn Res 3:11571182
10. Harvill JL, Ray BK (2005) A note on multi-step forecasting
with functional coefficient autoregressive models. Int J Forecast
21(4):717727
11. Keilis-Borok VI, Soloviev AA, Allegre CB, Sobolevskii AN
(2005) Patterns of macroeconomic indicators preceding the unemployment rise in Western Europe and the USA. Pattern Recogn
38(3):423435
12. Krolzig HM, Marcellino M (2002) A Markov-switching vector
equilibrium correction model of the UK labour market. Empir Econ
27:233254
13. Lahiani A, Scaillet O (2009) Testing for threshold effect in
ARFIMA models: application to US unemployment rate data. Int
J Forecast 25(2):418428
123