You are on page 1of 4

2009 Third International Symposium on Intelligent Information Technology Application Workshops

Stock data analysis based on BP neural network

Jie Zhang Fengjing Shao


Information Engineering College of Qingdao Information Engineering College of Qingdao
University University

QingDao, China QingDao, China


Fage19840721@163.com sfj@qdu.edu.cn

AbstractIn this paper, we apply data mining technology to topology, including the input layer, hidden layer and output
Chinese stock market in order to research the trend of price, it layer. In dozens of neural network models that were put
aims to predict the future trend of the stock market and the forward, researchers often use the Hopfield network, BP
fluctuation of price. This paper points out the shortage that network.
exists in current traditional statistical analysis in the stock,
Hopfield network [3] is the most typical feedback
then makes use of BP neural network algorithm to predict the
stock market by establishing a three-tier structure of the network model, it is one of the models which are most
neural network, namely input layer, hidden layer and output commmonly studied now. The Hopfield network is the
layer. After building the data pre-processing set before data monolayer constituted by the same neuron, and is also a
mining, lots of widely used stock market technical indicators symmetrically connected associative network without
such as the KD indicators, similarities and differences between learning function. It can implement the restriction
exponential smoothing moving average MACD, Relative optimization and associative memory
Strength Index RSI, will be introduced into the model. BP network is the back-propagation network [4]. It is a
Finally,we get a better predictive model to improve forecast multi-layer forward network, learning by minimum mean
accuracy.
square error. It is one of the most widely used networks. It
Keywords-component; Stock Market Forecasting Data can be used in the field of language integration,
Mining AlogorithmTechnical IndicatorsBP neural network identification and adaptive control, etc. BP network is semi-
supervised learning.
First of all, artificial neural network needs to learn a
certain learning criteria, and then it can work. Guidelines for
I. INTRODUCTION e-learning (Electronic Learning) can be listed as below.
The stock market reflects the fluctuation of the market If the result yielded by network is wrong, then the
economy, and receives ten million investors attention since network should reduce the possibility of making the same
its initial development. The stock market is characterized by mistake next time through learning
high-risk, high-yield, so investors are concerned about the
analysis of the stock market and trying to forecast the trend III. DATA COLLECTION AND PRETREATMENT
of the stock market. However, stock market is impacted by In this paper, we study on the analytical stock data from
the politics, economy and many other factors, coupled with the stock analysis software named big wisdom, which is
the complexity of its internal law, such as price (stock index) A
changes in the non-linear, and shares data with high noise stored in file zong.xls, in fact it is a 4255*27 matrix ij ,
characteristics, therefore the traditional mathematical Specific statistics on the output data as shown in table
statistical techniques to forecast the stock market has not 1:Maintaining the Integrity of the Specifications
yielded satisfactory results. Neural networks can
approximate any complex non-linear relations and has TABLE I. PART OF THE RAW DATA PARAMETER TABLE
robustness and fault-tolerant features.Therefore,it is very Stock Name time KDJ.K
suitable for the analysis of stock data A Share Index 20081017 12.9427232742309
A Share Index 20081016 16.9152851104736
II. NEURAL NETWORKS A Share Index 20081015 24.2198543548583
A Share Index 20081014 27.9921245574951
Artificial neural network [12] is a large broad network A Share Index 20081013 31.305269241333
with a number of processing units (neurons) connected. It is A Share Index 20081010 29.2756900787353
an abstract, simplified and simulation to human brain, and A Share Index 20081009 38.862247467041
reflects the basic characteristics of the human brain. A Share Index 20081008 55.4674110412597
Generally, the neural network is the multi-layered network A Share Index 20081007 74.7675552368164

978-0-7695-3860-0/09 $26.00 2009 IEEE 288


DOI 10.1109/IITAW.2009.54
26-day EMA = 0.0741* (the average index of today
We will use the following typical technical indicators in the average index of 26 days) + the average index
economics as input of BP neural network. of 26 days.
A. Moving average (MA) The MACD is calculated as:MACD = smoothing
Moving average line is a statistical mean, which sums up factor (0.2) * (the difference today yesterday's
stock price of certain days and gives out an average, and then average difference) + yesterday's average difference.
connects them into a line to observe the price trend. The D. Relative Strength Index (RS1)
function of moving average is obtaining the average cost
Relative Strength Index is an indicator comparing the
during a certain period, and we can use the average cost
average of closing high and the average of closing low.We
curve and the movement of daily closing price changes in the
can use it to analyze the market and strength in order to
line analysis of the change of bull and bear to study and to
forecast the future of the market trend.
determine possible changes in stock.
RSI = [average of increase / (average of increase +
In order to ensure the accuracy of the model, we have
average of decline)] *100
collected six kinds of MA data from June 10, 1991 to
Average of increase is the number of increase in the
October 17, 2008, as the initial input. They are
average cycle, average of decline is the number of decline in
MA1(Average on the 5th),MA2(Average on the
10th),MA3(Average on the 20th),MA4(Average on the the same cycle.In order to ensure the accuracy of the model,
30th),MA5(Average on the 60th),MA6(Average on the we collected the data from January 16, 1991 to October 17,
120th). 2008, which include all RSI1 (6 Date Line), RSI2 (12 Date
Line) and RSI3 (24 Date Line) data, as the initial input.
B. Random indicator (KDJ)
E. OnBalanceVolume (OBV)
There are a total of three lines standing for random
OBV is the degree of the active investors in the stock
indicators in the stock,namely K line, D line and the J line.
Random indicator not only considers the highest price,the market. If there are a lot of buyers and sellers, the stock
lowest price in the calculation period, but also takes into prices and the volume of trade will rise, the atmosphere of
account of the random amplitude in the course of the the stock market is warm, commonly known as the bull
fluctuation of stock price. Therefore, researchers always market; if there are a few of buyers and sellers, the stock
think that random indicator can more truly reflect the prices and the volume of trade will decline. It can be seen
volatility of stock price,and it plays an important role in that the impact of the rise and fall of stock prices and the
prompting. volume of trade is the OBV of stock. The volume of shares
and stock price can also reflect the degree of the rise and fall
K = 2 K t 1 / 3 + RSV / 3 of popularity of stock.
D = 2 Dt 1 / 3 + K / 3 F. BIAS
RSV = 100(Cn Ln )( H n Ln ) BIAS is the ratio between the application index and the
In the formula: C represents the daily closing price,L moving average. Base on BIAS,we can observe the degree
means the lowest price that day; H stands for the highest which the stock price deviate from the moving average price
price that day. to decide to buy or sell.
J = 3K 2 D BIAS = [(the stock market closing price of today N-
day moving average) / N-day moving average] *100%
C. Moving Average Convergence/Divergence (MACD)
G. Increase scope
The principle of MACD is to use the functions of the
Increase scope= (the stock market closing price of today
signs for aggregation and separation of fast moving average
and slow moving average, in addition to double smoothing the stock market opening price of today) / the stock
operation in order to study and determine the timing of buy market opening price of today.
and sell. in order to avoid the situation that the dimensions of
Set 12 days as the fast moving average line (12 larger data have larger influence in the results than that of
ERA); set 26 days as the slow moving average line smaller data, we normalized the original data. The formula of
(26 ERA). Normalization is as follows:
Set 12-day EMA smoothing factor as 0.1538; set 26- Aij Min( A j )
day EMA smoothing factor as 0.0741.
Max( A j ) Min( A j )
The difference DIF equals to 12-day EMA 26-day
EMA. The data were normalized to between 0 and 1.
12-day EMA = 0.1538* (the average index of today Through the formula editor support by Excel, we obtain
the average index of 12 days) + the average index the data file data2.xls, which includes the normalized data.
of 12 days. Specific data is shown in table 2:
TABLE II. NORMALIZED PARAMETER TABLE

289
Stock Time KDJ.K KDJ.D KDJ.J
v jt (n + 1) = v jt (n) + d tk b j
Index
A Share 20081017 0.093457615 0.173176288 0.130972246
wij (n + 1) = wij (n) + e kj ik rt (n + 1) = rt (n) + d tk
,
Index
A Share 20081017 0.134824588 0.2260067 0.144359518 j (n + 1) = j (n) + e kj
In the formula: are
learning coefficients ( 0< <1, 0< <1) .
Index
A Share 20081017 0.21088833 0.283976933 0.216667157
Index
A Share 20081017 0.250169631 0.331812086 0.232218188 c) Select the next pair of input patterns and then train
Index the network repeatedly according to Step2 until output error
A Share 20081017 0.284669979 0.383362093 0.234265366 reaches the training requirements.
Index In this paper, there are 21 input nodes in the input
A Share 20081017 0.263535621 0.442943266 0.122024424
Index
layer,namely:KDJ.K,KDJ.D,KDJ.J,Volume,turnover,MA.M
A Share 20081017 0.3633621 0.543184644 0.187505566 A1,MA.MA2,MA.MA3,MA.MA4,MA.MA5,MA.MA6,MA
Index CD.DIFF,MACD.DEA,MACD.MACD,RSI.RSI1,RSI.RSI2,
A Share 20081017 0.536274538 0.64220513 0.391765599 RSI.RSI3,OBV,BIAS.BIAS1,BIAS.BIAS2,BIAS.BIAS3;
Index there is only one node named Increase Scope in the output
A Share 20081017 0.737250275 0.701805507 0.696736622 layer. The transfer function of hidden layer is the hyperbolic
Index tangent function: tansig (n); and the transfer function of
output layer is the linear function: purelin(n). The eventual
IV. THE ANALYSIS OF BP NEURAL NETWORK establishment of the network topology is shown in Figure 1.
BP neural network algorithm is a supervised learning Input
algorithm, its main idea is: Enter the study samples,and then Hidden Output
layer
we can use the back-propagation algorithm to adjust the layer layer
weights and bias of network by repeated training. Ensure the i
output vector is close to the expected vector as far as h1
possibl.When the sum of squares of network output layer is I
less than a specified sum of squares, we can complete the O
training and save the weights and bias of the network. h2 1
Specific algorithm steps are as follows:
a) Initialization, appoint the connection weights [ w] ,[
r
v] and the threshold i , t at random. Calculate the unit I2
h2
output of the hidden layer and output layer of the network
from a given pair of input-output patterns. Figure 1. BP neural network topology.
n p
There is only one layer of hidden layer, which contains
b j = ( f wij ai j ) ct = ( f v jt b j rt ) three nodes. Table 3 is the ultimate modeling results.
i =1 j =1
From the table 3, we can see that the input data
bj RSI.RSI1, BIAS.BIAS1, BIAS.BIAS2 and volume have
In the formula: is the actual output of the j th neurons larger impact on output. This lead to a prediction model to
in the hidden layer;
ct is the actual output of the t th describe the specific relationship between input and output
((X 1, X 2, X3, X4) | Y). That is, ((RSI.RSI1 (6),
wij BIAS.BIAS1 (6), BIAS.BIAS2 (12), Volume) | increase
neurons in the output layer; is the connection weight scope).
v
from the input layer to the hidden layer; jt is the TABLE III. THE CORRELATION OF THE INPUT PARAMETERS AND
connection weight from the hidden layer to the output layer. OUTPUT

q RSI.RSI1 0.284907
e kj = [ d t v jt ]b j (1 b j ) BIAS.BIAS1 0.278854
d tk = ( ytk ct )ct (1 ct ) t =1
BIAS.BIAS2 0.108904
Volume 0.0795058
k
d t
KDJ.J 0.0745735
In the formula: is the adjustable error of the output MACD.MACD 0.0721896
k
e j
turnover 0.0708421
layer; s the adjustable error of the output layer. RSI.RSI2 0.0684929
RSI.RSI3 0.0629037
MA.MA3 0.0502497
b) Compute the new connection weights and new MA.MA5 0.0498825
thresholds, formulas are as follows: KDJ.D 0.0453716

290
BIAS.BIAS3 0.0445624 VI. CONCLUSION
MACD.DEA 0.0293132
KDJ.K 0.0264227 BP neural network model shows better prediction results,
MA.MA6 0.02208 the relative error and absolute error are less than 1%. How to
MA.MA4 0.0219186 improve and optimize the model to expand the scope of
OBV 0.0105659 forecast (such as to find a more suitable network, to select
MACD.DIFF 0.00946149 the better relevant input and output parameters of the
MA.MA2 0.00769159 network, to choose some better network parameters and so
MA.MA1 0.00730739 on), that is worthy of further study and exploration.
REFERENCES
V. THE ANALYSIS AND EVALUATION OF THE MODEL [1] [1] Fengjing Shao, Zhong qing yu. Principle and algorithm of data
mining. Sinohydro Press, 2003
For testing the veracity of the neural network model, we
[2] [2] Huadong Qin. Based on neural network forecasting stock market
select some representative stocks 600839 Sichuan trends. Chinese dissertation database, 2005
Changhong, 600547 Shandong gold, 000800 faw car and so [3] [3] Zhijun Peng. Some new data mining method and its application in
onto collect their data from February 16, 2007 to May 17, Chinese securities market. Chinese dissertation database,2005
2009. The results prove that the model can well predict the [4] [4]Yong Liao. Based on Gene Expression Programming and the time
future price trend.Results are shown in Table 4. series analysis of the price of stock. Chinese dissertation database,
2005
TABLE IV. THE TABLE OF FORECAST THE RESULTS

In Table 4 all the parameters are normalized.

Forecast Date Stock Name RSI.RSI1 BIAS.BIAS1 BIAS.BIAS2 Volume Forecast growth Actual
growth

2009/2/12 Sichuan Changhong 0.839232043 0.083747894 0.166373273 0.134738828 +3.0 +3.29


2009/2/12 Shandong gold 0.14558762 0.276729071 0.195829407 0.427022 +5.1 +5.53

2009/2/12 FAW Car 0.125907137 0.245447926 0.180759404 0.42245 +7.0 +6.8


2009/2/12 Aluminum Co. of China 0.533272941 0.473452864 0.16101735 0.152381225 +5.44 +5.64
Ltd
2009/2/12 Dong-E E-Jiao 0.43319235 0.119950476 0.147171098 0.31889076 +7.1 +6.81
2009/2/12 China Shipbuilding 0.593508154 0.1173336 0.184561322 0.1722405 +9.99 +10.01
2009/2/12 Beijing Urban 0.246793743 0.131826315 0.18131229 0.351252708 +9.72 +9.99
Construction
2009/2/12 Qingdao Haier 0.833591976 0.176967836 0.250644348 0.355755692 +7.1 +6.52
2009/2/12 QingDao beer 0.285308362 0.153540384 0.17085949 0.461547399 +4.52 +4.31

291

You might also like