You are on page 1of 6

Journalof

ELSEVIER

Materials Processing Technology


Journal of Materials ProcessingTechnology60 (1996) 399--404

Development of prediction model for mechanical properties of batch annealed thin steel strip by using artificial neural network modelling
P. Myilykoski, J. Larkiola, J. Nylander*

Helsinki University of technology, Laboratory of Materials Processing and Heat Treatment, Vuorimiehentie 2 A, 02150 gspoo, Finland. *Rautaruukki Oy, Thin Sheet Division, 13300 Hiimeenlinna, Finland.

Abstract

It is important to predict the mechanical properties of thin steel strip when it is produced by rolling. For this purpose we have used artificial neural networks (ANN) and the data acquisition system of Rautaruukkl H~imeenlinna Works. This paper shows that it is possible to predict accurately the mechanical properties thin steel strip, by using ANN modelling and the measurement data of the process. The prediction model can be used in different ways. For example the effects of process parameters can be evaluated and the controllable parameters can be changed in such a way that the process yields the desired mechanical properties.

Keywords: Artificial neural networks, modelling, mechanical properties, thin steel strip.

1. I n t r o d u c t i o n

High product quality is an essential requirement in all industrial processes. To achieve that, a thorough understanding of the manufacturing is essential. Physical models form the basis according to which parts of the process can be comprehended. In spite of the general nature of physical models, some statistical correction factors are usually needed to fit the model to a particular process. Furthermore, it is very difficult to form a physical model of a very complex and multidimensional process. Good results have been reported on the utilisation of neural computing to the modelling of complex processes [1, 2]. The ability to model arbitrary process from examples is arguably the most important aspect of ANNs. An ANN is able to form an internal representation of the data given to it by adjusting parameters that control the mapping between inputs and outputs. This paper describes the development process of an ANN model for the prediction of mechanical properties of batch annealed thin steel strips. In what follows the formulation of the model and the steps taken to achieve the desired performance are described.

ously. At the moment the measurements are averaged and focused to a coil of cold rolled strip. The process flow starts at the Raahe Works, where the steelmaking and hot rolling take place. The hot rolled coils are transported to H~imeenlinna Works to be pickled, cold rolled, batch annealed and temper rolled. The hot dip galvanised grades take a different production route at Hfimeenlinna but they are omitted in this study. A Test sample from each coil for uniaxial tension test is taken at the temper mill, or at some of the following cutting lines. The data for this study were collected from a period of approximately 6 months. The measured data from the manufacturing process is focused to one uniaxial tension test sample. It included a concentration of 14 alloying elements, two temperature measurements of the hot rolling stage, strip dimensions, the reduction taken at cold rolling stage, batch annealing parameters and a code describing the sample location within a strip. The desired output of the model was yield strength, tensile strength and tensile strain.

2.1. Neural network model


A multilayer perceptron (MLP) type network was used as a neural network simulator. The MLP used back propagation learning rule to find an optimum mapping between inputs and outputs. The theory behind MLP with back propagation learning rule is well documented in numerous publications [3, 4] and will be omitted here.

2. E x p e r i m e n t a l p r o c e d u r e s

Measured data of the process at Rautarnukki's Hgmeenlinna and Raahe Works are collected continu0924-0136/96/$15.00 1996ElsevierScience S.A.All rightsreserved

PI10924-0136 (96) 02361-8

400

P. Myllykoskiet al. / Journalof MaterialsProcessingTechnology60 (1996)399-404


sible g e n e r a l i s a t i o n t h e t r a i n i n g m u s t be i n t e r r u p t e d w h e n t h e g e n e r a l i s a t i o n is at its best.[6] The A N N developer h a s to choose t h e n u m b e r of h i d d e n l a y e r s a n d nodes. T h e r e a r e some rules of t h u m b (1), b u t e x p e r i m e n t a t i o n w i t h different topologies is advisable. The n e t w o r k topology is d i c t a t e d b y t h e a v a i l a b l e data. A lot of m e a s u r e m e n t s a m p l e s a r e r e q u i r e d i f large networks a r e needed.
M N = , (1)

However, it m a y be useful to p r e s e n t s c h e m a t i c a l l y the basic i d e a s a n d t h e function of such A N N to s t r e s s its simplicity, F i g u r e 1 a n d F i g u r e 2.

Ba~ is
XK Xt I* x1 ~
Input layer

(~~

~
Hidden layer Output layer

t,

N is n u m b e r of samples, M S is d e s i r e d accuracy.

is n u m b e r of weights a n d

Figure 1. Schematic illustration of a three layer perceptron network.

3.1. The effect of hidden neuron amount on the prediction accuracy


The selection of t h e n u m b e r of h i d d e n n e u r o n s is v e r y significant, considering t h e calculation t i m e d u r i n g t r a i n i n g a n d prediction. The l e a r n i n g r a t e a n d m o m e n t u m t e r m h a v e also a s t r o n g effect on t h e convergence d u r i n g t r a i n i n g . The w e i g h t t e r m s a r e selected r a n d o m l y before t r a i n i n g . Thus t h e s t a r t i n g p o i n t on t h e error p l a n e in the h y p e r space d e t e r m i n e d by t h e weights is a l w a y s u n i q u e a n d t h e s e a r c h of a n e r r o r m i n i m u m is likely to each t i m e r e s u l t in a different p r e d i c t i o n error. To d e t e r m i n e the difference b e t w e e n topologies t h a t differ in t h e a m o u n t of h i d d e n neurons, one h a s to t r a i n t h e m from m a n y different i n i t i a l w e i g h t s a n d to evaluate the r e s u l t s statistically. One such e x p e r i m e n t was done for t h e prediction of yield s t r e n g t h . F o u r n e t w o r k topologies w i t h different h i d d e n n e u r o n a m o u n t s were used. The control p a r a m e t e r s of n e t w o r k l e a r n i n g were k e p t c o n s t a n t a n d 200 i t e r a t i o n s were used in all occasions. T r a i n i n g a n d t e s t i n g w i t h i d e n t i c a l d a t a sets were done from 10 different i n i t i a l weights. The r e s u l t s a r e p r e s e n t e d in F i g u r e 3.

X K(O K ~ x..iO)i In=i~=lxiO)i

=~
Out=(1--e~ln) -1

j XlO) 1

Figure 2. The function of one perceptron unit during the calculation of outputs. The i n f o r m a t i o n n e e d e d to calculate o u t p u t s from t h e i n p u t s is s t o r e d in t h e w e i g h t m a t r i x [xi ] of the MLP. T h e o r y does not l i m i t t h e a m o u n t of h i d d e n units or t h e n u m b e r of h i d d e n layers, F i g u r e 1. However, it h a s b e e n p r o v e n t h a t one h i d d e n l a y e r is sufficient to r e p r e s e n t a n a r b i t r a r y m a p p i n g b e t w e e n i n p u t s a n d o u t p u t s [5]. The M L P s i m u l a t o r ( M E F N E T b y M E F O S , Lule&, Sweden) used in t h i s s t u d y w a s r e s t r i c t e d to one h i d d e n layer.

23

22.

~ 20
0

3. R e s u l t s

and discussion
rr" c-

1918

The m e a s u r e of A N N s p e r f o r m a n c e is t h e a b i l i t y to generalise. T h a t is t e s t e d b y feeding t h e n e t w o r k w i t h d a t a n o t i n c l u d e d in t h e t r a i n i n g d a t a . The t e s t e r r o r t y p i c a l l y is r e d u c e d as t h e t r a i n i n g proceeds. However, if t h e t r a i n i n g d a t a a r e m e a s u r e m e n t s from a r e a l process it is a l w a y s d i s t o r t e d b y m e a s u r e m e n t errors. Hence, if t h e t r a i n i n g is c o n t i n u e d too f a r t h e A N N begins to l e a r n t h e d i s t o r t i o n s of t h e t r a i n i n g data. A n i m m e d i a t e r e s u l t is a n i n c r e a s e in t h e t e s t error. So to e n s u r e t h e b e s t pos-

17,
16

_T_

10

15

20

25

30

35

Number of hidden nodes


Figure 3. The effect of hidden neuron amount on the mean prediction accuracy of yield strength, with error bars indicating standard deviation of RMS error.

P. Myllykoski et al. / Journal of Materials Processing Technology 60 (1996) 399-404

401

The flexibility of M L P is d e t e r m i n e d b y t h e a m o u n t of h i d d e n neurons. L a r g e n e t w o r k s a r e able to model m o r e complex a n d m u l t i d i m e n s i o n a l vector spaces w i t h h i g h e r accuracy t h a n t h o s e w i t h few h i d d e n nodes. However, t h e e r r o r surface becomes m o r e complex w i t h a n increasi n g n u m b e r of h i d d e n nodes. The "landscape" of t h e e r r o r surface is p r o b a b l y m o r e w r i n k l e d a n d t h e RMS e r r o r is less l i k e l y to find a global m i n i m u m or even a v e r y good local m i n i m u m . Still, given enough t r a i n i n g cycles, a complex M L P m a y find a s o l u t i o n t h a t s u r p a s s e s s i m p l e a r c h i t e c t u r e s in p r e d i c t i o n accuracy. On t h e o t h e r h a n d , t h e c a l c u l a t i o n t i m e s for t r a i n i n g become v e r y long, i f t h e n u m b e r of h i d d e n nodes is high. S i m p l e MLPs a r e also m o r e c e r t a i n to r e a c h a r e l a t i v e l y low e r r o r minimum.

T h e r e a r e no definite r u l e s as to how t h e t r a i n i n g a n d prediction errors correlate. However, it is g e n e r a l l y approved t h a t s t o p p i n g t h e t r a i n i n g before p r e d i c t i o n e r r o r for i n d e p e n d e n t d a t a begins to rise, is a good w a y to ens u r e good g e n e r a l i s a t i o n . Thus, r a t h e r t h a n observing t h e evolution of t r a i n i n g error, one s h o u l d e v a l u a t e t h e prediction e r r o r for i n d e p e n d e n t d a t a d u r i n g t r a i n i n g . Since t h e prediction e r r o r m a y h a v e s e v e r a l e r r o r minim u m s , t h e t r a i n i n g s h o u l d be c o n t i n u e d a s f a r as possible. The a v a i l a b l e t i m e a n d d e s i r e d accuracy a r e t h e m o s t i m p o r t a n t factors w h e n sufficient t r a i n i n g t i m e s a r e considered.

3.3. Learning rate


L e a r n i n g r a t e factor controls t h e t r a i n i n g process b y affecting t h e a m o u n t of c h a n g e in w e i g h t s in each w e i g h t u p d a t e . The l e a r n i n g r a t e h a s a s t r o n g effect on t h e opt i m u m n u m b e r of t r a i n i n g cycles. The h i g h e r t h e l e a r n ing rate, t h e sooner t h e o p t i m a l level of g e n e r a l i s a t i o n is reached, F i g u r e 5.

3.2. The evolution of prediction accuracy during training


The RMS e r r o r for t h e t r a i n i n g d a t a is r e d u c e d as t h e t r a i n i n g continues, t h o u g h t h e i m p r o v e m e n t of one ite r a t i o n becomes v e r y s m a l l a f t e r a while, a n d s o m e t i m e s t h e RMS e r r o r m a y b e g i n to increase, F i g u r e 4.

4.01

20
--=~ Test1 - - - e - - Test,?.

4.00 3.99 =~ . . ~ =.~ . 3.98


-

19

--~---o--

Test3 Train2

- - c - - - Train1 Q. 18

o~
3.973.963~ n" 3.95 3.94 "----= Mixed - - _ 9 _ "1 o

nr
3.9315
= i i = i I = i i i = i m = I

o
1O0 1000

5'0

,~0

,~o

2~o

Training iterations
Figure 5. The effect of learning rate on the optimum number of training iterations. Mixed denotes different learning rates for each layer of weights. The other shown learning rates were used for all layers in the three cases. L e a r n i n g r a t e can not be set to a too h i g h level, or t h e l e a r n i n g becomes u n s t a b l e . Low l e a r n i n g r a t e e n s u r e s a s m o o t h e r a n d more s t a b l e l e a r n i n g b e h a v i o u r b u t it also r e q u i r e s long t r a i n i n g times. E x p e r i m e n t a t i o n on differe n t l e a r n i n g r a t e s is u s u a l l y needed, to achieve a fair l e a r n i n g performance. The use of different l e a r n i n g r a t e s for each l a y e r of weights was found to be t h e b e s t solution; l a r g e v a l u e in t h e u p s t r e a m of t h e calculation a n d low v a l u e in t h e down s t r e a m . The t e s t e r r o r oscillated in some occasions, t h u s l e a d i n g to l a r g e n u m b e r of t r a i n i n g i t e r a t i o n s (-5000) before a n acceptable e r r o r level was reached. However, t h e m i x e d l e a r n i n g r a t e s p r o d u c e d t h e b e s t p e r f o r m a n c e on average.

Number of trainig cycles


Figure 4. The effect of training iterations on the test error of yield strength.

However, t h e p r e d i c t i o n accuracy for i n d e p e n d e n t d a t a does not i m p r o v e w i t h t r a i n i n g accuracy i n f i n i t e l y in a n y case. The t e s t e r r o r s t a r t s to i n c r e a s e aider some n u m b e r of i t e r a t i o n s , even i f t r a i n i n g e r r o r continues to decrease. V e r y different a m o u n t s of i t e r a t i o n s to achieve t h e b e s t p r e d i c t i o n accuracy h a v e b e e n p r o p o s e d [1, 2]. The r a n g e of o p t i m u m n u m b e r of t r a i n i n g cycles is b e t w e e n 10 a n d 10 4. As F i g u r e 4 shows, t h e p r e d i c t i o n e r r o r m a y h a v e more t h a n one e r r o r m i n i m u m (ease 2). S o m e t i m e s a v e r y poor prediction accuracy t h a t oscillates b e t w e e n e r r o r m i m i n u m s a n d m a x i m u m s in t h e b e g i n n i n g of t h e t r a i n i n g m a y l a t e r drop to a c o n s i d e r a b l y lower e r r o r level.

402

P. Myllykoskiet aL / Journal of Materials Processing Technology 60 (1996) 399-404

3.4. Analysis o f prediction accuracy

The p r o d u c t m i x t u r e of R a u t a r u u k k i H~imeenlinna W o r k s is v e r y u n e v e n l y d i s t r i b u t e d b e t w e e n different grades. Two g r a d e s of t h e cold rolled products account for 70% of all. Yet t h e p r e d i c t i o n accuracy should be as good for e v e r y grade. I t m a y s e e m a very difficult task, b u t f o r t u n a t e l y t h e M L P s e e m s to be able to produce v i r t u a l l y e q u a l p r e d i c t i o n accuracies for even t h e grades w i t h only a few s a m p l e s a v a i l a b l e , w i t h o u t e x t e r n a l help in t e r m s of favoring t h e r a r e r grades, F i g u r e 6. This is p r o b a b l y due to t h e r e l a t i v e s i m i l a r i t y of the g r a d e s a n d t h e flexibility of t h e M L P model.
I0 A v e r a g e error 5 = Standard error deviation 15~
2O

cluster is aligned in s a m e o r i e n t a t i o n in r e s p e c t of t h e zero prediction error. The prediction e r r o r grows w i t h t h e d e s i r e d yield s t r e n g t h inside t h e cluster. The only difference b e t w e e n t h e clusters is t h e a m o u n t of available s a m p l e s a n d t h e r a n g e of prediction errors.
806040[] o a ~o

20 o,~ -2o
-40

a_

-60 -80 -100 0


I I

500

1000

1500

&

I1
i1

-" ~

10 ~

-5
o

! ,Oo
[] o o [] I I ff.l o o o

Figure 8. The prediction accuracy for the test data, when sorted according to the measured yield strength. Sufficient training data. This b e h a v i o u r would s u g g e s t t h a t t h e M L P h a s interp r e d the d a t a space to consist of t h r e e s e p a r a b l e regions. Each region seems to h a v e a s e p a r a t e s u b m o d e l inside t h e one MLP. The r i g h t m o s t cluster in F i g u r e 7 consists of t h e high s t r e n g t h micro alloyed steels, b u t it is not quite clear as w h a t t h e difference is b e t w e e n t h e left a n d center clusters. However, this m e a n s t h a t t h e M L P h a s been able to f o r m u l a t e t h r e e s e p a r a t e models inside of one by e x a m p l e s p r e s e n t e d in t h e t r a i n i n g data. Conside r i n g t h e sufficiency of one model to r e p r e s e n t t h e whole i n p u t space, it m a y be concluded t h a t one model is sufficiently accurate. F i g u r e 8 shows a s i m i l a r case as F i g u r e 7, only w i t h a more r e p r e s e n t a t i v e d a t a used in t h e t r a i n i n g a n d testing. The m i d d l e cluster from F i g u r e 7 h a s d i s a p p e a r e d , a n d t h e r i g h cluster is j u s t visible. The correlation b e t w e e n prediction e r r o r a n d d e s i r e d yield s t r e n g t h h a s a l m o s t dimished, a n d t h e overall prediction accuracy is better.
~ 25

-I0 o I

10

100 A m o u n t of s a m p l e s

1000

10000

Figure 6. The gradewise prediction accuracy on the function of sample amount. The cold rolled g r a d e s m a y be divided into two categories, c a r b o n - m a n g a n e s e (C-Mn) a n d micro alloyed steels. Micro alloyed g r a d e s a r e produced m a i n l y w i t h n i o b i u m a n d t i t a n i u m . The two categories a r e v e r y different from t h e m e t a l l u r g i c a l p o i n t of view. However, b o t h k i n d s of g r a d e s were i n c l u d e d in t h e t r a i n i n g a n d t e s t i n g data, w i t h o u t significant differences in the prediction error level.
80 60 o.. 40 20 0 -g ~5 Q) o_ -20 -40 -60 -80 -100 0 500 1000 1500 2000 2500 3000 3500

.c m

2o
15

~ to

"o

Figure 7. The prediction accuracy for the test data, when sorted according to the measured yield strength. Insufficien training data. F i g u r e 7 i l l u s t r a t e s a n i n t e r e s t i n g b e h a v i o u r of t h e MLP, c o n s i d e r i n g t h e p r e d i c t a b i l i t y of C-Mn a n d micro alloyed steels w i t h t h e s a m e model. It is a common tendency for t h e f i t t i n g functions to over e s t i m a t e t h e low e n d a n d to u n d e r e s t i m a t e t h e high e n d of t h e p r e d i c t e d range. The p r e d i c t i o n errors in F i g u r e 7 a r e s o r t e d according to t h e d e s i r e d yield s t r e n g t h in a s c e n d i n g order. T h r e e clusters of prediction errors s e e m to formate. Each

-10

Figure 9. The effects of data division on the prediction accuracy of one steel grade (5022). "Others" shows the case with 5022s removed from both training and testing. The rest have 5022s in testing and training data as labeled. A n o t h e r r e s u l t t h a t s u p p o r t s t h e conclusion to favour one model for all g r a d e s is p r e s e n t e d in t h e F i g u r e 9. One of t h e two most produced g r a d e s s e e m e d to be clearly more i n a c c u r a t e l y p r e d i c t a b l e t h a n t h e o t h e r

P. Myllykoski et al. /Journal of Materials Processing Technology 60 (1996) 399-404

403

grades on average. It is mildly micro alloyed formable construction steel. The training and testing data were divided in several ways to investigate the possibilities to improve the prediction accuracy of that grade. First the 5022:s were extracted to form separate training and testing files, but compared with the case where training included all grades the prediction accuracy was practically unchanged. Next the original training and testing data sets were divided into C-Mn and micro alloyed steels with a criterion of 0.01% content of niobium or titanium. However, practically same level of prediction accuracy was obtained for both sets of 5022's. The division of data did not improve the prediction accuracy for the grade 5022. The reason for the inaccuracy of the model in case of grade 5022 yields from the sample taking practice, which is dictated by the production route of the product. This rule applies to all grades. The samples for tensile tests are taken from either end of the strip or from the middle. A sample taken from the middle of the strip is most representative, considering the average mechanical properties of the strip. A middle sample can only be taken if the coil is divided from the middle or if it is cut into sheets. Unfortunately a middle sample is a rarity in case of grade 5022, Figure 10. The mechanical properties of the strip's beginning may be very different from the rest of the strip, due to the overheating during annealing. Since other inputs for the MLP are averages of the whole strip, it is natural that the predictability of mechanical properties is less accurate for the grade 5022.
100 90 80
70.

Table 1. Absolute prediction accuracy of the mechanical properties for all steel grades. Average prediction error Absolute Yield strength [MPA/%] Tensile strength [MPA]%] Tensile strain (A80) [%/%] 8.57 6.06 1.40 Relative 4.31 1.86 3.43

4. C o n c l u s i o n s

As a result of this study, it was possible to determine the optimal parameter settings for the training of the MLP. Development times for new models with new data are probably greatly shortened. It was discovered that experimentation with different settings is useful practice. Each case to be trained is unique, and general thumb rules give only a starting point. Available time sets the limits for training practice. Conservative parameter settings; low learning rate and few hidden neurons, ensure a stable learning behaviour. However, if development time if not a limiting factor, experimentation with higher learning rate and larger network may result in a more accurate prediction model. The results showed that one MLP is able to model inhomogenous data with good accuracy. It forms submodels to represent data clusters that are logically different. A clear example of this was the extraction of submodels for C-Mn and micro alloyed steels. The MLP probably divides the data space into subspaces that are modelled individually. Behaviour of the model is thus predictable only within and close to these subspaces. The areas between subspaces but within the whole data space form a grey area where the response of the model in unpredictable. A prediction model can give information of the mechanical properties along the process chain as to what the properties are likely be if next phases go as usually. This information can be utilised to correct the following process parameters to obtain the desired properties. External knowledge has to be available to do the corrections, because the present MLP model is not able to assist the decisions. The model can be used to calculate the expected mechanical properties, but it is not yet ment to work as an optimisation model.

"5

60.
50.

40

30 20 10 0
1 3 Sample type 5

Figure 10. Proportions of the tensile test sample types for the grade 5022 and the rest of the grades. "1" denotes beginning, "3" middle and "5" end of the strip. When prediction accuracies are evaluated in cases where desired responses of the model are distorted by deviations in measurement data, it is difficult to draw definite conclusions of the model's validity. The model may be more accurate than what the difference between calculated and measured results would suggest. "Bad" data may be sufficient to form simple models but modellisation of complex dependencies probably require more accurate data. The prediction accuracy has been shown to be dependent on many parameters of the MLP as well as the parameters of the process. Many combinations have been tested. The best (so far) prediction results are summarised in Table 1.

5. F u t u r e w o r k

The next step will be the formulation of an optimisation model, which uses the prediction model to find a better combination of process parameters. The previously described behaviour os MLP to form submodels has to be taken into account. Some kind of gray area detection has to be available, to give credibility to the predictions. Intensive research is being done to solve the problem with aid of self organising maps (SOM).

404
References

P. Myllykoski et al. /Journal of Materials Processing Technology 60 (1996) 399-404 [3] Freeman, J., Skapura, D., Neural Network Algorithms, Applications and programming Techniques, AddisonWesley Publishing Company Inc., USA, 1991. Rumelhart, D., Hinton, G. and Williams, 1~, Learning Internal Representations by Error Propagation, Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Vol. 1, MIT Press, 1986. Cybenko, G., Approximation by superpositions of a sigmoidal function, Mathematics of control, signals and systems 2, pp 303 - 314, 1989.

[l]

Parlos, A., Chong, K., Atiya A., Application of Recurrent Multilayer Perceptron in Modeling Complex Process Dynamics, IEEE Transactions on Neural Networks, Vol. 5, No. 2, March 1994. Boger, Z., Experience in Developing and Analyzing Models of Industrial Plants by Large Scale Artificial Neural Networks, Proceedings of ICANN '95, October 9 - 13, 1995, Paris, France.

[4]

[2]

[5]

You might also like