Professional Documents
Culture Documents
dieter.haerle @k-ai.at
Department of Computing, Imperial College London, UK
AbstractWe introduce CompNN, a compositional method for is not appropriate to be employed for circuits that are sensitive
the construction of a neural-network (NN) capturing the dynamic to nonlinear input-output (I/O) impedance interaction.
behavior of a complex analog multiple-input multiple-output In this paper we propose an alternative machine-learning
(MIMO) system. CompNN rst learns for each input/output
pair (i, j), a small-sized nonlinear auto-regressive neural network approach for automatically deriving neural network (NN)
with exogenous input (NARX) representing the transfer-function abstractions of integrated circuits, up to a prescribed tolerance
hij . The training dataset is generated by varying input i of the of the behavioral features. NN modeling of the electronic
MIMO, only. Then, for each output j, the transfer functions hij circuits has been recently used in electromagnetic compat-
are combined by a time-delayed neural network (TDNN) layer, fj . ibility (EMC) testing, where the authors modeled a band-
The training dataset for fj is generated by varying all MIMO
inputs. The nal output is f = (f1 , . . ., fn ). The NNs parame- gap reference circuit (BGR) by utilizing an echo-state neural
ters are learned using Levenberg-Marquardt back-propagation network [6]. The developed NN model has shown a reasonable
algorithm. We apply CompNN to learn an NN abstraction of a time performance in transient simulations; however, since the
CMOS band-gap voltage-reference circuit (BGR). First, we learn model is coded in Verilog-A, simulation speed-up is limited.
the NARX NNs corresponding to trimming, load-jump and line- In [7], authors used a novel nonlinear autoregressive neural-
jump responses of the circuit. Then, we recompose the outputs
by training the second layer TDNN structure. We demonstrate network with exogenous input (NARX) for modeling the
the performance of our learned NN in the transient simulation power-up behavior of a BGR. They demonstrated attractive
of the BGR by reducing the simulation-time by a factor of 17 improvements in the time performance of the transient simu-
compared to the transistor-level simulations. CompNN allows us lations of the analog circuit within the Cadence AMS simulator
to map particular parts of the NN to specic behavioral features by using this NARX model.
of the BGR. To the best of our knowledge, CompNN is the rst
method to learn the NN of an analog integrated circuit (MIMO In the present study, we employ a compositional approach
system) in a compositional fashion. for learning the Overall time-domain behavior of a complex
multiple-input multiple-output (MIMO) system, CompNN.
I. I NTRODUCTION CompNN learns in a rst step, for each input i and each
One challenging issue in the pre-silicon verication process output j a small-sized nonlinear auto-regressive NNs with
of recently produced analog integrated circuits (IC)s is the de- exhogeneous inputs (NARX) representing the transfer-function
velopment of high performance models for carrying out time- fiJ from i to j. The learning data-set for hij is generated by
efcient simulations. Transistor-level fault simulations of a sin- varying only input i of the MIMO system and keeping all
gle analog IC can take up to one or two weeks to be completed. the other inputs constant. In a second step, for each output
As a result, over the past years, several attempts to develop fast j, the transfer functions hij learned in Step 1, one for each
behavioral models of the analog ICs have been investigated. input i, are combined by a (possibly nonlinear) function fj ,
Examples include SystemC, Verilog HDL, Verilog AMS and which is learned by employing another NN layer. The training
Verilog-A models which in principle can realize very accurate dataset in this case is generated by applying all the inputs at
models [1][4]. However, the development of such models is the same time to the MIMO system. Once we constructed
not automated, and the associated human effort is considerable fj for each output j, the overall output function is obtained
[1]. Moreover, this approach is unlikely to scale up to large as f = (f1 , . . ., fn ). We evaluate our approach by modeling
libraries of existing analog components. Another example is the main time-domain behavioral features of a CMOS band-
real number modeling (RNM). In this method, analog parts gap voltage reference circuit. We initially extract such features
of a mixed-signal IC are functionally modeled by real values from the BGR circuit by using our I/O decomposition method.
and they are used in top-level system on chip verication [5]. Consequently, we dene trimming, load jump and line jump as
RNMs are fast and cover a large range of circuits. However, for the main behavioral features of the circuit to be modeled. Indi-
analog circuits including continuous time feedbacks or detailed vidual small-sized NARX networks are designed and trained in
RC lter effects, it is not recommended [5]. Moreover, RNM order to model the BGR output responses. We recompose the
2236
quence, modeling of the time-domain features requires pow-
erful nonlinear system identication techniques and solutions.
A nonlinear auto-regressive neural network with exogenous
input (NARX NN) appears to be a suitable framework for
deriving approximations, up to a prescribed, maximum error,
of the BGR. It has been previously demonstrated that a
recurrent nature of the NARX NN topology consisting of only
seven neurons and three three-time input-and-output delay
components is able to precisely reproduce the turn-on behavior
of the circuit [7].
In this paper, we use the NARX architecture for modeling
in addition the trimming, load jump and line jump behaviors
of the BGR. The output of the network is constructed from the
time-delayed components of the input signal X(t) and output
signal Y (t), (see for example [9]):
Fig. 2. NARX neural network architecture. Note that the network realizes
Y (t) = f (X(t 1), X(t 2), ..., X(t nx ), a recurrent topology where the output is fed-back into the input layer and
(1) causes further renements on the predicted output signal Y 1.
Y (t 1), Y (t 2), ..., Y (t ny )).
The nx and the ny factors, dene the input and output delays, TABLE II
that is, the number of discrete time steps within the input and T RANSIENT SIMULATIONS PERFORMED FOR THE TRAINING DATA
COLLECTION PURPOSES
the output histories that the component has to remember, in
order to properly predict the next value of the output [10]. Simulation Simulation Time CPU time Input Output of samples
n = nx + ny is the number of input nodes. Trimming 100 s 1.4 s Trimming inputs V out1V 695
Load Jump 540 s 1.3 s Load Prole V out1V 433
The size of the hidden layer is highly dependent on the num- Line Jump 200 s 1s VDD V out1V 501
2237
0
10
Train Train
-2
10-2 10
-4
10
-4
10-4 10
0 5 10 15 20 0 5 10 15 0 10 20 30
23 Epochs 16 Epochs 32 Epochs
Training
Validation
400
Training 500 Training
Validation
600 Validation Test
Test Test 400 Zero Error
Zero Error Zero Error
Instances
300
Instances
Instances
400 300
200
200
200
100 100
0 0 0
-0.2359
-0.1769
-7.6e-05
0.1178
0.1768
0.2357
0.2947
-0.118
-0.059
0.058
-0.014
-0.5
-0.4
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
0.5
0.0056
-0.5
-0.4
-0.3
-0.2
-0.1
0.1
0.2
0.3
0.4
0.5
Errors = Targets - Outputs
Errors = Targets - Outputs Errors = Targets - Outputs
1.2 1.005
0.8
Training Targets
0.6 Training Outputs 1 1
Validation Targets
0.4 Validation Outputs
Test Targets 0.8 0.995
Test Outputs
0.2 Errors
Response
0 0.6 0.99
0.1 0.2 10-3
Targets - Outputs Targets - Outputs 5
Error
Targets - Outputs
Error
Error
0 0 0
-0.1 -0.2
200 400 600 100 200 300 400 -5
Samples 100 200 300 400
Samples Samples
Fig. 3. Network performance of the trimming, load jump and line jump NARX behavioral models. A, B and C display the performance of the NARX neural
network model of trimming, load jump and line jump, respectively, throughout the training process. The MSE is reduced drastically by each training step.
In all three cases, the process terminated as soon as the validation dataset error stopped descending after 6 consequent epochs. D, E and F shows the error
histogram of training samples for the NARX model of trimming, load jump and line jump behavior, respectively. Note that most of the instances error are
close to the zero error line for each case. G, H and I represent the output of the band-gap circuit together with its neural network response for trimming, load
jump and line jump behaviors, respectively. They also show the generated output error per sample.
method, which is a modied version of the Gauss-Newton Parameter is the key to the fast convergence [15]. When this
training algorithm, results in fast convergence of the gradient parameter is zero, the LM method realizes the common Gauss-
to its minimum since it is unaccompanied by calculation of Newton algorithm. If increases throughout the training
Hessian matrix. We initially dene a cost function as follows: process, it is multiplied by an increase value. On the contrary,
1 when a training step results in a decrease of the value of ,
E(w, b) = (f (w, b)k tk )2 , (3) its value gets reduced by a decrease value. As a result, the
2
kK
cost function moves in a fast way towards the error reduction
where E(w, b) stands for the error rate as a function of the within each training epoch. The parameters initial values and
weight w, and bias values b, f (w, b)k is the output generated descriptions employed within the LM training algorithm are
by the neural network and tk is the target outputs. We then summarized in the Table III.
try to minimize the error function for each training iteration For starting the training process, the collected samples are
with respect to the synaptic weights. w which is calculated randomly divided into three data subsets consisting of:
by the LM method and it is given by:
Training set (70%): This dataset is employed during the
w = [J T (w)J(w) + I]1 J T (w)(f (w) t), (4)
training process.
Accordingly, the updated value of the weights is computed as: Validation set (15%): This dataset is used for generaliza-
tion and validation purposes. It also plays a role in the
wnew = w + w. (5)
termination of the training process.
where J(w) is the Jacobian matrix comprising the rst-order Test set (15%): This dataset provides an additional eval-
derivatives of the error function with respect to weight values. uation test after the training phase. It is not deployed
2238
1.2
1.1
1 1
1.05
0.95 0.995
0.9 1
0.9 0.95 1 1.05 1.1 0.99
Target 1 1.05 1.1 1.15 1.2 0.995 1 1.005 1.01
Target Target
-4 -4 -3
10 10 10
Correlations
Zero Correlation
15
Correlations
Zero Correlation
3
Correlations
Zero Correlation
3 Confidence Limit Confidence Limit 2.5 Confidence Limit
2
2 10
Correlation
Correlation
Correlation
1.5
1
5 1
0 0.5
0
0
-1
-0.5
-20 -10 0 10 20 -20 -10 0 10 20 -20 -10 0 10 20
Lag Lag Lag
Fig. 4. Linear regression and error auto-correlation function (ACF) representation of the NARX behavioral models. A, B and C show the regression analysis
which is performed on the behavioral features, respectively for the trimming, load jump and the line jump. On the left-hand side axes of each regression
plot the tting line function of the NARX output and the selected target values is computed. Note that R stands for the regression coefcient. D, E and F
demonstrate the error ACF calculated for our NARX models. blue bars represent the correlation distribution of the lagged errors and the red lines are the
95% condence bounds (limit lines are located at an error correlation correspond to 2 standard error (SE)). For an ideal model, the error ACF will
be a single bar at the lag zero while for a reliable model most of the lagged error components are located within the condence boundaries.
2239
6
T3 T2 T1 Trimming Input 1 (T1)
8
Load Applied to the 0.49V Output
Trimming input 2 (T2) Load Applied to the 1V Output
5 Trimming input 3 (T3)
2 2
1
0
000
0
100 200 300 400 500 600 100 200 300 400 500 600
Samples Samples
6
T3 T2 T1 Trimming Input 1 (T1)
Trimming Input 2 (T2)
6
T3 T2 T1 Trimming Input 1 (T1) 8
Load Applied to the 0.49V Output
Trimming Input 2 (T2) Load Applied to the 1V Output
5 Trimming Input 3 (T3) 5 Trimming Input 3 (T3)
Trimming Inputs (V)
Trimming Inputs (V)
2 2 2
1 1
0
000 000
0 0 100 200 300 400 500 600
200 400 600 200 400 600
Samples Samples
Samples
Fig. 5. Time-response of the trained neural networks. A, B and C represent the input and output response of the NARX networks resembling trimming,
load jump and line jump, after the training process where a simulink block of the network is generated. Training input data is applied to the network and its
corresponding output is recorded. In B, we applied two load proles, one to the 0.49V output and the other one to the 1V output. Since the 0.49V output of
the BGR is created by using a resistor devision on the 1V output pin, at the output of the 1V pin we see the effect of the load connected to the 0.49V, as
well. D and E depict two different input sets that are applied to the trained trimming neural network, in order to check the behavior of the NARX network in
case of input patterns unalike the training input pattern. The same is checked for the load jump network in F. Note that the network generated a reasonable
response in case of dissimilar input patterns in both cases.
layer of the neural network. The regression performance of e(t) = Y (t) T (t). The error correlation rate for the lag
the NARX network for each individual behavioral feature is i, i , is computed as follows:
shown in the Figures 4A-C. the regression coefcients R, are
calculated to be close enough to R = 1 which is the case of T
t=i+1 (et e)(eti e)
ideal model. Moreover, the tting-line function between the i = T , (6)
t=1 (et e)
2
output of the NARX and target values are computed for each
network.
where T is the number of lags in time, which in our case is
In order to assess the efciency of the network and the set to 20 and e stands for the average of the output error time-
training process, we calculate the error auto-correlation func- series. Ideally, the AFC comprises a single bar at the lag zero
tion (ACF) in each case. The ACF explains how the output and the correlation rates of the other lagged-error components
errors are correlated in time [16]. Let the output error time- are zero. For a reliable model we set a 95% condence limit
series, e(t), be the difference between the generated output equal to 2SE , where SE is the standard error for checking
of the NARX network, Y (t), and the target values, T (t), the importance of the ith lag for the autocorrelation, i , and
2240
1.5
Output (V)
0.5
0
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Samples
Fig. 6. Two-layer neural network structure. A) Four NARX behavioral models are fed into the second layer network. B) Cadence schematic environment
prepared to perform the co-simulation of Simulink model in Cadence AMS Desinger C) Response of the the BGR (solid red line) and its model (dashed blue
line) to the training data. D) Response of the circuit and the model to the test pattern.
it is roughly calculated as follows: neural network models generates a response in case of different
ij input datasets (Figures 5D-F), which satisfy such condition.
(1 + 2 j=1 i 2 ) Note that once the training process is terminated, the simu-
SE = . (7)
T lation time of the trained neural network is very fast. The CPU
Figures 4D-F show the error ACF plots for our trimming, load time recorded by MATLAB to perform our validation simu-
jump and line jump networks, respectively. The horizontal lations is on average in the range of some milliseconds. Our
red lines are the 95% condence bounds. Note that in all learned models show improvements in the time performance
cases most of the error autocorrelation samples are within the by a factor of 17 when compared to their analog counterparts,
condence limits. This underlines the accuracy of the model. during transient simulations. We experimentally verify such
Furthermore, in order to observe the behavior of the trained results in the following.
NARX models after the training process, we perform vali-
dation simulations by applying training datasets and datasets V. R ECOMPOSITION FUNCTION : A TIME - DELAYED
different from the training sets to the network. Figures 5A-F NEURAL NETWORK LAYER
show the applied input proles together with the time response In this section we select a recomposition function f , as
of the networks, trimming, load jump and the line jump. We described in Section II, for combining behavioral models of the
observe that the neural networks output reasonably follows BGR including the power-up behavior. By using the LM back-
its target values in all cases. Based on the specication of our propagation algorithm we train a time-delayed neural network
BGR, the acceptable error-rate at the 1V-output is 5%. Our (TDNN) comprised of three input delay elements and 200
2241
hidden-layer neurons, to be able to take the generated output of achieved sensible enhancement in the time performance of the
the four pre-trained NARX models and to predict the correct simulation.
1V-output pin of the BGR. The structure is selected with the For future work, we intend to exploit our NARX models
same approach as that of NARX models. Figure 6A represents in the verication of analog integrated circuits, where the
the structure of the two-layer network. The network response instantaneous response of the network together with its high
to the training and test dataset is shown in Figure 6B and 6C, level of accuracy results in signicant improvements in the
respectively. Matlab CPU time for executing the simulation of performance of the pre-silicon analog fault simulations.
the network is approximately 50ms.
ACKNOWLEDGMENTS
VI. C O - SIMULATION OF MATLAB / SIMULINK MODELS AND We would like to thank Inneon for training, mentoring
ANALOG DESIGN ENVIRONMENT and provision of the tool landscape. This work was jointly
Here we utilize the Cadence AMS Designer/MATLAB co- funded by the Austrian Research Promotion Agency (FFG,
simulation interface in order to evaluate the performance of Project No. 854247) and the Carinthian Economic Promotion
the designed neural network model within the Analog Design Fund (KWF, contract KWF-1521/28101/40388). Part of this
Environment (ADE) of Cadence software, where we execute research work was carried out while the rst author was
analog ICs fault simulations [8]. Inside the co-simulation visiting Imperial College London in 2016.
platform, a coupling module is provided in order to link R EFERENCES
Simulink and Cadence schematics environments. Figure 6A
[1] R. Narayanan, N. Abbasi, M. Zaki, G. Al Sammane, and S. Tahar,
and 6B show the simulation environments in Simulink and On the simulation performance of contemporary ams hardware descrip-
Cadence schematics respectively. We apply inputs to the neural tion languages, in 2008 International Conference on Microelectronics.
network block in Simulink and simultaneously run a transient IEEE, 2008, pp. 361364.
[2] M. Shokrolah-Shirazi and S. G. Miremadi, Fpga-based fault injection
simulation in the Cadence ADE. Figure 6C and 6D depict the into synthesizable verilog hdl models, in Secure System Integration
results of the co-simulation in case of training input dataset and Reliability Improvement, 2008. SSIRI08. Second International
and test input dataset, correspondingly. The total CPU time Conference on. IEEE, 2008, pp. 143149.
[3] F. Pecheux, C. Lallement, and A. Vachoux, Vhdl-ams and verilog-ams
for such transient simulations is calculated as 1.07s while the as alternative hardware description languages for efcient modeling of
same simulation of the transistor-level BGR takes 17.8s to be multidiscipline systems, IEEE transactions on Computer-Aided design
completed. As a results, we gain a simulation speed-up by a of integrated Circuits and Systems, vol. 24, no. 2, pp. 204225, 2005.
[4] W. Zhao and Y. Cao, New generation of predictive technology model
factor of 17. for sub-45 nm early design exploration, IEEE Transactions on Electron
Devices, vol. 53, no. 11, pp. 28162823, 2006.
VII. C ONCLUSIONS [5] S. Balasubramanian and P. Hardee, Solutions for mixed-signal soc
verication using real number models, Cadence Design Systems, 2013.
We employed a new neural network modeling approach for [6] M. Magerl, C. Stockreiter, O. Eisenberger, R. Minixhofer, and A. Baric,
complex MIMO systems (CompNN). We modeled individual Building interchangeable black-box models of integrated circuits for
I/O behavioral functions of the system by training NARX neu- emc simulations, in Electromagnetic Compatibility of Integrated Cir-
cuits (EMC Compo), 2015 10th International Workshop on the. IEEE,
ral networks. We then merged the overall behavioral features 2015, pp. 258263.
by training a second layer TDNN. CompNN enabled us to [7] R. M. Hasani, D. Haerle, and R. Grosu, Efcient modeling of complex
dene a one-to-one mapping from specic behavioral features analog integrated circuits using neural networks, in 2016 12th Confer-
ence on Ph. D. Research in Microelectronics and Electronics (PRIME).
of the system to certain parts of the model. We illustrated IEEE, 2016, pp. 14.
the performance of our modeling approach by designing [8] Cadence. Cadence Virtuoso AMS Designer Simulator, cosimulation of
behavioral NN models for a CMOS band-gap voltage reference mixed-signal systems with matlab and simulink. [Online]. Available:
http://www.mathworks.com/products/
circuit. Individual, small-sized NARX networks were designed [9] H. T. Siegelmann, B. G. Horne, and C. L. Giles, Computational
and trained to imitate the trimming, load jump and line jump capabilities of recurrent narx neural networks, IEEE Transactions on
responses of the BGR. Such pre-trained networks together with Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 27, no. 2, pp.
208215, 1997.
the power-up behavior, were fed into a second time-delayed [10] S. A. Billings, Nonlinear system identication: NARMAX methods in
network in order to generate a single block representing the the time, frequency, and spatio-temporal domains. John Wiley & Sons,
BGR. 2013.
[11] J. Heaton, Introduction to neural networks with Java. Heaton Research,
The performance of the instructed networks were quali- Inc., 2008.
tatively and quantitatively analyzed by carrying out linear [12] C.-W. Hsu, C.-C. Chang, C.-J. Lin et al., A practical guide to support
regression analysis, computing the error auto-correlation func- vector classication, 2003.
[13] H. Demuth, M. Beale, and M. Hagan, Neural network toolbox 8.4,
tion and calculating the error histogram for each model. We Users guide, 2015.
conrmed the level of generalization and the accuracy of [14] D. W. Marquardt, An algorithm for least-squares estimation of non-
such predictive neural networks by illustrating the output linear parameters, Journal of the society for Industrial and Applied
Mathematics, vol. 11, no. 2, pp. 431441, 1963.
response of the models to various input patterns different [15] M. T. Hagan and M. B. Menhaj, Training feedforward networks with
from the training patterns. We subsequently created a single the marquardt algorithm, IEEE transactions on Neural Networks, vol. 5,
neural network block by adding the second layer for merging no. 6, pp. 989993, 1994.
[16] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time series
the behavioral features and training the network. Finally we analysis: forecasting and control. John Wiley & Sons, 2015.
employed the designed network in a transient simulation and
2242