Professional Documents
Culture Documents
School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
2
School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
3
VLSI Design and Education Center (VDEC), University of Tokyo, Tokyo, Japan
s.ghandali@ut.ac.ir, b.alizadeh@ut.ac.ir, fujita@ee.t.u-tokyo.ac.jp, navabi@ut.ac.ir
original forms is taken into consideration. Then common
coefficients and common cubes are extracted using the
kernel/co-kernel extraction technique from [7]. Common
sub-expressions are determined using algebraic division
technique. This method is only applicable to those
polynomials in which linear blocks exist explicitly.
Abstract
This paper describe a system-level approach to improve the
area and delay of datapath designs that perform polynomial
computations over
, which are used in many applications
such as computer graphics and digital signal processing
domains. This approach optimizes the implementation of
multivariate polynomial systems in terms of the number of
arithmetic operations by performing optimization on a
system level prior to high-level synthesis. Univariate
functional decomposition of polynomial expressions and
canonization form over
are used in this method. We use
GAUT high-level synthesis tool to generate RTL datapath
architectures for the optimized polynomials. Experimental
results on a set of benchmark applications with polynomial
expressions show that this method outperforms conventional
methods in terms of the area of the sequential datapath
architectures in speed optimization mode with an average
improvement of 25.81%, and the required clock cycles in two
modes of speed optimization and area optimization, with an
average improvement of 23.48% and 38.24%, respectively.
Keywords
High-level synthesis, system-level transformations, register
transfer level (RTL), polynomial datapath, univariate
functional decomposition, canonization form
1. Introduction
2m
, K=<k1, k2,
gcd(2m , di=1 ki !)
.
, kd> for each ki= 1, 2, , i, and
2 ,
2
SF(n) is the least
such that n divides k!, and denotes
Smarandache function [10]. gcd(x,y) computes the greatest
common divisor of x and y.
(1)
to
may be
polynomial over
zero.
3. Motivational Example
2. Preliminaries
to
[9]. Let
,,
be p given
polynomial functions over
to
as the specification where = < x1,x2,,xd > is a vector of d
input variables and n1, n2, , nd denote size of the
corresponding variables.
represents the finite set of
integers {0, 1, , 2n-1}. m is the size of the output bit-vector
f.
Theorem 1: Let f be a polynomial function from
to
. Then according to [9], f can be
uniquely represented in a canonical form as (1), where Yk is
falling factorial of degree
( denotes the ring of
integers) and is defined as follows,
Y0(x)=1, Y1(x) = x,
is 0,
(a)
(b)
Figure 1: (a) Datapath architecture of the polynomials, implemented using factorization, (b) Datapath architecture of the
polynomials, implemented using our proposed method
f2(x) = g o h = (x3 + x) o (x2+x) = (x2+ x)3+ x2+x = h3+h =
h(h2+1) = h(f1+1).
The optimized polynomial system requires only 3
multiplications and 2 additions. We have used GAUT as a
high-level synthesis tool to generate datapath architectures
for the polynomial systems. GAUT tool has been used in
many academic projects, and its HLS algorithms for binding,
allocation, and scheduling are well documented [12].
We have used GAUT to generate datapath architectures
for two modes; speed optimization and area optimization in
which only one functional unit is considered for each
operation type existed in the design. The datapath
architecture of the polynomials, implemented using
factorization, in the speed optimization mode is shown in
Fig. 1(a). The datapath architecture of the polynomials,
implemented using our proposed method is shown in Fig.
1(b).
The results reported by GAUT for the polynomials,
implemented using factorization and our proposed method
are shown in Table 1. We have used notch library,
provided by GAUT, and we have set clock cycle to 20. This
table reports area and number of the clock cycles, registers,
multiplexers, and functional units (adder, subtracter,
multiplier) in the datapath architectures of the factored
Speed
Optimization
Area
Optimization
Cycles
Registers
Muxes
+
FU
Area
Cycles
Registers
Muxes
+
FU
Area
6
14
160
4
0
6
530
22
13
320
1
0
1
91
Proposed
Method
6
4
32
1
0
1
91
6
4
32
1
0
1
91
where fe
,..,
,...,
(2)
i-1 i+1
named
f1 = y4(x3+2x2)+y2(x6+2x5-x4-x3+2x2-x)+y(5x6+10x5-5x45x3+10x2-5x),
so
={f1(x)=5x6+10x5-5x4-5x3+10x2-5x, f2(x)=x6
5 4 3
2
+2x -x -x +2x -x, f4(x)= x3+2x2}.
f2 based on the variable y is represented as
f2 = x6(y3)+x4(y4-2y3+2y2-2y)+x2(y3)+x(y4+2y2-2y),
{y3,y4-2y3+2y2-2y,y4+2y2
so
-2y}.
i-1 i+1
(3)
1,
1,
,
1,
Ab = a,
,
2
So h is obtained as h= x + x2
x.
1 0 0
0
0 1
2,
1 ,
0 0 1
1
So g is obtained as g = x2+x.
0
1
1
fe1,..,ei1,ei
fe1 ,..,ei1,ei
1 ,..,ed
1 ,..,ed
(4)
step 1: ( k = 1)
0,
,
1,
step 2: ( k=2 )
1
2
i-1 i+1
fe1,..,ei1,ei
1 ,..,ed
(5)
where (
is a new right decomposition factor of
is a new indecomposable part of
fe ,..,e ,e ,..,e ,
1
fe
i-1 i+1
, and
is a member of _
. Please note
i-1 i+1
Original
Optimized
15
8
21
12
320
224
5. Experimental Results
FU
+
3
2
1
1
Area
Muxes
Registers
Cycles
12
4
1028
356
Our approach
DIRU
MVCS
SG2
16
20
336
4
0
7
613
13
24
304
4
0
7
613
13
26
288
4
0
15
1277
11
22
272
3
0
7
605
Horner
DIRU
Quad
SG2
16
20
324
3
0
7
605
17
21
306
3
0
7
605
11
20
304
4
0
7
613
11
20
288
2
0
7
597
0.00
13.42
18.32
27.39
Technique in [6]
DIRU
PSK
SG2
17
24
352
2
1
13
1103
15
37
368
3
1
21
1775
15
15
256
2
1
5
439
13
19
336
2
1
5
439
Technique in [1]
Cycles
Registers
Muxes
+
FU
Area
Cycles
Registers
Muxes
+
FU
Area
Cycles
Registers
Muxes
+
FU
Area
Cycles
Registers
Muxes
+
FU
Area
DIRU
PSK
Quad
17
22
384
2
1
11
937
12
19
272
3
1
6
530
15
16
243
2
1
5
439
13
17
320
2
1
4
356
Our approach
Technique in [1]
Technique in [6]
Horner
Cycles
Registers
Muxes
+
FU
Area
Cycles
Registers
Muxes
+
FU
Area
Cycles
Registers
Muxes
+
FU
Area
Cycles
Registers
Muxes
+
FU
Area
DIRU
PSK
Quad
72
13
592
1
1
1
99
52
16
672
1
1
1
99
36
12
411
1
1
1
99
41
16
624
1
1
1
99
DIRU
PSK
SG2
86
15
656
1
1
1
99
91
27
1056
1
1
1
99
36
11
432
1
1
1
99
48
17
704
1
1
1
99
DIRU
Quad
SG2
68
13
560
1
0
1
91
72
17
800
1
0
1
91
55
18
752
1
0
1
91
49
18
720
1
1
0
91
DIRU
MVCS
SG2
86
16
672
1
0
1
91
71
22
832
1
0
1
91
65
19
832
1
0
1
91
52
18
752
1
1
0
91
%
0.00
8.38
37.92
38.68
6. CONCLUSION
References
[1] S. Ghandali, B.Alizadeh, Z. Navabi and M. Fujita,
"Polynomial Datapath Synthesis and Optimization
Based on Vanishing Polynomial over Z
and
Algebraic Techniques," 10th ACM-IEEE conference
on Formal Methods and Models for Co-Design
(MEMOCODE), 2012, pp.65-74.
[2] B. Alizadeh and M. Fujita, "Modular Datapath
Optimization and Verification Based on ModularHED," IEEE Trans. on Computer-Aided Design of
Integrated Circuits and Systems (TCAD), vol. 29, pp.
1422-1435, 2010.
[3] B. Alizadeh and M. Fujita, "Improved heuristics for
finite word-length polynomial datapath optimization,"
ACM- IEEE International Conference on ComputerAided Design - Digest of Technical Papers (ICCAD),
2009, pp. 739-744.
[4] O. Sarbishei, B. Alizadeh and M. Fujita, "Polynomial
datapath optimization using partitioning and
compensation
heuristics,"
Design
Automation
Conference (DAC), 2009, pp. 931-936.
[5] B. Alizadeh and M. Fujita, "Modular-HED: A
Canonical Decision Diagram for Modular Equivalence
Verification of Polynomial Functions," fifth Workshop
on Constraints in Formal Verification (CFV), 2008,
pp. 22-40.
[6] S. Gopalakrishnan and P. Kalla,"algebraic techniques
to enhance common sub-expression elimination for
polynomial system synthesis," Design, Automation &
Test in Europe (DATE) Conference, 2009, pp. 1452 1457.
[7] A. Hosangadi, F. Fallah and R. Kastner, "Optimizing
polynomial expressions by algebraic factorization and
common subexpression elimination," IEEE Trans. on
Computer-Aided Design of Integrated Circuits and
Systems(TCAD), vol. 25, pp. 20122022, 2006.
[8] A. Hosangadi, F. Fallah, and R. Kastner, "Factoring
and eliminating commonsub expressions in polynomial
expressions," in Proc., ACM- IEEE International
Conference on Computer-Aided Design (ICCAD),
2004, pp. 169-174.
[9] Z. CHEN,"On polynomial functions from Zn1Zn2
Znr to Zm," Discrete Math., Vol. 162, No. 13, pp. 67
76, 1996.
[10] F. Smarandache, "A function in number theory,"
Analele Univ. Timisoara, Fascicle 1, vol. XVII, pp.
7988, 1980.
[11] MATLAB version 8.2, (2013), (computer software),
The MathWorks Inc., Natick, Massachusetts.
[12] P. Coussy, et al., "GAUT: A High-Level Synthesis
Tool for DSP Applications," High-Level Synthesis: