You are on page 1of 15

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 11, NO.

1, JANUARY 2012

105

Low Complexity Design of Ripple Carry and BrentKung Adders in QCA


Vikramkumar Pudi and K. Sridharan, Senior Member, IEEE
AbstractThe design of adders on quantum dot cellular automata (QCA) has been of recent interest. While few designs exist, investigations on reduction of QCA primitives (majority gates and inverters) for various adders are limited. In this paper, we present a number of new results on majority logic. We use these results to present efcient QCA designs for the ripple carry adder (RCA) and various prex adders. We derive bounds on the number of majority gates for n-bit RCA and n-bit BrentKung, Kogge Stone, LadnerFischer, and HanCarlson adders. We further show that the BrentKung adder has lower delay than the best existing adder designs as well as other prex adders. In addition, signal integrity and robustness studies show that the proposed BrentKung adder is fairly well-suited to changes in time-related parameters as well as temperature. Detailed simulations using QCADesigner are presented. Index TermsArea, BrentKung adder, cell count, Han Carlson adder, inverters, KoggeStone adder, LadnerFischer adder, majority gates, quantum-dot cellular automata, ripple carry adder.

I. INTRODUCTION ONTEMPORARY microprocessors and applicationspecic integrated circuits are largely based on the complementary metal oxide semiconductor (CMOS) technology. It is believed that the performance of various circuits in current CMOS-based architectures is close to reaching the limit. When the feature size of transistors is reduced to a nanometer, quantum effects such as tunneling take place [1]. Further, when device scaling takes place, the interconnections do not scale automatically due to the effects of wire resistance and capacitance [2]. Alternatives to conventional CMOS technology are therefore being explored. These include the quantum-dot cellular automata (QCA) [3] and the single electron transistor (SET) [4]. The work described in this paper concerns designs in QCA. QCA is based on electron connement in dots. The basic element in QCA, namely the QCA cell, represents a bit through the conguration of charge in the cell. The key aspect of QCA is that interaction between cells is purely Coulombic and there is no transport of charge between cells [5]. The cellular automata

(CA) notion is due to the fact that state of a given cell at a particular time depends on the state of its neighbors during the previous clock cycle. Logic primitives in the QCA model are majority gate and inverter. The motivation for the research described in this paper stems from studies on AND-OR gate realizations of Boolean functions in combinational logic and the minimal realizations for threevariable functions in QCA described in [6]. Prior studies suggest that design complexity can be substantially inuenced by the majority gate count. The number of majority gates (and to an extent inverters) also indirectly determines the cell count due to QCA wires in a design. It is therefore of interest to design methods that involve systematic reduction of majority gates and inverters. To the best of our knowledge, there has been no prior work on deriving simplied expressions involving majority gates and inverters for various types of multi-bit adders. The contributions of this paper are as follows. We present a number of new results on majority logic reduction and apply the same to multibit adders. In particular, we present efcient QCA designs for the ripple carry adder (RCA) and various prex adders. We derive bounds on the number of majority gates for n-bit RCA and n-bit BrentKung, KoggeStone, Ladner Fischer, and HanCarlson adders. We further show that the BrentKung adder has lower delay than the best existing adder designs as well as other prex adders. In addition, signal integrity and robustness studies have been performed. The Brent Kung adder yields the correct output for temperature varying from 1 to 23 K. Simulations using QCADesigner [7] support the theory presented. The rest of this paper is organized as follows. Basic notations pertaining to QCA are presented in the next section. Section III presents new results on majority gates. Section IV presents results on the RCA. The efcient QCA design of the BrentKung prex adder is presented in Section V. Section VI presents majority gate results for KoggeStone, HanCarlson, and Ladner Fischer prex adders. Section VII presents simulation results for the proposed adders. Section VIII presents comparisons of various adders including the best existing adder designs. Conclusions are presented in Section IX.

Manuscript received February 2, 2010; revised January 23, 2011 and May 5, 2011; accepted May 9, 2011. Date of publication May 27, 2011; date of current version January 11, 2012. The review of this paper was arranged by Associate Editor D. Hammerstrom. The authors are with the Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai-600036, India (e-mail: happyvikramm@gmail.com; sridhara@iitm.ac.in). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TNANO.2011.2158006

II. TERMINOLOGY AND PRIOR WORK A. Basics of QCA Fig. 1 depicts QCA cells, each with four quantum dots. Each QCA cell is occupied by two electrons. The locations of the electrons determine the binary states. It is worth noting that Fig. 1 shows only a conceptual four-dot cell (the implementation technology is not xed here).

1536-125X/$26.00 2011 IEEE

106

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 11, NO. 1, JANUARY 2012

Fig. 1.

QCA cells with electrons indicating possible polarizations.

Fig. 5.

QCA inverter and majority gate.

Fig. 2.

QCA Clock zones.

Fig. 3.

Interdot barriers in a clocking zone.

1(D1 latch) will be in the switch phase so electrons in QCA cell polarize as per clock zone 0, similarly, for clock zone 2(D2 latch) and clock zone 3(D3 latch). Among the four clock zones, only one zone will be in the hold state and has valid data at a time. In the implementation, 16 cells are assumed per clock zone (to allow comparisons with prior work [11]). Primitives in the QCA model consist of a wire, inverter and a majority gate and are depicted in Fig. 5. A QCA design permits two options for crossover, termed coplanar crossover and multilayer crossover. While the coplanar crossover uses only one layer but involves usage of two cell types (termed regular and rotated), the multilayer crossover uses more than one layer of cells (analogous to multiple metal layers in a conventional IC). The multilayer crossover is used in this paper for wire crossings since we can effectively cross signals over on another layer and the extra layers of QCA can be used as active components of the circuit [8]. Further, multilayer QCA circuits can potentially consume much less area as compared to planar circuits [8]. Moreover, some studies [12] indicate that coplanar crossover is difcult to fabricate in the molecular implementation. B. Prior Work

Fig. 4. Array of QCA cells with four clocks applied functions like a D-latch [8].

QCA cells are clocked using a four-phase clocking scheme. Four clocks with a phase difference of 90 are shown in Fig. 2. Phases in QCA clocking, namely switch, hold, release, and relax [9] are depicted in Fig. 3. From an unpolarized state, a cell is polarized during the switch phase depending on the state of the neighboring cells. In the hold phase, a cell retains its polarization. During the release and relax phases, QCA cells are unpolarized. In view of the polarization and unpolarization, each clock zone behaves like a D-latch. Hence, when a sequence of clocks is applied to an array of QCA cells as shown in Fig. 4, the clocking information can be conveyed through a numbered D-latch system [8]. The convention followed for the clocking in Fig. 4 is the same as in [8], [10], [11]. The numbering distinguishes one clock zone from another via appropriate subscripts. D0, D1, D2, and D3 represent the D-latches of clock zones 0, 1, 2, and 3 respectively. During the switch phase of the clock zone 0 (D0 latch), electrons in the QCA cell polarize as per the input polarization. In the hold phase of clock zone 0, clock zone

1) Majority Logic Algebraic Manipulation and Synthesis: Research on binary majority decision elements motivated by the development of devices such as parametrons and Esaki diodes has been reported as early as 1960 [13]. The author in [13] describes how ordinary Boolean algebra can be augmented to include a majority decision operator. Realization of a one-bit adder using three majority gates and two inverters is shown in [13]. An extension to this work is presented in [14] where a new algebra for the logical design with majority decision elements is derived and applied to the one-bit adder design. An approach based on decomposition and rearrangement, to majority element-based synthesis of networks of components having limited fan-in, is presented in [15]. Strategies of [14] and [15] are equally elegant with respect to synthesis of a broad class of functions. Extensions to multibit adders are not, however, discussed in these works. A distributive law pertaining to majority logic is described in [16]. While [14] presents an algebraic method for the logical design, the authors in [17] present a geometric method that uses Veitch diagrams (and extensions) for synthesis using i-input majority gates of a variety of n-argument switching functions.

PUDI AND SRIDHARAN: LOW COMPLEXITY DESIGN OF RIPPLE CARRY AND BRENTKUNG ADDERS IN QCA

107

An approach based on the notion of logically passive self-dual (LPSD) has been presented in [18]. An extension to this work is presented in [19]. All of these are interesting from a theoretical point of view and offer valuable insight for the majority logic-based design. However, their focus seems to be more on simplication of Boolean expressions rather than on special adder designs (which is of primary interest to us in this paper). 2) QCA-based Digital Design: A simulation tool for QCA has been reported in [7] while a synthesis tool is described in [20]. The authors in [6] present an evaluation of several threevariable Boolean functions and conclude with study of a 1-bit full adder. A performance comparison of some quantum-dot CA adders is presented in [21]. The modular design of conditional sum adders has been studied in [22]. The authors in [23] have presented a QCA design methodology based on the traditional CMOS circuits design ow and a SPICE model. Probabilistic modeling of adder circuits using Bayesian networks is presented in [24]. Cell count, area, and latency have been studied for multibit adders (especially RCA and CLA) in [10]. Robust QCA adder designs that exploit proper clocking schemes are proposed in [25] but they do not study special adders such as prex adders. Probabilistic analysis of molecular quantum-dot cellular adders is presented in [26]. Reliability of magnetic QCA adders and electrostatic QCA adders is studied via probabilistic transfer matrices in [27]. Robust adders based on QCA are described in [28]. Energy dissipation per clock cycle in QCA adder circuits is studied in [29]. An interesting extension of [10] to a new type of adder called CFA is presented in [11]. A recent text on design and test of digital circuits in QCA is [30]. III. NEW RESULTS ON MAJORITY GATES We present three new lemmas on reduction of the number of majority gates which are directly of interest in obtaining efcient designs of prex adders. Given three binary inputs, a, b, and c, the majority voting logic function can be expressed in terms of fundamental Boolean operators M (a, b, c) = ab + bc + ac [3], [6]. Lemma 1: If a, b and c are three binary inputs, then M (a, b, M (a, b, c)) = M (a, b, c). Proof: M (a, b, M (a, b, c)) = ab + b(ab + bc + ac) + a(ab + bc + ac) = ab + bc + ac + abc = M (a, b, c) Q.E.D.

= ab + ac + ab b c = a(b + c) + b(a + ac) b = a(b + c) + b(a + c) = M (a, b, c) Q.E.D.

As a consequence of Lemma 2, we have M (a, M (a, b, c), c) = M (a, b, c) and M (M (a, b, c), b, c) = M (a, b, c). Lemma 3. Let f1 , f2 , and f3 be three Boolean functions such that f1 and f2 satisfy f1 f2 = f1 and f1 + f2 = f2 . Then M (f1 , f2 , f3 ) = f1 + f2 f3 Proof: M (f1 , f2 , f3 ) = f1 f2 + f1 f3 + f2 f3 = f1 + (f1 + f2 )f3 = f1 + f2 f3 Q.E.D.

IV. RIPPLE CARRY ADDER DESIGN IN QCA An interesting consequence of Lemma 2 is a new Lemma 4 described in the following. This lemma establishes that carry generation requires one majority gate and sum generation requires just two majority gates plus one inverter in a one-bit full adder. Let ai , bi , and ci be inputs to a full adder and let si and ci+1 be its outputs. Lemma 4: A 1-bit full adder can be realized using three majority gates and one inverter. Proof: From the results in [6], we have ci+1 = M (ai , bi , ci ). (1)

The sum si can be expressed in three equivalent formats as given by (3), (4), and the following: si = M (M (ai , bi , ci ), M (ai , bi , ci ), ci ). If bi in (2) is complemented instead of ci , we have si = M (M (ai , bi , ci ), M (ai , bi , ci ), bi ) Similarly, si = M (M (ai , bi , ci ), M (ai , bi , ci ), ai ). Using Lemma 2 on (4), we have si = M (M (ai , bi , ci ), M (M (ai , bi , ci ), bi , ci ), ai ). (5) (4) (3) (2)

As a consequence of Lemma 1, we have M (a, M (a, b, c), c) = M (a, b, c) and M (M (a, b, c), b, c) = M (a, b, c). Lemma 2: Let a, b and c be three binary inputs. Then M (a, b, M (a, b, c)) = M (a, b, c). Proof: M (a, b, M (a, b, c)) = ab + a( + c + ac) ab b + b( + c + ac) ab b

Q.E.D. A one-bit full adder that incorporates appropriate clocking is shown in Fig. 6. The D-latch convention presented in [8] enables us to obtain the total circuit delay. One D-latch (namely, D0) is used to indicate that one-quarter of a clock is required to apply the inputs to the majority logic. One-fourth clock zone delay is assumed when a majority gate is immediately followed by an inverter or vice-versa (D1 is introduced at the output of inverter that follows the majority gate [8]). Proceeding this way, we have a total circuit delay of 1 clock (4 clock zones) for generating Si as well as Ci+1 for a 1-bit adder.

108

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 11, NO. 1, JANUARY 2012

Fig. 7. 4-bit RCA critical path composed of seven D-latches (including one for input and one for each majority gate).

Fig. 6. Full adder realization using three majority gates and one inverter; numbered D-latches enable delay determination.

The result presented in Lemma 4 improves upon a result in [6] that requires two inverters. We can use the result on a 1-bit full adder to derive the following for an n-bit RCA. Corollary 1: An n-bit RCA requires at most 3n majority gates and n inverters. From Corollary 1, we see that a 4-bit RCA requires 12 majority gates and four inverters. Note that the path from input to the last output contains seven clock zones as shown in Fig. 7. So we have a total delay of 1 3 clocks. While the RCA is simple, 4 the delay increases (linearly) as the size of the adder increases. This suggests study of other types of adders. V. BRENTKUNG PREFIX ADDER IN QCA Prex adders constitute an interesting class of parallel adders [31]. They are based on reducing carry computation to a prex computation. Among the various adders reported in the literature [32][35], the BrentKung adder has been chosen rst for efcient QCA realization in view of the (relatively) small growth in the number of associative operations as a function of the adder size. This is exemplied via the prex graph of a 16-bit Brent Kung adder depicted in Fig. 8. The small shaded circles in Fig. 8 represent the associative operator dened in (6). The prex graph indicates 4, 11, and 26 associative operations respectively for BrentKung adders of sizes 4, 8, and 16 bits, respectively. In general, for an n-bit BrentKung adder, the number of levels (of the corresponding prex graph) is 2 log2 (n) 1 [31] while the cost in terms of the number of associative operations is 2n log2 (n) 2 (this can be inferred as the sum of associative operations at each level in the prex graph). Substantially larger number of associative operations are required for other prex

Fig. 8.

16-bit BrentKung adder prex graph.

adders (as indicated in Table II) although other prex adders too have O(log2 n) levels. The number of associative operations inuences the amount of majority logic (as established later via Theorems 1, 2, 3, and 4). We begin by presenting a general formulation of prex adders in terms of the associative operator dened as follows [34] (Note that is also the fundamental carry operation [31]): (Gi , Pi ) (Gj , Pj ) = (Gi + (Pi Gj ), Pi Pj ). (6)

In particular, (7), (8), (9), and (10) apply to all forms of prex adders. Square brackets are used in (8)(10) to indicate the previous step in recursion. Let gi = ai bi and pi = ai + bi : (c1 , 0) = (g0 , p0 ) (c0 , 0) (c2 , 0) = (g1 , p1 ) [(g0 , p0 ) (c0 , 0)] (c3 , 0) = (g2 , p2 ) [(g1 , p1 ) (g0 , p0 ) (c0 , 0)] (c4 , 0) = [(g3 , p3 ) (g2 , p2 )] [(g1 , p1 ) (g0 , p0 ) (c0 , 0)] . (10) (7) (8) (9)

PUDI AND SRIDHARAN: LOW COMPLEXITY DESIGN OF RIPPLE CARRY AND BRENTKUNG ADDERS IN QCA

109

TABLE 1 BRENTKUNG PREFIX ADDER EQUATIONS

Fig. 9.

Eight-bit BrentKung adder prex graph.

To develop the equations for the BrentKung prex adder, we dene the following: (gi:j , pi:j ) = (gi , pi ) (gi1 , pi1 ) . . . (gj 1 , pj 1 ) (gj , pj ). (11) Suppose m is an integer dened as j < m < i; then we can rewrite (11) as shown in (12): (gi:j , pi:j ) = (gi:m , pi:m ) (gm 1:j , pm 1:j ). Using (12), we can rewrite (8) as c2 = (g1:0 , p1:0 ) (c0 , 0) In general, ci+1 can be expressed as ci+1 = (gi:0 , pi:0 ) (c0 , 0). If initial carry c0 = 0, then ci+1 = gi:0 . A. Majority Logic Optimization for an 8-bit BrentKung Adder The prex graph for an 8-bit BrentKung adder is given in Fig. 9. The associative operators are indicated by lled circles with the labels below indicating the corresponding outputs. The equations corresponding to the 8-bit BrentKung adder prex graph (see Fig. 9) are given in Table I. This assumes the initial carry c0 is 0 (and is in line with the assumptions in [34]). The generate and propagate phase is not labelled explicitly as a stage. We now present a lemma (Lemma 5) on the majority logic that will be used to obtain an efcient QCA design for the Brent Kung adder. Let gi = ai bi and pi = ai + bi . Lemma 5 shows that the calculation of sum requires just two majority gates and one inverter using gi , pi , ci , and ci+1 . Lemma 5: At most two majority gates and one inverter are required for obtaining the sum at each stage (when gi and pi are incoming) of a BrentKung adder. Proof: From (1) and (2), we have si = M (M (ai , bi , ci ), M (ai , bi , ci ), ci ). However, M (ai , bi , ci ) = ai bi + (ai + bi )ci = gi + pi ci = M (gi , pi , ci )(from Lemma 3). Hence, ci+1 = M (ai , bi , ci ) = M (gi , pi , ci ). Similarly M (ai , bi , ci ) = M (gi , pi , ci ) From Lemma 2, M (gi , pi , ci ) = M (gi , pi , M (gi , pi , ci )) = M (gi , pi , ci+1 ). Therefore si = M (ci+1 , M (gi , pi , ci+1 ), ci ) Q.E.D. (13) (12)

In the case of an 8-bit BrentKung adder in QCA, at most 51 majority gates and eight inverters are required. This can be explained as follows. The computation of gi , pi , i = 0, . . . 7 requires 16 majority gates. c1 in Table I is g0 and therefore does not require any majority gate. c2 , given by g1:0 = g1 + p1 g0 requires one majority gate (from Lemma 3). c3 = g2:0 = g2 + p2 g1:0 also requires one majority gate (Lemma 3). c4 = g3:0 = g3:2 + p3:2 g1:0 requires four majority gates since g3:2 requires one, p3:2 requires one, the AND of p3:2 g1:0 requires one while the overall OR requires one (here, Lemma 3 does not apply). c5 = g4:0 = g4 + p4 g3:0 requires one majority gate (using Lemma 3). c6 = g5:0 = g5:4 + p5:4 g3:0 requires four majority gates (similar to explanation for c4 ). c7 = g6:0 = g6 + p6 g5:0 requires one majority gate. Finally, c8 = g7:0 = g7:4 + p7:4 g3:0 requires seven majority gates (since g7:4 = g7:6 + p7:6 g5:4 requires four majority gates, one each for g7:6 and p7:6 , one for AND and one for OR; p7:4 = p7:6 p5:4 requires one majority gate; the AND and OR in overall expression for c8 require one each). Sixteen majority gates and eight inverters are required for sum and hence the comment. Fig. 10 shows the majority gate diagram for primarily carry generation of an 8-bit BrentKung adder with numbered D-latches. This assumes that gi s and pi s are available. Further, p0 is not required for generation of the output carries and hence it is not shown in the diagram. The total (maximum) delay is 2.5 clocks and can be obtained by counting the D-latches along the path that leads to sum S7 .

110

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 11, NO. 1, JANUARY 2012

B. Generalization to an n-bit BrentKung Adder In this section, we present a bound on the number of majority gates required, denoted by I(n), for QCA realization of an n-bit BrentKung adder (we focus on majority gates since we know that n inverters are required for an n-bit adder). I(n) corresponds to the sum of Ic (n) (for generating n-carries), Is (n) (for generating n-sums) and Ig p (n) (for obtaining the gi s and pi s). To obtain Ic (n), we present a recursive formulation. In particular, we show that Ic (n) = Ic (n : n + 1) + Ic ( n ), where 2 2 Ic (n : n + 1) is the number of majority gates required for gen2 erating the carries from n to n + 1 while Ic ( n ) is the number of 2 2 majority gates required for generating carries of an n -bit adder. 2 To derive the general formula for Ic (n), we make some observations on the prex graph shown in Fig. 8. In this case, n = 16. Let the number of majority gates required for a direct method d (that does not use Lemma 3) be denoted by Ic (16). Calculation d of Ic (16) involves following steps. (i) Consider the left-half corresponding to generation of C9 to C16 in Fig. 8. Stages 1, 2, and 3 involve four, two, and one associative operations respectively. Each associative operator requires three majority gates. Hence, the total number of majority gates is 21 (73). (ii) Stage 4 involves one associative operation while Stages 5, 6, and 7 involve one, two, and four associative operations respectively. From Stage 4 to Stage 7, only generation of carry takes place so there is no need for majority logic for propagate here. Therefore, each associative operation in Stages 4 to 7 requires two majority gates. Hence, a total of 16 (82) is required. d We therefore have Ic (16 : 9) = 21 + 16 = 37. (iii) For generating the remaining carries, namely C1 to C8 , we consider the left-half corresponding to C8 C5 . In this case, two and one associative operations are present in Stages 1 and 2, respectively, one associative operation in Stage 3 and one and two associative operations in Stages 4 and 5, respectively. d Ic (8 : 5) = 3 3 + 4 2 = 17. d (iv) This way, we can proceed recursively. Ic (16) can be d d d d d expressed as Ic (16 : 9) + Ic (8), Ic (8) as Ic (8 : 5) + Ic (4), d d d d d and Ic (4) as Ic (4 : 3) + Ic (2). Ic (2) is 2 and Ic (4 : 3) requires d seven majority gates. Hence, Ic (16) = 37 + 17 + 7 + 2 = 63. d (v) Ic (16) is considerably less than Ic (16) since Lemma 3 can be applied at Stage 1 and 7 leading to a saving of 15 majority gates. That is, n 1 majority gates are saved here. Hence, Ic (16) = 48. We now present a bound on the number of majority gates required for the n-bit case via Theorem 1. Theorem 1: The number of majority gates required for an n-bit BrentKung adder is given by I(n) = 8n 3 log2 (n) 4 Proof : Computation of carries of an n-bit BrentKung adder requires 2 log2 (n) 1 stages (assuming that n is a power of 2). The count for majority gates for an n-bit adder can be obtained using a recursive formulation noting that the lower order carries (namely, carries from 1 to n ) are those of an n -bit adder. 2 2 We therefore obtain a general formula for calculating the majority gates for carries from n + 1 to n. The stages from 1 to 2 n (log2 n 1) contain n , n , 16 , . . ., 2 l o gn ( n ) number of associative 2 4 8

Fig. 10. Brent-Kung 8-bit adder majority gate diagram for carry generation assuming gs and ps; the maximum delay of 2.5 clocks is for sum S 7 .

Fig. 11.

Sixteen-bit KoggeStone adder prex graph.

Remark 1: c0 has been assumed to be 0 in the derivation of BrentKung adder (in line with the description in [34]). If c0 is non-zero, then c1 , given by M (x0 , y0 , c0 ), is calculated along with g0 and p0 . For calculation of remaining carries, namely c2 to c8 , we simply can replace g0 by c1 . As a consequence, there is no increase in the total delay. Further, only one extra majority gate is required. In the absence of Lemma 3, the requirement of majority gates would increase by 7 (1 more for c2 , c3 , c4 , c5 , c6 , c7 , and c8 ) for an 8-bit adder.

PUDI AND SRIDHARAN: LOW COMPLEXITY DESIGN OF RIPPLE CARRY AND BRENTKUNG ADDERS IN QCA

111

operations respectively for carries from n to n + 1. Each as2 sociative operation requires three majority gates (in a direct method). So total number of majority gates (for the left half), denoted by t1 , is given as t1 = 3 n n n n + + + + log (n ) 4 8 16 2 2

1 [denoted by Ir ( n )]: 2 Ir (n) = Ir n : = n n + 1 + Ir 2 2

= 3(2log 2 (n )2 + 2log 2 (n )3 + 2log 2 (n )4 + + 2 + 1) = 3(2log 2 (n )1 1) =3 n 1 . 2

n n + Ir 2 2 n n = + + + 4 + 2 + Ir (2) 2 4 = n 2 + 1 [Ir (2) = 1] = n 1. Therefore,


d Ic (n) = Ic (n) Ir (n)

Stages log2 (n) to 2 log2 (n) 1 contain 1,1,2,. . ., n , n number 8 4 of associative operations respectively and each requires two majority gates. The total number of majority required for this, denoted by t2 , is given by

= 4n 3 log2 (n) 4. For an n-bit adder, gi and pi require one majority gate each, hence Ig p (n) = 2n majority gates. Also, each si requires two majority gates, so Is (n) = 2n majority gates. So overall majority gate requirement is I(n) = Ic (n) + Ig p (n) + Is (n)

t2 = 2 1 + 1 + 2 + +

n n + 8 4
log 2 (n )3

= 2(1 + 1 + 2 + + 2 = 2(1 + 2 = 2(2 =2


log 2 (n ) log 2 (n )1

+2

log 2 (n )2

= 8n 3 log2 (n) 4

Q.E.D.

1)

log 2 (n )1

=n
d Ic (n : n 2)

is then expressed as the sum of t1 and t2 . So n +1 2 =3 n n 1 +n=5 2 2 3

d Ic n :

d Remark 2: From Theorem 1, we note that Ic (n) is given by 5n 3 log2 (n) 5 while application of Lemma 3 leads to Ic (n) given by 4n 3 log2 (n) 4. This corresponds to savings of approximately 20%. Further, the direct BrentKung adder yields d Id (n) = Ic (n) + 4n while application of Lemma 3 leads to I(n) = Ic (n) + 4n. This corresponds to majority gate savings of roughly 11%. In addition, using Theorem 1, we can infer that when n = 32 at most 237 majority gates and 32 inverters are required while for n = 64, at most 490 majority gates and 64 inverters are required for BrentKung prex adder.

Hence, for a direct solution, we have

VI. OTHER PREFIX ADDERS In this section, we develop majority-gate based designs for other prex adders and compare with the BrentKung prex adder. The number of majority gates required for an n-bit Kogge Stone adder is given by Theorem 2. Its prex adder is shown in Fig. 11. A. KoggeStone Adder Theorem 2: The number of majority gates required for an n-bit KoggeStone adder is given by I(n) = n(3 log2 n 1) + 5. Proof: The computation of carries of an n-bit KoggeStone adder requires log2 n stages. The number of majority gates required for an n-bit KoggeStone adder is also obtained via a recursive formulation (just like the BrentKung adder). The general formula for calculating the number of majority gates for carries from n + 1 to n is obtained as follows. Each stage in the 2

d Ic (n) = 5

n 2

d 3 + Ic

n 2

=5

n n d + + + 2 3(log2 (n) 1) + Ic (2) 2 4

d = 5(n 2) 3 log2 (n) + 3 + 2 [Ic (2) = 2]

= 5n 3 log2 (n) 5. To obtain a reduction in majority gates in each stage, we can apply Lemma 3 to two stages, namely Stage 1 and Stage (2 log2 (n) 1) (Lemma 3 cannot be applied to other stages). This will lead to a reduction of one majority gate at each associative operator from carry n + 1 to n (i.e., n + n = n ). So 2 4 4 2 total reduction of majority gates, denoted by Ir (n), is sum of reduction of majority gates from carry n + 1 to n (denoted by 2 Ir (n : n + 1)) and reduction of majority gates from carry n to 2 2

112

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 11, NO. 1, JANUARY 2012

KoggeStone adder requires n associative operations (namely 2 carries from n + 1 to n). Except the last stage, each associative 2 operator in all other stages requires three majority gates. In the last stage, each associative operation requires only two majority gates (since there is no calculation of the propagate term): n +1 2 n n (log2 n 1) + 2 2 2

d Ic n :

=3 =

n [3 log2 n 1] 2 n d d n Ic (n) = [3 log2 n 1] + Ic 2 2 =3 n n n log2 n + log2 + + 2 2 2 4 2 n n d + + 2 + Ic (2) 2 4


d [ Ic (2) = 2].

Fig. 12.

LadnerFischer 16-bit adder prex graph.

Proof (Outline): The direct calculation of the carries, denoted d by Ic (n), is given by
d Ic (n) =

n d n [3 log2 n + 1] + Ic 4 2 3 n n n log2 n + log2 + + 2 2 2 2 4 2 n n d + + 2 + 1 + Ic (2) 4 8

= 3[2n log2 n (log2 n + 1)n] [n 2] + 2 = 3n log2 n 4n + 4

Lemma 3 can be applied to Stage 1 of each associative operator. We will have a reduction of one majority gate each and a total of n majority gates: 2 Ir (n) = n n + Ir 2 2

= n 1. Hence, Ic (n) is given as


d Ic (n) = Ic (n) Ir (n)

3 [2n log2 n (log2 n + 1)n] [n 2] + 2 2 3 d = n log2 n n + 1 [ Ic (2) = 2]. 2 We can apply Lemma 3 to associative operations in Stage 1 and stage log2 n. This leads to a reduction of majority gates, denoted Ir (n), given by = Ir (n) = n n + Ir 2 2 =n1

= 3n log2 n 5n + 5. For an n-bit adder, gi and pi require one majority gate each; hence, Ig p (n) = 2n majority gates. Also, each si requires two majority gates, so Is (n) = 2n majority gates. So overall majority gate requirement is given by I(n) = Ic (n) + Ig p (n) + Is (n) = n(3 log2 n 1) + 5 B. LadnerFischer Adder Another prex adder reported in the literature is the Ladner Fischer adder [33]. This also has O(log2 n) stages. Its prex graph is shown in Fig. 12. A bound on the number of majority gates required for an n-bit LadnerFischer is given in Theorem 3. The details of the proof are given briey here (More details on the LadnerFischer adder itself are available in [36]). Theorem 3: The number of majority gates required for an n-bit LadnerFischer adder is given by I(n) = n (3 log2 n + 4) + 2. 2 Q.E.D.

d Ic (n) = Ic (n) Ir (n)

3 n log2 n 2n + 2. 2 The overall majority gate requirement is given by = I(n) = Ic (n) + Ig p (n) + Is (n) n = (3 log2 n + 4) + 2 2 C. HanCarlson Adder A fourth prex adder reported in the literature is the Han Carlson adder [35]. Its prex graph is shown in Fig. 13. Theorem 4: The number of majority gates required for an n-bit HanCarlson adder is given by n I(n) = (3 log2 n + 4) + 2. 2 Proof: The computation of carries of n-bit HanCarlson adder requires log2 n + 1 stages. As before, we use a recursive formulation and the general formula for calculating the majority gates for carries from ( n + 1) to n is derived as follows. Each 2 stage in the HanCarlson adder requires n associative opera4 tions (namely carries from ( n + 1) to n). In the Stages labelled 1 2

Q.E.D.

PUDI AND SRIDHARAN: LOW COMPLEXITY DESIGN OF RIPPLE CARRY AND BRENTKUNG ADDERS IN QCA

113

TABLE II NUMBER OF LEVELS, ASSOCIATIVE OPERATIONS AND MAJORITY GATES FOR VARIOUS PREFIX ADDERS

Fig. 13.

Sixteen-bit Han-Carlson adder prex graph.

to log2 n 1, each associative operator requires three majority gates. However, in the last two stages, each associative requires only two majority gates (since there is no calculation of the propagate term):
d Ic n :

n +1 2

=3 =

n n (log2 n 1) + 2 2 4 2

Fig. 14.

Plot of majority gates versus adder size for various prex adders.

n [3 log2 n + 1] 4 n d d n Ic (n) = [3 log2 n + 1] + Ic 4 2 = n n 3 n log2 n + log2 + + 2 2 2 2 4 2 n n d + + 2 + 1 + Ic (2) 4 8

3 n log2 n 2n + 2. 2

For an n-bit adder, gi and pi require one majority gate each; hence, Ig p (n) = 2n majority gates. Also, each si requires two majority gates, so Is (n) = 2n majority gates. So overall majority gate requirement is I(n) = Ic (n) + Ig p (n) + Is (n) n = (3 log2 n + 4) + 2 2

Q.E.D.

3 = [2n log2 n (log2 n + 1)n] 2 [n 2] + 2 = 3 n log2 n n + 1 2


d [ Ic (2) = 2].

We can apply Lemma 3 to associative operations in Stage 1 and stage log2 n + 1. For each associative operation, there is a saving of 1 majority gate which implies a total reduction of n 2 majority gates (for the half from n + 1 to n): 2 Ir (n) = Hence, Ic (n) is given as
d Ic (n) = Ic (n) Ir (n)

Table II summarizes the number of levels, associative operations and majority gates for each of the four prex adders (in terms of the adder size, denoted by n). Graphs showing how majority gates accumulate for different prex adders, as the adder size grows are shown in Fig. 14. From the graph (Fig. 14), we observe that the BrentKung prex adder supports a very efcient solution (in terms of smaller growth in number of majority gates). This is not unexpected, however, given the fact that the BrentKung graph has smaller number of operators (as expressed via Table II) and thus gates. VII. QCA IMPLEMENTATION In this section, we present results of simulation in QCADesigner [7]. We also present area and time complexity results for various adders.

n n + Ir 2 2

= n 1.

114

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 11, NO. 1, JANUARY 2012

Fig. 15.

QCADesigner layout for 8-bit ripple carry adder.

Fig. 18.

QCADesigner layout for 8-bit LadnerFischer adder.

Fig. 16.

QCADesigner layout for 8-bit Brent-Kung adder.

Fig. 19.

QCADesigner layout for 8-bit HanCarlson adder.

B. Simulation Engine The coherence vector engine has been used for simulations. The options for the simulation were as follows (and are in agreement with the suggestions in [37]): temperature: 1 K; relaxation time: 1 1015 s; time step: 1 1016 s; total simulation time: 7 1011 s; radius of effect: 80 nm; relative permittivity: 12.9; layer separation: 11.5 nm; Euler method; randomize simulation order.
Fig. 17. QCADesigner layout for 8-bit KoggeStone adder.

C. Layout Level Implementation of RCA and Prex Adders A. Design Rules Cells for our design are assumed to have a height of 18 nm, and width of 18 nm while the quantum dots have a diameter of 5 nm (same as the assumptions of [11]). Further, the cells are placed on a grid with a cell center-to-center distance of 20 nm. A maximum of 16 cells per clock zone is used (as in [11], it is to be noted that the number of cells per clock zone affects the overall delay). Fig. 15 shows the QCADesigner layout for an 8-bit ripple carry adder. The layout is labelled to indicate majority gates as well as the outputs, namely S0 , S1 . . . , S7 and C8 . A labelled QCADesigner layout for the proposed BrentKung adder (8-bit) is shown in Fig. 16. Figs. 1719, show the layout of an 8-bit KoggeStone adder, LadnerFischer adder, and HanCarlson adder, respectively. The 16-bit versions are available at http://www.ee.iitm.ac.in/ sridhara/16 bit layouts/index.html.

PUDI AND SRIDHARAN: LOW COMPLEXITY DESIGN OF RIPPLE CARRY AND BRENTKUNG ADDERS IN QCA

115

Fig. 20.

QCADesigner Layout for a 16-bit Brent-Kung adder.

Fig. 21.

Simulation results for 16-bit BrentKung adder.

Fig. 22.

Polarization versus relaxation time for a BrentKung adder with respect to: (a) C4; (b) S2; and (c) S0.

Fig. 20 gives the layout for the 16-bit BrentKung adder. Fig. 21 shows the simulation results of a 16-bit BrentKung adder (the complete sum output includes also the top carry out). The rst set of inputs for simulation shown in Fig. 21 corresponds to A[15 : 0] = 0; B[15 : 0] = 0 (since the initial carry is set to 0 as in [34], this is retained for simulations).

The output, SUM[15 : 0] = 0 appears after four clock delays (this is also reected in the delay column in Table III). The second set of inputs corresponds to A[15 : 0] = 1024; B[15 : 0] = 512. The output, SUM[15 : 0] = 1536 (this includes the carry as well).

116

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 11, NO. 1, JANUARY 2012

TABLE III CELL COUNT, AREA, OVERALL SIZE, DELAY, AND NUMBER OF TOTAL CLOCK PHASES FOR DIFFERENT METHODS; PROP: PROPOSED; BK: BRENTKUNG; HC: HANCARLSON; KS: KOGGESTONE; LF: LADNERFISCHER

The point where the design breaks at the other end is obtained by systematically increasing the relaxation time from the default value (of 1 1015 s) and observing the outputs for each case. The plots of polarization of three of the outputs, namely C4, S2, and S0, as a function of relaxation time when inputs are A = 15, B = 15, and Cin = 0 are shown in Figs. 22(a) (c). (The plots are limited to these three due to space constraints.) The X-axis is plotted on logarithmic scale (base 10) to facilitate handling a large range for the relaxation time (from 9.9 1017 to 3.0 1012 s in the case of C4 and slightly less in the case of S2 and S4). The design breaks for each of the output bits when the relaxation time is 9.9 1017 s (i.e., just less than the time step). On the higher side, the design breaks for C4 when the relaxation time is 9 1014 s for 1 Kelvin and for somewhat higher values as the temperature is increased (3 1013 s for 10 K and 1 1012 s for 20 K). The design breaks for S2 at 9 1013 s for 1 Kelvin and around the same value as temperature is increased. Further, the design breaks for S0 when the relaxation time is 3 1013 s at 1 Kelvin and at the same value as temperature is increased to 10 Kelvin. These are illustrated in Figs. 22(a)(c) (Note that the X-axis values are in natural logarithm scale). VIII. COMPARISON STUDIES A. Cell Count, Area, and Delay for Various Adders Table III gives the details of cell count, area and delay for the proposed RCA, BrentKung, KoggeStone, LadnerFischer, and HanCarlson adders. Comparisons with prior work are also presented in Table III. The comparisons are primarily with the results reported in [10] and [11] since they appear to be the most recent on RCA, CLA, and other other adders. The delay value indicated for the proposed 8-bit BrentKung adder corresponds to the delay for output of sum S7 in Fig. 10. It is worth noting that the proposed BrentKung adder has the lowest delay among all existing adders and this can be attributed due to optimization of majority logic (as well as wires in the design). Plots of cell count, delay and area as a function of adder size are given in Figs. 23, 24, and 25. These plots are based on the analysis of the QCA layouts. We present time and space estimates for various adders using the order notation followed by the authors of [10] (this is based on examining the growth of cell count, delay etc. as the adder size doubles). From the statistics, cell counts for a QCA adder with n-bit operands are roughly O(n1.33 ) for BrentKung, O(n1.49 ) for HanCarlson, O(n1.56 ) for KoggeStone, O(n1.42 ) for Ladner-Fischer, O(n1.24 ) for proposed RCA, O(n1.21 ) for CFA [11] and O(n1.35 ) for RCA [10]. Area estimates are O(n1.39 ) for BrentKung, O(n1.41 ) for HanCarlson, O(n1.53 ) for Kogge Stone, O(n1.48 ) for LadnerFischer, O(n1.56 ) for proposed RCA, O(n1.42 ) for CFA [11] and O(n1.72 ) for RCA [10]. Delay estimates are given by O(n0.68 ) for BrentKung, O(n0.83 ) for HanCarlson, O(n0.78 ) for KoggeStone, O(n0.7 ) for Ladner Fischer, O(n0.83 ) for proposed RCA, O(n0.87 ) for CFA [11] and O(n0.97 ) for RCA [10]. From the order results (using the Big Oh notation), we note that BrentKung has, in general, lower complexity than other adders in the QCA model. We

D. Study of Signal Integrity as a Function of Relaxation Time Given the advantages of the BrentKung adder in the QCA model with respect to majority gate requirements, it is of interest to explore other aspects that are equally important in the QCA paradigm. In this section, we study the robustness of the Brent Kung adder. For this purpose, the coherence vector simulation engine is used since it allows for the verication of functionality of QCA layouts as time-related parameters and temperature are varied. The clock frequency for the studies is obtained using the coherence vector engine total simulation time and the length of simulation input vector. In our case, this is 100 GHz. The polarization of various outputs as a function of the relaxation time is studied for a 4-bit BrentKung adder for three different temperature settings. Thresholds are set as follows [38]: polarization <0.5 = logic 0; polarization >0.5 = logic 1, otherwise the state is indeterminate. The range for the relaxation time is chosen as follows. The coherence vector engine time step has been xed at the default value (1 1016 s) and this determines a value for the relaxation time where the design breaks (i.e., gives incorrect polarization of the outputs).

PUDI AND SRIDHARAN: LOW COMPLEXITY DESIGN OF RIPPLE CARRY AND BRENTKUNG ADDERS IN QCA

117

Fig. 23.

Complexity in terms of cell count for various adders.

Fig. 26.

Polarization versus temp. with input 1: (a) S0 and (b) C4.

Fig. 24.

Time complexity in terms of delay for various adders.

further note that a feature of the KoggeStone adder in the QCA domain is that the growth (ratio) in delay as well as area (when the adder width is doubled) remains nearly constant. B. Polarization Versus Temperature Studies In this section, we study robustness of the proposed Brent Kung adder. We report studies on polarization as a function of temperature for two different adders, namely the proposed BrentKung adder and carry ow adder (which has the least cell count and delay [11] among existing adders). The coherence vector simulation in QCADesigner has been performed with temperature set to various values starting from 1 Kelvin. Default settings (Euler Method and Randomize Simulation Order option) have been chosen for simulation of BrentKung as well as the carry ow adder. The clock frequency for the studies is set to 100 GHz as before. We interpret the waveforms resulting from simulation in QCADesigner using a simple threshold system as suggested in [38]: polarization <0.5 = logic 0;

Fig. 25.

Area complexity for various adders.

118

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 11, NO. 1, JANUARY 2012

at higher temperatures (beyond 21 Kelvin) in the case of CFA leading to an error in overall sum for inputs given by: X = 15, Y = 15, Cin = 0. All the outputs have the correct value up to 23 Kelvin for the proposed Brent-Kung adder. For the input given by X = 7, Y = 7, Cin = 0, S3 is in error for temperatures exceeding 6 Kelvin (the design breaks at this temperature as shown in Fig. 27(b) and is the rst output that is broken) in the case of CFA while accuracy of all outputs is maintained up to 23 Kelvin in the case of the BrentKung adder. IX. CONCLUSIONS In this paper, we have considered primitives in QCA and developed several results pertaining to majority logic optimization. We have also shown that a 1-bit full adder can be realized using at most three majority gates and one inverter. Using the new results on majority logic optimization, we have presented an efcient QCA design for an n-bit ripple carry adder and various prex adders. We have shown that the proposed ripple carry adder has substantially lower area and delay than existing RCA designs. We have also shown that the BrentKung adder has lower delay than all other adder designs studied here (and in prior work) for large word sizes. Further, the Brent Kung adder performs best among the prex adders in terms of delay and area. The BrentKung adder design has also been studied for robustness. It is observed that the Brent-Kung adder is fairly robust for considerable variation in relaxation time as well as temperature. Simulation results using QCADesigner with the coherence vector engine conrm the advantages of the Brent-Kung prex adder in the QCA domain. REFERENCES
[1] G. W. Hanson, Fundamentals of Nanoelectronics. Englewood Cliffs, NJ: Prentice-Hall, 2008. [2] W. Porod, Quantum-dot devices and quantum-dot cellular automata, J. Franklin Inst., vol. 334B, no. 5/6, pp. 11471175, 1997. [3] P. D. Tougaw and C. S. Lent, Logical devices implemented using quantum cellular automata, J. Appl. Phys., vol. 75, no. 3, pp. 18181825, 1994. [4] M. A. Kastner, The single electron transistor, Rev. Modern Phys., vol. 64, no. 3, pp. 849858, 1992. [5] J. Timler and C. S. Lent, Power gain and dissipation in quantum-dot cellular automata, J. Appl. Phys., vol. 91, no. 2, pp. 823831, 2002. [6] R. Zhang, K. Walus, W. Wang, and G. A. Jullien, A method of majority logic reduction for quantum cellular automata, IEEE Trans. Nanotechnol., vol. 3, no. 4, pp. 443450, Dec. 2004. [7] K. Walus, T. Dysart, G. Jullien, and R. Budiman, QCADesigner: A rapid design and simulation tool for quantum-dot cellular automata, IEEE Trans. Nanotechnol., vol. 3, no. 1, pp. 2629, Mar. 2004. [8] K. Walus and G. A. Jullien, Design tools for an emerging SOC technology: Quantum-dot cellular automata, Proc. IEEE, vol. 94, no. 6, pp. 12251244, Jun. 2006. [9] C. S. Lent and P. D. Tougaw, A device architecture for computing with quantum dots, Proc. IEEE, vol. 85, no. 4, pp. 541557, Apr. 1997. [10] H. Cho and E. E. Swartzlander, Adder designs and analyses for quantumdot cellular automata, IEEE Trans. Nanotechnol., vol. 6, no. 3, pp. 374 383, May 2007. [11] H. Cho and E. E. Swartzlander, Adder and multiplier designs in quantumdot cellular automata, IEEE Trans. Comput., vol. 58, no. 6, pp. 721727, Jun. 2009. [12] A. Chaudhary, D. Z. Chen, X. S. Hu, M. T. Niemier, R. Ravichandran, and K. Whitton, Fabricatable interconnect and molecular QCA circuits, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no. 11, pp. 19781991, Nov. 2007.

Fig. 27.

Polarization versus temp. with input 2: (a) S0 and (b) S3.

polarization >0.5 = logic 1, otherwise the state is indeterminate. The results presented are for the 4-bit version of each of the two adders. The change in polarization has been studied for sum bits, Si, i = 0, . . . , 3 and carry bit C4 for two different inputs. The rst set of input values (referred to as input 1 in gure captions) is: X = 15; Y = 15, Cin = 0. Plots of polarization versus temperature for S0 and C4 for this input set are shown respectively in Figs. 26(a) and (b). (Plots for the remaining outputs are omitted due to space constraints.) The second set of input values (referred to as input 2 in gure captions) is X = 7; Y = 7; Cin = 0. Plots of polarization versus temperature for S0 and S3 for this input set are shown respectively in Figs. 27(a) and (b). Since negative polarization (1) corresponds to a binary value of 0, the min value is taken in the results of simulation in QCADesigner for both C4 and S0 (for the remaining output bits, the max value is taken since these bits are 1). From the plots, we can infer that the proposed Brent-Kung adder has better performance than the CFA with respect to output polarization. S0 is in error (it is 0)

PUDI AND SRIDHARAN: LOW COMPLEXITY DESIGN OF RIPPLE CARRY AND BRENTKUNG ADDERS IN QCA

119

[13] R. Lindaman, A theorem for deriving majority-logic networks within an augmented Boolean algebra, IEEE Trans. Electron. Comput., vol. EC-9, no. 3, pp. 338342, Sep. 1960. [14] M. Cohn and R. Lindaman, Axiomatic majority-decision logic, IEEE Trans. Electron. Comput., vol. EC-10, no. 1, pp. 17 21, Mar. 1961, 2012. [15] F. Miyata, Realization of arbitrary logical functions using majority elements, IEEE Trans. Electron. Comput., vol. EC-12, no. 3, pp. 183191, Jun. 1963. [16] S. B. Akers, On the algebraic manipulation of majority logic, IEEE Trans. Electron. Comput., vol. EC-10, no. 4, pp. 779779, Dec. 1961. [17] H. S. Miller and R. O. Winder, Majority-logic synthesis by geometric methods, IEEE Trans. Electron. Comput., vol. EC-11, no. 1, pp. 8990, Feb. 1962. [18] S. B. Akers, Synthesis of combinational logic using three-input majority gates, in Proc. 3rd Annu. Symp. Switch. Circuit Theory Logical Design, 1962, 712 1962, pp. 149158. [19] E. M. Riseman, A realization algorithm using three-input majority elements, IEEE Trans. Electron. Comput., vol. EC-16, no. 4, pp. 456462, Aug. 1967. [20] R. Zhang, P. Gupta, and N. K. Jha, Synthesis of majority and minority networks and its applications to QCA-, TPL-, and SET-based nanotechnologies, in Proc. Int. Conf. VLSI Design, 2005, pp. 229234. [21] R. Zhang, K. Walus, W. Wang, and G. A. Jullien, Performance comparison of quantum-dot cellular automata adders, in Proc. IEEE Int. Symp. Circuits Syst., 2005, pp. 25222526. [22] H. Cho and E. E. Swartzlander, Modular design of conditional sum adders using quantum-dot cellular automata, in Proc. 6th IEEE Conf. Nanotechnol. (IEEE-NANO 2006), pp. 363366.. [23] R. Tang, F. Zheng, and Y.-B. Kim, QCA-based nano circuits design [adder design example], in Proc. IEEE Int. Symp. Circuits Syst., 2005, pp. 25272530. [24] S. Bhanja and S. Sarkar, Probabilistic modeling of QCA circuits using Bayesian networks, IEEE Trans. Nanotechnol., vol. 5, no. 6, pp. 657 670, Nov. 2006. [25] K. Kim, K. Wu, and R. Karri, The robust QCA adder designs using composable QCA building blocks, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no. 1, pp. 176183, Jan. 2007. [26] T. J. Dysart and P. M. Kogge, Probabilistic analysis of a molecular quantum-dot cellular automata adder, in Proc. IEEE Int. Symp. Defect Fault-Tolerance VLSI Syst., 2007, pp. 478486. [27] T. J. Dysart and P. M. Kogge, Analyzing the inherent reliability of moderately sized magnetic and electrostatic QCA circuits via probabilistic transfer matrices, IEEE Trans. Very Large Scale Integrat. Syst., vol. 17, no. 4, pp. 507516, Apr. 2009. [28] I. Hanninen and J. Takala, Robust adders based on quantum-dot cellular automata, in Proc. IEEE Int. Conf. Appl.-Specic Syst., Architect. Processors, 2007, pp. 391396. [29] S. Srivastava, S. Sarkar and S. Bhanja, Estimation of upper bound of power dissipation in QCA circuits, IEEE Trans. Nanotechnol., vol. 8, no. 1, pp. 116127, Jan. 2009. [30] F. Lombardi and J. Huang, Design and Test of Digital Circuits by QuantumDot Cellular Automata: Norwood, MA: Artech House, 2007. [31] I. Koren, Computer Arithmetic Algorithms. Natick, MA: A.K. Peters Ltd., 2002.

[32] P. M. Kogge and H. S. Stone, A parallel algorithm for the efcient solution of a general class of recurrent equations, IEEE Trans. Comput., vol. C-22, no. 8, pp. 786793, Aug. 1973. [33] R. E. Ladner and M. J. Fischer, Parallel prex computation, J. Assoc. Comput. Mach., vol. 27, no. 4, pp. 831838, Oct. 1980. [34] R. P. Brent and H. T. Kung, A regular layout for parallel adders, IEEE Trans. Comput., vol. C-31, no. 3, pp. 260264, Mar. 1982. [35] T. Han and D. A. Carlson, Fast area-efcient VLSI adders, in Proc. 8th IEEE Symp. Comput. Arithmet., 1987, pp. 4956. [36] V. Pudi and K. Sridharan, Efcient design of a hybrid adder in quantumdot cellular automata, IEEE Trans. VLSI Syst., 2010, to be published. [37] K. Walus, G. Schulhof, and G. Jullien, Implementation of a simulation engine for clocked molecular QCA, in Proc. IEEE Can. Conf. Electr. Comput. Eng., May 2006, pp. 21282131. [38] G. Schulhof, K. Walus, and G. A. Jullien, Simulation of random cell displacements in QCA, ACM J. Emerg. Technol. Comput. Syst., vol. 3, no. 1, pp. 114, 2007.

Vikramkumar Pudi received the B.Tech. degree in electronics and communication engineering from Narayana Engineering College, Nellore, India in 2008. He is currently working toward the Ph.D degree in the Department of Electrical Engineering in Indian Institute of Technology Madras, Chennai, India. His research interests include quantum dot cellular automata, VLSI architectures, and hardware realization of DSP algorithms.

K. Sridharan (S84M96SM01) received the Ph.D degree from Rensselaer Polytechnic Institute, Troy, NY, in 1995. He was an Assistant Professor at Indian Institute of Technology (IIT), Guwahati, from 1996 to 2001. Since June 2001, he is with IIT Madras where he is presently a Professor. He was a visiting staff member at Nanyang Technological University, Singapore, in 20002001 and 20062008. He has supervised three Ph.D degrees and holds two patents. He is an author of a book published by Springer in 2008 on hardware-efcient algorithms for robotics. He has also authored/co-authored approximately 70 papers in various journals and conferences. He received the Computer Engineering Division Prize for a paper published in the Journal of I.E(India) in 2002. He is also the recipient of the 2009 Vikram Sarabhai Research Award for his work in algorithms, architectures, and FPGAbased designs for robotics.

You might also like