You are on page 1of 4

Racetrack Memory Based Reconfigurable Computing

Weisheng Zhao1, 2*, Nesrine Ben Romdhane1, 2, Yue Zhang1, 2, Jacques-Olivier Klein1, 2, and Define Ravelosona1, 2
1. IEF, Univ. Paris-Sud, Orsay, France-91405
2. CNRS, UMR8622, Orsay, France-91405
*Contact: weisheng.zhao@u-psud.fr

AbstractReconfigurable computing provides a number of


advantages such as low R&D cost and design flexibility compared
with application specific logic circuits; however its low power
efficiency and logic density limit greatly its wide application. One
of the major reasons of this shortcoming is the SRAM based
configuration memory, which occupies large die area and
consumes high static power. The later is more severe due to the
rapidly increasing sneak currents, which are intrinsic and
become worse following the fabrication node shrinking.
Racetrack memory is one of emerging non-volatile memory
technologies under intense investigation and promises ultra-high
density, non-volatility and low power. In this invited paper, we
present the design of racetrack memory based reconfigurable
computing. By using a racetrack memory compact model and
design kit 28 nm, mixed simulation results show its high density
and low power performance compared with conventional SRAM
based reconfigurable computing.
KeywordsRacetrack Memory; Non-Volatility; Low Power;
Magnetic Domain Wall motion; Instant On/Off

I.

These shortcomings of classical reconfigurable computing


circuits draw the significant interest to study new memory for
the configuration data. They should provide non-volatility to
allow reduced standby power, fast data access (e.g. 1 ns) to
ensure fast reconfiguration and computing and small cell area
(e.g. 4F2) to improve logic density. For these purposes, a
number of non-volatile memory technologies are under intense
investigation to be integrated in reconfigurable computing
circuits such as spin transfer torque magnetic RAM (STTMRAM) [4-6], phase change memory (PCM) [7] and oxide
resistive RAM (R-RAM) [8] etc. Racetrack memory based on
current-induced magnetic domain wall (DW) motion is an
emerging approach combining all the performances for the
reconfigurable computing [9-11]. Recently, the first prototype
based on 90 nm was shown and proves its technological
feasibility and CMOS integration [12]. Besides, its operation
is similar to shift registers, which are widely used in classical
reconfigurable computing as configuration memory [13-14].

INTRODUCTION

Conventional reconfigurable computing logic circuits like


field programmable gate arrays (FPGA) have been the object
of intense development in the last twenty years thanks to its
evident advantages in terms of R&D cost and flexibility [1].
However, it is still far below the market share of application
specific logic circuits (ASIC). The major reasons are its low
power efficiency and logic density, caused by the configuration
storage based on Static Random Access Memory (SRAM).
SRAM provides a fast data access to ensure both the
reconfiguration speed and computing speed, however it is
volatile. All the functions have to be pre-programmed at each
power-up and external non-volatile PROM memories must be
integrated into the chip either in the same package or at the
Printed Circuit Board (PCB) level. When a power failure
occurs, all the data in long-running computations are lost. The
static power to keep the configuration data becomes high due
to the intrinsic rapidly increasing leakage currents below 40 nm
technology node [2]. Besides, basic cell of SRAM is composed
of 6 transistors and this limits the logic density. Internal Flash
memory is designed to address these issues by replacing both
the external memory and SRAM. However, it has some
drawbacks such as slow reprogramming and sensing
operations, low writing cycles number (up to 106), which limit
its lifetime, reconfiguration and computing speed[3].
The authors wish to acknowledge financial support of the French national
project ANR-DIPMEM and European program FP7 MAGWIRE.

978-1-4673-6104-0/13/$31.00 2013 IEEE

(a)

Constriction

(b)
Fig.1. (a) Vertical Magnetic Tunnel Junction (MTJ) structure: It is composed
of an oxide barrier and two ferromagnetic (FM) layers. The magnetization of
one FM layer is fixed, but free for the other. According to different
configuration (Parallel or Anti-Parallel) of two FM layer, MTJ shows low or
high resistance property. (b) Racetrack memory based on current induced DW
motion, which is composed of one write head (MTJ0), one read head (MTJ1)
and one magnetic nano-stripe. Iwrite nucleates data or magnetic domain in the
magnetic stripe through spin-transfer torque (STT) approach [6], Ishift induces
DW motion along the magnetic stripe and Iread detects the magnetization
direction through Tunnel MagnetoResistance (TMR) effect [7].

As shown in Fig.1, a Racetrack Memory (RM) cell is


composed of two magnetic tunnel junctions (MTJs) for DW
nucleation (MTJ0) and detection (MTJ1) respectively and one
magnetic nano-stripe for data storage [11-13]. Data are stored
through the magnetization direction of multiple magnetic
domains, which are separated by DWs that can be shifted

along the magnetic stripe (see Fig.2). Data propagation


induced by Ishift requires a short delay depending on the
distance W (e.g. 20-40 nm) between two neighboring DWs.
Noted that the speed of DW motion could be higher than 100
m/s under a relatively low current for Ta/CoFeB/MgO
structure [16]. Artificial potentials or constrictions are used to
allow synchronous propagation and to pin the DWs when no
Ishift is applied.
The write (Iwrite) and read (Iread) currents are in different
paths, allowing independent write/read operations. DW
nucleation and propagation current sources are shared by each
magnetic stripe, which can store multiple bits along its length
(e.g. some um). Therefore, RM is expected to bring ultra-high
area efficiency (e.g. 1 F2) [15]. The programming delay for a
64-bits racetrack memory word could be lower than 100 ns
@28 nm node and the sensing speed of MTJ read heads is as
low as ~200 ps, which ensure a high configuration (i.e. <1 us)
and computing speed (e.g. 3 GHz).

The peripheral circuits of racetrack memory have been


investigated in the last years [13]. Fig.4a shows the bidirectional current source for the magnetization nucleation in
the magnetic stripe. Fig.4b is one of the high-performance SA
to detect the state of MTJ nanopillar through its resistance
difference [17]. It is a very critical part of RM-LUT as it
translates the resistance state to logic level (e.g. RAP
corresponds to logic 1). Meanwhile it determines the
computing speed, power and data reliability. The structure
shown in Fig.4b is based on pre-charge principal and it
presents ultra low power (~1fJ/ operation), short sensing
latency (~100ps) and high hardness to the process variation
[17]. There are a number of transistors in the peripheral
circuits, but they can be globally shared for each RM-LUT.
Thanks to the recent progress of high Tunnel MagnetoResistance (TMR) ratio in perpendicular magnetic anisotropy
(PMA) MTJ (e.g. 200%) [18], the logic density can be
increased by avoiding the complementary structures for data
sensing while keeping nearly the same reliability [19-20].
Another shortcoming of complementary cells is the negative
consequence of stochastic domain wall motion, which can
cause programming errors.
(a)

MTJ8

MTJ1

Fig.2. Kerr image of perfect round magnetic domains (white and black color
shows opposite magnetization direction) in a crystallized Ta (5nm)/CoFeB
(1nm)/MgO structure. The typical propagation field is as low as 0.5 mT, one
order of magnitude lower than conventional ferromagnetic films with
perpendicular magnetic anisotropy.

In this paper, we present the first racetrack memory based


reconfigurable computing circuits and architectures. By using
an accurate compact model and 28 nm industrial CMOS design
kit [16], we performed the mixed simulation to show its logic
functionality, and to evaluate its power, speed and logic
density. The rest of the paper is organized as follows: section II
describes the detailed design of the reconfigurable logic
circuits based on racetrack memory; functional simulations and
performance analysis will be carried out in section III; at last
conclusions and perspectives will be discussed.
II.

STRUCTURE OF RACETRACK MEMORY BASED CIRCUITS


FOR RECONFIGURABLE COMPUTING

Look-up-table (LUT) is a basic element for reconfigurable


computing circuit, which can be used to generate any logic
function [1]. The structure of racetrack memory based LUT
(RM-LUT) can be composed of a conventional logic tree, a
magnetic stripe and a high performance sense amplifier (S.A),
as shown in Fig.3a. There are a number of read heads (i.e. 8 for
3 bit logic, or 2N for N bit logic) in the magnetic stripe, which
is different from the conventional racetrack memory structure
(see Fig.1a), to replace the SRAM storage cell or shift registers
for logic configuration. The diameter of MTJ read heads, M,
depends on the minimum fabrication node (e.g. 28 nm); the
distance between two constrictions for domain wall pinning, W,
is larger than M, for instance, W=2M. Instead of 6 transistors
for each SRAM cell, there is only one read head for racetrack
memory. This allows a much higher logic density.

A
B

Iref
(b)
A
B
C

S.A

Racetrack Memory

Out

Out

RM Look Up Table
MFF

Out_S

Out_S

Fig.3. (a) The hybrid Racetrack memory and CMOS circuit to build a nonvolatile look up table for reconfigurable computing, which is composed of
configuration part, multiplex part and data sensing amplifier (S.A) circuit. W
is the distance between constrictions and M is the diameter of MTJ nanopillar
MTJ1-8 are the read heads associated to the magnetic nanowire (b) The
symbol of this RM look up table.

Out

Out

S.A
Data Address
A, B, C
logic tree

MTJx

Transistor tree
Iread

Iref
MTJref

(a)
(b)
Fig.4. (a) Writing circuit: Depending on "input" state, 2 transistors will be
open when the 2 others close, which create a bi-directional current to pass
through the write head MTJ and nucleate the magnetic magnetization in the
magnetic nanowire. (b) Pre-Charge Sense Amplifier (PCSA) circuit is
composed of seven transistors (MN0-2 and MP0-3). MTJx is the read head
corresponding to the logic configuration of A, B and C; MTJref is also a MTJ
nanopillar presenting medium resistance between RAP and RP of MTJ for read
heads. A transistor tree should be added to balance the resistance of selection
transistor in the left branch.

It is important to mention that, the output data are


intrinsically synchronous due to the CLK signal of PCSA
[19]. Flip-Flop is not required to be associated with a LUT to
build a configurable logic block (CLB) or semi-slice [1]. This
simplifies the whole structure and allows a faster speed. For the
low sensing latency as low as 100 ps, the computing speed of a
CLB can be achieved up to ~ 10 GHz. This additional
advantage could overcome the speed bottleneck of
conventional semi-slice composed of one LUT plus one FlipFlop. We can use other S.A circuits without integrated
synchronization signal for asynchronous logic computing [21].
In this case, we need a magnetic Flip-Flop (MFF) for nonvolatile storage, which has been deeply studied by both
academics and industries in the last years [22-23].
III.

MIXED SIMULATION AND PERFORMANCE ANALYSIS

The logic reconfiguration power is relatively low benefiting


from the local non-volatile storage of RM and small current
value for the control of domain wall with PMA (i.e. nucleation
and propagation). Some techniques have also been integrated.
For instance, there is a magnetization nucleation only if logic
value changes. Otherwise, only domain wall propagation pulse
is applied for the logic reconfiguration. In the simulation
shown in Fig.5, the energy for AND logic configuration is
~1.37 pJ (i.e. including both the domain wall propagation and
nucleation dissipation) and the energy for XOR logic
reconfiguration is ~1.95pJ, which is much lower than those of
conventional SRAM-LUT. The energy for computing is as low
as ~28.6 fJ per operation. Noted that there is no standby power
thanks to the non-volatility of RM, which is one of the major
dissipation sources for SRAM-LUT.

A. Spice-comptible Simulation model of Racetrack Memory


In order to perform the simulation of racetrack memory
reconfigurable computing circuits, we developed a spice
compatible model based on Verilog-a language. It includes the
basic physics (i.e. magnetization nucleation, different regimes
of current induced domain wall motion, tunneling effect etc.)
and a number of experimental parameters [24]. The magnetic
stripe is Ta/MgO/CoFeB nanowire with PMA [13, 25]; the
read/write heads are CoFeB/MgO/CoFeB nanopillar and the
upper thick CoFeB is used as reference layer. Table I shows the
essential parameters in the simulation model.
TABLE I
PARAMETERS AND VARIABLES PRESENT IN THE SIMULATION MODEL
Parameter
Description
Default Value
Spin polarization of the tunnel
P
0.56
current
Ms
Saturation magnetization
4560x103 A/m
WNW
Magnetic Nanowire width
28 nm
TNW
Magnetic Nanowire thickness
6 nm
MTJ read head diameter
28 nm
MMTJ
WDW
Domain Wall distance
56 nm
TMR(0)
TMR ratio with 0 Vbias
200%
Writing and shifting voltage
3V
Vddwrite
Material resistivity of nanowire
NW
4.8e-7 m
Vddread
Reading voltage
1.0V
STT Critical current density for
JNcleation
5.7x 105 A/cm2
DW nucleation
STT Critical current density for
Jp
6.2x107A/cm2
DW motion

B. Circuit simulation and power analysis


By using 28 nm CMOS design kit and this compact model,
we simulated successfully this reconfigurable logic circuit
based on racetrack memory. Fig.5 shows an example of three
input logic circuit (Signals A, B and C), which is firstly
programmed to AND logic and then reconfigurated
dynamically to XOR logic. This reconfiguration can be very
fast, 170 ns instead of ms in the conventional reconfigurable
logic circuits. In this simulation, we integrate a MFF at the
output of racetrack memory LUT to store the intermediate data
in a non-volatile state. The computing latency is ~ 4 ns if we
take this delay into account for non-volatile storage (i.e.
Out_S). It is ~500 ps if the output is not protected (i.e.
Out) for high-speed propose (Fig.6). This suggests the
computing frequency of this LUT can be up to GHz.

Fig.5. Functional simulation of RM-LUT: A, B and C are input signals;


En_Com is the control signal to enable the computing operation, it is set to
0 during logic function (re)configuration; Iwrite is the current passing
through the write head MTJ0 for domain wall nucleation; Ishift is the
current pulse to move the domain wall in the ferromagnetic nanowires (see
also Fig.1); Out is the output of LUT or the computing logic result.

Fig.6. Zoom of transient simulation during the computing operation (a)


CLK signal to drive the computing, (c) Output of LUT.

C. Circuit layout and area analysis


Fig.7 shows the layout implementation of this circuit,
racetrack memory and MTJs are implemented above the
CMOS circuits, driving a much lower footprint. This example
of three input logic occupies 5.047 m 2.535 m area, 20%

less than that of the conventional SRAM-LUT. According to


the theoretical estimation (see Fig.8), the area gain can be
increased for complex logic gates with more than six inputs.

research topics to explore benefiting from these powerful


energy-efficient reconfigurable logics.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]

[7]
[8]
[9]
[10]
Fig.7. (a) The full layout of this RM-LUT, (b) the implementation flow of
racetrack memory integration (c) an example of racetrack memory with eight
read head MTJs, which is implemented in the layout of CMOS circuits.

[11]
[12]

600

SRAM_LUT
RM_LUT

[13]

Number of transistors

500

400

[14]
300

[15]
[16]
[17]

200

100

[18]
2

Number of function inputs

Fig.8. The area gain of RM-LUT becomes more larger for complex logic
gates. It is caused by the global share of peripheral circuits.

IV.

CONCLUSION AND PERSPECTIVES

In this invited paper, we presented racetrack memory based


reconfigurable computing. We analyzed the physics and device
feasibility; designed the circuit blocks and performed circuit
simulation based on 28 nm technology node. The performance
of this new solution is promising. For instance, zero standby
power, low dynamic dissipation for function (re)configuration.
The logic density can also be greatly improved thanks to 3D
back-end process integration and ultra-low storage area of RM.
Nevertheless, it should be noted that racetrack memory
suffers from a number of technological issues before its
practical using, such as the reduction of ferromagnetic material
resistivity to get a higher density; the optimization of domain
wall motion under low current; and the efficient solution for
domain wall pinning etc [26-27]. A demonstrator of racetrack
memory on 40 nm node is under development in the
framework of European project Magwire [28]. System level
integration and new structure of interconnection are also the

[19]
[20]
[21]

[22]

[23]

[24]
[25]
[26]
[27]
[28]

S.Brown, R.Francis, J.Rose, and Z.Vranesic, Field Programmable Gate


Arrays Kluwer Academic Publishers, 1992.
N.S. Kim Leakage current: Moore's law meets the static power IEEE
Computer, pp.68-74, 2003.
ProASIC, www.actel.com/
C. Chappert, A. Fert, and F. N. Van Dau, The emergence of spin
electronics in data storage., Nat. Materials, vol. 6, pp. 813-23, 2007.
C.J. Lin, et al., 45nm Low power CMOS logic compatible embedded
STT MRAM utilizing a reverse-connection 1T/1MTJ cell, in Proc. of
IEEE IEDM, pp. 279-282, 2009.
W.S. Zhao, E. Belhaire, C. Chappert and P. Mazoyer, "Spin Transfer
Torque (STT)-MRAM based Run Time Reconfiguration FPGA
circuit"ACM Transactions on Embedded Computing Systems, Vol.9,
No.2, article 14. 2009.
H S P. Wong, et al., 2010, Phase Change Memory, Proceedings of the
IEEE, Vol.98, pp.2201-2227.
M. Kund, et al., 2005, Proc. Of IEEE IEDM, USA, pp. 754757.
S. S. P. Parkin, M. Hayashi, and L. Thomas, Magnetic domain-wall
racetrack memory Science, vol. 320, no. 5873, pp. 190-4, 2008.
C. Burrowes et al., Non-adiabatic spin-torques in narrow magnetic
domain walls, Nature Physics, vol.6, pp.17-21, 2010.
W. S. Zhao, J. Duval, D. Ravelosona, J. O. Klein, J. V. Kim, A
compact model of domain wall propagation for logic and memory
design, Journal of applied physics, 109, 07D501, 2011.
A.J. Annunziata et al., Racetrack memory cell array with integrated
magnetic tunnel junction readout, Proc. Of IEEE IEDM, USA,
pp.24.2.1-24.2.4.
Y. Zhang, W.S. Zhao, D. Ravelosona, J-O. Klein, J.V. Kim and C.
Chappert, Perpendicular-magnetic-anisotropy CoFeB racetrack
memory, Journal of Applied Physics, vol.111, 093925, 2012.
M. Hayashi, L. Thomas, R. Moriya, C. Rettner, and S. S. P. Parkin,
Current-controlled magnetic domain-wall nanowire shift register
Science, vol. 320, no. 5873, pp. 209-11, Apr. 2008.
International Roadmap for semiconductor (ITRS), 2010 ERD Update.
STMicroelectronics, 28 nm Design Kit Design Manuel, 2012.
W.S. Zhao et al., Design Considerations and Strategies for HighReliable STT-MRAM Microelectronics Reliability , Vol.51, pp.14541458, October 2011.
H. Yoda et al, Progress of STT-MRAM Technology and the Effect on
Normally-off Computing Systems, Session 11.3, IEDM 2012.
W.S Zhao, C. Chappert, V. Javerliac and J-P. Noizire, "High speed,
high stability and low power sensing amplifier for MTJ/CMOS hybrid
logic circuits", IEEE Trans. Magn., Vol.45, pp.3784-3787, 2009.
W.S. Zhao, D. Ravelosona, J-O Klein and C. Chappert, Domain Wall
Shift Register based reconfigurable logic IEEE Transactions on
Magnetics, Vol.47, pp.2966-2969, October 2011.
K. Ryu et al., A Magnetic Tunnel Junction Based Zero Standby
Leakage Current Retention Flip-Flop, IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 20, pp. 2044 2053, 2012.
W.S. Zhao, E. Belhaire, C. Chappert, Spin-MTJ based Non-Volatile
Flip-Flop,Proc. Of IEEE International Conference on Nanotechnology
(IEEE-NANO), pp.399-402, 2007.
N. Sakimura, T. Sugibayashi, R. Nebashi, et N. Kasai, Nonvolatile
Magnetic Flip-Flop for Standby-Power-Free SoCs, IEEE Journal of
Solid-State Circuits, vol. 44, no 8, pp. 22442250, 2009.
S. Fukami, et al., "Current-induced domain wall motion in
perpendicularly magnetized CoFeB nanowire". Appl. Phys. Lett. 98,
082504, 2011.
Spintronics device library: Spinlib www.ief.u-psud.fr/~zhao/spinlib.html
N. Lei et al., Strain-controlled magnetic domain wall propagation in
hybrid piezoelectric/ferromagnetic structures", Nature Communications,
vol.4, 1378, 2013.
J.H. Franken, H.J.M. Swagten and B. Koopmans, Shift registers based
on magnetic domain wall ratches with perpentidular anistropy, Nature
Nanotechnlogy, vol. 7, pp.499-503, 2012.
FET
FP7
European
project
Magwire: http://pages.ief.upsud.fr/magwire/Magwire/Homepage.html

You might also like