You are on page 1of 2

Beyond Scaling - Realizing Value Through the Integration of Memory and Autonomic

Chip Features
Subramanian S. Jyer
IBM Corporation, Systems & Technology Group,
2070 Route 52, Hopewell Junction, NY 12533. USA.
email: ssiyer(us.ibm.com

ABSTRACT proximity of N- and P-channel devices makes isolation


more complex. Additionally, the need for performance
As we scale to 45nm and beyond, the expectations of as measured by read current), the inherent threshold
increased functional value per unit cost may only be met mismatches (dominated by statistical dopant fluctuation)
with a more holistic technology integration based on cause read instabilities that manifest themselves as
systems needs. We consider two approaches: increasing number fails as the operating voltage is
Leveraging embedded dense memory. We examine the decreased (see reference 1). Standby power in SRAMs
scalability and performance issues of embedded is also not scaling because of the leakier nature of the
memory and the autonomic chip - where a judicious devices used. Finally, SER has reached very high
use of circuit innovation, material and physical values as both capacitance and stored charge at the
phenomena are employed to yield self diagnosing and internal SRAM nodes has decreased as a result of
self healing and perhaps even self reconfiguring chips scaling. Finally, for large memory sizes, time of Flight
extending both the lifetime and scope of the chip (TOF) is a significant delay component that detracts
from performance.
INTRODUCTION
DRAMs address several of the issues above. IBM has
The technological challenges of scaling into the 45nm pioneered the use of deep trench embedded DRAM for
node are well known. These include the lithographic logic applications for over four logic generations. The
challenges being addressed by hyper NA immersion trench based eDRAM is extremely logic friendly and
lithography, device performance challenges being met guarantees logic equivalency since the DRAM specific
through the use of advanced strain engineering, and not processes are established well before the logic devices
the least the challenges of interconnect performance are fabricated. There is no impact on the middle of the
which dominate at the chip level especially for complex line and backend processing ensuring timing and library
System-on-Chip (SOC) applications. We address two equivalency. DRAM cells are typically about 5 - 8 times
issues that promise to deliver increased value at both smaller than corresponding SRAM cells. At the usable
the chip and system level. One has to do with integrating memory macro level this advantage is eroded by various
dense dynamic memory on chip, and the other has to do circuit overheads but still is about 3.5 to 4 times as
with building autonomic features where self diagnosis dense as SRAMs with the advantage increasing with
and self repair are possible at various times during the memory size. From a power perspective DRAMs
chips lifetime. containing only one low leakage device typically have
about 6 to 8 times lower standby power per Mb
Embedded Memory compared to 6 transistor SRAMs. Finally, the larger
capacitance of the DRAM cell makes it immune to SER
From a functional perspective, the trend to integrate events and SER fail rates are typically 5000 times better
increasingly large amounts of memory on chip compared to SRAMs.
(embedded memory or eMemory) continues unabated.
eMemory dominates the area of mid- and high-end Scaling of eDRAM in logic technology has two goals: to
processors chips as well as most Application Specific reduce size consistent with the scaling factor and trench
Integrated Circuits (ASICs) and in many cases, memory based eDRAM has met this approximately 50% scaling
can occupy up to 70% of the die area. In the high end per generation rather well; the other is to improve
applications, the key attributes of on-chip memory are performance - in this it differs from the more
performance, area, power, soft error rate (SER) and conventional commodity DRAM. Performance
cost. eMemory is also an important attribute in improvement comes from three factors: logic
consumer ICs especially for portable applications. Here performance improvements; cell performance
power (especially standby power), area, cost and SER improvements and finally DRAM architecture
are important considerations. improvements. As a result of these three factors, it is
expected that eDRAMs in the 45 nm generation will
Static RAMs (SRAMs) have been the workhorse of have latencies of the order of 1-2ns and random cycle
embedded memory applications but are becoming times of about 2-4 ns. This coupled with their smaller
increasingly difficult to scale. As SRAMs scale, the close size and consequent lower TOF delays will make

1-4244-01 82-8/06/$20.00 ©2006 IEEE 1


eDRAMs an excellent replacement for the large on chip Realization of this BISR requires the use of a reliable
level 2/level 3 caches on processors. This has special fuse methodology. We have employed electromigration
significance in the context of multi-core processor chips, a hither-to-fore reliability problem constructively to build
where processor speed has saturated and performance a highly reliable fuse. Unlike rupture mode fuses,
is expected to come from the multi-core architecture and electromigration does not create debris and thus fuse
an optimized memory subsystem with large caches that healing does not occur (reference 3). This system has
are electrically close and dedicated to the processors. been employed not just for repair but even more
Finally, we must address the issue of eDRAM cost and extensively for chip reconfiguration, radio frequency (rf)
complexity adders. The trench based eDRAM adds 3-4 tuning, and die by die yield optimization but more
extra mask levels to a typical logic flow. Two to three of
Innovatively to store on-chip the chip
thnower frequ nc charathe chipBlill oftemperiatr,
Materials,
these masks arearuused
thes mass se to optimize
ptimze the
te stndar 1/0the
standard I/O..' power frequency characteristics, chip temperature -
device for a low leakage transfer device and are block frequency characteristics etc. These can be effectively
level masks. The other is a critical deep trench level. used at the system level to manage performance and
The overall complexity adder in 90nm is about 15% and power.
becomes a smaller fraction as the device menu
increases and backend interconnect complexity Summary
increases. A simple calculation shows that if a chip
allocates about 25% of the die area to SRAM and this While technology scaling is now more challenging, there
SRAM can be replaced functionally by eDRAM, the are some relatively easy ways to extract more value for
resultant die will in fact be more economical. Clearly, the the technology. We have described the use of
more complex technologies in 45 nm and certainly embedded DRAMs and eFUSEs to leverage additional
Silicon-on-insulator (SOI) technology make this tradeoff functional value beyond scaling alone.
more favorable for eDRAM. There will obviously gains in
power and SER resistance. From an area scaling REFERENCES
perspective, replacing standard SRAM with eDRAM is
almost equivalent to scaling the memory an additional 1. C. Wann et al, "SRAM Cell design for stability
two generations! Finally, from a performance methodology" proc. VLSI-TSA (2005) p 21
perspective, improvements in eDRAM cell and circuit 2. S.S. Iyer et al " eDRAM the technology platform for
architecture as well as judicious optimization of the the Blugene IL chip", IBM Jour, Res. Dev. 49 (2/3)
memory sub-system to leverage the larger eDRAM p333 (2005)
cache will allow for a system level performance 3. C. Kothandaraman et al " Electrically programmable
improvement of over one generation. fuse (eFUSE) using electromigration in silicides"
IEEE Electron Device Letters EDL (23) p 523 (2002)

The reader is encouraged to consult reference (2) which


explores these ideas in greater detail.

Autonomic Chips
Another important development is the building of
autonomic capabilities into the chip. Most advanced
SOCs employ Built-in Self Test (BIST). BIST engines
are dedicated engines of increasing versatility that can
apply a wide range of test vectors to functionality test a
wide variety of IP blocks including memory, logic and
analog functions. In the case of memory, BIST can be
combined with Redundancy Allocation Logic (RAL) to fix
the defects using redundant elements. Advances in the
development of electrical fuses (eFUSE) that can be
programmed on chip allow us to combine BIST, RAL
and eFUSE to form a Built-in Self Repair (BISR) system.
Additional sophistication is possible through the use of
Fuse string compression, hierarchical repair where
repair solutions are augmented at wafer level, module
level, and even multiple instances of field repair.

You might also like