You are on page 1of 28

CHAPTER-I

INTRODUCTION

Field Programmable Gate Arrays (FPGAs) are a step in the continuum ofevolution of
Integrated Circuits (IC). FPGAs are reprogrammable silicon chips and are one of the
Programmable Logic Devices (PLDs) that can be configured to implement customized
hardware functionality of any digital circuit. Due to their flexibility, programmability,
capacity for various applications and low end product cycle, FPGAs are highly desirable
for implementation of digital circuits. The main difference between FPGAs and
conventional fixed logic implementations, such as Application Specific Integrated
Circuits (ASICs), is that the designer can program the FPGA on-site. Using an FPGA
instead of a fixed logic implementation eliminates the non-recurring engineering (NRE)
costs and significantly reduces time-to-market.

FPGA chips adoption across all industries is driven by the fact that FPGAs combine the
best parts of ASICs and processor-based systems. These reprogrammable silicon chips
also have the same flexibility of software running on a processor-based system, but it is
not limited by the number of processing cores available. The software tools provide the
programming environment, whereas FPGA circuitry is truly a “hard” implementation of
program execution.

This chapter covers the background information about FPGAs, architecture of general
FPGA and some commercial FPGAs followed by its market trends and applications. It
then describes about the issues and challenges in FPGA based digital system designs
followed by motivation and objectives of research work. The organization of further
chapters of the thesis is covered at the end of this chapter.

1.1 Introduction

FPGAs are reprogrammable silicon chips that can be configured to implement


customized hardware functionality by using its prebuilt logic blocks and programmable

1
routing resources without even picking up a breadboard or soldering iron. They are based
around a matrix of Configurable Logic Blocks (CLBs) connected through programmable
interconnects.

The FPGA share a common history with most PLDs. PLDs is divided into three basic
architecture types: Simple Programmable Logic Devices (SPLD), Complex
Programmable Logic Devices (CPLD) and FPGA. The first of this kind of devices was
the Programmable Read Only Memory (PROM). Philips invented the Field-
Programmable Logic Array (FPLA) in the 1970s which was driven by need of
specifically implementing logic circuits. It consisted of two planes, a programmable
wired AND-plane and the other as wired OR that could implement functions in the Sum
of Products form. But in both the devices (PROM and FPLA), sequential logic like a
flip-flop to create synchronous designs or state machine are missing.

To overcome difficulties of cost and speed, the first real PLD, called as Programmable
Array Logic (PLA) was developed by Monolithic Memories Inc. (MMI) in 1978 Birkner
and Chua (1978). This was first kind of PLD-architecture. PLAs had only one
programmable ‘AND’ plane fed into fixed OR gates under category of SPLD. PALs,
Programmable Array Logics (PLAs) and Generic Array Logics (GALs) along with other
variants are grouped as SPLDs. A SPLD-architecture based device consists of two or
more vendor specific macro-cells to realize logic functions. The architecture of basic
simplified SPLD has been shown in Figure 1.1. The following are the significant
characteristics of SPLDs:

 One macro-cell per output


 Minimum two macro-cells per device
 Typically all macro-cells are identical
 One product term per macro-cell
 Product term typically generated by AND-matrix and OR-matrix, out of which
minimum one matrix is programmable
 Dedicated one flip-flop per macro-cell

2
Figure 1.1: Architecture of simplified SPLD macro-cell

In order to cater to growing technological demands, SPLDs were integrated onto a single
chip called Complex Programmable Logic Devices (CPLDs) that contains many SPLD-
like devices interconnected via a programmable switch matrix. The SPLD-like devices
were called logic-blocks that contain many SPLD-like macro-cells. Some PLD vendors
developed their logic block or switch matrix architecture and gave them vendor-specific
names like: Erasable Programmable Logic Devices (EPLD), Electrically Erasable
Programmable Logic Devices (EEPLD), Segmented Programmable Logic Devices
(SPLD) and Expanded Programmable Logic Devices (XPLD). The architecture of basic
CPLD has been shown in Figure 1.2. The following are the significant characteristics of
CPLDs:

 Product terms in programmable macro-cells


 Many macro-cell per logic block and typically all logical blocks are identical

Figure 1.2: Architecture of CPLD

 Typically one dedicated flip-flop per macro-cell


 Minimum two logic blocks per device
 Routing between logic blocks is via global switch matrix

3
Simulations and prototyping have been a very important part of the electronics industry
since a very long time. Before heading in for the actual fabrication of a dedicated
hardware, everyone would want to be sure that what they are making will work the way
they want it to. Over all these years, while electronics companies offered dedicated
hardware in their products, it was not possible for the end user to reconfigure them to his
own needs. This need led to the growth of a new market segment of customer
configurable Field Programmable integrated circuits called Field Programmable Gate
Arrays or FPGAs where transistors gave way to Logic Blocks and the customization
could now be performed by the user on the field and not in the manufacturing lab.

1.2 FPGA-Architecture

The FPGA-architecture consists of many logic modules which are placed in array-
structure and these modules are configurable at site and are therefore called as
Configurable Logic Blocks (CLBs). The channels between the CLBs are used for routing.
The arrays of the CLBs are surrounded by programmable I/O modules and connected via
programmable interconnects. There are two subclasses of FPGA architecture depending
on granularity of CLBs: Coarse-grained and Fine-grained FPGAs.

Figure 1.3: General Architecture of FPGA

4
The coarse-grained FPGAs have very large logic modules/CLBs with sometimes two or
more sequential logic elements, whereas the fine-grained FPGAs have very simple logic
modules. A conventional, island-style FPGA can be viewed as an array of CLBs
connected by programmable interconnects i.e. switchboxes (SBox) and connection boxes
(CBoxes) as shown in Figure 1.3 Parvez and Mehrez (2011).

As shown in Figure 1.4a, N programmable lookup tables (LUTs) are connected together
using internal interconnect inside the CLB. The LUT is simply a circuit that selects the
output of aStatic Random Access Memory(SRAM) cell based on the LUT’s k inputs: by
programming appropriate values in the SRAM cells the LUT can implement any k-input
function. Figure 1.4b shows the architecture of 3-input LUT. All programmable
interconnect is implemented using the simple, unidirectional switch that takes several
inputs and selects one output. The circuit consists of input end buffers for each input that
serve to isolate the switch, a multiplexer, and an output buffer that drives a longwire
segment. Each of the input and output buffers can be built using a single or multiple
staged inverters as described by Parvez and Mehrez (2011). Figure 1.5a Farooq, et.al
(2012) shows how to assemble these switches into a switchbox and Figure 1.5b shows the
basic 3-input switch circuit.

Figure 1.4 a: CLB with 5 look-up tables (LUTs) Figure 1.4 b: 3 input LUT

5
While the CLB provides the logic capability, flexible interconnect routing routes the
signals between CLBs and to and from I/Os. Routing comes in several flavors, from that
designed to interconnect between CLBs to fast horizontal and vertical long lines spanning
the device to global low-skew routing for clocking and other global signals. The design
software makes the interconnect routing task hidden to the user unless specified
otherwise, thus significantly reducing design complexity.Today’s FPGAs provide support
for dozens of I/O standards thus providing the ideal interface bridge in the system. I/O in
FPGAs is grouped in banks with each bank independently able to support different I/O
standards. The leading FPGAs provide over a dozen I/O banks, thus allowing flexibility
in I/O support.The CLBs can be configured to implement any logic function andthe
interconnects provide the flexibility to connect any signal in the design to any logic
resource. The programming technology for the logic and interconnect resources can be
SRAM, flash memory, or antifuse. SRAM-based FPGAs offer in-circuit reconfigurability
at the expense of being volatile, while antifuse are write-once devices. Flash-based
FPGAs provide an intermediate alternative by providing reconfigurability as well as non-
volatility. The most popular programming technology in state-of-the-art FPGAs is SRAM.

a) Unidirectional switchbox b) 3-input switch


Figure 1.5 FPGA switchbox and switch circuits

Traditionally, FPGAs consist of input/output pads, logic resources, and routing resources.
However, state-of-the-art FPGAs usually include embedded memory, Digital Signal
Processing (DSP) blocks, Phase-Locked Loops (PLLs), embedded processors, and other
6
special feature blocks.

The credit to develop the first commercially viable FPGA goes to Xilinx co-founders
Ross Freeman and Bernard Vonderschmitt. The XC2064 was invented in 1985 consisting
of 64 Configurable Logic Blocks with 3 Look-Up Tables. By the end of 1990, a lot of
competition sprung up in manufacturing FPGAs when Xilinx’s market share started to
decline. As FPGA started gaining acceptance in applications like Digital Signal
Processing and Telecommunications, players like Actel, Altera, Lattice, QuickLogic,
Cypress, Lucent and SiliconBlue started entering this field in the world FPGA Market
along with Xilinx.

1.3 Commercial FPGAs

The architecture of various FPGAs differs from vendor to vendor and is characterized by
its structure, content of logic block and routing resources. Generic Xilinx Architecture
consists of:

 Symmetric Array; Array consists of CLBs with LUTs and D-Flip-flops.


 N-input LUTs can implement any n-input Boolean function
 Array embedded within the periphery of IO blocks
 Array elements interleaved with routing resources (wire segments, switch
matrix and single connection points)
 Employs SRAM technology
The elementary programmable logic block in Xilinx FPGAs is called slice.The Xilinx
Virtex-4 FPGA slice as detailed in Xilinx Vertex-4 User Guideand shown in Figure 1.6
includes:

• Two 4-input LUTs that can implement any 4-input Boolean function, used as
combinational function generators (here one LUT is marked “F”, the other one is marked
“G”).

7
Figure 1.6 Xilinx Virtex-4 FPGA slice [Xilinx Vertex-4 User Guide]

• Two dedicated user-controlled multiplexers for combinational logic (shown as MUXF5


and MUXFX). MUXF5 can be used to combine outputs of the slice's LUTs and to
implement 5-input combinational circuit. MUXFX is used to combine outputs of the
other MUXF5 and MUXFX (from the other slices).

• Dedicated arithmetic logic (two 1-bit adders, carry chain and two dedicated AND gates
for fast and efficient multiplication).

• Two 1-bit registers that can be configured either as flip-flops or as latches. The input to
these registers is selected by YMUX and XMUX multiplexers. Note that these
multiplexers aren't user-controlled: the path is selected during FPGA programming.

The next generation of Xilinx Virtex-5 FPGA [Vertex-5 User Guide] slice shown in
Figure 1.7 includes:

8
Figure 1.7 Xilinx Virtex-5 FPGA slice [Xilinx Vertex-5 User Guide]

• Four LUTs that can be configured either as 6-input LUTs with 1-bit output or 5-input
LUTs with 2-bits output.

• Three dedicated user-controlled multiplexers for combinational logic (shown as


F7AMUX, F7BMUX and F8MUX). The multiplexers F7AMUX and F7BMUX combine
outputs of the slice's LUTs to implement 7-input combinational circuits. The multiplexer
F8MUX is used to combine outputs of the F7AMUX and F7BMUX.

• Dedicated arithmetic logic (two 1-bit adders and a carry chain).

• Four 1-bit registers that can be configured either as flip-flops or as latches. The input to
these registers is selected by AMUX – DMUX multiplexers. Note that these multiplexers
aren't user-controlled: the path is selected during FPGA programming.

The Stratix, as given in Stratix Device Family Data Sheet, is an FPGA device of other
FPGA vendor Altera that consists of five major components: Logic Array Blocks (LABs)
to implement arbitrary logic functions, memory blocks to store data, DSP blocks to speed
up multiply and accumulate operations, PLL modules to alter the phase and the frequency
of the input clock, and I/O pads to access the outside world. All of these components are
interconnected by a programmable routing network.

9
Each LAB consists of 10 Logic Elements(LEs) that operate in the normal or the dynamic
arithmetic mode. In the normal mode, the LE is configured as a single 4-input lookup
table (LUT) and a register. The output of an LE is either the output of the LUT or the
output of the register whose data input comes from the LUT. This mode is useful when
implementing arbitrary logic functions. The functional schematic of an LE in this mode is
shown in Figure 1.8.

The Altera Stratix II device is the second generation of the Altera Stratix. The most
noticeable difference is the redesigned LAB that implements arbitrary logic functions.
The Stratix II LAB consists of 8 Adaptive Logic Modules (ALMs). A high level
schematic of an ALM is shown in Figure 1.9. An ALM can operate in one of four modes:
normal extended LUT, arithmetic and shared arithmetic. In the normal mode the ALM
combinational logic can produce four outputs. Two outputs come from the registers in the
ALM, while the other two outputs are generated by combinational logic that can be
configured in several different ways: a pair of 4-input LUTs, a 5 and a 3-input LUT, a 5
and a 4-input LUT, a pair of 5-input LUTs or a pair of 6-input LUTs. In the latter three
cases, some of the input signals are shared between LUTs. Alternatively, a single output
function of 6 inputs can be implemented. Also, a subset of 7-input functions can be
implemented in a single ALM using the extended LUT mode.

Figure 1.8 Stratix LE in normal mode [Stratix Device Family Data Sheet]

10
Figure 1.9 High level diagram of the Stratix II ALM [Stratix Device Family Data Sheet]

State-of-the-art FPGAs usually include embedded memory, Digital Signal Processing


(DSP) blocks, Phase-Locked Loops (PLLs), embedded processors, and other special
feature blocks.

1.4 Advantages of FPGA Based Systems

The development of an FPGAs applicationsand software development have the similarity


that both use high abstraction level programming/description languages and the design
process is heavily dependent on software tools. The design of complex systems is easy,
but verification of the correct functioning of those systems is not so easy. On the other
hand, a significant difference is that an FPGA application does not need an operating
system, hardware drivers, or other similar platform. Rather, such a computing
environment can be configured onto a device with sufficient capacity with FPGAs as they
use parallel processing with dedicated hardware for each function instead of executing a
program one instruction at a time. A distinct difference between an FPGA application and
a microprocessor application running software is in its parallel computing. An FPGA
executes all of the logics on each clock cycle, whereas a microprocessor executes one
program instruction per cycle.FPGAs can provide fast response times with dedicated
hardware for each task and less concerns about tasks interfering with each other. Cyber
security issues which are there with software are also considered to be less with

11
FPGAs.They can contain all of the memory and functions and it is very difficult and
impossible to alter it for certain technologies.

Although ASICs are similar to FPGAs in the sense that the function is pure hardware
without additional layers of platform software (operating system, drivers, etc.) but unlike
FPGAs, the functions of ASICs are fixed and determined at production. Only the
necessary circuitry is placed on the chip. Because the manufacturing process cannot be
used to mass produce “blank” devices that are then configured for the applications, the
NRE costs (in particular, setting up the manufacturing process) are huge compared to
FPGAs. However, ASIC is a good choice if the production quantities are in the millions
of devices.

Dedicated separate hardware for all functions provides the advantages of computational
efficiency, but from the reliability point of view, a more important aspect is the
separation of functions. There is no need for resource allocation such as memory,
processor time, or data transfer on a bus. This reduces the risk of functions interfering
with each other or with the operating system or other platform functions in FPGAs. As
described by Salaun (2009) the standardized configurable hardware components provide
the advantages but not the problemsof both conventional hardware and microprocessor
technology.As there are only the necessary components and functions, the conventional
hardware is considered to be the least complex, and conventional microprocessor
implementation as the most complex. FPGA falls in between these two implementations.
The capabilities of the technologies correspond to the complexity, with increased
complexity offering increased capability to implement more varied functions and
alternative solutions. As an access or the possibility to alter the functions is more
difficult, the security issues are also seen to be less with FPGAs than with microprocessor
based systems.

FPGAs now deliver ASIC-like density and performance, while their flexibility and
operational characteristics offer distinct advantages over their ASIC counterparts. The
designers are turning more towards FPGAs for new system on chip (SoC) designs as
innovative architectures with embedded processors; memory blocks and Digital Signal

12
Processors (DSPs) are emerging in FPGA. If the design time in case of FPGA is 9
months, then it takes approximately 2-3 years in case of ASIC for the same design.
Moreover, high initial ASIC cost is recovered only in very high volume products. Use of
FPGAs as a percentage of logic market has increased from 10 to 22% in past 3-4 years.
FPGAs (or programmable logic) are the fastest growing segment of the semiconductor
industry.

The advantages of using FPGA based design can thus be summarized with following five
indicators:

 Performance: By breaking the paradigm of sequential execution and


accomplishing more per clock cycle, FPGAs exceed the computing power of
DSPsby taking advantage of its hardware parallelism. Berkeley Design
Technology, Inc. (BDTI), a noted analyst and benchmarking firm, released
benchmarks showing how FPGAs can deliver many times the processing power
per dollar of a DSP solution in some applications. Moreover, controlling of inputs
and outputs (I/O) at the hardware level in FPGAs provides them faster response
times and specialized functionality to closely match application requirements.

 Time to market: In the face of increased time-to-market concerns, FPGA


technology offers flexibility and rapid prototyping capabilities. Without going
through the long fabrication process of custom ASIC design, one can test an idea
or concept and verify it in hardware and then can implement incremental changes
and iterate on an FPGA design within very short period. Commercial off-the-shelf
(COTS) hardware is also available with different types of I/O already connected
to a user-programmable FPGA chip.

 Cost: The NRE expense of FPGA-based hardware solutions is much less as


compared to custom ASIC design. The large initial investment in ASICs is easy to
justify for Original Equipment Manufacturers (OEMs) shipping thousands of
chips per year, but many end users need custom hardware functionality for
theirsmall number (tens to hundreds) of systems in development. As the system
requirements often keep on changing over time, the cost of making incremental

13
changes to FPGA designs is negligible as compared to the large expense of re-
spinning an ASIC.

 Reliability: The software tools provide the programming environment and the
FPGA circuitry is truly a “hard” implementation of program execution. Processor-
based systems often involve several layers of abstraction to help schedule tasks
and share resources among multiple processes. The driver layer controls hardware
resources and the Operating System (OS) manages memory and processor
bandwidth. For any given processor core, only one instruction can execute at a
time, and processor-based systems are continually at risk of time-critical tasks
preempting one another. As FPGAs do not use OSs reliability concerns get
minimized with its true parallel execution and deterministic hardware dedicated to
every task.

 Long-term maintenance- The FPGA chips are field-upgradable and do not


require the time and expense for its redesign. These chips can keep up with future
modifications that might be necessary with the changes over time and do not
cause maintenance and forward-compatibility challenges. Digital communication
protocols, for example, have specifications that can change over time, and ASIC-
based interfaces may cause maintenance and forward-compatibility challenges.

1.5 Market Trend of FPGAs

FPGA technology continues to gain momentum, and the worldwide FPGA market which
was valued at USD 5.08 billion in 2012 is expected grow to USD 8.95 billion by 2019,
growing at a Compound Annual Growth Rate (CAGR) of 8.5% from 2013 to 2019 as per
the new market report published in Transparency Market Research (2013). Since their
invention by Xilinx in 1984, FPGAs have gone from being simple glue logic chips to
actually replacing custom ASICs and processors for signal processing and control
applications. Increasing adoption of 3G and Long Term Evolution (LTE) technologies
and high demand for bandwidth in wireless networks is expected to drive the FPGA
market in the coming years. Furthermore, FPGAs are used in flat panel displays;
therefore, growing popularity of smart phones, tablets, and phablets with advanced touch
screen functionality is expected to contribute to market growth over the forecast period.

14
Rising electronic content in automotives and advent of electric and hybrid vehicles which
employ FPGAs to a large extent is a key growth driver for the industry.

Profiling the top FPGA companies for 2013, Altera and Xilinx continue to dominate the
market for general purpose programmable logic as per the report of Source Tech 411
(2013) about top FPGA companies for 2013 dated 28th April 2013. The other FPGA
companies are Actel, Lattice, Lucent, Quick logic and Cypress had their market share in
2012 as shown in Figure 1.10. These two companies comprise approximately 90%
market share (Xilinx 47%, Altera 41%) in 2013 with combined revenues in excess of
$4.5B and a market cap over $20B.

Figure 1.10: FPGA market share-2012


1.6 Applications of FPGAs

Due their field-programmability, FPGAs are an ideal fit for many different markets.
Many industry leaders in the field of FPGAs specially Xilinx provides comprehensive
solutions consisting of FPGA devices, advanced solutions, configurable and ready to use
Intellectual Property (IP) cores for various applications and markets such as:

 Aerospace and Defense:FPGA vendors provide radiation-tolerant FPGAs along


with intellectual property for image processing, waveform generation, and partial
reconfiguration for SDRs.
 ASIC Prototyping with FPGAs enables fast and accurate SoC system modeling
and verification of embedded software
 Audio: Xilinx FPGAs and targeted design platforms enable higher degrees of
flexibility, faster time-to-market, and lower overall non-recurring engineering

15
costs (NRE) for a wide range of audio, communications, and multimedia
applications
 Automotive Systems:Xilinx and other vendors provide automotive silicon and IP
solutions for gateway and driver assistance systems, comfort, convenience, and
in-vehicle infotainment
 Broadcast:FPGAs adapt to changing requirements faster and lengthen product life
cycles with Broadcast Targeted Design Platforms and provide solutions for high-
end professional broadcast systems.
 Consumer Electronics:Cost-effective solutions enabling next generation, full-
featured consumer applications, such as converged handsets, digital flat panel
displays information appliances, home networking, and residential set top boxes
make uses of FPGAs based designs.
 Data Centers designed for high-bandwidth, low-latency servers, networking, and
storage applications to bring higher value into cloud deployments also make use
of FPGAs based systems.
 High Performance Computing and Data Storage systems uses FPGA based
systems for Solutions to Network Attached Storage (NAS), Storage Area Network
(SAN), servers, and storage appliances.
 Most Industrial Imaging and Surveillance, Industrial Automation makes use of
FPGAs.
 The Virtex FPGA and Spartan FPGA families can be used to meet a range of
processing, display, and I/O interface requirements inMedical Imaging Equipment
for diagnostic, monitoring, and therapy applications.
 Video and Image Processing:Xilinx FPGAs and targeted design platforms enable
higher degrees of flexibility, faster time-to-market, and lower overall NRE for a
wide range of video and imaging applications
 FPGAS are used in wired and wireless Communication for end-to-end solutions
for the Reprogrammable Networking Line Card Packet Processing, Framer/MAC,
serial backplanes, and more.RF, base band, connectivity, transport and
networking solutions for wireless equipment, addressing standards such as
WCDMA, HSDPA, WiMAX and others also use FPGA based system.

16
1.7 Issues and Challenges in FPGA Based Digital Systems

FPGA have become an attractive implementation solution in the modern digital systems
due to its reconfigurable architecture, ease of design and flexibility, better performance
and low NRE cost. However, following are significant factors and issues of concern in
FPGA system:
 Area
 Delay
 Power Consumption
 Radiation Effects on FPGAs
 Process Variation
Brown, et.al, (1992), Zuchowski, et.al, (2002) and Wilton, et.al, (2005) have made
various comparisons between FPGAs or similar devices and ASICs in the past. Recently,
a more thorough comparison has been performed by Kuon and Rose (2007). In the study,
a 90nm FPGA, the Altera Stratix II, was compared to an ASIC created using a
STMicroelectronics 90nm process. The approach used was an empirical one that
compared the area, performance, and power consumption of multiple benchmark circuits
in both technologies. The results of this comparison are summarized in Table 1.1. The
table lists the geometric average of the ratio of the FPGA measurement to the ASIC
measurement across all benchmark circuits for specific metrics. Each row indicates the
comparison of particular metric: area consumed, critical path delay and dynamic power.

Table 1.1: FPGA: ASIC Gap[Kuon and Rose (2007)]


Category/Benchmark
Pure Soft Logic Soft Logic with Soft Logic Soft Logic mixed
Metric
DSP Arithmetic with Memory with DSP and
Computations Blocks Memory Blocks
Area 35 25 33 18
Delay 3.4 3.5 3.5 3.0
Dynamic
14 12 14 7.1
Power

17
All the issues concerning to FPGA based systems are described in the following sections

1.7.1 Area

The average ratio of area consumed by a 90 nm FPGA using just soft logic to a 90 nm
ASIC is 35. This is clearly a significant difference that severely limits the size of circuits
that can be handled on a modern FPGA. One of the primary contributors to the higher
cost of FPGAs relative to ASICs is the large area gap and it directly affects the volumes
at which FPGAs are no longer price competitive. The hard logic blocks that are now
employed in FPGAs reduce this area gap and bring that ratio down. The benchmarks that
make use of the DSP blocks and soft logic are only 25 times larger on average when
implemented inan FPGA. When both memory and the DSP blocks were used, ratio of
area consumed by a 90 nm FPGA and a 90 nm ASIC came down to 18.These
measurements are somewhat optimistic because only the hard blocks that are used are
included in the FPGA area measurement, rather than some proportion of unusedblocks.

As more hard blocks are used, the ratio of FPGA to ASIC area will be reduced because
the implementation area of the hard blocks is similar to that of an ASIC implementation,
assuming that the logic dominates the area and not the routing. It is described by Kuon
and Rose (2007)that with full utilization of all the hard blocks available on the Stratix II,
the area gap could potentially shrink to as low as 4.7 times.

Due to alimited supply of programmable routing resources, it is very well accepted that
100% logic utilization of FPGAs is frequently impractical. The amount of interconnect
needed to completedevice routing is defined by the individual natureof a particular and
specific logic design.Unused routing area will be wasted if the routing allocated to a
device is at a high level relative to itsavailable logicand the design can be definedas logic
limited. On the other hand, if the level of routing resources is at a low level relative to
itsavailable logic, the logic device will be routing-limited.Thus in order to successfully
complete place and route, the user requires selecting an FPGA with a larger amount of
routing and logic resources. This leads to wasted logic area because the additional logic
resourceswill likely be unused. By allocating routing resources to a given logic capacity,

18
an area-efficient FPGAfamily can be designed so that area wastage across a collection of
designs with similar amounts of logicis minimized and the mapping for most designs is
balanced.

1.7.2 Delay

As compared to area gap, the average delay gap of 3.4 times for the soft logic case in
FPGA is not as large. The DSP blocks, which dramatically improved the FPGA vs. ASIC
area gap have little benefit on the FPGA vs. ASIC delay gap. The following reasons are
cited for this little effect only:

 The net gain is not as dramatic as expected because a hard block may only speed
up a portion of the critical path, with the remainder still relatively slow.
 The gain is limited because a hard block may speed up some number of critical
paths in the FPGA, but there may be other near-critical paths that do not speed up.

Similarly, the use of the memory blocks in the FPGA also had a negligible impact on the
delay gap between FPGAs and ASICs and their main benefit also appears to be area
savings only. Ho, et.al, (2006) suggests that if a full re-timing passis done on the circuit
with the hard embedded blocks in place, larger gains from hard blocks can be obtained
particularly for highly pipelined circuits.

1.7.3 Power Consumption

As compared to customized circuits, programmability of FPGAs result in with more


loaded interconnection network. The pass transistors; signal buffers and other
programmable switching structure increase the capacity load of signal networks over
dedicated metal wires. On account of flexibility of FPGA as compared to other
processing/ implementation units having fixed architectures and interconnections, there is
a significant increase in the power consumption in FPGA.

As stated by Shang, et.al, (2002), Wilton, et.al, (2004) and Degalahal and Tuan (2005),
there are two primarily types of power consumptions in FPGAs: static and dynamic.
Static power is consumed due to transistor leakage whereas dynamic power is consumed
by toggling nodes and is mainly a function of voltage, frequency, and effective

19
capacitance. It is important to understand both types their variations under various
conditions so that they can be properly optimized to meet the design’s power budget.

As described by Sedcole and Cheung (2006), static power is mainly caused by leakage
current between power supply and ground and consists of sub threshold leakage, reverse
biased PN junction current, gate induced drain leakage and gate tunneling. The leakage
current starts to be fairly significant at 90 nm for both ASICs and FPGAs and becomes
even more challenging at 65 nm. Static power due to transistor leakage varies with the
following parameters: Process, Voltage and Temperature. It has been revealed that static
power increases dramatically along with shrinking transistor size. Moreover, the thermal
characteristics also get affected significantly with design shrinking of features. To obtain
higher performance from the transistor, the threshold voltage of the transistor that also
increases the leakage needs to be lowered.Core voltage also influences the static power
and leakage and variation is approximately square and cube of core voltage. With an
increase of only 5% of core voltage the static power increases approximately by 15%.

To reduce the transistor leakage FPGAs, Xilinx IC designer started to adopt the use of a
third-gate oxide thickness (triple oxide) in the transistors of 90 nm Virtex-4 FPGAs. The
third medium thickness oxide (midox) and higher threshold voltage in the portion of the
transistors of Virtex-4 FPGAs allows a dramatic reduction in overall leakage compared to
other FPGAs. Virtex-5 FPGAs continue to deploy the triple oxide technology in the 65
nm process node that enables significant lower leakage current and static power.

Dynamic power is the power-consumed during switching events in the core or I/Os of an
FPGA. This is caused by signal transition at device transistors and frequency of signal
transition is obviously related to clock frequency. Due to its programmability the
dynamic power consumption is design dependentin FPGA. The factors like switching
activity of resources, effective capacitance of resource and resource utilization are design
dependent and contribute to dynamic power. Switching activity represents the average
number of signal transition in a clock cycle. The effective capacitance corresponds to the
sum of parasitic effect due to interconnection wires and transistors. FPGA architecture
usually offers more resources than actually required to implement a particular design,

20
which means some resources, are not used after the final chip configuration and they
don’t consume dynamic power; this is referred to as resource utilization. Taking all these
factors into account, as per Shang, et.al, (2002), Lee, et.al, (2003) and Degalahal and
Tuan (2005), total dynamic power consumption of the device is generally modeled as:

P= V2 f  S C U
n
Where:
n is the number of toggling nodes,
V is supply voltage,
f is clock/toggle frequency (presumed to be fixed for each resource) and
S, C, and U correspond to switching activity of resources, effective capacitance of
resource and resource utilization respectively.

All nodes in the FPGA consume power through a combination of charging transistor
parasitic capacitance and metal interconnect capacitance. The later depends on the length
of route in FPGA, while the number of transistors that are switching determines the node
capacitance. The dynamic power can be reduced by reducing the number of switching
transistors and minimizing routing lengths through tighter packing. The Virtex-5 FPGAs
have lowered the gate capacitance and shorter interconnect traces that contributes to
lowering the node capacitance by about 15% and hence lowers the dynamic power.
Moreover dynamic power gets reduced by approximately 17% in Virtex-5FPGAs simply
by decreasing core voltage from 1.2 V to 1.0 V. Research efforts in reducing the power
consumption and improving the power efficiency of reconfigurable FPGAS have been
intensified in the recent past.

1.7.4 Radiation Effects on FPGAs

Increased density and corresponding shrinkage of process geometry has made FPGA
devices more susceptible to failures due to external radiations. Earlier this has been an
issue for space based system but is now becoming an issue for terrestrial systems in
elevated radiation environment and commercial avionics as well. Radiation effects on

21
single FPGA have system level consequences and need to be addressed in current and
future designs. As described in Xilinx TMR user guide 6.2.3 version of September 2006,
there are two main categories of radiation effects that are relevant for SRAM, FPGAs in
space: Total- Dose Effects and Single-Event Effects (SEEs).

Total-Dose Effects are similar to sunburn to human and is dependent on the amount of
radiations and how long it took to accumulate the total dose. These are cumulative effects
that induce degradation of electrical parameters at the device, circuit and system levels.
They are induced by the total amount of ionizing energy deposited by photons or particles
such as electrons, protons, or heavy ions.

SEEs are induced by the passage of a single high energy proton or heavy ion through a
device or a sensitive region of a microcircuit. SEEs can be either destructive e.g. Single-
Event Latch-up (SEL), or non-destructive, such as the occurrence of transient faults in
combinational and sequential logic.

The main reliability issues in radiations environment are: SEL, Single-Event Upset (SEU)
and Multiple-Bit Upset (MBU)

 Single event latch up (SEL) occurs when one of the parasitic bipolar transistors
created as a byproduct of CMOS fabrication process is activated by a charged
particle. This type of upset is activated by a charged particle.
 Single events upsets (SEUs)occur when RAM cell’s state gets changed due to
exposure to energetic particles. The effectis determined by the function of
particular RAM and it can alter one or more of the following:

- logic content
- user logic state
- logic configuration
- routing.
Alteration in logic content is the most straight forward effect and results in a flip
flop transitioning to incorrect state.

22
 Multiple-Bit Upset (MBU) is an SEU that results in more than one adjacent bits
flipping due to an oblique angle strike. Its probability steadily increases as
geometries shrink. Use of maximum MBU distance observed is useful to
determine block RAM interleaving required so that even MBUs are able to be
corrected by the Error Correcting Code (ECC)

As described by Adelland Allen (2008), the following mitigation techniques to minimize


the radiation effects can be applied on FPGAs:

a) Triple Modular Redundancy (TMR)

b) Configuration Scrubbing

c) TMR Tool

Triple module redundancy, or TMR, is an effective technique creating fault tolerant logic.
In TMR, the logic of the design can simply be triplicated, with redundant voters on the
output. In order to recover smoothly from logic upsets, the internal state of the design
must be restored to the repaired logic. It is a common hardening technique that can be
implemented via design synthesis to reduce the SEE susceptibility of FPGA parts. Since
Xilinx FPGAs have a larger gate count available than other FPGAs, they serve as a good
candidate for TMR methods.

The use of hardware redundancy by itself is not sufficient to avoid errors in the FPGA; it
is mandatory to reload the bitstream constantly to avoid the accumulation of faults. The
continuous re-load of the bitstream is called “scrubbing.” Thescrubbing as explained by
Xilinx allows a system to repair bit-flips in the configuration memory without disrupting
its operations, which includes the memory cells that configure the LUT, the routing, and
the CLB customization. Configuration scrubbing prevents the build-up of multiple
configuration faults and reduces the time in which an invalid circuit configuration is
allowed to operate.

Implementing TMR is very difficult if it is done manually. A special software tool


(TMRTool) has been developed and fits within the Xilinx design flow. This tool
eliminates half-latches (weak keepers), which are also sensitive to SEU. This tool has

23
been evaluated in several radiation tests, but more efforts will be required to ensure that it
is completely effective.

TMR does not come without a price. Obviously, designs are at least 3 times as large as a
non-TMR design, and suffer from speed degradation as well. In particular, feedback
TMR degrades the speed of operation by introducing a longer feedback path. Power
consumption is also tripled along with the logic. The underlying assumption of TMR is
that only one upset will occur within a given logic block.

1.7.5 Process Variation

Variability after manufacturing in transistor properties (such as Vt, oxide thickness and
doping concentrations), like soft errors, is an issue faced by all integrated circuit
designers. This variation can have a significant effect on performance and power
consumption. For FPGAs it has been estimated in the paper by Sedcole and Cheung
(2006) that within die variation in the speed of a logic element causes a speed reduction
of 5.7% in 90nm technology and this may potentially grow to 22.4% in 22nm technology.

To reduce the impact of process variation,architectural changes to FPGAs have been


suggested by Wong, et.al, (2005) and Nabaa, et.al, (2006). The most straightforward is to
select the logic block architecture parameters to minimize this variation and LUT size is
found to be particularly important for reducing variation in either timing or leakage. Body
biasing as an alternative approach to adaptively compensate for any variation has been
suggested by Nabaa, et.al, (2006). The inherent regularity and reconfigurability of
FPGAs makes it possible to include a characterization unit that can test each logic block
in the FPGA and store an appropriate body-bias setting. Slow blocks are set to a body
bias voltage that will decrease its threshold voltage thereby increasing the block’s speed.
This scheme incurs an area penalty of the order of 1%–2% while decreasing delay
variability by 30% and leakage variability by 78%.

Other approaches for handling process variability in FPGAs rely on CAD-level changes.
Proposals include:

24
- introducing statistical static timing analysis (SSTA) to FPGA CAD tools to
improve delays by avoiding the margins that are necessary for traditional
static timing analysis by Sivaswamy and Bazargan (2007),
- testing multiple logically equivalent configurations of the FPGA to find one
that is functional at the desired speed by Sedcole and Cheung (2007),
- generating critical paths that will be more robust in the face of variation by
Matsumoto, et.al, (2007) and
- customizing the implementation on the FPGA for the variations of each
specific device by Katsuki, et.al (2005) and Cheng, et.al, (2006).

1.8 Motivation

It has been seen that FPGAs are ideally fit for many different markets due to their field-
programmability and provide comprehensive and advanced solutions configurable and
ready to use IP cores for various applications and markets. They are replacing ASICs in
many applications because the price of a single chip development,in case of ASIC, shoots
very high in each successive technology iteration, whereas the relative prices of FPGA
architectures are becoming more and more attractive. Many other advantages of FPGAs
such as prototyping and its field reprogrammability make them more viable alternatives
in many current ASIC applications. Even though the final product is a hardware
component, the design and implementation of an FPGA-based system has strong
similarities with software-based systems. A number of hardware description languages
(HDL) or higher-level languages rather similar to software programming languages are
typically used. This, together with the automated design tools, makes it very easy to
define even very complex functions.

However, these advantages of FPGAs are offset by its speed, large area and high power
consumption in many cases.The ever-growing demand for low-power battery operated
portable communications and computer systems is motivating new low power techniques,
especially for FPGAs, which dissipate significantly more power than fixed-logic
implementations. The FPGA architectural developments have improved the area
efficiency, performance, and power consumption of FPGAs. The overhead incurred to

25
make FPGAs both general purpose and field-programmable often prevents the use of
FPGAs for some applications. It is well understood that FPGAs suffer in terms of area,
performance, and power consumption relative to ASICs. The area penalty of using an
FPGA over an ASIC at high volume results in a significantly higher unit price. The
related issues of FPGA based designs are required to be addressed to make them more
efficient in general and in respect of area, speed and power in particular.

All above mentioned advantages and applications of FPGA based system motivated to
work on FPGA based optimal digital system design after the thorough study of literature
related to FPGA based digital design and the methods/techniques which have been
explored to improve the overall performance of FPGA based designs so as touse and
apply a novel, easy and efficient approach for an optimal FPGA based digital system
design in respect of its utilization resources, delays/speed and power consumption.

1.9 Objectives of Research

The research work on the FPGA-based Low Power Optimal Digital System Design has
been taken up with the following objectives in mind:

1. To conduct exhaustive literature review in the area of FPGA-based system


designs
2. To identify the areas of improvement in the FPGA-based digital systems
3. To design and develop a very comprehensive FPGA- based digital system
4. To study and analyse the reports of utilization and timing summary and power
consumed in the above designed FPGA- based digital system as stated at 3.
5. To use and apply a novel approach for easily and efficiently optimizing those
parameters
6. To study and analyse the same reports of utilization and timing summary and
power consumed as stated at 4 for an optimized designed digital system that will
result after the application of proposed approach.
7. To compare the reports of the initially designed FPGA-based digital system and
optimized FPGA-based designed digital system that will result after application of

26
proposed approach to validate the improvement in respect of critical parameters in
FPGA-based digital system designs i.e. area, delay/speed and power.

1.10 Organization of Thesis

Further part of the thesis is organized chapter-wise as given below:

Chapter-IIcoversthe overall detailed literature survey and review of the research papers,
technical papers and manuals of FPGAs vendors regarding FPGA based designs in
general and in the area of improving its overall performance in particular. It then
summarizes the observations and proposals given by the contributors in the area to
improve the performance of FPGAs along with concluding remarks.

Chapter-III describes the basic development stages of FPGA. It then covers the basics of
32-bit floating point representations and its arithmetic operations on the basis of which
the comprehensive digital system of 32-bit floating point arithmetic unit is designed on
FPGA per the set objectives in chapter I. The algorithms and flow charts for design 32-bit
floating point arithmetic unit along with the detailed VHDL are presented in this chapter.
The same is taken as the base digital system design to carry out the further work.

Chapter-IVfirst summarizes the flow of steps for FPGA methodology and presents the
complete work flow to carry out further research work. It then briefly describes about the
software tools developed by Xilinx and the same is used for synthesis and analysis of our
digital system of 32 bit FPAU which has been presented in Chapter III. It also presents
the reports of synthesized design through some of the snap shorts. It then describes the
power estimator tool used to estimate the power consumed by our designed FPGA-based
digital system. It also presents the reports of verifications of 32-bit FPAU design on
practical FPGA platform. All the reports of the designed system in respect of resource
utilization, timing summary and estimated power consumed are demonstrated at the end
of this chapter.

Chapter-Vpresents a very novel approach of linking the Modelsim and Simulink of


Matlab for up-loading the VHDL design that is used for cross verifying the results of all

27
arithmetic operations of addition/subtraction, division and multiplication by simulation in
Modelsim wave window of Matlab after creating the simulink model and subsystem. It
describes the complete steps for linking steps for linking Matlab and Modelsim.
ModelSim is an easy-to-use yet versatile VHDL/ (System) Verilog/SystemC simulator by
Mentor Graphics. It supports behavioural, Register Transfer Level (RTL) and gate-level
modeling. Simulink model and subsystem is generated and launched to simulate and
verify VHDL coded design of 32-bit FPAU described in chapter III and IV. The
simulation results of all the floating arithmetic operations in the Modelsim window are
presented at the end of the chapter.

Chapter-VI devotes in describing all the necessary steps required for further creating of
test bench for generation of optimized code of the designed 32-bit FPAU system using
the generated simulink model presented in Chapter V followed by the optimization
process. The next section presents the reports of resources utilized and timing summary
obtained after synthesization of optimized VHDL design of 32-bit FPAU using same ISE
tool of Xilinx with the same FPGA platform/device as target for implementation which
has been used for initial designed VHDL code of FPAU. It then presents the power
estimation reports of optimized VHDL design of 32-bit FPAU using same version/release
of XPower as has been used for initial designed VHDL code of FPAU. The results of
initial VHDL design of 32-bit FPAU and the optimized VHDL code obtained after using
the proposed approach in simulink in respect of resources of FPGA utilized, timing
summary and estimated power consumed are compared at the end of this chapter.

Chapter-VIIsummarizes the concluding remarks drawn from all the chapters to


consolidate the research work and finally states the scope for future work.

28

You might also like