You are on page 1of 112

FPGA Architecture

Presentation Overview

Available choice for digital designer


FPGA A detailed look
Interconnection Framework

FPGAs and CPLDs

Field programmability and programming


technologies

SRAM, Anti-fuse, EPROM and EEPROM

Design steps
Commercially available devices

Xilinx XC4000
Altera MAX 7000

Fixed Versus Programmable Logic

The circuits in a fixed logic device are permanent,


they perform one function or set of functions once
manufactured, they cannot be changed.

Programmable logic devices (PLDs) are standard,


off-the-shelf parts that offer customers a wide range
of logic capacity, features, speed, and voltage
characteristics - and these devices can be changed
at any time to perform any number of functions.

Classifications
PLA a Programmable Logic Array (PLA) is a relatively
small FPD that contains two levels of logic, an ANDplane and an OR-plane, where both levels are
programmable
PAL a Programmable Array Logic (PAL) is a relatively
small FPD that has a programmable AND-plane
followed by a fixed OR-plane
SPLD refers to any type of Simple PLD, usually either a
PLA or PAL
CPLD a more Complex PLD that consists of an
arrangement of multiple SPLD-like blocks on a
single chip.
FPGA a Field-Programmable Gate Array is an FPD
featuring a general structure that allows very high
logic capacity.

Definitions
Field Programmable Device (FPD):
a general term that refers to any type of integrated
circuit used for implementing digital hardware, where the
chip can be configured by the end user to realize
different designs.
Programming of such a device often involves placing
the chip into a special programming unit, but some chips
can also be configured in-system. Another name for
FPDs is programmable logic devices (PLDs).

Designers Choice

Digital designer has various options

SSI (small scale integrated circuits) or MSI (medium


scale integrated circuits) components

Difficulties arises as design size increases


Interconnections grow with complexity resulting in a
prolonged testing phase

Simple programmable logic devices


PALs

(programmable array logic)


PLAs (programmable logic array)
Architecture not scalable; Power consumption and
delays play an important role in extending the
architecture to complex designs
Implementation of larger designs leads to same difficulty
as that of discrete components

Simple Programmable Logic Devices


Simple

two level structure


PAL and PLA

Allow high speed performance


implementations of circuit

Drawback

Small logic circuits


Modest

number of product terms


Interconnection structure grow impractically
large

With increase in product terms

MPGAs

PLA
Programmable AND Plane

Programmable OR Plane
YZ
XZ
XYZ
XY

XY+YZ
XZ+XYZ

PLA
Programmable AND Plane

Programmable OR Plane

Programmable Node
Un-programmed
Connect
Disconnect

X
X
Y

X X Y Y

O1

O2

O3
XY
XY

XY

XY

O4

PAL
Programmable AND Plane

Fix OR Plane

O1

O2

O3

O4

PAL with Logic Expanders


Programmable AND Plane
Fix OR Plane

Logic expanders

PLA v.s. PAL


PLAs are more flexible than PALs since both AND & OR planes are
programmable in PLAs.
Because both AND & OR planes are programmable, PLAs are expensive
to fabricate and have large propagation delay.
By using fix OR gates, PALs are cheaper and faster than PLAs.
Logic expanders increase the flexibilities of PALs, but result in significant
propagation delay.
PALs usually contain D flip-flops connected to the outputs of OR gates
to implement sequential circuits.
PLAs and PALs are usually referred to as SPLD.

Programmable Logic

Programmable digital integrated circuit


Standard off-the-shelf parts
Desired functionality is implemented by
configuring on-chip logic blocks and
interconnections
Advantages (compared to an ASIC):

Low development costs


Short development cycle
Device can (usually) be reprogrammed

Types of programmable logic:

Complex PLDs (CPLD)


Field programmable Gate Arrays (FPGA)

A Generic CPLD Structure

Multiple PLDs can be combined on a single chip by using programmable interconnect


structures. These PLDs are called CPLDs.

CPLD
Architecture and Examples

PLD
Sum
of
Products
Programmable AND array followed by fixed fan-in OR gates
A

C
Programmable switch or fuse

f1 A B C A B C

f2 A B A B C

AND plane

PLD - Macrocell
Can implement combinational or
sequential logic
A

Select

Enable

f1
Flip-flop
D

Clock

AND plane

MUX

CPLD Structure
Integration of several PLD blocks with a
programmable interconnect on a single chip
PLD
Block

I/O Block

PLD
Block

I/O Block

I/O Block

Interconnection Matrix

I/O Block

PLD
Block

PLD
Block

High Density Logic Overview

High-Density or Complex
PLDs

Large Logic Building Blocks


PLD-Like Architectures
Centralized Interconnect

HDPLD or CPLD
A

Fast Predictable Performance


Good at Wide Gating Functions
State

Machines
Counters

Altera MAX CPLD


I/O Cell

LAB

LAB

LAB

LAB

LAB

LAB

Chip-wide
interconnect

Altera MAX chip

LA
(local
array)

LAB (Logic Array Block)

Macroccell
Each LAB contains 16 macrocells

CPLD Example - Altera MAX7000

EPM7000 Series Block Diagram

CPLD Example - Altera MAX7000

MAX 7000 architecture includes


Logic array blocks
Macrocells
Expander product terms( shareable and parallel)
Programmable interconnect array
I/O control blocks
Performance
Linking high performance and flexible LABs
Programmable interconnect array (PIA)
Global bus fed by all dedicated inputs
I/O pins
Macrocells

Configured
Combinational or Sequential logic operations

CPLD Example - Altera MAX7000

EPM7000 Series Device Macrocell

CPLD Example - Altera MAX7000

Macrocell
Logic array
Combinational

logic is implemented in the array


Five products terms

Product-term select matrix


Allocate

product terms

Primary logic inputs to AND or XOR gates


Combinational logic
Secondary inputs to the register
Clear, preset, clock, clock enable control functions

Programmable register
Logic expanders
Each

LAB has16 Shareable expanders

Inverted product terms fed back into the logic array

Parallel

expanders

Product terms borrowed from adjacent macrocells

CPLD Example - Altera MAX7000

Registered functions

Macrocell FF

Configured to get T,D,JK or SR functions

Software

Optimize resource utilization

By global clock signal

Fastest clock to output performance

By a global clock signal and an active-high clock enable

Provides an enable on each FF


Fast clock to output performance of global clock

By an array clock implemented with a product term

FF clocked
Signals from buried macrocells
I/O pins

FF supports Asynchronous preset and clear functions

Selecting efficient FF operation for registered function

Register can be clocked in different modes

Programmable clock control

Product-term select matrix provides product terms to control these functions


Control signals are active high or
can be derived to be below by inverting within the logic array

Device power-up

FF is cleared upon power up

CPLD Example - Altera MAX7000

I/O pins

Fast input path to macrocell register


bypassing PIA and combinational logic

Complex logic functions

Each macrocell provides five product


terms
Most

logic functions can designed using five


product terms

Another macrocell can be used to

supply the required logic resources


Expander product terms
Shareable

and parallel expander product

terms

Provides additional product terms to any


macrocell within the same LAB

CPLD Example - Altera MAX7000

EPM7000 Series Device Macrocell

Performance
CPLDs

have wide fan in

Single level allows high


frequency AND low
latency
Very small functions burn
logic

Macrocell
68

Logic

Design Methodologies

Custom ICs are created using unique masks for all


layers during the manufacturing process

Highly skilled and competent designers


Lengthy development time
High cost of design and testing

Mask Programmable Gate Arrays

Generic masks for all layers except metallization


Generic Masks array of modular functional blocks
Modules of transistors rows separated by fixed width chls

Designers expertise less critical


Shorter development time and low development costs
Channel-less Gate arrays Sea-of-Gates

Standard Cells

Modules or Standard cells are picked from the database


and then placed in rows and interconnected
Placement and Routing are done automatically
(removing the designers from the physical design process)
Designs are less efficient in size and performance

Gate

arrays

a highly standardized means to implement digital integrated circuit design


manufactured as regular arrays of patterned blocks of transistors which can be
interconnected to form logic elements such as gates, flip-flops and multiplexers.
Manufacturer can pre-produce gate array wafers without interconnections in highvolume.
These are then configured in an additional process step in the factory
Once a customer provides a definition of the logic block interconnections, one or
more layers of metal are added to form these connections
collectively known as MPGAs (Mask- Programmable Gate Arrays)
Sea-of-gates structures added metal interconnects have to be placed over
particular transistors, rendering them unusable
Regular gate arrays blank routing space is provided at regular intervals in
the transistor array
As process technologies advance and sizes get smaller, it is becoming
increasingly more expensive to configure such devices

Masked Programmable Logic Devices

MPGA

Rows of transistors
User specified interconnections
Within

the rows

to implement basic logic gates

Between

the rows

To connect basic gates together

I/O circuitry
Predefined mask layers except final metal layers
Manufacturer

Metal layers
Customized

to implement desired circuit

MPGA

MPGA

Drawback
Large

NRE cost

Need to generate metal mask layers


Manufacture the chip

More

time to market

Advantage
General

structure allows to implement much


larger structure

Due to the their scalable interconnection structure


Scales proportionally with the logic

Field Programmable Logic Devices


FPGAs

Programmability of PLD
Scalable interconnection structure of an
MPGA

Designers Choice
Quest

for high capacity; Two choices


available

MPGA (Masked Programmable Logic Devices)


Customized during fabrication
Low volume expensive
Prolonged time-to-market and high financial risk

FPGA (Field Programmable Logic Devices)


Customized by end user
Implements multi-level logic function
Fast time to market and low risk

Designers Choice
FPGA s

vs MPGA

Disadvantages
Low

Programmable switches
Significant resistance and capacitance in the
connections between logic blocks

Low

speed of operation

logic density

Programmable switches and programming circuitry


Requires more area over MPGA to implement with the
same amount of logic circuitry
Less number of chips per wafer

FPGA s vs MPGA

Standard Cell-based Design

Rows of cells

Feedthrough cell

Logic cell

Routing
channel

Functional
module
(RAM,
multiplier,)

Standard-cell layout methodology

Routing channel
requirements are
reduced by presence
of more interconnect
layers

Gate Array Sea-of-gates


rowsof
uncommitted
cells

routing
channel

Why FPGAs?

Advantages of FPGAs

Replacement of SSI and MSI chips


Availability of parts off the shelf
Rapid Turnaround
Low risk
Re programmability

Limitations

PLDs will operate faster than FPGAs for the same design
implemented in both
For FPGAs the circuit delay depends on the design
implementation tools
Less dense and operate at lower speed when compared
to conventional Gate Arrays

FPGA A Quick Look

Two dimensional array of customizable logic


block placed in an interconnect array
Like PLDs programmable at users site
Like MPGAs, implements thousands of gates
of logic in a single device
Employs

logic and interconnect structure capable of


implementing multi-level logic
Scalable in proportion with logic removing many of the
size limitations of PLD derived two level architecture

FPGAs offer the benefit of both MPGAs and


PLDs!

FPGA A Detailed Look

Based on the principle of functional


completeness
FPGA: Functionally complete elements
(Logic Blocks) placed in an interconnect
framework
Interconnection framework

comprises of wire segments and switches

Provide a means to interconnect logic blocks

Circuits are partitioned to logic block size,


mapped and routed

Basic FPGA Architecture

FPGA Architecture

(With Multiplexer As Functionally Complete Cell)


Basic building block

Interconnection Framework
Granularity

and interconnection structure has


caused a split in the industry

FPGA
Fine grained
Variable length
interconnect segments
Programmable switches
Timing in general is not
predictable; Timing
extracted after placement
and route

Interconnection Framework

CPLD
Coarse grained

(SPLD like blocks)


Programmable crossbar
interconnect structure
Interconnect structure uses
continuous metal lines
The switch matrix may or may
not be fully populated
Timing predictable if fully
populated
Architecture does not scale well

Field Programmability

Field programmability is achieved through


switches (Transistors controlled by memory
elements or fuses)
Switches control the following aspects
Interconnection

among wire segments


Configuration of logic blocks

Distributed memory elements controlling the


switches and configuration of logic blocks are
together called Configuration Memory

Technology of Programmable
Elements

Vary from vendor to vendor. All share the


common property: Configurable in one of the
two positions ON or OFF
Can be classified into three categories:

SRAM based
Fuse based
EPROM/EEPROM/Flash based

Desired properties:
Minimum

area consumption
Low on resistance; High off resistance
Low parasitic capacitance to the attached wire
Reliability in volume production

SRAM Programming
Technology

Employs SRAM (Static RAM) cells


to control pass transistors and/or
transmission gates
SRAM cells control the configuration
of logic block as well
Volatile
Needs an external storage
Needs a power-on configuration
mechanism
In-circuit re-programmable

Lesser configuration time


Occupies relatively larger area

Anti-fuse Programming
Technology

Though implementation differ, all anti-fuse


programming elements share common property

Uses materials which normally resides in high

impedance state
But can be fused irreversibly into low impedance
state by applying high voltage

Anti-fuse Programming
Technology

Very low ON Resistance (Faster


implementation of circuits)
Limited size of anti-fuse elements;
Interconnects occupy relatively lesser area

Offset : Larger transistors needed for programming

One Time Programmable

Cannot be re-programmed
(Design

changes are not possible)

Retain configuration after power off

EPROM, EEPROM or Flash


Based Programming Technology

EPROM Programming Technology


Two gates: Floating and Select
Normal mode:
No charge on floating gate
Transistor behaves as normal n-channel transistor

Floating gate charged by applying high voltage


Threshold of transistor (as seen by gate) increases
Transistor turned off permanently

Re-programmable by exposing to UV radiation

EPROM Programming
Technology
Used

as pulldown devices
Consumes static
power

EPROM Programming
Technology
No

external storage mechanism


Re-programmable (Not all!)
Not in-system re-programmable
Re-programming is a time consuming
task

EEPROM Programming
Technology

Two gates: Floating and Select


Functionally equivalent to EPROM;
Construction and structure differ
Electrically Erasable: Re-programmable by
applying high voltage
(No UV radiation expose!)
When un-programmed, the threshold (as
seen by select gate) is negative!

EEPROM Programming
Technology

EEPROM Programming
Technology
Re-programmable;

In general, insystem re-programmable


Re-programming consumes lesser time
compared to EPROM technology
Multiple voltage sources may be
required
Area occupied is twice that of EPROM!

Programming Technologies

Basic architectures of FPGAs

An FPGA device
to allow the implementation of practically any logic circuit
requires an area trade-off between a sufficient number of
flexible configurable logical cells and
enough interconnect resources to allow all connections between these cells.

majority of circuits
a small portion of routing and logic resources,
Resulting in a loss in speed (signal passing through redundant routing
elements)
density of logic when compared to the same circuit implemented in dedicated
logic.

grouping of different FPGA devices with related architecture into a family.


Each member in a family would be physically tailored to a certain class of
application architecture, by for example replacing the switches in certain
routes by hard shorts, or hard-wiring the logical cells internally in a certain
manner.
This member may now implement certain circuits more efficiently, but its
reduced flexibility means that some circuits may not fit at all onto the device.

Implementation of a circuit is now a question of choosing the


right device from the FPGA family.

Programming Skills vs. FPGAs


CPU Model

Single-threading
No synchronization
for/if/switch control

Incremental execution
One instruction at a time
Results are immediate

Common parallelization
Large units of work
Costly communication

FPGA Model

Massive parallelism
Visible timing relations
State machine/hardwired

Pipelined execution
All operations active
Visible dependencies

Parallelism model
Fine grain one ALU op
Cheap on-chip comm.

An Example

Modulo-4 counter:
Specification

Modulo-4 counter: Logic


Implementation

FPGA Implementation of
Modulo-4 Counter

Design Steps Involved in


Designing With FPGAs

Understand and define design


requirements
Design description
Behavioural simulation (Source
code interpretation)
Synthesis
Functional or Gate level
simulation
Implementation
Fitting
Place and Route
Timing or Post layout simulation
Programming, Test and Debug

Commercially Available
Devices
Architecture

differs from vendor to vendor


Characterized by

Structure and content of logic block


Structure and content of routing resources
To examine,

devices

look at some of available

FPGA: Xilinx (XC4000)


CPLD: Altera (MAX 5K)

Xilinx FPGAs

Generic Xilinx Architecture

Symmetric Array based;


Array consists of CLBs with
LUTs and D-Flipflops
N-input LUTs can implement
any n-input boolean function
Array embedded within the
periphery of IO blocks
Array elements interleaved
with routing resources (wire
segments, switch matrix and
single connection points)
Employs SRAM technology

What is an FPGA?
contain the building blocks necessary to design
a custom integrated circuit without having to turn
to an outside foundry.
logic blocks
Interconnects and
I/O blocks
All of these can be programmed to do a particular function
memory-based (SRAM or flash EEPROM)
anti-fuse
A designer needs to develop a special program and have that program uploaded to the FPGA.
FPGAs could be considered more of a software development than a hardware development
effort.
Intellectual property -IP, placed inside the FPGA, can either be developed by the designer or via
a third party.

Design Flow Approaches


Schematic capture - the most intuitive and visual but the least flexible
Hardware Description Language More portable

Why FPGAs?

Advantages of FPGAs

Replacement of SSI and MSI chips


Availability of parts off the shelf
Rapid Turnaround
Low risk
Re programmability

Limitations

PLDs will operate faster than FPGAs for the same design
implemented in both
For FPGAs the circuit delay depends on the design
implementation tools
Less dense and operate at lower speed when compared
to conventional Gate Arrays

FPGA manufacturers
Xilinx (http://www.xilinx.com) SRAMbased FPGAs ( tens of thousands

to millions upon

millions of gates).

Altera

(http://www.altera.com) SRAM
based FPGAs
Lattice Semiconductor (http://
www.latticesemi.com)
Actel (http://www.actel.com)
Quick Logic (http://www.quicklogic.com)

Classification by Granularity

Logic Block size correlates to the granularity of a device which


relates to the effort required to complete the wiring between the
blocks (routing channels)

Fine granularity (sea of gates architecture)


Medium granularity (FPGA)
Large granularity (CPLD)

Large numbers of relatively simple programmable logic


block islands embedded in a sea of programmable
interconnect
Fine-grained architecture

Each logic block can be used to implement only a very


simple function

Coarse-grained architecture

3- input function or a storage element


Glue logic and state machines
A large number connections into and out of each block

Underlying FPGA Fabric

Each logic block contains a relatively of more logic


Logic block might contain Four 4-input LUTs, four muxes,
four D-latches, and some fast carry logic

Mux vs LUT-based logic blocks


assume that the LUT is formed from SRAM cells (but it
could be formed using antifuses, E2PROM, or FLASH
cells)

MUX-based logic block

Multiplexer based CLB


(configurable logic block)

Multiplexer based CLB


example from Actel 40MK
8-input, 1-output cell
implements basic logic functions (and, or, nor, ..) with 2,3, or 4 inputs

LUT based CLB

A commonly used technique is to use the


inputs to select the desired SRAM cell
using a cascade of transmission gates
If a transmission gate is enabled (active), it
passes the signal seen on its input through
to its output. But if the gate is disabled, its
output is electrically disconnected from the
wire it is driving.
4-input LUTS offer the optimal balance of
tradeoffs.
recently introduced Virtex-5 family from
Xilinx features 6-input LUTs
Altera has a fabric that combines two 4LUTs and four 3-LUTs. In addition to
allowing designers to form a 6-LUT, this
also allows you to make a 5-LUT and a 3LUT, and many other combinations.

Look-up table (LUT) based CLB

LUT based CLB depending on


the combination of the input
words, a predefined output
value is assigned

Memory implementation:
input values = address of memory
predefined values = content of memory

Registered output based


CLB

The output of the LUT may be registered or


not, depending on the functional description
(selection is implemented via multiplexers)

Look up Tables

Configuration memory
holds outputs for truth
table
Internal signals connect
to control signals of
multiplexers to select
value of truth table for
any given input value

Synthesis mode

Arithmetic mode

Any logic function of up to 4 variables in


its registered or direct form.

The LUT is split to provide any two logic


functions of the same 3 variables. In
the arithmetic mode, the inputs A, B, C
are the addends and the Carry-in,
whilst the output functions are the Sum
and the Carry-out.

Multiplier mode

This mode also implements an adder, with


the addends this time being partial products
and Carry-in from the previous bit position.

The partial product of A and B may be


implemented with an AND gate.

CLB with registered output

Counter mode
The LUT provides two logic
functions (counter Output and
Carryout) of the same 2 variables,
which are a Carry-in and the
previous Output. The feedback loop
to use this output as an input is
normally provided for within the
CLC; this could also be
implemented externally by
connecting appropriate routes.

Multiplexer (2:1) mode


The LUT is configured to provide a
logic function of 3 variables, where
one selects one of the other two
inputs. As an example, the case
where C is the select line for A and
B will be considered

CLB with registered


output

Modulo-4 counter: Logic


Implementation

An Example

Modulo-4 counter:
Specification

LUT based CLB

Example from Actel


Varicore CLC:
LUT based

Multiplexer to decrease
LUT size
Registered output via
multiplexer selectable

LUT output based CLB


Example from Xilinx XC3000:
Dual output complex CLB
registers selectable
large combinational
function with two outputs

Xilinx logic cell (LC)

an LC comprises a 4-input LUT (which can


also act as a 16 1 RAM or a 16-bit shift
register), a multiplexer and a register

The register can be configured (programmed) to


act as a flip-flop or as a latch.
The polarity of the clock (rising-edge-triggered or
falling-edge-triggered) can be configured, as can
the polarity of the clock enable and set/reset
signals (active-high or active-low)

Highly-simplified view of a Xilinx logic cell (LC).

the equivalent core "building block" in an FPGA


from Altera is called a logic element (LE).

A multi-faceted LUT
A "slice" containing two logic cells.

what Xilinx call a configurable logic block


(CLB) and what Altera refer to as a logic
array block (LAB).
some Xilinx FPGAs have two slices in
each CLB while others have four
fast programmable interconnect within the
CLB. This interconnect is used to connect
neighboring slices.

logic-block hierarchy
LC, then Slice (with two LCs), then

CLB (with four Slices)


complemented by an equivalent
hierarchy in the interconnect. Thus,
there is fast interconnect between the
LCs in a slice, then slightly slower
interconnect between slices in a CLB,
followed by the interconnect between
CLBs.
The idea is to achieve the optimum
tradeoff between making it easy to
connect things together without
incurring excessive interconnectrelated delays.

A CLB containing four slices

all of the LUTs within a CLB can be configured together to implement the following:
Single-port 16 8 bit RAM
Single-port 32 4 bit RAM
Single-port 64 2 bit RAM
Single-port 128 1 bit RAM
Dual-port 16 4 bit RAM
Dual-port 32 2 bit RAM
Dual-port 64 1 bit RAM
each 4-bit LUT can be used as a 16-bit shift register
the LUTs within a single CLB to be configured together to implement a shift register
containing up to 128 bits as required

Fast carry chains

A key feature - the special logic and interconnect required to implement fast carry chains.
In the context of the CLBs, each logic cell (LC) contains special carry logic.
This is complemented by dedicated interconnect between the two LCs in each slice,
between the slices in each CLB, and between the CLBs themselves.
This special carry logic and dedicated routing boosts the performance of logical functions
such as counters and arithmetic functions such as adders.
The availability of these fast carry chains in conjunction with features like the shift
register incarnations of LUTs and the embedded multipliers.

FPGA
families
Low-cost

High-performance

Spartan 3
Virtex 4 LX / SX / FX
Spartan 3E
Virtex 5 LX
Xilinx
Spartan 3L

Altera

Cyclone II

Stratix II
Stratix II GX

Xilinx FPGA Families

Old families
XC3000, XC4000, XC5200
Old 0.5m, 0.35m and 0.25m technology. Not
recommended for modern designs.
Low Cost Family
Spartan/XL derived from XC4000
Spartan-II derived from Virtex
Spartan-IIE derived from Virtex-E
Spartan-3, Spartan 3E, Spartan 3L
High-performance families
Virtex (220 nm)
Virtex-E, Virtex-EM (180 nm)
Virtex-II, Virtex-II PRO (130 nm)
Virtex-4 (90 nm)
Virtex 5 (65 nm)

Xilinx XC3000
CLB

Granularity of FPGAs

Selection of an FPGA

an evolution of PALs where size is increased by an order of


magnitude, or a refinement of mask-programmed gate arrays,
where the reprogramming time and cost are drastically reduced

anti-fuse versus reprogrammable configuration, blockstructured versus channel-structured routing, and lookup table
(LUT) versus multiplexer versus sum-of-products logic.

the technology and architecture of the routing fabric is


the most important factor in determining the
effectiveness of an FPGA for a particular application.

Selecting an FPGA

Size

I/O pins

A large FPGA may be able to squeeze in all of your required IP, but the resultant
cost might break the project's budget. It may make more sense to only
incorporate certain IP into the FPGA and use off-the-shelf components for the
rest of the design.

Performance

A designer needs to know how many pins they must share with the circuit
outside of the FPGA. For example, serialization and de-serialization of signals
can use up many pins.

Unit price

FPGA vendors measure density or size in different ways. Nonetheless, a


designer will need a ballpark understanding of what type of FPGA product they
require.

If fast computations are essential, then higher-performance FPGAs would, of


course, become mandatoryand a tradeoff to cost.

Power consumption

This is critical for applications


particularly sensitive to heat
dissipation, and for those that
require batteries.

FPGA routing enables (almost) arbitrary


connection among logic blocks, but at the
cost of tying up more area and incurring
more delay than present in a mask
programmed part. Likewise,

the logic architectures of FPGAs are larger


and slower than mask defined gates, since
their functionality must be programmable. But
in comparison to the routing fabric, their
delays tend to be more predictable and less
of a limiting factor.

FPGAs

Architecture
Gate Density
Routing Resources
Programming method

Xilinx Logic Cell Array (LCA)


Actel Configurable Technology (ACT)

FPGA Capacity comparisons

Some FPGAs offer dedicated adder blocks.


One operation that is very common in DSP applications is
called a multiply-and-accumulate (MAC) this function
multiplies two numbers together and adds the result into a
running total stored in an accumulator.

the majority of designs make use of microprocessors


in one form or another.
high-end FPGAs contain one or more embedded
microprocessors microprocessor cores

The core functions forming a MAC

A hard microprocessor core is one that is


implemented as a dedicated, predefined block.

move all of the tasks that used to be performed by the


external microprocessor into the internal core
makes the board smaller and lighter.

Two main approaches for integrating such a core into


the FPGA
locate it in a strip to the side of the main FPGA
fabric
embed one or more microprocessor cores directly
into the main FPGA fabric

Soft microprocessor cores

configure a group of programmable logic blocks to act


as a microprocessor
simpler (more primitive) and slower
only need to implement a core if you need it, and also
that you can instantiate as many cores as you require

Bird's-eye view of chip with embedded


core outside of the main fabric

Clock trees

All of the synchronous elements inside an FPGA


(the registers configured to act as flip-flops inside
the programmable logic blocks) need to be
driven by a clock signal.
Such a clock signal typically originates in the
outside world, comes into the FPGA via a special
clock input pin, and is then routed through the
device and connected to the appropriate registers.
Clock tree

the main clock signal branches again and again


the flip-flops can be consider to be the "leaves" on
the end of the branches

all of the flip-flops see the clock signal as close


together as possible.
Skew

If the clock was distributed as a single long track


driving all of the flip-flops one after another, then
the flip-flop closest to the clock pin would see
the clock signal much sooner than the one at the
end of the chain.

The clock tree is implemented using special


tracks and is separate from the generalpurpose programmable interconnect.
multiple clock pins
multiple clock domains (clock trees)

Clock manager

daughter clocks may be used to drive internal


clock trees or external output pins that can be
used to provide clocking services to other
devices on the host circuit board. Each family
of FPGAs has its own type of clock manager
(there may be multiple clock manager blocks
in a device), where different clock managers
may support only a subset of the following
features:

Jitter removal
Frequency synthesis

Phase shifting

Selecting an FPGA

Size

I/O pins

A large FPGA may be able to squeeze in all of your required IP, but the resultant cost
might break the project's budget. It may make more sense to only incorporate certain
IP into the FPGA and use off-the-shelf components for the rest of the design.

Performance

A designer needs to know how many pins they must share with the circuit outside of
the FPGA. For example, serialization and de-serialization of signals can use up many
pins.

Unit price

FPGA vendors measure density or size in different ways. Nonetheless, a designer will
need a ballpark understanding of what type of FPGA product they require.

If fast computations are essential, then higher-performance FPGAs would, of course,


become mandatoryand a tradeoff to cost.

Power consumption

This is critical for applications


particularly sensitive to heat
dissipation, and for those that
require batteries.

Design Entry
Involves capturing the design using a high-level description

Logic Synthesis
optimizes the circuit by regrouping logic functions and/or removing

language like Verilog or VHDL.


Alternatively a schematic editor is used to enter the design at basic
logic level, or by making use of generic blocks which in turn are
described by high level languages
entry of the design using state diagrams.
The CAD software provided by FPGA manufacturers includes
libraries of standard circuits or macro-functions to quickly
implement common circuits of varying complexity.
The schematic or VHDL description are then translated into a
netlist describing the circuit in terms of logic gates and sequential
elements.

redundancies.
according to design constraints or rules, which could be minimizing
area or maximizing velocity.
Once the optimized netlist is obtained, it has to be mapped onto
the logical cell of the FPGA (LUT / flip-flop, PLA ... ).

Floorplanning

The circuit to be designed is now divided into partitions,


each of which is adjusted to be implemented in a particular
area on a FPGA device.
A partition usually corresponds to a large section of the
circuit which has a particular functionality, e.g a multiplier,
filter bank etc.
the total number of FPGA devices required is also
determined.

FPGA Design Flow

Place and Route.

Layout Verification

A logic partition is now mapped onto an FPGA device by


means of the placement tool, which assigns a physical place in
the array of CLCs to each function (LUT / flip-flop, PLA . ).
Typical placement algorithms aim to minimize the total length of
the interconnections in the final design, with the objective of
maximizing the speed of the device.
Routing algorithms configure the routing elements to provide
the required connections between logic elements.
The primary aim of any routing algorithm is to assure that
100% of the required routes may be realized.
Other goals of routing algorithms include finding the shortestpaths possible between elements. Because of restricted
interconnection resources, this step is the most restrictive.
This step involves extracting the physical layout of the design
and simulating it using commercial simulators to obtain timing
data and checking design rules (DRC).
If the delays associated with the interconnections within the
prototype indeed fulfill delay constraints imposed by the design
specifications, then the device may be programmed, otherwise
the placement and routing steps have to be repeated until a
satisfactory configuration is found.

Macro Integration

This involves the provision of all the necessary files and data
formats for integrating the macro in the design flow of the
whole chip.
Once the circuit would have been verified, the design
configuration is output in a format which is readable as an input
to the FPGA device which is to be programmed.
The programming of the device could be a question of minutes .

FPGA Design Flow

The Design Cycle

ASIC Design Methodology

the design is verified by simulation at each stage of refinement

Accurate simulators are slow


Fast simulators trade away simulation accuracy.

ASIC designers use a battery of simulators across the speedaccuracy spectrum in an attempt to verify the design.

an FPGA designer can replace simulation with in-circuit


verification, simulating the circuitry in real time with a
prototype

The path from design to prototype is short, allowing a designer


to verify operation over a wide range of conditions at high speed
and high accuracy.
proof-of-concept prototype designs easily

Designs can be verified by trial rather than reduction to first principles


or by mental execution.
verify that the design works in the real system, not merely in a
potentially-erroneous simulation model of the system.

The Design Cycle


1.
2.
3.
4.
5.
6.
7.
8.
9.

Entering the design in the form of schematic,


netlist, logic expressions, or HDLs
Simulating the design for functional
verification
Mapping the design into the FPGA
architecture
Placing and Routing the FPGA design
Extracting delay parameters of the routed
design
Resimulating for timing verification
Generating the FPGA device configuration
format
Configuring or Programming the device
Testing the product for undesirable functional
behavior

FPGA Configuration

In the case of re-programmable devices, activation or deactivation of interconnects


is implemented by means of pass transistors or tri-state buffers
Memory units also store the configuration of LUTs and static multiplexers in the
CLC. If the type of memory used is EEPROM, the device is non-volatile, but the
difficult mechanism of re-configuration imposes limitations on the application of the
system.
SRAM memory, on the other hand, loses the configuration once power is removed
from the device (volatile), but it is simple and quick to configure. The use of SRAM
allows for dynamic re-configuration of the device even during real-time operation.
Small local SRAM blocks may also be used to store several configuration bits.

FPGA Capacity comparisons

SRAM FPGA -- EEPROM


FPGA

Despite this, however, most FPGAs still use


SRAM for reasons of simplicity (when you
need to reprogram it, it's easier to reencode a small ROM chip than to reprogram
a large FPGA chip), so count on having to
use a separate boot ROM for the FPGA.
Use of an FPGA is broadly divided into two
main stages:
Configuration

mode

the mode in which the FPGA is when you first power it


up. Configuration mode is, as you may have guessed,
where you configure the FPGA;

Product FPGA vs ASIC

Comparison:
FPGA benefits vs ASICs:
- Design time:
- Cost:
- Volume:

9 month design cycle vs 2-3 years


No $3-5 M upfront (NRE) design cost.
No $100-500K mask-set cost
High initial ASIC cost recovered only in very high volume products

Due to Moores law, many ASIC market requirements now met by FPGAs
- Eg. Virtex II Pro has 4 processors, 10 Mb memory, IO

Resulting Market Shift:


Dramatic decline in number of ASIC design starts:
- 11,000 in 97
- 1,500 in 02

FPGAs as a % of Logic market:

- Increase from 10 to 22% in past 3-4 years

FPGAs (or programmable logic) is the fastest growing segment of the


semiconductor industry!!

FPGA/ASIC Crossover Changes

Cost

90nm / 300mm ASICs

SICs
A
m
m
0
0
2
/
150nm
s
s
A
A
G
G
P
P
F
F
m
m
m
0
0
m
3
0
/
0
2
m
/
0n
9
m
n
0
15
FPGA FPGA
Cost Advantage
Cost
Advantage
ASIC Cost
ASICAdvantage
Cost Advantage
FPGA
Cost Advantage

Production Volume

You might also like