FPGA-Arch CPLD Design April2012

FPGA Architecture
Presentation Overview
Available choice for digital designer

FPGA A detailed look
Interconnection Framework
FPGAs and CPLDs
Field programmability and programming

technologies
SRAM, Anti-fuse, EPROM and EEPROM
Design steps
Commercially available devices
Xilinx XC4000
Altera MAX 7000
Fixed Versus Programmable Logic
The circuits in a fixed logic device are permanent,

they perform one function or set of functions once
manufactured, they cannot be changed.
Programmable logic devices (PLDs) are standard,

off-the-shelf parts that offer customers a wide range
of logic capacity, features, speed, and voltage
characteristics - and these devices can be changed
at any time to perform any number of functions.
Classifications
PLA a Programmable Logic Array (PLA) is a relatively
small FPD that contains two levels of logic, an ANDplane and an OR-plane, where both levels are
programmable
PAL a Programmable Array Logic (PAL) is a relatively
small FPD that has a programmable AND-plane
followed by a fixed OR-plane
SPLD refers to any type of Simple PLD, usually either a
PLA or PAL
CPLD a more Complex PLD that consists of an
arrangement of multiple SPLD-like blocks on a
single chip.
FPGA a Field-Programmable Gate Array is an FPD
featuring a general structure that allows very high
logic capacity.
Definitions
Field Programmable Device (FPD):
a general term that refers to any type of integrated
circuit used for implementing digital hardware, where the
chip can be configured by the end user to realize
different designs.
Programming of such a device often involves placing
the chip into a special programming unit, but some chips
can also be configured in-system. Another name for
FPDs is programmable logic devices (PLDs).
Designers Choice
Digital designer has various options
SSI (small scale integrated circuits) or MSI (medium

scale integrated circuits) components
Difficulties arises as design size increases

Interconnections grow with complexity resulting in a
prolonged testing phase
Simple programmable logic devices

PALs
(programmable array logic)

PLAs (programmable logic array)
Architecture not scalable; Power consumption and
delays play an important role in extending the
architecture to complex designs
Implementation of larger designs leads to same difficulty
as that of discrete components
Simple Programmable Logic Devices

Simple
two level structure

PAL and PLA
Allow high speed performance

implementations of circuit
Drawback
Small logic circuits

Modest
number of product terms

Interconnection structure grow impractically
large
With increase in product terms
MPGAs
PLA
Programmable AND Plane
Programmable OR Plane
YZ
XZ
XYZ
XY
XY+YZ
XZ+XYZ
PLA
Programmable OR Plane
Programmable Node
Un-programmed
Connect
Disconnect
X
X
Y
X X Y Y
O1
O2
O3
XY
XY
XY
XY
O4
PAL
Fix OR Plane
O1
O2
O3
O4
PAL with Logic Expanders

Fix OR Plane
Logic expanders
PLA v.s. PAL

PLAs are more flexible than PALs since both AND & OR planes are
programmable in PLAs.
Because both AND & OR planes are programmable, PLAs are expensive
to fabricate and have large propagation delay.
By using fix OR gates, PALs are cheaper and faster than PLAs.
Logic expanders increase the flexibilities of PALs, but result in significant
propagation delay.
PALs usually contain D flip-flops connected to the outputs of OR gates
to implement sequential circuits.
PLAs and PALs are usually referred to as SPLD.
Programmable Logic
Programmable digital integrated circuit

Standard off-the-shelf parts
Desired functionality is implemented by
configuring on-chip logic blocks and
interconnections
Advantages (compared to an ASIC):
Low development costs

Short development cycle
Device can (usually) be reprogrammed
Types of programmable logic:
Complex PLDs (CPLD)

Field programmable Gate Arrays (FPGA)
A Generic CPLD Structure
Multiple PLDs can be combined on a single chip by using programmable interconnect

structures. These PLDs are called CPLDs.
CPLD
Architecture and Examples
PLD
Sum
of
Products
Programmable AND array followed by fixed fan-in OR gates
A
C
Programmable switch or fuse
f1 A B C A B C
f2 A B A B C
AND plane
PLD - Macrocell
Can implement combinational or
sequential logic
A
Select
Enable
f1
Flip-flop
D
Clock
AND plane
MUX
CPLD Structure
Integration of several PLD blocks with a
programmable interconnect on a single chip
PLD
Block
I/O Block
PLD
Block
I/O Block
I/O Block
Interconnection Matrix
I/O Block
PLD
Block
PLD
Block
High Density Logic Overview
High-Density or Complex
PLDs
Large Logic Building Blocks

PLD-Like Architectures
Centralized Interconnect
HDPLD or CPLD
A
Fast Predictable Performance

Good at Wide Gating Functions
State
Machines
Counters
Altera MAX CPLD

I/O Cell
LAB
LAB
LAB
LAB
LAB
LAB
Chip-wide
interconnect
Altera MAX chip
LA
(local
array)
LAB (Logic Array Block)
Macroccell
Each LAB contains 16 macrocells
CPLD Example - Altera MAX7000
EPM7000 Series Block Diagram
MAX 7000 architecture includes

Logic array blocks
Macrocells
Expander product terms( shareable and parallel)
Programmable interconnect array
I/O control blocks
Performance
Linking high performance and flexible LABs
Programmable interconnect array (PIA)
Global bus fed by all dedicated inputs
I/O pins
Macrocells
Configured
Combinational or Sequential logic operations
EPM7000 Series Device Macrocell
Macrocell
Logic array
Combinational
logic is implemented in the array

Five products terms
Product-term select matrix

Allocate
product terms
Primary logic inputs to AND or XOR gates

Combinational logic
Secondary inputs to the register
Clear, preset, clock, clock enable control functions
Programmable register
Logic expanders
Each
LAB has16 Shareable expanders
Inverted product terms fed back into the logic array
Parallel
expanders
Product terms borrowed from adjacent macrocells
Registered functions
Macrocell FF
Configured to get T,D,JK or SR functions
Software
Optimize resource utilization
By global clock signal
Fastest clock to output performance
By a global clock signal and an active-high clock enable
Provides an enable on each FF

Fast clock to output performance of global clock
By an array clock implemented with a product term
FF clocked
Signals from buried macrocells
I/O pins
FF supports Asynchronous preset and clear functions
Selecting efficient FF operation for registered function
Register can be clocked in different modes
Programmable clock control
Product-term select matrix provides product terms to control these functions

Control signals are active high or
can be derived to be below by inverting within the logic array
Device power-up
FF is cleared upon power up
I/O pins
Fast input path to macrocell register

bypassing PIA and combinational logic
Complex logic functions
Each macrocell provides five product

terms
Most
logic functions can designed using five

product terms
Another macrocell can be used to
supply the required logic resources

Expander product terms
Shareable
and parallel expander product
terms
Provides additional product terms to any

macrocell within the same LAB
EPM7000 Series Device Macrocell
Performance
CPLDs
have wide fan in
Single level allows high

frequency AND low
latency
Very small functions burn
logic
Macrocell
68
Logic
Design Methodologies
Custom ICs are created using unique masks for all

layers during the manufacturing process
Highly skilled and competent designers

Lengthy development time
High cost of design and testing
Mask Programmable Gate Arrays
Generic masks for all layers except metallization

Generic Masks array of modular functional blocks
Modules of transistors rows separated by fixed width chls
Designers expertise less critical

Shorter development time and low development costs
Channel-less Gate arrays Sea-of-Gates
Standard Cells
Modules or Standard cells are picked from the database

and then placed in rows and interconnected
Placement and Routing are done automatically
(removing the designers from the physical design process)
Designs are less efficient in size and performance
Gate
arrays
a highly standardized means to implement digital integrated circuit design

manufactured as regular arrays of patterned blocks of transistors which can be
interconnected to form logic elements such as gates, flip-flops and multiplexers.
Manufacturer can pre-produce gate array wafers without interconnections in highvolume.
These are then configured in an additional process step in the factory
Once a customer provides a definition of the logic block interconnections, one or
more layers of metal are added to form these connections
collectively known as MPGAs (Mask- Programmable Gate Arrays)
Sea-of-gates structures added metal interconnects have to be placed over
particular transistors, rendering them unusable
Regular gate arrays blank routing space is provided at regular intervals in
the transistor array
As process technologies advance and sizes get smaller, it is becoming
increasingly more expensive to configure such devices
Masked Programmable Logic Devices
MPGA
Rows of transistors
User specified interconnections
Within
the rows
to implement basic logic gates
Between
the rows
To connect basic gates together
I/O circuitry
Predefined mask layers except final metal layers
Manufacturer
Metal layers
Customized
to implement desired circuit
MPGA
MPGA
Drawback
Large
NRE cost
Need to generate metal mask layers

Manufacture the chip
More
time to market
Advantage
General
structure allows to implement much

larger structure
Due to the their scalable interconnection structure

Scales proportionally with the logic
Field Programmable Logic Devices

FPGAs
Programmability of PLD
Scalable interconnection structure of an
MPGA
Designers Choice
Quest
for high capacity; Two choices

available
MPGA (Masked Programmable Logic Devices)

Customized during fabrication
Low volume expensive
Prolonged time-to-market and high financial risk
FPGA (Field Programmable Logic Devices)

Customized by end user
Implements multi-level logic function
Fast time to market and low risk
Designers Choice
FPGA s
vs MPGA
Disadvantages
Low
Programmable switches
Significant resistance and capacitance in the
connections between logic blocks
Low
speed of operation
logic density
Programmable switches and programming circuitry

Requires more area over MPGA to implement with the
same amount of logic circuitry
Less number of chips per wafer
FPGA s vs MPGA
Standard Cell-based Design
Rows of cells
Feedthrough cell
Logic cell
Routing
channel
Functional
module
(RAM,
multiplier,)
Standard-cell layout methodology
Routing channel
requirements are
reduced by presence
of more interconnect
layers
Gate Array Sea-of-gates

rowsof
uncommitted
cells
routing
channel
Why FPGAs?
Advantages of FPGAs
Replacement of SSI and MSI chips

Availability of parts off the shelf
Rapid Turnaround
Low risk
Re programmability
Limitations
PLDs will operate faster than FPGAs for the same design
implemented in both
For FPGAs the circuit delay depends on the design
implementation tools
Less dense and operate at lower speed when compared
to conventional Gate Arrays
FPGA A Quick Look
Two dimensional array of customizable logic

block placed in an interconnect array
Like PLDs programmable at users site
Like MPGAs, implements thousands of gates
of logic in a single device
Employs
logic and interconnect structure capable of

implementing multi-level logic
Scalable in proportion with logic removing many of the
size limitations of PLD derived two level architecture
FPGAs offer the benefit of both MPGAs and

PLDs!
FPGA A Detailed Look
Based on the principle of functional

completeness
FPGA: Functionally complete elements
(Logic Blocks) placed in an interconnect
framework
Interconnection framework
comprises of wire segments and switches
Provide a means to interconnect logic blocks
Circuits are partitioned to logic block size,

mapped and routed
Basic FPGA Architecture
FPGA Architecture
(With Multiplexer As Functionally Complete Cell)

Basic building block
Granularity
and interconnection structure has

caused a split in the industry
FPGA
Fine grained
Variable length
interconnect segments
Programmable switches
Timing in general is not
predictable; Timing
extracted after placement
and route
CPLD
Coarse grained
(SPLD like blocks)

Programmable crossbar
interconnect structure
Interconnect structure uses
continuous metal lines
The switch matrix may or may
not be fully populated
Timing predictable if fully
populated
Architecture does not scale well
Field Programmability
Field programmability is achieved through

switches (Transistors controlled by memory
elements or fuses)
Switches control the following aspects
Interconnection
among wire segments

Configuration of logic blocks
Distributed memory elements controlling the

switches and configuration of logic blocks are
together called Configuration Memory
Technology of Programmable
Elements
Vary from vendor to vendor. All share the

common property: Configurable in one of the
two positions ON or OFF
Can be classified into three categories:
SRAM based
Fuse based
EPROM/EEPROM/Flash based
Desired properties:
Minimum
area consumption
Low on resistance; High off resistance
Low parasitic capacitance to the attached wire
Reliability in volume production
SRAM Programming
Technology
Employs SRAM (Static RAM) cells

to control pass transistors and/or
transmission gates
SRAM cells control the configuration
of logic block as well
Volatile
Needs an external storage
Needs a power-on configuration
mechanism
In-circuit re-programmable
Lesser configuration time

Occupies relatively larger area
Anti-fuse Programming
Technology
Though implementation differ, all anti-fuse

programming elements share common property
Uses materials which normally resides in high
impedance state
But can be fused irreversibly into low impedance
state by applying high voltage
Anti-fuse Programming
Technology
Very low ON Resistance (Faster

implementation of circuits)
Limited size of anti-fuse elements;
Interconnects occupy relatively lesser area
Offset : Larger transistors needed for programming
One Time Programmable
Cannot be re-programmed
(Design
changes are not possible)
Retain configuration after power off
EPROM, EEPROM or Flash

Based Programming Technology
EPROM Programming Technology

Two gates: Floating and Select
Normal mode:
No charge on floating gate
Transistor behaves as normal n-channel transistor
Floating gate charged by applying high voltage

Threshold of transistor (as seen by gate) increases
Transistor turned off permanently
Re-programmable by exposing to UV radiation
EPROM Programming
Technology
Used
as pulldown devices
Consumes static
power
EPROM Programming
Technology
No
external storage mechanism

Re-programmable (Not all!)
Not in-system re-programmable
Re-programming is a time consuming
task
EEPROM Programming
Technology
Two gates: Floating and Select

Functionally equivalent to EPROM;
Construction and structure differ
Electrically Erasable: Re-programmable by
applying high voltage
(No UV radiation expose!)
When un-programmed, the threshold (as
seen by select gate) is negative!
EEPROM Programming
Technology
EEPROM Programming
Technology
Re-programmable;
In general, insystem re-programmable

Re-programming consumes lesser time
compared to EPROM technology
Multiple voltage sources may be
required
Area occupied is twice that of EPROM!
Programming Technologies
Basic architectures of FPGAs
An FPGA device
to allow the implementation of practically any logic circuit
requires an area trade-off between a sufficient number of
flexible configurable logical cells and
enough interconnect resources to allow all connections between these cells.
majority of circuits
a small portion of routing and logic resources,
Resulting in a loss in speed (signal passing through redundant routing
elements)
density of logic when compared to the same circuit implemented in dedicated
logic.
grouping of different FPGA devices with related architecture into a family.

Each member in a family would be physically tailored to a certain class of
application architecture, by for example replacing the switches in certain
routes by hard shorts, or hard-wiring the logical cells internally in a certain
manner.
This member may now implement certain circuits more efficiently, but its
reduced flexibility means that some circuits may not fit at all onto the device.
Implementation of a circuit is now a question of choosing the

right device from the FPGA family.
Programming Skills vs. FPGAs

CPU Model
Single-threading
No synchronization
for/if/switch control
Incremental execution
One instruction at a time
Results are immediate
Common parallelization
Large units of work
Costly communication
FPGA Model
Massive parallelism
Visible timing relations
State machine/hardwired
Pipelined execution
All operations active
Visible dependencies
Parallelism model
Fine grain one ALU op
Cheap on-chip comm.
An Example
Modulo-4 counter:
Specification
Modulo-4 counter: Logic

Implementation
FPGA Implementation of
Modulo-4 Counter
Design Steps Involved in

Designing With FPGAs
Understand and define design

requirements
Design description
Behavioural simulation (Source
code interpretation)
Synthesis
Functional or Gate level
simulation
Implementation
Fitting
Place and Route
Timing or Post layout simulation
Programming, Test and Debug
Commercially Available
Devices
Architecture
differs from vendor to vendor

Characterized by
Structure and content of logic block

Structure and content of routing resources
To examine,
devices
look at some of available
FPGA: Xilinx (XC4000)

CPLD: Altera (MAX 5K)
Xilinx FPGAs
Generic Xilinx Architecture
Symmetric Array based;

Array consists of CLBs with
LUTs and D-Flipflops
N-input LUTs can implement
any n-input boolean function
Array embedded within the
periphery of IO blocks
Array elements interleaved
with routing resources (wire
segments, switch matrix and
single connection points)
Employs SRAM technology
What is an FPGA?
contain the building blocks necessary to design
a custom integrated circuit without having to turn
to an outside foundry.
logic blocks
Interconnects and
I/O blocks
All of these can be programmed to do a particular function
memory-based (SRAM or flash EEPROM)
anti-fuse
A designer needs to develop a special program and have that program uploaded to the FPGA.
FPGAs could be considered more of a software development than a hardware development
effort.
Intellectual property -IP, placed inside the FPGA, can either be developed by the designer or via
a third party.
Design Flow Approaches

Schematic capture - the most intuitive and visual but the least flexible
Hardware Description Language More portable
Why FPGAs?
Advantages of FPGAs
Replacement of SSI and MSI chips

Availability of parts off the shelf
Rapid Turnaround
Low risk
Re programmability
Limitations
PLDs will operate faster than FPGAs for the same design
implemented in both
For FPGAs the circuit delay depends on the design
implementation tools
Less dense and operate at lower speed when compared
to conventional Gate Arrays
FPGA manufacturers
Xilinx (http://www.xilinx.com) SRAMbased FPGAs ( tens of thousands
to millions upon
millions of gates).
Altera
(http://www.altera.com) SRAM
based FPGAs
Lattice Semiconductor (http://
www.latticesemi.com)
Actel (http://www.actel.com)
Quick Logic (http://www.quicklogic.com)
Classification by Granularity
Logic Block size correlates to the granularity of a device which

relates to the effort required to complete the wiring between the
blocks (routing channels)
Fine granularity (sea of gates architecture)

Medium granularity (FPGA)
Large granularity (CPLD)
Large numbers of relatively simple programmable logic

block islands embedded in a sea of programmable
interconnect
Fine-grained architecture
Each logic block can be used to implement only a very

simple function
Coarse-grained architecture
3- input function or a storage element

Glue logic and state machines
A large number connections into and out of each block
Underlying FPGA Fabric
Each logic block contains a relatively of more logic

Logic block might contain Four 4-input LUTs, four muxes,
four D-latches, and some fast carry logic
Mux vs LUT-based logic blocks

assume that the LUT is formed from SRAM cells (but it
could be formed using antifuses, E2PROM, or FLASH
cells)
MUX-based logic block
Multiplexer based CLB

(configurable logic block)
Multiplexer based CLB

example from Actel 40MK
8-input, 1-output cell
implements basic logic functions (and, or, nor, ..) with 2,3, or 4 inputs
LUT based CLB
A commonly used technique is to use the

inputs to select the desired SRAM cell
using a cascade of transmission gates
If a transmission gate is enabled (active), it
passes the signal seen on its input through
to its output. But if the gate is disabled, its
output is electrically disconnected from the
wire it is driving.
4-input LUTS offer the optimal balance of
tradeoffs.
recently introduced Virtex-5 family from
Xilinx features 6-input LUTs
Altera has a fabric that combines two 4LUTs and four 3-LUTs. In addition to
allowing designers to form a 6-LUT, this
also allows you to make a 5-LUT and a 3LUT, and many other combinations.
Look-up table (LUT) based CLB
LUT based CLB depending on

the combination of the input
words, a predefined output
value is assigned
Memory implementation:
input values = address of memory
predefined values = content of memory
Registered output based

CLB
The output of the LUT may be registered or

not, depending on the functional description
(selection is implemented via multiplexers)
Look up Tables
Configuration memory
holds outputs for truth
table
Internal signals connect
to control signals of
multiplexers to select
value of truth table for
any given input value
Synthesis mode
Arithmetic mode
Any logic function of up to 4 variables in

its registered or direct form.
The LUT is split to provide any two logic

functions of the same 3 variables. In
the arithmetic mode, the inputs A, B, C
are the addends and the Carry-in,
whilst the output functions are the Sum
and the Carry-out.
Multiplier mode
This mode also implements an adder, with

the addends this time being partial products
and Carry-in from the previous bit position.
The partial product of A and B may be

implemented with an AND gate.
CLB with registered output
Counter mode
The LUT provides two logic
functions (counter Output and
Carryout) of the same 2 variables,
which are a Carry-in and the
previous Output. The feedback loop
to use this output as an input is
normally provided for within the
CLC; this could also be
implemented externally by
connecting appropriate routes.
Multiplexer (2:1) mode

The LUT is configured to provide a
logic function of 3 variables, where
one selects one of the other two
inputs. As an example, the case
where C is the select line for A and
B will be considered
CLB with registered

output
Modulo-4 counter: Logic

Implementation
An Example
Modulo-4 counter:
Specification
LUT based CLB
Example from Actel

Varicore CLC:
LUT based
Multiplexer to decrease
LUT size
Registered output via
multiplexer selectable
LUT output based CLB

Example from Xilinx XC3000:
Dual output complex CLB
registers selectable
large combinational
function with two outputs
Xilinx logic cell (LC)
an LC comprises a 4-input LUT (which can

also act as a 16 1 RAM or a 16-bit shift
register), a multiplexer and a register
The register can be configured (programmed) to

act as a flip-flop or as a latch.
The polarity of the clock (rising-edge-triggered or
falling-edge-triggered) can be configured, as can
the polarity of the clock enable and set/reset
signals (active-high or active-low)
Highly-simplified view of a Xilinx logic cell (LC).
the equivalent core "building block" in an FPGA

from Altera is called a logic element (LE).
A multi-faceted LUT
A "slice" containing two logic cells.
what Xilinx call a configurable logic block

(CLB) and what Altera refer to as a logic
array block (LAB).
some Xilinx FPGAs have two slices in
each CLB while others have four
fast programmable interconnect within the
CLB. This interconnect is used to connect
neighboring slices.
logic-block hierarchy
LC, then Slice (with two LCs), then
CLB (with four Slices)

complemented by an equivalent
hierarchy in the interconnect. Thus,
there is fast interconnect between the
LCs in a slice, then slightly slower
interconnect between slices in a CLB,
followed by the interconnect between
CLBs.
The idea is to achieve the optimum
tradeoff between making it easy to
connect things together without
incurring excessive interconnectrelated delays.
A CLB containing four slices
all of the LUTs within a CLB can be configured together to implement the following:
Single-port 16 8 bit RAM
Dual-port 16 4 bit RAM
each 4-bit LUT can be used as a 16-bit shift register
the LUTs within a single CLB to be configured together to implement a shift register
containing up to 128 bits as required
Fast carry chains
A key feature - the special logic and interconnect required to implement fast carry chains.
In the context of the CLBs, each logic cell (LC) contains special carry logic.
This is complemented by dedicated interconnect between the two LCs in each slice,
between the slices in each CLB, and between the CLBs themselves.
This special carry logic and dedicated routing boosts the performance of logical functions
such as counters and arithmetic functions such as adders.
The availability of these fast carry chains in conjunction with features like the shift
register incarnations of LUTs and the embedded multipliers.
FPGA
families
Low-cost
High-performance
Spartan 3
Virtex 4 LX / SX / FX
Spartan 3E
Virtex 5 LX
Xilinx
Spartan 3L
Altera
Cyclone II
Stratix II
Stratix II GX
Xilinx FPGA Families
Old families
XC3000, XC4000, XC5200
Old 0.5m, 0.35m and 0.25m technology. Not
recommended for modern designs.
Low Cost Family
Spartan/XL derived from XC4000
Spartan-II derived from Virtex
Spartan-IIE derived from Virtex-E
Spartan-3, Spartan 3E, Spartan 3L
High-performance families
Virtex (220 nm)
Virtex-E, Virtex-EM (180 nm)
Virtex-II, Virtex-II PRO (130 nm)
Virtex-4 (90 nm)
Virtex 5 (65 nm)
Xilinx XC3000
CLB
Granularity of FPGAs
Selection of an FPGA
an evolution of PALs where size is increased by an order of

magnitude, or a refinement of mask-programmed gate arrays,
where the reprogramming time and cost are drastically reduced
anti-fuse versus reprogrammable configuration, blockstructured versus channel-structured routing, and lookup table
(LUT) versus multiplexer versus sum-of-products logic.
the technology and architecture of the routing fabric is

the most important factor in determining the
effectiveness of an FPGA for a particular application.
Selecting an FPGA
Size
I/O pins
A large FPGA may be able to squeeze in all of your required IP, but the resultant
cost might break the project's budget. It may make more sense to only
incorporate certain IP into the FPGA and use off-the-shelf components for the
rest of the design.
Performance
A designer needs to know how many pins they must share with the circuit
outside of the FPGA. For example, serialization and de-serialization of signals
can use up many pins.
Unit price
FPGA vendors measure density or size in different ways. Nonetheless, a

designer will need a ballpark understanding of what type of FPGA product they
require.
If fast computations are essential, then higher-performance FPGAs would, of

course, become mandatoryand a tradeoff to cost.
Power consumption
This is critical for applications

particularly sensitive to heat
dissipation, and for those that
require batteries.
FPGA routing enables (almost) arbitrary

connection among logic blocks, but at the
cost of tying up more area and incurring
more delay than present in a mask
programmed part. Likewise,
the logic architectures of FPGAs are larger

and slower than mask defined gates, since
their functionality must be programmable. But
in comparison to the routing fabric, their
delays tend to be more predictable and less
of a limiting factor.
FPGAs
Architecture
Gate Density
Routing Resources
Programming method
Xilinx Logic Cell Array (LCA)

Actel Configurable Technology (ACT)
FPGA Capacity comparisons
Some FPGAs offer dedicated adder blocks.

One operation that is very common in DSP applications is
called a multiply-and-accumulate (MAC) this function
multiplies two numbers together and adds the result into a
running total stored in an accumulator.
the majority of designs make use of microprocessors

in one form or another.
high-end FPGAs contain one or more embedded
microprocessors microprocessor cores
The core functions forming a MAC
A hard microprocessor core is one that is

implemented as a dedicated, predefined block.
move all of the tasks that used to be performed by the

external microprocessor into the internal core
makes the board smaller and lighter.
Two main approaches for integrating such a core into

the FPGA
locate it in a strip to the side of the main FPGA
fabric
embed one or more microprocessor cores directly
into the main FPGA fabric
Soft microprocessor cores
configure a group of programmable logic blocks to act

as a microprocessor
simpler (more primitive) and slower
only need to implement a core if you need it, and also
that you can instantiate as many cores as you require
Bird's-eye view of chip with embedded

core outside of the main fabric
Clock trees
All of the synchronous elements inside an FPGA

(the registers configured to act as flip-flops inside
the programmable logic blocks) need to be
driven by a clock signal.
Such a clock signal typically originates in the
outside world, comes into the FPGA via a special
clock input pin, and is then routed through the
device and connected to the appropriate registers.
Clock tree
the main clock signal branches again and again

the flip-flops can be consider to be the "leaves" on
the end of the branches
all of the flip-flops see the clock signal as close

together as possible.
Skew
If the clock was distributed as a single long track

driving all of the flip-flops one after another, then
the flip-flop closest to the clock pin would see
the clock signal much sooner than the one at the
end of the chain.
The clock tree is implemented using special

tracks and is separate from the generalpurpose programmable interconnect.
multiple clock pins
multiple clock domains (clock trees)
Clock manager
daughter clocks may be used to drive internal

clock trees or external output pins that can be
used to provide clocking services to other
devices on the host circuit board. Each family
of FPGAs has its own type of clock manager
(there may be multiple clock manager blocks
in a device), where different clock managers
may support only a subset of the following
features:
Jitter removal
Frequency synthesis
Phase shifting
Selecting an FPGA
Size
I/O pins
A large FPGA may be able to squeeze in all of your required IP, but the resultant cost
might break the project's budget. It may make more sense to only incorporate certain
IP into the FPGA and use off-the-shelf components for the rest of the design.
Performance
A designer needs to know how many pins they must share with the circuit outside of
the FPGA. For example, serialization and de-serialization of signals can use up many
pins.
Unit price
FPGA vendors measure density or size in different ways. Nonetheless, a designer will
need a ballpark understanding of what type of FPGA product they require.
If fast computations are essential, then higher-performance FPGAs would, of course,

become mandatoryand a tradeoff to cost.
Power consumption
This is critical for applications

particularly sensitive to heat
dissipation, and for those that
require batteries.
Design Entry
Involves capturing the design using a high-level description
Logic Synthesis
optimizes the circuit by regrouping logic functions and/or removing
language like Verilog or VHDL.

Alternatively a schematic editor is used to enter the design at basic
logic level, or by making use of generic blocks which in turn are
described by high level languages
entry of the design using state diagrams.
The CAD software provided by FPGA manufacturers includes
libraries of standard circuits or macro-functions to quickly
implement common circuits of varying complexity.
The schematic or VHDL description are then translated into a
netlist describing the circuit in terms of logic gates and sequential
elements.
redundancies.
according to design constraints or rules, which could be minimizing
area or maximizing velocity.
Once the optimized netlist is obtained, it has to be mapped onto
the logical cell of the FPGA (LUT / flip-flop, PLA ... ).
Floorplanning
The circuit to be designed is now divided into partitions,

each of which is adjusted to be implemented in a particular
area on a FPGA device.
A partition usually corresponds to a large section of the
circuit which has a particular functionality, e.g a multiplier,
filter bank etc.
the total number of FPGA devices required is also
determined.
FPGA Design Flow
Place and Route.
Layout Verification
A logic partition is now mapped onto an FPGA device by

means of the placement tool, which assigns a physical place in
the array of CLCs to each function (LUT / flip-flop, PLA . ).
Typical placement algorithms aim to minimize the total length of
the interconnections in the final design, with the objective of
maximizing the speed of the device.
Routing algorithms configure the routing elements to provide
the required connections between logic elements.
The primary aim of any routing algorithm is to assure that
100% of the required routes may be realized.
Other goals of routing algorithms include finding the shortestpaths possible between elements. Because of restricted
interconnection resources, this step is the most restrictive.
This step involves extracting the physical layout of the design
and simulating it using commercial simulators to obtain timing
data and checking design rules (DRC).
If the delays associated with the interconnections within the
prototype indeed fulfill delay constraints imposed by the design
specifications, then the device may be programmed, otherwise
the placement and routing steps have to be repeated until a
satisfactory configuration is found.
Macro Integration
This involves the provision of all the necessary files and data
formats for integrating the macro in the design flow of the
whole chip.
Once the circuit would have been verified, the design
configuration is output in a format which is readable as an input
to the FPGA device which is to be programmed.
The programming of the device could be a question of minutes .
FPGA Design Flow
The Design Cycle
ASIC Design Methodology
the design is verified by simulation at each stage of refinement
Accurate simulators are slow

Fast simulators trade away simulation accuracy.
ASIC designers use a battery of simulators across the speedaccuracy spectrum in an attempt to verify the design.
an FPGA designer can replace simulation with in-circuit

verification, simulating the circuitry in real time with a
prototype
The path from design to prototype is short, allowing a designer

to verify operation over a wide range of conditions at high speed
and high accuracy.
proof-of-concept prototype designs easily
Designs can be verified by trial rather than reduction to first principles

or by mental execution.
verify that the design works in the real system, not merely in a
potentially-erroneous simulation model of the system.
The Design Cycle

1.
2.
3.
4.
5.
6.
7.
8.
9.
Entering the design in the form of schematic,

netlist, logic expressions, or HDLs
Simulating the design for functional
verification
Mapping the design into the FPGA
architecture
Placing and Routing the FPGA design
Extracting delay parameters of the routed
design
Resimulating for timing verification
Generating the FPGA device configuration
format
Configuring or Programming the device
Testing the product for undesirable functional
behavior
FPGA Configuration
In the case of re-programmable devices, activation or deactivation of interconnects

is implemented by means of pass transistors or tri-state buffers
Memory units also store the configuration of LUTs and static multiplexers in the
CLC. If the type of memory used is EEPROM, the device is non-volatile, but the
difficult mechanism of re-configuration imposes limitations on the application of the
system.
SRAM memory, on the other hand, loses the configuration once power is removed
from the device (volatile), but it is simple and quick to configure. The use of SRAM
allows for dynamic re-configuration of the device even during real-time operation.
Small local SRAM blocks may also be used to store several configuration bits.
FPGA Capacity comparisons
SRAM FPGA -- EEPROM

FPGA
Despite this, however, most FPGAs still use

SRAM for reasons of simplicity (when you
need to reprogram it, it's easier to reencode a small ROM chip than to reprogram
a large FPGA chip), so count on having to
use a separate boot ROM for the FPGA.
Use of an FPGA is broadly divided into two
main stages:
Configuration
mode
the mode in which the FPGA is when you first power it

up. Configuration mode is, as you may have guessed,
where you configure the FPGA;
Product FPGA vs ASIC
Comparison:
FPGA benefits vs ASICs:
- Design time:
- Cost:
- Volume:
9 month design cycle vs 2-3 years

No $3-5 M upfront (NRE) design cost.
No $100-500K mask-set cost
High initial ASIC cost recovered only in very high volume products
Due to Moores law, many ASIC market requirements now met by FPGAs
- Eg. Virtex II Pro has 4 processors, 10 Mb memory, IO
Resulting Market Shift:

Dramatic decline in number of ASIC design starts:
- 11,000 in 97
- 1,500 in 02
FPGAs as a % of Logic market:
- Increase from 10 to 22% in past 3-4 years
FPGAs (or programmable logic) is the fastest growing segment of the

semiconductor industry!!
FPGA/ASIC Crossover Changes
Cost
90nm / 300mm ASICs
SICs
A
m
m
0
0
2
/
150nm
s
s
A
A
G
G
P
P
F
F
m
m
m
0
0
m
3
0
/
0
2
m
/
0n
9
m
n
0
15
FPGA FPGA
Cost Advantage
Cost
Advantage
ASIC Cost
ASICAdvantage
Cost Advantage
FPGA
Cost Advantage
Production Volume

FPGA-Arch CPLD Design April2012

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FPGA-Arch CPLD Design April2012

Uploaded by

Copyright:

Available Formats

FPGA Architecture

Available choice for digital designer

FPGAs and CPLDs

Field programmability and programming

SRAM, Anti-fuse, EPROM and EEPROM

Fixed Versus Programmable Logic

The circuits in a fixed logic device are permanent,

Programmable logic devices (PLDs) are standard,

Digital designer has various options

SSI (small scale integrated circuits) or MSI (medium

Difficulties arises as design size increases

Simple programmable logic devices

(programmable array logic)

Simple Programmable Logic Devices

two level structure

Allow high speed performance

Small logic circuits

number of product terms

With increase in product terms

PAL with Logic Expanders

PLA v.s. PAL

Programmable digital integrated circuit

Low development costs

Types of programmable logic:

Complex PLDs (CPLD)

A Generic CPLD Structure

Multiple PLDs can be combined on a single chip by using programmable interconnect

High Density Logic Overview

Large Logic Building Blocks

Fast Predictable Performance

Altera MAX CPLD

Altera MAX chip

LAB (Logic Array Block)

CPLD Example - Altera MAX7000

EPM7000 Series Block Diagram

CPLD Example - Altera MAX7000

MAX 7000 architecture includes

CPLD Example - Altera MAX7000

EPM7000 Series Device Macrocell

CPLD Example - Altera MAX7000

logic is implemented in the array

Product-term select matrix

Primary logic inputs to AND or XOR gates

LAB has16 Shareable expanders

Inverted product terms fed back into the logic array

Product terms borrowed from adjacent macrocells

CPLD Example - Altera MAX7000

Configured to get T,D,JK or SR functions

Optimize resource utilization

By global clock signal

Fastest clock to output performance

By a global clock signal and an active-high clock enable

Provides an enable on each FF

By an array clock implemented with a product term

FF supports Asynchronous preset and clear functions

Selecting efficient FF operation for registered function

Register can be clocked in different modes

Programmable clock control

Product-term select matrix provides product terms to control these functions

FF is cleared upon power up

CPLD Example - Altera MAX7000

Fast input path to macrocell register