You are on page 1of 265

UNIT 3 Programmable Logic Devices

R.B.Ghongade

Key Terms
Field-Programmable Device (FPD) a general term that refers to any type of integrated circuit used for implementing digital hardware, where the chip can be configured by the end user to realize different designs. Programming of such a device often involves placing the chip into a special programming unit, but some chips can also be configured in-system. Another name for FPDs is programmable logic devices (PLDs); although PLDs encompass the same types of chips as FPDs, we prefer the term FPD because historically the word PLD has referred to relatively simple types of devices. PLA a Programmable Logic Array (PLA) is a relatively small FPD that contains two levels of logic, an ANDplane and an OR-plane, where both levels are programmable.

Key Terms
PAL a Programmable Array Logic (PAL) is a relatively small FPD that has a programmable AND-plane followed by a fixed OR-plane SPLD refers to any type of Simple PLD, usually either a PLA or PAL CPLD a more Complex PLD that consists of an arrangement of multiple SPLD-like blocks on a single chip. Alternative names are sometimes adopted for this style of chip are Enhanced PLD (EPLD), Super PAL, Mega PAL, and others. FPGA a Field-Programmable Gate Array is an FPD featuring a general structure that allows very high logic capacity. Whereas CPLDs feature logic resources with a wide number of inputs (AND planes), FPGAs offer more narrow logic resources. FPGAs also offer a higher ratio of flip-flops to logic resources than do CPLDs.

Key Terms
HCPLDs high-capacity PLDs: a single acronym that refers to both CPLDs and FPGAs. This term has been coined in trade literature for providing an easy way to refer to both types of devices. Interconnect the wiring resources in an FPD. Programmable Switch a user-programmable switch that can connect a logic element to an interconnect wire, or one interconnect wire to another Logic Block a relatively small circuit block that is replicated in an array in an FPD. When a circuit is implemented in an FPD, it is first decomposed into smaller sub-circuits that can each be mapped into a logic block. The term logic block is mostly used in the context of FPGAs, but it could also refer to a block of circuitry in a CPLD.

Key Terms
Logic Capacity the amount of digital logic that can be mapped into a single FPD. This is usually measured in units of equivalent number of gates in a traditional gate array. In other words, the capacity of an FPD is measured by the size of gate array that it is comparable to. In simpler terms, logic capacity can be thought of as number of 2-input NAND gates. Logic Densitythe amount of logic per unit area in an FPD. Speed-Performance measures the maximum operable speed of a circuit when implemented in an FPD. For combinational circuits, it is set by the longest delay through any path, and for sequential circuits it is the maximum clock frequency for which the circuit functions properly.

Digital VLSI Chips -classification


ASIC

Generic usage

FPD

Field Programmable Device

ASIC

Typical usage

SPLD

CPLD

FPGA

GATE ARRAY

STANDARD CELL

FULL CUSTOM
Increasing complexity

Increasing complexity

PLA

PAL

GAL

PROM

EPLD

E2PLD

General Programmable Logic Device

Inputs

(logic variables)

Logic gates and programmable switches

Outputs

(logic functions)

Consists of a set of inputs (the logic variables) and set of outputs (logic functions). The job of the designer is to simply program the switches and hence configure the logic gates to perform the desired function

AND-OR realization of logic functions


A 0 0 0 0 1 1 1 1 B 0 0 1 1 0 0 1 1 C 0 1 0 1 0 1 0 1 Y 0 0 0 1 0 1 1 0
C B Y

Y AB AC BC
A

Thus given a logic function in SOP form , it can be implemented by using AND and OR arrays. This forms the basic working principle of programmable logic devices

General form of programmable function device Logic 1


a Pull-up resistors

Links that can be programmed


Inputs are available in their true as well as inverted (complementary) forms. This is an important development since all possibilities of inputs are available readily. The user can now put links and construct the desired function. Putting the link ( or removing it) is called as programming the device.

Programming technologies
The type of links gives rise to two different technologies: Fusible link and Anti-fuse Fusible link technologies

Fusible Link Technology


Logic 1 a

Un-programmed Device

Logic 1 a

Programmed Device
b y=a.b'

blown fuses

Fusible link technologies


Devices based on fusible-link technologies are said to be one-time programmable, or OTP, because once a fuse has been blown, it cannot be replaced. This places a severe limitation on the usage of the device.

Antifuse technologies
Logic 1 a

Un-programmed Device

antifuse links
a

Logic 1

Programmed Device

y=a.b'

programmed antifuse links

Simplified Antifuse
An antifuse is a microscopic column of amorphous (noncrystalline) silicon linking two metal tracks. In its unprogrammed state, the amorphous silicon acts as an insulator with a very high resistance in excess of one billion ohms

Un-programmed

Programmed

Types of antifuse technologies


There are two classes of antifuse technologies:
Poly-diffusion Anti-fuse (used by Actel) Metal-Metal Anti-fuse (used by Quicklogic)

Poly-diffusion Anti-fuse

An Oxide-Nitride-Oxide dielectric normally prevents current from flowing between diffusion and poly-silicon layers When a programming pulse is applied the dielectric melts and a circuit is formed between the diffusion and poly-silicon

Metal-Metal Anti-fuse

The link is an alloy of tungsten, titanium and silicon The conductive link usually forms at the corner of the via where the electric field is highest during programming

Programming!
The act of programming this particular element effectively grows a linkknown as a viaby converting the insulating amorphous silicon into conducting polysilicon Devices based on antifuse technologies are OTP, because once an antifuse has been grown, it cannot be removed. Again it is a severe limitation of the technology, but antifuse technology has found its way in space applications because of high reliability

PLD Notation
link a b c d

a.b'.d

a'.c'

no link

Non-programmable link
a b c d non-programmable connection

a.b'.d

a'.c'

Programmable Logic Array (PLA)


The AND array along with an OR array can be put together to form a Programmable Logic Array (PLA) We have already explored the technique of realizing any logic expression by using AND and OR gates. This is the underlying principle of PLA. In a PLA both the AND as well as OR arrays are both programmable. PLAs are specified in terms of:
Number of inputs (n) Number of outputs(m) Number of product terms(p)

PLA
a b c d Programmable OR array

Programmable AND array

f1

f2

f3

PLA programmed for various logic expressions


a b c d

a.b.c'.d' a'.b'.c'

b'.c

a.b.c'.d'+a'.b'.c' a'.b'.c'+b'.c a.b.c'.d'+b'.c

PLA QP82S100
m 16 n 8 p 48

Consists of 16 dedicated inputs and 8 dedicated outputs Each output is capable of being actively controlled by any or all of the 48 product terms. The True, Complement, or Dont Care condition of each of the 16 inputs can be ANDed together to comprise one product term All 48 product terms can be selectively ORed to each output

PLA QP82S100

Programmable AND Array Logic (PAL)


Many applications do not require that both the AND as well as OR arrays be programmable. Programmable links are slower than permanent links owing to the considerable resistance shown by the fusible material. Hence another option for the design engineers -The AND array can be kept programmable as in PLA but the OR array has got no programmability! Permanent connections are only available in the OR array thus pre-defining the sum terms. This reduces the flexibility but greatly improves the speed and reduces the manufacturing cost.

PAL
a b c d Fixed OR array

Programmable AND array

f1

f2

f3

Additional Features
Tri-state outputs
gives programmable bi-directional pins saves the pin-count

Registered outputs
Enables the use of the PAL in finite state machines Increases the versatility of the device

Macrocell

PAL16L8A

Specifications
Part Number = PAL16L8A Description = Programmable array logic device Fuse type=titanium-tungsten Manufacturer = Texas Instruments Number of Inputs = Upto 16 Prod. Terms Max. = 64 No. of Outputs = Upto 8 Nom. Supp (V) = 5.0 Package = DIP, LEADLESS CERAMIC CHIP CARRIER(FK) Pins = 20 Technology = Advanced Low-Power Schottky Bi-directional pins=6

Programming the PLD


Programming a traditional PLD is easy because there are computer programs and associated tools specially created for the task. The user first creates a computer file known as a PLD source file containing a textual description of the required functionality. In addition to Boolean equations, the PLD source file may also support truth tables, state tables, and other constructs, all in textual format. Automatic selection on a variety of criteria, such as the speed, cost, and power consumption of the devices. The program may also be used to partition a large design across several devices, in which case it will output a separate JEDEC file for each device. Finally, the designer takes a new device of the appropriate type and places it in a socket on a special tool, which may be referred to as a programmer, blower, or burner. The main computer passes the JEDEC file to the programmer, which uses the contents of the file to determine which fuses to blow

JEDEC: Joint Electron Device Engineering Council

Setup for programming

Reprogrammable PLDs
The basic (and most severe) limitation with fusible link and antifuse technologies is that, the device cannot be re-programmed. This may be a severe short-coming especially during the development phases of the system

Technologies for re-programmable PLD


EPROM( Erasable Programmable Read-Only Memory ) E2PROM( Electrically Erasable Programmable Read-Only Memory ) FLASH SRAM (Static Random Access Memory)

EPROM
An EPROM transistor has the same basic structure as a standard MOS transistor, but with the addition of a second polysilicon floating gate isolated by layers of oxide
SOURCE TERMINAL SOURCE TERMINAL CONTROL GATE TERMINAL DRAIN TERMINAL SiO2 GATE SOURCE DRAIN Si GATE
FLOATING GATE

CONTROL GATE TERMINAL

DRAIN TERMINAL

SOURCE

DRAIN

MOS TRANSISTOR

EPROM TRANSISTOR

EPROM
In its un-programmed state, the floating gate is uncharged and doesnt affect the normal operation of the control gate. To program the transistor, a relatively high voltage in the order of 12V is applied between the control gate and drain terminals. This causes the transistor to be turned hard on, and excited electrons push through the oxide into the floating gate in a process known as hot (high energy) electron injection. When the programming signal is removed, a negative charge remains on the floating gate. This charge is very stable and will not dissipate for more than a decade under normal operating conditions. The stored charge on the floating gate inhibits the normal operation of the control gate, and thus distinguishes those cells that have been programmed from those which have not.

EPROM

2 E PROM
An E2PROM cell is approximately 2.5 times larger than an EPROM cell because it contains two transistors. One of the transistors is similar to that of an EPROM transistor in that it contains a floating gate, but the insulating oxide layers surrounding the floating gate are very much thinner. The second transistor can be used to erase the cell electrically, and E2PROM devices can typically be erased and reprogrammed on a word-by-word basis.

2 E PROM

FLASH
The name FLASH was originally coined to reflect the technologys rapid erasure times compared to EPROM These devices can be electrically erased, but only by erasing the whole device or a large portion of it. architectures have a two-transistor cell which is very similar to that of an E2PROM cell allowing them to be erased and reprogrammed on a word-by-word basis.

FLASH

SRAM
It consists of two cross-coupled inverters and two access transistors The SRAM cell drives the gates of other transistors on the chip - either ON to make connection or OFF to break the connection. The access transistors R E A D / W R IT E are connected to the at their respective gate terminals, and the DATA at their source/drain terminals. The is R E A D / W R IT E used to select the cell while the DATA are used to perform read or write operations on the cell. Internally, the cell holds the stored value on one side and its complement on the other side. To store data, R E A D / W R IT E is set to to 1 (5v), the NMOS now passes the data from the left hand side to the right hand side of the transistor. After the data stabilizes around the two NOT gates, R E A D / W R IT E is set to 0, and the data remains running forever. Note that the lower NOT is labeled WEAK, meaning it has weaker transistors. That is in case we want to set a new data and we want the STRONG NOT to override the WEAK one in case the logical level has to change

SRAM Cell

SRAM
SRAM cells are used for the following: 1. They can store a logic value of 0 or 1. 2. They can store a value of an LUT. 3. They configure the interconnection switches of the FPGA

Type

Symbol

Re-programmable

Volatile

Technology

Associated with SPLD

Fusible link

No

No

Bipolar

Anti-fuse

No

No

CMOS

FPGA

EPROM

Yes (out of circuit) Yes (in-circuit) Yes (in-circuit) Yes (in-circuit)

No

UVCMOS

SPLD & CPLD SPLD & CPLD

E2PROM

No

EECMOS

SRAM

Yes

CMOS

FPGA

FLASH

No

CMOS

FPGA

Comparison between programming technologies


Technology EPROM Advantages Mainstream Technology Reprogrammable 100% testable non-volatile software is simple Limitations SRAM Base logic process - so it uses leading edge processing Re-programmable 100% testable no programmer No socket requires high voltage - 1 generation below SRAM requires programmer requires socket high impedance 80uA/ minimum gate (12K ohm) impact ionization limits voltage across the device Largest Area element using 5 to 6 transistors plus switch = 30u per node @ 0.25u switch is medium impedance 3k/ohms per square (500uA/micron) high capacitance -1.6 fA per micron/ per node @ 0.25u volatile requires external memory to load designs easily copied dead until loaded soft ware is difficult

Technology Antifuse

Advantages Highest density - a mere cross point - 10X the density of SRAM Lowest switch resistance - 25 Ohms Very low capacitance 1 fF per node.approaching the metal line capacitance non- volatile Nearly impossible to reverse engineer Radiation hard Live with in 1 millisecond of the power supply reaching spec voltage Software is easy to place and route

Limitations Requires programmer Requires a socket - a problem for devices with > 200 pins solved with BGA Those who design by test will throw out a lot of parts. Requires one to two transistors per wire for programming ~ 10mA for Metal antifuses ONO antifuses require less only 5mA needed so can be programmed from the edge Some antifuse defects not testable until programming - hence only 98% to 99 % programming yield - but 100% functional Requires high voltages About the same speed as SRAM Radiation Hardness is expected to behave similar to EPROM - has not been tested yet

FLASH

Re-programmable in the board No socket Non-volatile One transistor instead of 6 for routing control - i.e. denser parts Passes full Vcc without pump Live at power up. Difficult to reverse engineer

CPLD
R.B.Ghongade

Key Terms
CPLD: Complex programmable logic device. A programmable logic device consisting of several interconnected programmable blocks. Logic Array Block (LAB): A group of macrocells that share common resources in a CPLD. Programmable Interconnect Array (PIA): An internal bus with programmable connections that link together the Logic Array Blocks of a CPLD. Buried logic: Logic circuitry in a PLD that has no connection to the input or output pins of the PLD, but is used solely as internal logic. I/O Control Block: A circuit in a CPLD that controls the type of tri-state switching used in a macrocell output.

Key Terms
Parallel logic expanders: Product terms that are borrowed from neighbouring macrocells in the same LAB. Shared logic expanders: Product terms that are inverted and fed back into the programmable AND matrix of an LAB for use by any other macrocell in the LAB. Specifications: There are several performance specifications for complex programmable logic devices
Internal frequency is the speed at which CPLDs can perform operations or transfer data internally. The propagation delay is the time interval between the application of an input signal and the occurrence of the corresponding output in a logic circuit. Speed grade indicates the delay in nanoseconds (ns) through a macrocell in the CPLD. For example, a CPLD with a speed grade of 10 has a delay of 10 ns through a macrocell. CPLD with low speed grade numbers run faster than devices with highspeed grade numbers

CPLD
The term complex PLD (CPLD) is generally taken to refer to a class of devices that
contain a number of simple PLA or PAL functions (generically referred to as simple PLDs (SPLDs) share a common programmable interconnection matrix.

Thus CPLDs consist of multiple SPLD-like blocks on a single chip. However, CPLD products are much more sophisticated than SPLDs, even at the level of their basic SPLD-like blocks. While each manufacturer has a different variation, in general they are all similar in that they consist of function blocks, input/output block, and an interconnect matrix. The devices are programmed using programmable elements that, depending on the technology of the manufacturer, can be
EPROM cells EEPROM cells Flash EPROM cells

Generic building blocks


PLD blocks (also called Function Blocks) Interconnection matrix I/O blocks

Altera MAX7000S Complex PLD

Some tricks!
Using XOR gate as programmable NOT gate

0 1 1

LOGIC CIRCUIT

LOGIC CIRCUIT

Some tricks!
Using MUX as programmable switch

4:1 MUX

Programmable Cells

Packages

PQFP: Plastic Quad Flat Package

PLCC: Plastic Leaded Chip Carrier

TQFP: Thin Quad Flat Pack

PGA: Pin Grid Array

Device number
E P M 7 128 S LC84

EPM7

128

S In-system programmable

LC84 84-pin PLCC package

MAX7000 Number of family macrocells

MAX 7000 family


Features
Advanced CMOS technology EEPROM-based provides 600 to 5,000 usable gates In System Programmable pin-to-pin delays as low as 5 ns counter speeds of up to 175.4 MHz

Architecture
The MAX 7000 architecture includes the following elements:
Logic array blocks (LAB) Macrocells Expander product terms (shareable and parallel) Programmable interconnect array I/O control blocks

CLOCK & RESET pins


The MAX7000S family has four pins that can be configured as control signals or inputs. GCLK1 is a global clock that is common to all macrocells in the device and can be used to synchronously clock all registers. OE1 is an output enable that can globally activate or disable the tristate outputs of the device macrocells. GCLRn is an active- LOW global clear function. The fourth control pin can be configured as an input, as can the other three pins, or as a second global clock (GCLK2) or output enable (OE2). If the control functions are not used, these pins add four inputs to the available total.

Global Clock

Active- LOW Global Clear

Architecture

Logic Array Block


LABs consist of 16-macrocell arrays Multiple LABs are linked together via the programmable interconnect array (PIA), a global bus that is fed by all dedicated inputs, I/O pins, and macrocells Each LAB in a MAX7000S device has from 6 to 16 I/O pins For EPM7128SLC84 there are only 60 I/Os available

Macrocell

Macrocell
The macrocell is similar to that of a GAL or Universal PAL in that it provides a sum-of-products function with active- HIGH or -LOW options and the choice of registered or combinational output. Registered outputs can be clocked with one of two global clocks or by a product term from the AND matrix. The register can be cleared globally or by a product term and preset with a product term. The macrocell has five dedicated product terms, which is fewer than found in the PAL and GAL. This is generally sufficient to implement most logic functions. If more terms are required, they can be supplied by a set of shared logic expanders or parallel logic expanders.

Shareable Expanders

Shareable Expanders
Shared logic expanders do not add more product terms to a given macrocell. They do make the programming of the entire LAB more efficient by allowing a product term to be programmed once and used in several macrocells of the same LAB. One product term per macrocell is inverted and fed back into the shared expander pool of product terms. Since there are 16 macrocells per LAB, the shared logic expander pool has up to 16 product terms

Parallel Expanders

Parallel Expanders
Parallel logic expanders allow a macrocell to borrow up to 15 product terms from its three lower-numbered neighbours (5 product terms per neighboring macrocell). For example, macrocell 4 can borrow up to 5 terms each from macrocells 3, 2, and 1. By using its 5 dedicated product terms and the maximum number of parallel expanders, a macrocell can have up to 20 product terms at its disposal. These borrowed terms are not usable by the macrocell from which they were borrowed. The parallel expanders are set up so that a lowernumber cell lends product terms to a higher-number cell, so the number of available terms depends on how close to the end of a chain a macrocell is.

Programmable Interconnect Array

PIA
Logic is routed between LABs via the programmable interconnect array (PIA). This global bus is a programmable path that connects any signal source to any destination on the device. All MAX 7000 dedicated inputs, I/O pins, and macrocell outputs feed the PIA, which makes the signals available throughout the entire device. Only the signals required by each LAB are actually routed from the PIA into the LAB. An EEPROM cell controls one input to a 2-input AND gate, which selects a PIA signal to drive into the LAB. While the routing delays of channel-based routing schemes in masked or FPGAs are cumulative, variable, and path-dependent, the MAX 7000 PIA has a fixed delay. The PIA thus eliminates skew between signals and makes timing performance easy to predict.

I/O Block

I/O Block
The I/O control block allows each I/O pin to be individually configured for input, output, or bidirectional operation. All I/O pins have a tri-state buffer that is individually controlled by one of the global output enable signals or directly connected to ground or VCC. The I/O control block of EPM7032, EPM7064, and EPM7096 devices has two global output enable signals that are driven by two dedicated active-low output enable pins (OE1 and OE2). The I/O control block of MAX 7000E and MAX 7000S devices has six global output enable signals that are driven by the true or complement of two output enable signals, a subset of the I/O pins, or a subset of the I/O macrocells

I/O Control

I/O Block
When the tri-state buffer control is connected to ground, the output is tri-stated (high impedance) and the I/O pin can be used as a dedicated input. When the tri-state buffer control is connected to VCC, the output is enabled. The MAX 7000 architecture provides dual I/O feedback, in which macrocell and pin feedbacks are independent. When an I/O pin is configured as an input, the associated macrocell can be used for buried logic

Output Configuration
MultiVolt I/O Interface
MAX 7000 device outputs can be programmed to meet a variety of system-level requirements. MultiVolt I/O Interface MAX 7000 devicesexcept 44pin devicessupport the MultiVolt I/O interface feature, which allows MAX 7000 devices to interface with systems that have differing supply voltages. The 5.0-V devices in all packages can be set for 3.3V or 5.0-V I/O pin operation. These devices have one set of VCC pins for internal operation and input buffers (VCCINT), and another set for I/O output drivers (VCCIO).

Output Configuration
Open-Drain Output Option (MAX 7000S Devices Only)
This open-drain output enables the device to provide system-level control signals (e.g., interrupt and write enable signals) that can be asserted by any of several devices. It can also provide an additional wired-OR plane

Output Configuration
Slew-Rate Control
The output buffer for each MAX 7000E and MAX 7000S I/O pin has an adjustable output slew rate that can be configured for low-noise or high-speed performance. A faster slew rate provides high-speed transitions for high-performance systems However, these fast transitions may introduce noise transients into the system. A slow slew rate reduces system noise, but adds a nominal delay of 4 to 5 ns.

Xilinx XC95XX/XC95XXX Complex PLD

PLD like blocks called as FUNCTION BLOCKS

Available packages Xilinx CPLD


XC9536 44-Pin VQFP 44-Pin PLCC 48-Pin CSP 84-Pin PLCC 100-Pin TQFP 100-Pin PQFP 160-Pin PQFP 208-Pin HQFP 352-Pin BGA 34 34 34 XC9572 34 69 72 72 XC95108 69 81 81 108 XC95144 81 81 133 XC95216 133 166 166 XC95288 168 192

More packages

VQFP: Very Fine Pitch Quad Flat Pack/ Very Thin Quad Flat Package

CSP: Chip Scale Package

HQFP: Heat-sinked Quad Flat Pack

BGA: Ball Grid Array

Device marking

Features
High-performance: 5 ns pin-to-pin logic delays on all pins, fCNT to 125 MHz Large density range: 36 to 288 macrocells with 800 to 6,400 usable gates 5V in-system programmable: Endurance of 10,000 program/erase cycles Enhanced pin-locking architecture Flexible 36V18 Function Block: 90 product terms drive any or all of 18 macrocells within Function Block, global and product term clocks, output enables, set and reset signals, extensive IEEE Std 1149.1 boundary-scan (JTAG) support ,slew rate control on individual outputs, user programmable ground pin capability, extended pattern security features for design protection, High-drive 24 mA outputs, 3.3V or 5V I/O capability Advanced CMOS 5V FLASH technology Supports parallel programming of multiple XC9500 devices

XC9500 Architecture

CLOCK ,RESET, TRI-STATE pins


The pins labeled GCK (three), GSR (one), GTS (two or four) can be used for special purposes GCK: global clock GSR: global set/reset GTS: global three-state controls

Function Blocks

Function Blocks
The AND plane still exists as shown by the crossing wires. The AND plane can accept inputs from the I/O blocks, other function blocks, or feedback from the same function block. The terms are then ORed together using a fixed number of OR gates, and terms are selected via a large multiplexer. The outputs of the mux can then be sent straight out of the block, or through a clocked flip-flop. This particular block includes additional logic such as a selectable exclusive OR and a master reset signal, in addition to being able to program the polarity at different stages

Function Blocks
Each Function Block is comprised of 18 independent macrocells, each capable of implementing a combinatorial or registered function. The FB also receives global clock, output enable, and set/reset signals. The FB generates 18 outputs that drive the Fast CONNECT switch matrix. These 18 outputs and their corresponding output enable signals also drive the IOB. Logic within the FB is implemented using a sum-ofproducts representation. Thirty-six inputs provide 72 true and complement signals into the programmable AND-array to form 90 product terms. Any number of these product terms, up to the 90 available, can be allocated to each macrocell by the product term allocator.

XC9500 macrocell
Set control Programmable inversion or XOR product term Up to 5 product terms Global clock or product-term clock Reset control

OE control

Macrocell Clock and Set/Reset Capability

Product term allocator

IOB

Switch matrix

Xilinx CoolRunner-II CPLD Family


Features Optimized for 1.8V systems : Low power CPLD, Densities from 32 to 512 macrocells 0.18 micron CMOS CPLD : Optimized architecture for effective logic synthesis, multivoltage ,I/O operation ( 1.5V to 3.3V) Advanced system features: Fast in system programming, On-The-Fly Reconfiguration (OTF),boundary scan test, multiple I/O banks on all devices, low- power management External signal control, flexible clocking modes Clock divider ( 2,4,6,8,10,12,14,16) Global signal options with macrocell control, multiple global clocks with phase selection per macrocell Multiple global output enables Global set/reset: Abundant product term clocks, output enables and set/resets, efficient control term clocks, output enables and set/resets for each macrocell and shared across function blocks Advanced design security Open-drain output option for Wired-OR and LED drive Optional bus-hold, 3-state or weak pullup on select I/O pins: Optional configurable grounds on unused I/Os, mixed I/O voltages compatible with 1.5V, 1.8V, 2.5V, and 3.3V logic levels on all parts Wide package availability including fine pitch:Chip Scale Package (CSP) BGA, Fine Line BGA, TQFP, PQFP, VQFP, PLCC, and QFN packages Guaranteed 1,000 program/erase cycles, Guaranteed 20 year data retention

CoolRunner-II CPLD Architecture

Coolrunner-II family Function Block

Macrocell

New control signals


Control Terms (CT) are available to be shared for key functions within the FB, and are generally used whenever the exact same logic function would be repeatedly created at multiple macrocells. The CT product terms are available for FB clocking (CTC), FB asynchronous set (CTS), FB asynchronous reset (CTR), and FB output enable (CTE).

Advanced Interconnect Matrix


The Advanced Interconnect Matrix is a highly connected low power rapid switch. The AIM is directed by the software to deliver up to a set of 40 signals to each FB for the creation of logic. Results from all FB macrocells, as well as, all pin inputs circulate back through the AIM for additional connection available to all other FBs as dictated by the design software. The AIM minimizes both propagation delay and power as it makes attachments to the various FBs

I/O blocks

Output Banking: The output pins are grouped in large banks which allow easy interfacing to 3.3V, 2.5V, 1.8V, and 1.5V in a single part. Thus these CPLDs can be widely used as voltage interface translators

DataGate

DataGate
Is used for power reduction. Each I/O pin has a series switch that can block the arrival of free running signals that are not of interest. Signals that serve no use may increase power consumption, and can be disabled. Users are free to do their design, then choose sections to participate in the DataGATE function. DataGATE is a logic function that drives an assertion rail threaded through the medium and high-density CoolRunner-II CPLD parts. Designers can select inputs to be blocked under the control of the DataGATE function, effectively blocking controlled switching signals so they do not drive internal chip capacitances. Output signals that do not switch, are held by the bus hold feature. Any set of input pins can be chosen to participate in the DataGATE function.

Choice of CPLD
When considering a CPLD for use in a design, the following issues should be taken into account :
1. The programming technology EPROM, EEPROM, or Flash EPROM? This will determine the equipment needed to program the devices and whether they can be programmed only once or many times. 2. The function block capability How many function blocks are there in the device? How many product and sum terms can be used? What are the minimum and maximum delays through the logic? What additional logic resources are there such as XNORs, ALUs, etc.? What kind of register controls are available (e.g., clock enable, reset, preset, polarity control)? How many are local inputs to the function block and how many are global, chipwide inputs? What kind of clock drivers are in the device and what is the worst case skew of the clock signal on the chip. This will help determine the maximum frequency at which the device can run. 3. The I/O capability How many I/O are independent, used for any function, and how many are dedicated for clock input, master reset, etc.? What is the output drive capability in terms of voltage levels and current? What kind of logic is included in an I/O block that can be used to increase the functionality of the design?

FPGA
R.B.Ghongade

Key terms
Look-up table (LUT): A circuit that implements a combinational logic function by storing a list of output values that correspond to all possible input combinations. CLB: Configurable Logic Block is the name for programmable logic block in a FPGA. Logic element (LE): A circuit internal to a FPGA used to implement a logic function as a look-up table. Cascade chain: A circuit in a FPGA that allows the input width of a Boolean function to expand beyond the width of one logic element. Carry chain: A circuit in a FPGA that is optimized for efficient operation of carry functions between logic elements. DCM: Digital clock manager is a very important circuit that offers various clock management functions in a FPGA. Clock trees: Distribution of clock signal lines along the FPGA architecture.

Field Programmable Gate Arrays


Structure much like a gate array ASIC Visualized as islands of programmable logic in a sea of programmable interconnect. More closer to programmable ASICs Can be scaled to large sizes Large emphasis is laid on interconnection routing Timing performance is difficult to predict

Generic FPGA architecture


Contain the following blocks:
Programmable logic block I/O blocks Programmable interconnect

In addition the FPGA has:


Clock distribution circuit Embedded memory blocks Special purpose blocks:
DSP blocks: Hardware multipliers, adders and registers

Embedded microprocessors/microcontrollers High-speed serial transceivers

FPGA architecture
Programmable logic block

Programmable interconnect

Many times the FPGA is described in terms of the fabric which means the underlying structure of the device

Programming
FPGAs can use any one of the following programming technologies:
SRAM Antifuse FLASH Hybrid FLASH-SRAM

FPGA fabric

Programmable Logic Block

Programmable Logic Block

Programmable Logic Block Programmable interconnects

Programmable Logic Block

Programmable Logic Block

Programmable Logic Block

Types of architectures
Fine grained
Each programmable logic block can be used to implement only a very simple function. For example, it might be possible to configure the block to act as any 3-input function, such as a primitive logic gate (AND,OR, NAND, etc.) or a storage element (D-type flip-flop, D-type latch, etc.). fine-grained architectures are said to be particularly efficient when executing systolic algorithms (functions that benefit from massively parallel implementations). Fine-grained implementations require a relatively large number of connections into and out of each block compared to the amount of functionality that can be supported by those blocks

Types of architectures
Coarse grained
In the case of a coarse-grained architecture, each logic block contains a relatively large amount of logic compared to their fine-grained counterparts. For example, a logic block might contain four 4-input LUTs, four multiplexers, four D-type flip-flops, and some fast carry logic. As the granularity of the blocks increases to mediumgrained and higher, the amount of connections into the blocks decreases compared to the amount of functionality they can support.

Logic realization techniques


There are two fundamental methods employed by vendors for the programmable logic blocks used to form the medium-grained architectures referenced in the previous section:
MUX (multiplexer) based LUT (lookup table) based

MUX-based
This is based on the Shannons decomposition theorem which states that: Let f(x) be a switching function on n variables. Then f(a) can be factored as

f ( a ) ai f 1 ai f 2
OR

f (a1 , a2 ,..., an ) a1 f (0, a2 ,...an ) a1 f (1, a2 ,...an )

Example (MUX implementation)


Consider a 3-input function
a 0 0 0 0 1 1 1 1 b 0 0 1 1 0 0 1 1 c 0 1 0 1 0 1 0 1 y 0 1 0 1 0 1 1 1

y ab c
b 0 0 1 1 0 0 1 1 c 0 1 0 1 0 1 0 1 y 0 1 0 1 0 1 1 1

y1 c
y2 b c

Using Shannons decomposition theorem we can write y as

y a (c ) a ( b c )

Example
b 0 0 1 1 c 0 1 0 1 y 0 1 1 1

y2 y3 y4 y2 b c b 1

MUX implementation

Another possible implementation

LUT-based
An n-input LUT is that it can implement any possible n-input combinational. The underlying concept behind a LUT is relatively simple. A group of input signals is used as an index (pointer) to a lookup table. The contents of this table are arranged such that the cell pointed to by each input combination contains the desired value

LUT implementation

Using pass transistors

Using transmission gates

# of LUTs?
It has been statistically concluded that a 4-input LUT is best for FPGA devices. One additional advantage of LUT based programmable block is that the SRAM the cells forming the LUT can be used as a small block of RAM (the 16 cells forming a 4input LUT, for example, could be used as a 16 X 1 RAM). This is referred to as distributed RAM. Also all the SRAM cells are effectively connected in a chain. This is so as to facilitate the programming. But this offers a new possibility of using this chain as a shift register. Because of all these advantages , majority of todays FPGA architectures are LUT based

Major FPGA Vendors


SRAM-based FPGAs Xilinx, Inc. Actel Corp. Altera Corp. Atmel Quick Logic Corp Lattice Semiconductor Flash & antifuse FPGAs

Xilinx FPGA Devices


Old families
XC3000, XC4000, XC5200 Old 0.5m, 0.35m and 0.25m technology. (Not recommended for modern designs)

Low Cost Family


Spartan/XL derived from XC4000 Spartan-II derived from Virtex Spartan-IIE derived from Virtex-E Spartan-3 Virtex (0.22m) Virtex-E, Virtex-EM (0.18m) Virtex-II, Virtex-II PRO (0.13m) Virtex-4 (0.09m) Virtex-5 (0.065m)

High-performance families

LX
Logic

Virtex 5 flavours LXT SXT FXT


Logic/Serial DSP/Serial Embedded/Serial

Xilinx devices
0.13m 0.18m 0.22m 0.3m

0.13m

90nm

65nm

Virtex-5 550 MHz 24M gates* Virtex-II Pro 450 MHz 8M gates* Virtex-II 450 MHz 8M gates Virtex-4 500 MHz 16M gates*

Xilinx Device Complexity

0.35m

0.25m

Virtex-E 240 MHz 4M gates Virtex 200 MHz 1M gates Spartan 80 MHz 40K gates Spartan-II 200 MHz 200K gates

Spartan-3 326 MHz 5M gates

XC4000 100 MHz 250K gates XC3000 85 MHz 7.5K gates XC5200 50 MHz 23K gates

XC2000 50 MHz 1K gates

1985 1987

1991

1995

1998 1999 2000 2002 2003 2004

2006

Xilinx FPGA devices


All Xilinx FPGAs contain the same basic resources:
Logic cells (LCs) grouped into Slices which are grouped into Configurable Logic Blocks (CLBs) Contain combinatorial logic and register resources I/O Blocks Interface between the FPGA and the outside world Programmable interconnect Other resources Memory Multipliers Global clock buffers Boundary scan logic

Xilinx logic cell (LC)


16-bit Shift Register 16 X 1 RAM a b c d

4-input LUT
y

0 MUX

FLIP-FLOP

clock

clock enable set/reset

The core building block in a modern FPGA from Xilinx is called a logic cell

Logic Cell
The register can be configured to act as a flipflop, or as a latch. The polarity of the clock (rising- edge triggered or falling-edge triggered) can be configured, as can the polarity of the clock enable and set/reset signals (active-high or active-low). In addition to the LUT, MUX, and register, the LC also contains other elements, including some special fast carry logic for use in arithmetic operations.

The Slice

A slice contains two LCs Each logic cells LUT, MUX, and register have their own data inputs and outputs; the slice has one set of clock, clock enable, and set/reset signals common to both logic cells.

Configurable Logic Block (CLB)


Xilinx FPGAs can have two or four slices in each CLB There is also some fast programmable interconnect within the CLB. This interconnect is used to connect neighboring slices.

Why the hierarchy?


The reason for having this type of logic-block hierarchyLC Slice (with two LCs) CLB (with four slices)is that it is complemented by an equivalent hierarchy in the interconnect. Thus, there is fast interconnect between the LCs in a slice, then slightly slower interconnect between slices in a CLB, followed by the interconnect between CLBs. This is to achieve the optimum trade-off between making it easy to connect things together without incurring excessive interconnect-related delays.

Fast carry chains


A key feature of modern FPGAs is that they include the special logic and interconnect required to implement fast carry chains. Each LC contains special carry logic. This is complemented by dedicated interconnect between the two LCs in each slice, between the slices in each CLB, and between the CLBs themselves. This special carry logic and dedicated routing boosts the performance of logical functions such as counters and arithmetic functions such as adders. The availability of these fast carry chainsin conjunction with features like the shift register use of LUTs and embedded multipliers are useful when the FPGAs are to be used for applications like DSP

Embedded RAM

Embedded RAM
A lot of applications require the use of memory, so FPGAs may include relatively large chunks of embedded RAM called block RAM. Depending on the architecture of the component, these blocks might be positioned around the periphery of the device, scattered across the face of the chip in relative isolation, or organized in columns. Each block of RAM can be used independently, or multiple blocks can be combined together to implement larger blocks. These blocks can be used for a variety of purposes, such as implementing standard single- or dual-port RAMs, first-in first-out (FIFO) functions and state machines

Embedded multipliers, adders, MACs

MAC

Embedded multipliers, adders, MACs


Some functions, like multipliers, are inherently slow if they are implemented by connecting a large number of programmable logic blocks together. Since these functions are required by a lot of applications, many FPGAs incorporate special hardwired multiplier blocks. These are typically located in close proximity to the embedded RAM blocks because these functions are often used in conjunction with each other Similarly, some FPGAs offer dedicated adder blocks. One operation that is very common in DSP-type applications is called a multiply-and-accumulate (MAC). As its name would suggest, this function multiplies two numbers together and adds the result to a running total stored in an accumulator

Embedded processor cores


Some functions such as reading switch positions and flashing lightemitting diodes (LEDs) require low speed counters. Slowing the hardware down to implement this sort of function (using huge counters to generate delays, for example) is often impracticable. Thus, its often better to implement these tasks with microprocessors. High-end FPGAs contain one or more embedded microprocessors, which are typically referred to as microprocessor cores. In this case, it often makes sense to move all of the tasks that used to be performed by the external microprocessor into the internal core. This provides a number of advantages,
saves the cost of having two devices; eliminates large numbers of tracks, pads, and pins on the circuit board makes the board smaller and lighter

Types of microprocessor cores


There are two types of microprocessor cores :
Hard microprocessor core: Implemented as a dedicated, predefined block. Soft microprocessor core: It is possible to configure a group of programmable logic blocks to act as a microprocessor. These are typically called soft cores, but they may be more precisely categorized as either soft or firm depending on the way in which the microprocessors functionality is mapped onto the logic blocks

Clock trees
All of the synchronous elements inside an FPGAfor example, the registers configured to act as flip-flops inside the programmable logic blocksneed to be driven by a clock signal. Such a clock signal typically originates in the outside world, comes into the FPGA via a special clock input pin, and is then routed through the device and connected to the appropriate registers.

Clock trees

Clock managers

Some FPGA clock managers are based on phase-locked loops (PLLs), while others are based on digital delay-locked loops

Clock manager functions


Jitter removal

Jitter removal

Skew correction

Digital frequency synthesis & Phase shifting

General-purpose I/O

I/O
Each bank can be configured individually to support a particular I/O standard. Allows the FPGA to work with devices using multiple I/O standards, FPGA can actually be used to interface between different I/O standards (and also to translate between different protocols that may be based on particular electrical standards).

Configurable I/O impedances


Modern FPGA output signals with fast edge rates require termination to prevent reflections and maintain signal integrity. High pin count packages (especially ball grid arrays) cannot accommodate external termination resistors. Thus the Digitally Controlled Impedance (DCI) circuit is employed DCI eliminates the need for external resistors, and improves signal integrity. The DCI feature can be used on any IOB by selecting one of the DCI I/O standards. When applied to inputs, DCI provides input parallel termination. When applied to outputs, DCI provides controlled impedance drivers (series termination) or output parallel termination. DCI operates independently on each I/O bank.

Core versus I/O supply voltages


Over time, the geometries of the structures on silicon chips became smaller because smaller transistors have lower costs, higher speed, and lower power consumption. However, these processes demanded lower supply voltages, which have continued to fall over the years This supply (which is actually provided using large numbers of power and ground pins) is used to power the FPGAs internal logic. For this reason, this is known as the core voltage. However, different I/O standards may use signals with voltage levels significantly different from the core voltage, so each bank of general-purpose I/Os can have its own additional supply pins.

Core voltages

Gigabit transceivers
The traditional way to move large amounts of data between devices is to use a bus, a collection of signals that carry similar data and perform a common function Buses grew to 16 bits in width, then 32 bits, then 64 bits, and so forth. The problem is that this requires a lot of pins on the device and a lot of tracks connecting the devices together. Routing these tracks so that they all have the same length and impedance becomes increasingly difficult as boards grow in complexity. Furthermore, it becomes increasingly difficult to manage signal integrity issues (such as susceptibility to noise) when we are dealing with large numbers of bus-based tracks.

Todays high-end FPGAs include special hard-wired gigabit transceiver blocks. These blocks use one pair of differential signals (which means a pair of signals that always carry opposite logical values) to transmit (TX) data and another pair to receive (RX) data

Interconnect and routing


A programmable switch matrix forms the heart of interconnect in a FPGA.
CLB CLB CLB

PSM
CLB CLB

PSM
CLB Programmable Switch Matrix

PSM
CLB CLB

PSM
CLB

The Switch
The actual switching matrix employs a structure of six pass transistors per cross point. Thus connectivity can be established by controlling the transistors

Various types of connections

Various types of connections


Single lines : used to connect a CLB to another CLB that is one hop away. These wires have to go through a programmable switch hence adds delay. Double lines: These wires travel past two CLBs before hitting the switch, hence they provide shorter delays for longer connections. Long lines: Wires in Long groups do not go through any programmable switch at all; instead they travel all the way across or down a row or column and are driven by three-state drivers near the CLB. Direct connect lines: These are the CLB outputs that are directly connected to CLBs immediately below and to the right of it. Global clock lines: These lines are optimized for use as clock inputs to the CLB, providing short delay and minimal skew.

FPGA
R.B.Ghongade

Key terms
Look-up table (LUT): A circuit that implements a combinational logic function by storing a list of output values that correspond to all possible input combinations. CLB: Configurable Logic Block is the name for programmable logic block in a FPGA. Logic element (LE): A circuit internal to a FPGA used to implement a logic function as a look-up table. Cascade chain: A circuit in a FPGA that allows the input width of a Boolean function to expand beyond the width of one logic element. Carry chain: A circuit in a FPGA that is optimized for efficient operation of carry functions between logic elements. DCM: Digital clock manager is a very important circuit that offers various clock management functions in a FPGA. Clock trees: Distribution of clock signal lines along the FPGA architecture.

Field Programmable Gate Arrays


Structure much like a gate array ASIC Visualized as islands of programmable logic in a sea of programmable interconnect. More closer to programmable ASICs Can be scaled to large sizes Large emphasis is laid on interconnection routing Timing performance is difficult to predict

Generic FPGA architecture


Contain the following blocks:
Programmable logic block I/O blocks Programmable interconnect

In addition the FPGA has:


Clock distribution circuit Embedded memory blocks Special purpose blocks:
DSP blocks: Hardware multipliers, adders and registers

Embedded microprocessors/microcontrollers High-speed serial transceivers

FPGA architecture
Programmable logic block

Programmable interconnect

Many times the FPGA is described in terms of the fabric which means the underlying structure of the device

Programming
FPGAs can use any one of the following programming technologies:
SRAM Antifuse FLASH Hybrid FLASH-SRAM

FPGA fabric

Programmable Logic Block

Programmable Logic Block

Programmable Logic Block Programmable interconnects

Programmable Logic Block

Programmable Logic Block

Programmable Logic Block

Types of architectures
Fine grained
Each programmable logic block can be used to implement only a very simple function. For example, it might be possible to configure the block to act as any 3-input function, such as a primitive logic gate (AND,OR, NAND, etc.) or a storage element (D-type flip-flop, D-type latch, etc.). fine-grained architectures are said to be particularly efficient when executing systolic algorithms (functions that benefit from massively parallel implementations). Fine-grained implementations require a relatively large number of connections into and out of each block compared to the amount of functionality that can be supported by those blocks

Types of architectures
Coarse grained
In the case of a coarse-grained architecture, each logic block contains a relatively large amount of logic compared to their fine-grained counterparts. For example, a logic block might contain four 4-input LUTs, four multiplexers, four D-type flip-flops, and some fast carry logic. As the granularity of the blocks increases to mediumgrained and higher, the amount of connections into the blocks decreases compared to the amount of functionality they can support.

Logic realization techniques


There are two fundamental methods employed by vendors for the programmable logic blocks used to form the medium-grained architectures referenced in the previous section:
MUX (multiplexer) based LUT (lookup table) based

MUX-based
This is based on the Shannons decomposition theorem which states that: Let f(x) be a switching function on n variables. Then f(a) can be factored as

f ( a ) ai f 1 ai f 2
OR

f (a1 , a2 ,..., an ) a1 f (0, a2 ,...an ) a1 f (1, a2 ,...an )

Example (MUX implementation)


Consider a 3-input function
a 0 0 0 0 1 1 1 1 b 0 0 1 1 0 0 1 1 c 0 1 0 1 0 1 0 1 y 0 1 0 1 0 1 1 1

y ab c
b 0 0 1 1 0 0 1 1 c 0 1 0 1 0 1 0 1 y 0 1 0 1 0 1 1 1

y1 c
y2 b c

Using Shannons decomposition theorem we can write y as

y a (c ) a ( b c )

Example
b 0 0 1 1 c 0 1 0 1 y 0 1 1 1

y2 y3 y4 y2 b c b 1

MUX implementation

Another possible implementation

LUT-based
An n-input LUT is that it can implement any possible n-input combinational. The underlying concept behind a LUT is relatively simple. A group of input signals is used as an index (pointer) to a lookup table. The contents of this table are arranged such that the cell pointed to by each input combination contains the desired value

LUT implementation

Using pass transistors

Using transmission gates

# of LUTs?
It has been statistically concluded that a 4-input LUT is best for FPGA devices. One additional advantage of LUT based programmable block is that the SRAM the cells forming the LUT can be used as a small block of RAM (the 16 cells forming a 4input LUT, for example, could be used as a 16 X 1 RAM). This is referred to as distributed RAM. Also all the SRAM cells are effectively connected in a chain. This is so as to facilitate the programming. But this offers a new possibility of using this chain as a shift register. Because of all these advantages , majority of todays FPGA architectures are LUT based

Major FPGA Vendors


SRAM-based FPGAs Xilinx, Inc. Actel Corp. Altera Corp. Atmel Quick Logic Corp Lattice Semiconductor Flash & antifuse FPGAs

Xilinx FPGA Devices


Old families
XC3000, XC4000, XC5200 Old 0.5m, 0.35m and 0.25m technology. (Not recommended for modern designs)

Low Cost Family


Spartan/XL derived from XC4000 Spartan-II derived from Virtex Spartan-IIE derived from Virtex-E Spartan-3 Virtex (0.22m) Virtex-E, Virtex-EM (0.18m) Virtex-II, Virtex-II PRO (0.13m) Virtex-4 (0.09m) Virtex-5 (0.065m)

High-performance families

LX
Logic

Virtex 5 flavours LXT SXT FXT


Logic/Serial DSP/Serial Embedded/Serial

Xilinx devices
0.13m 0.18m 0.22m 0.3m

0.13m

90nm

65nm

Virtex-5 550 MHz 24M gates* Virtex-II Pro 450 MHz 8M gates* Virtex-II 450 MHz 8M gates Virtex-4 500 MHz 16M gates*

Xilinx Device Complexity

0.35m

0.25m

Virtex-E 240 MHz 4M gates Virtex 200 MHz 1M gates Spartan 80 MHz 40K gates Spartan-II 200 MHz 200K gates

Spartan-3 326 MHz 5M gates

XC4000 100 MHz 250K gates XC3000 85 MHz 7.5K gates XC5200 50 MHz 23K gates

XC2000 50 MHz 1K gates

1985 1987

1991

1995

1998 1999 2000 2002 2003 2004

2006

Xilinx FPGA devices


All Xilinx FPGAs contain the same basic resources:
Logic cells (LCs) grouped into Slices which are grouped into Configurable Logic Blocks (CLBs) Contain combinatorial logic and register resources I/O Blocks Interface between the FPGA and the outside world Programmable interconnect Other resources Memory Multipliers Global clock buffers Boundary scan logic

Xilinx logic cell (LC)


16-bit Shift Register 16 X 1 RAM a b c d

4-input LUT
y

0 MUX

FLIP-FLOP

clock

clock enable set/reset

The core building block in a modern FPGA from Xilinx is called a logic cell

Logic Cell
The register can be configured to act as a flipflop, or as a latch. The polarity of the clock (rising- edge triggered or falling-edge triggered) can be configured, as can the polarity of the clock enable and set/reset signals (active-high or active-low). In addition to the LUT, MUX, and register, the LC also contains other elements, including some special fast carry logic for use in arithmetic operations.

The Slice

A slice contains two LCs Each logic cells LUT, MUX, and register have their own data inputs and outputs; the slice has one set of clock, clock enable, and set/reset signals common to both logic cells.

Configurable Logic Block (CLB)


Xilinx FPGAs can have two or four slices in each CLB There is also some fast programmable interconnect within the CLB. This interconnect is used to connect neighboring slices.

Why the hierarchy?


The reason for having this type of logic-block hierarchyLC Slice (with two LCs) CLB (with four slices)is that it is complemented by an equivalent hierarchy in the interconnect. Thus, there is fast interconnect between the LCs in a slice, then slightly slower interconnect between slices in a CLB, followed by the interconnect between CLBs. This is to achieve the optimum trade-off between making it easy to connect things together without incurring excessive interconnect-related delays.

Fast carry chains


A key feature of modern FPGAs is that they include the special logic and interconnect required to implement fast carry chains. Each LC contains special carry logic. This is complemented by dedicated interconnect between the two LCs in each slice, between the slices in each CLB, and between the CLBs themselves. This special carry logic and dedicated routing boosts the performance of logical functions such as counters and arithmetic functions such as adders. The availability of these fast carry chainsin conjunction with features like the shift register use of LUTs and embedded multipliers are useful when the FPGAs are to be used for applications like DSP

Embedded RAM

Embedded RAM
A lot of applications require the use of memory, so FPGAs may include relatively large chunks of embedded RAM called block RAM. Depending on the architecture of the component, these blocks might be positioned around the periphery of the device, scattered across the face of the chip in relative isolation, or organized in columns. Each block of RAM can be used independently, or multiple blocks can be combined together to implement larger blocks. These blocks can be used for a variety of purposes, such as implementing standard single- or dual-port RAMs, first-in first-out (FIFO) functions and state machines

Embedded multipliers, adders, MACs

MAC

Embedded multipliers, adders, MACs


Some functions, like multipliers, are inherently slow if they are implemented by connecting a large number of programmable logic blocks together. Since these functions are required by a lot of applications, many FPGAs incorporate special hardwired multiplier blocks. These are typically located in close proximity to the embedded RAM blocks because these functions are often used in conjunction with each other Similarly, some FPGAs offer dedicated adder blocks. One operation that is very common in DSP-type applications is called a multiply-and-accumulate (MAC). As its name would suggest, this function multiplies two numbers together and adds the result to a running total stored in an accumulator

Embedded processor cores


Some functions such as reading switch positions and flashing lightemitting diodes (LEDs) require low speed counters. Slowing the hardware down to implement this sort of function (using huge counters to generate delays, for example) is often impracticable. Thus, its often better to implement these tasks with microprocessors. High-end FPGAs contain one or more embedded microprocessors, which are typically referred to as microprocessor cores. In this case, it often makes sense to move all of the tasks that used to be performed by the external microprocessor into the internal core. This provides a number of advantages,
saves the cost of having two devices; eliminates large numbers of tracks, pads, and pins on the circuit board makes the board smaller and lighter

Types of microprocessor cores


There are two types of microprocessor cores :
Hard microprocessor core: Implemented as a dedicated, predefined block. Soft microprocessor core: It is possible to configure a group of programmable logic blocks to act as a microprocessor. These are typically called soft cores, but they may be more precisely categorized as either soft or firm depending on the way in which the microprocessors functionality is mapped onto the logic blocks

Clock trees
All of the synchronous elements inside an FPGAfor example, the registers configured to act as flip-flops inside the programmable logic blocksneed to be driven by a clock signal. Such a clock signal typically originates in the outside world, comes into the FPGA via a special clock input pin, and is then routed through the device and connected to the appropriate registers.

Clock trees

Clock managers

Some FPGA clock managers are based on phase-locked loops (PLLs), while others are based on digital delay-locked loops

Clock manager functions


Jitter removal

Jitter removal

Skew correction

Digital frequency synthesis & Phase shifting

General-purpose I/O

I/O
Each bank can be configured individually to support a particular I/O standard. Allows the FPGA to work with devices using multiple I/O standards, FPGA can actually be used to interface between different I/O standards (and also to translate between different protocols that may be based on particular electrical standards).

Configurable I/O impedances


Modern FPGA output signals with fast edge rates require termination to prevent reflections and maintain signal integrity. High pin count packages (especially ball grid arrays) cannot accommodate external termination resistors. Thus the Digitally Controlled Impedance (DCI) circuit is employed DCI eliminates the need for external resistors, and improves signal integrity. The DCI feature can be used on any IOB by selecting one of the DCI I/O standards. When applied to inputs, DCI provides input parallel termination. When applied to outputs, DCI provides controlled impedance drivers (series termination) or output parallel termination. DCI operates independently on each I/O bank.

Core versus I/O supply voltages


Over time, the geometries of the structures on silicon chips became smaller because smaller transistors have lower costs, higher speed, and lower power consumption. However, these processes demanded lower supply voltages, which have continued to fall over the years This supply (which is actually provided using large numbers of power and ground pins) is used to power the FPGAs internal logic. For this reason, this is known as the core voltage. However, different I/O standards may use signals with voltage levels significantly different from the core voltage, so each bank of general-purpose I/Os can have its own additional supply pins.

Core voltages

Gigabit transceivers
The traditional way to move large amounts of data between devices is to use a bus, a collection of signals that carry similar data and perform a common function Buses grew to 16 bits in width, then 32 bits, then 64 bits, and so forth. The problem is that this requires a lot of pins on the device and a lot of tracks connecting the devices together. Routing these tracks so that they all have the same length and impedance becomes increasingly difficult as boards grow in complexity. Furthermore, it becomes increasingly difficult to manage signal integrity issues (such as susceptibility to noise) when we are dealing with large numbers of bus-based tracks.

Todays high-end FPGAs include special hard-wired gigabit transceiver blocks. These blocks use one pair of differential signals (which means a pair of signals that always carry opposite logical values) to transmit (TX) data and another pair to receive (RX) data

Interconnect and routing


A programmable switch matrix forms the heart of interconnect in a FPGA.
CLB CLB CLB

PSM
CLB CLB

PSM
CLB Programmable Switch Matrix

PSM
CLB CLB

PSM
CLB

The Switch
The actual switching matrix employs a structure of six pass transistors per cross point. Thus connectivity can be established by controlling the transistors

Various types of connections

Various types of connections


Single lines : used to connect a CLB to another CLB that is one hop away. These wires have to go through a programmable switch hence adds delay. Double lines: These wires travel past two CLBs before hitting the switch, hence they provide shorter delays for longer connections. Long lines: Wires in Long groups do not go through any programmable switch at all; instead they travel all the way across or down a row or column and are driven by three-state drivers near the CLB. Direct connect lines: These are the CLB outputs that are directly connected to CLBs immediately below and to the right of it. Global clock lines: These lines are optimized for use as clock inputs to the CLB, providing short delay and minimal skew.

FPGA II
R.B.Ghongade

Spartan-II FPGA Family Features


Second generation ASIC replacement technology Densities as high as 5,292 logic cells with up to 200,000 system gates Streamlined features based on Virtex architecture Unlimited re-programmability Very low cost 0.18 micron process System level features SelectRAM+ hierarchical memory: 16 bits/LUT distributed RAM Configurable 4K bit block RAM Fast interfaces to external RAM Fully PCI compliant Low-power segmented routing architecture

Spartan-II FPGA Family Features


Full readback ability for verification/observability Dedicated carry logic for high-speed arithmetic Efficient multiplier support Cascade chain for wide-input functions Abundant registers/latches with enable, set, reset Four dedicated DLLs for advanced clock control Four primary low-skew global clock distribution nets IEEE 1149.1 compatible boundary scan logic

Versatile I/O and packaging Low-cost packages available in all densities Family footprint compatibility in common packages 16 high-performance interface standards Hot swap Compact PCI friendly Zero hold time simplifies system timing

Spartan II family
Device Logic Cells System Gates (Logic and RAM) 15,000 30,000 50,000 100,000 150,000 200,000 CLB Array (R x C) Total CLBs
Maximum Available User I/O

Total Distributed RAM Bits

Total Block RAM Bits 16K 24K 32K 40K 48K 56K

XC2S15 XC2S30 XC2S50 XC2S100 XC2S150 XC2S200

432 972 1,728 2,700 3,888 5,292

8 x 12 12 x 18 16 x 24 20 x 30 24 x 36 28 x 42

96 216 384 600 864 1,176

86 92 176 176 260 284

6,144 13,824 24,576 38,400 55,296 75,264

Available packages
Device Maximum User I/O 86 92 176 176 Available User I/O According to Package Type
VQ100 VQG100 TQ144 TQG144 CS144 CSG144 PQ208 PQG208 FG256 FGG256 FG456 FGG456

XC2S15 XC2S30 XC2S50 XC2S100

60 60 -

86 92 92 92

92 -

140 140

176 176

XC2S150

260

140

176

260

XC2S200

284

140

176

284

Spartan II FPGA architecture

Slice

BUFT
Each Spartan-II CLB contains two 3-state drivers (BUFTs) that can drive on-chip busses. Each Spartan-II BUFT has an independent 3-state control pin and an independent input pin.

Block RAM
Spartan-II FPGAs incorporate several large block RAM memories. These complement the distributed RAM Look-Up Tables (LUTs) that provide shallow memory structures implemented in CLBs. Block RAM memory blocks are organized in columns. All Spartan-II devices contain two such columns, one along each vertical edge. These columns extend the full height of the chip. Each memory block is four CLBs high, and consequently, a Spartan-II device eight CLBs high will contain two memory blocks per column, and a total of four blocks.

Programmable Routing Matrix


Five levels of hierarchies are used for routing in Spartan II family :
Local General purpose IO Dedicated Global

Local Routing
Provide the following three types of connections:
Interconnections among the LUTs, flip-flops, and General Routing Matrix (GRM) Internal CLB feedback paths that provide high-speed connections to LUTs within the same CLB, chaining them together with minimal routing delay. Direct paths that provide high-speed connections between horizontally adjacent CLBs, eliminating the delay of the GRM

Local Routing

General Purpose Routing


Most Spartan-II signals are routed on the general purpose routing, and consequently, the majority of interconnect resources are associated with this level of the routing hierarchy. The general routing resources are located in horizontal and vertical routing channels associated with the rows and columns CLBs. The general-purpose routing resources are listed below. Adjacent to each CLB is a General Routing Matrix(GRM). The GRM is the switch matrix through which horizontal and vertical routing resources connect,and is also the means by which the CLB gains access to the general purpose routing. 24 single-length lines route GRM signals to adjacent GRMs in each of the four directions. 96 buffered Hex lines route GRM signals to other GRMs six blocks away in each one of the four directions. Organized in a staggered pattern, Hex lines may be driven only at their endpoints. Hex-line signals can be accessed either at the endpoints or at the midpoint (three blocks from the source). One third of the Hex lines are bidirectional, while the remaining ones are unidirectional. 12 Long lines are buffered, bidirectional wires that distribute signals across the device quickly and efficiently. Vertical Long lines span the full height of the device, and horizontal ones span the full width of the device.

I/O Routing
Spartan-II devices have additional routing resources around their periphery that form an interface between the CLB array and the IOBs. This additional routing, called the VersaRing, facilitates pin-swapping and pin-locking, such that logic redesigns can adapt to existing PCB layouts. Time-to-market is reduced, since PCBs and other system components can be manufactured while the logic design is still in progress.

Dedicated Routing
Some classes of signal require dedicated routing resources to maximize performance. In the Spartan-II architecture, dedicated routing resources are provided for two classes of signal.
Horizontal routing resources are provided for on-chip3-state busses.Four partition-able bus lines are provided per CLB row, permitting multiple busses within a row Two dedicated nets per CLB propagate carry signals vertically to the adjacent CLB

Global Routing
Global Routing resources distribute clocks and other signals with very high fanout throughout the device. Spartan-II devices include two tiers of global routing resources referred to as primary and secondary global routing resources. The primary global routing resources are four dedicated global nets with dedicated input pins that are designed to distribute high-fanout clock signals with minimal skew. Each global clock net can drive all CLB,IOB, and block RAM clock pins. The primary global nets may only be driven by global buffers. There are four global buffers, one for each global net. The secondary global routing resources consist of 24backbone lines, 12 across the top of the chip and 12 across bottom. From these lines, up to 12 unique signals per column can be distributed via the 12longlines in the column. These secondary resources are more flexible than the primary resources since they are not restricted to routing only to clock pins

Spartan II clock distribution scheme

Input/Output Block

I/O Banking

Boundary Scan

Spartan-II devices support all the mandatory boundary-scan instructions specified in the IEEE standard 1149.1 A Test Access Port (TAP) and registers are provided that implement the EXTEST, SAMPLE/PRELOAD, and BYPASS instructions

Virtex IV family
Contain the same basic resources
Slices (grouped into CLBs)
Contain combinatorial logic and register resources

IOBs
Interface between the FPGA and the outside world

Programmable interconnect Other resources


Memory Multipliers Global clock buffers Boundary scan logic

Overview of Virtex IV
The Virtex-4 Family is a new generation FPGA from Xilinx. The innovative Advanced Silicon Modular Block or ASMBL column-based architecture is unique in the programmable logic industry. ASMBL column-based architecture is unique in the programmable logic industry. Virtex-4 FPGAs contain three families (platforms): LX, FX, and SX. A wide array of hard-IP core blocks complete the system solution. These cores include the PowerPC processors, Tri-Mode Ethernet MACs, 622 Mb/s to 10+ Gb/s serial transceivers, dedicated DSP slices, high-speed clock management circuitry, and sourcesynchronous interface blocks. The basic Virtex-4 building blocks are an enhancement of those found in the popular Virtex-based product families: Virtex, Virtex-E, Virtex-II, Virtex-II Pro, and Virtex-II Pro X, allowing upward compatibility of existing designs. Virtex-4 devices are produced on a 90-nm copper process, using 300 mm (12 inch) wafer technology.

Features of Virtex IV family


Three families LX/SX/FX - Virtex-4 LX: High-performance logic applications solution - Virtex-4 FX: High-performance, full-featured solution for embedded platform applications - Virtex-4 SX: High-performance solution for Digital Signal Processing (DSP) applications Xesium Clock Technology - Digital Clock Manager (DCM) blocks - Additional Phase-Matched Clock Dividers (PMCD) - Differential Global Clocks XtremeDSP Slice - 18x18, twos complement, signed Multiplier - Optional pipeline stages - Built-In Accumulator (48-bits) & Adder/Subtracter

Features of Virtex IV family


Smart RAM Memory Hierarchy - Distributed RAM - Dual-Port 18-Kbit RAM blocks Optional pipeline stages Optional programmable FIFO logic - Automatically remaps RAM signals as FIFO signals - High-speed memory interface support: DDR and DDR-2 SDRAM, QDR-II, and RLDRAM-II. SelectIO Technology - 1.5 to 3.3 V I/O Operation - Built-In ChipSync Source-Synchronous Technology - Digitally-controlled impedance (DCI) active termination - Fine grained I/O banking (Configuration in one bank) Flexible Logic Resources Secure Chip AES Bitstream Encryption 90-nm copper CMOS process 1.2V core voltage Flip-Chip Packaging RocketIO 622 Mb/s to 10+ Gb/s Multi-Gigabit Transceivers (MGT) (FX only) IBM PowerPC RISC Processor Core (FX only) - PowerPC 405 (PPC405) Core - Auxiliary Processor Unit Interface (User Coprocessor) Multiple Tri-Mode Ethernet MACs (FX only)

Virtex Architecture
Block SelectRAM SelectRAM resource I/O Blocks (IOBs)

Programmable interconnect Dedicated multipliers Configurable Logic Blocks (CLBs)

Clock Management (DCMs, BUFGMUXes)

Slices and CLBs

Simplified Slice Structure


Each slice has four outputs Two registered outputs, two non-registered outputs Two BUFTs associated with each CLB, accessible by all 16 CLB outputs Carry logic runs vertically, up only Two independent carry chains per CLB

Slice 0
PRE D CE CLR Q

LUT LUT

Carry Carry

LUT LUT

Carry Carry

D PRE Q CE CLR

Detailed Slice Structure

SLICEM & SLICEL


The elements common to both slice pairs (SLICEM and SLICEL) are two logic-function generators (or look-up tables), two storage elements, wide-function multiplexers, carry logic, and arithmetic gates. These elements are used by both SLICEM and SLICEL to provide logic, arithmetic, and ROM functions. SLICEM supports two additional functions:
storing data using distributed RAM shifting data with 16-bit registers.

SLICEM represents a superset of elements and connections found in all slices.

Logic Resources in One CLB


Arithmetic & Carry-Chains Distributed RAM Shift Registers

Slices

LUTs

Flip-Flops

MULT_ANDs

64 bits

64 bits

MULTI AND gate


LUT

S CO DI CI

CY_MUX

CY_XOR MULT_AND

AxB
LUT

LUT

A new feature introduced in Virtex family is the MULTI AND gate for efficient multiply and add implementation. Earlier FPGA architectures require two LUTs per bit to perform the multiplication and addition. The MULT_AND gate enables an area reduction by performing the multiply and the add in one LUT per bit

Look-Up Table (LUT)


Virtex-4 function generators are implemented as 4-input look-up tables (LUTs). There are four independent inputs for each of the two function generators in a slice (F and G). The function generators are capable of implementing any arbitrarily defined four-input Boolean function. The propagation delay through a LUT is independent of the function implemented. Signals from the function generators can exit the slice (through the X or Y output), enter the XOR dedicated gate enter the select line of the carry-logic multiplexer feed the D input of the storage element, or go to the MUXF5.

Connecting LUTs
F5 F8

CLB Slice S3 Slice S2


F7

MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3 MUXF7 combines the two MUXF6 outputs MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice

F5

Slice S0

F6

F5

Slice S1

F5

F6

Fast Carry Logic


COUT
To S0 of the next CLB

COUT
To CIN of S2 of the next CLB

Simple, fast, and complete arithmetic Logic Dedicated XOR gate for singlelevel sum completion Uses dedicated routing resources All synthesis tools can infer carry logic

SLICE S3 First Carry Chain


CIN COUT

SLICE S2

SLICE S1
CIN COUT

Second Carry Chain SLICE S0

CIN

CIN

CLB

Flexible Sequential Elements


Either flip-flops or latches Two in each slice; eight in each CLB Inputs come from LUTs or from an independent CLB input Separate set and reset controls Can be synchronous or asynchronous All controls are shared within a slice Control signals can be inverted locally within a slice
_1 FDRSE D CE R S Q

FDCPE D PRE Q CE CLR

LDCPE D PRE Q CE G CLR

Shift Register LUT (SRL16CE)


Dynamically addressable serial shift registers Maximum delay of 16 clock cycles per LUT (128 per CLB) Cascadable to other LUTs or CLBs for longer shift registers Dedicated connection from Q15 to D input of the next SRL16CE Shift register length can be changed asynchronously LUT by toggling address A
LUT D CE CLK
D Q CE

D Q CE

D Q CE

D Q CE

A[3:0] Q15 (cascade out)

IOB Element
Input path Two DDR registers Output path Two DDR registers Two 3-state enable DDR registers Separate clocks and clock enables for I and O Set and reset signals are shared

IOB
OCK1

Reg Reg DDR MUX Reg Reg

Input
Reg Reg
ICK1

OCK2

3-state

Reg Reg
ICK2

OCK1

Reg Reg DDR MUX

PAD PAD
Output

OCK2

Reg Reg

SelectIO Standard
Allows direct connections to external signals of varied voltages and thresholds Optimizes the speed/noise tradeoff Saves having to place interface components onto your board Differential signaling standards LVDS, BLVDS, ULVDS LDT LVPECL Single-ended I/O standards LVTTL, LVCMOS (3.3V, 2.5V, 1.8V, and 1.5V) PCI-X at 133 MHz, PCI (3.3V at 33 MHz and 66 MHz) GTL, GTLP

Digital Controlled Impedance (DCI)


DCI provides Output drivers that match the impedance of the traces On-chip termination for receivers and transmitters DCI advantages Improves signal integrity by eliminating stub reflections Reduces board routing complexity and component count by eliminating external resistors Eliminates the effects of temperature, voltage, and process variations by using an internal feedback circuit

Distributed SelectRAM Resources


Uses a LUT in a slice as memory Synchronous write Asynchronous read Accompanying flip-flops can be used to create synchronous read RAM and ROM are initialized during configuration Data can be written to RAM after configuration Emulated dual-port RAM One read/write port One read-only port
RAM16X1S D WE WCLK

LUT LUT

A0 A1 A2 A3

RAM32X1S D WE

RAM16X1D D WE WCLK O A0 A1 A2 A3 DPRA0 DPRA1 DPRA2 DPRA3 DPO SPO

Slice LUT

WCLK A0 A1 A2 A3 A4

LUT

Block SelectRAM Resources


Up to 3.5 Mb of RAM in 18-kb blocks Synchronous read and write True dual-port memory Each port has synchronous read and write capability Different clocks for each port Supports initial values Synchronous reset on output latches Supports parity bits One parity bit per eight data bits
18-kb block SelectRAM memory DIA DIPA ADDRA WEA ENA SSRA CLKA DIB DIPB ADDRB WEB ENB SSRB CLKB

DOA DOPA

DOB DOPB

Dedicated Multiplier Blocks


18-bit twos complement signed operation Optimized to implement Multiply and Accumulate functions Multipliers are physically located next to block SelectRAM memory
Data_A (18 bits) 4 x 4 signed

18 18xx18 18 Multiplier Multiplier


Data_B (18 bits)

8 x 8 signed Output (36 bits) 12 x 12 signed 18 x 18 signed

Global Clock Routing Resources


Sixteen dedicated global clock multiplexers Eight on the top-center of the die, eight on the bottomcenter Driven by a clock input pad, a DCM, or local routing Global clock multiplexers provide the following: Traditional clock buffer (BUFG) function Global clock enable capability (BUFGCE) Glitch-free switching between clock signals (BUFGMUX) Up to eight clock nets can be used in each clock region of the device Each device contains four or more clock regions

Digital Clock Manager (DCM)


Up to twelve DCMs per device Located on the top and bottom edges of the die Driven by clock input pads DCMs provide the following: Delay-Locked Loop (DLL) Digital Frequency Synthesizer (DFS) Digital Phase Shifter (DPS) Up to four outputs of each DCM can drive onto global clock buffers All DCM outputs can drive general routing

Clock Regions

I/O Tile

SelectIO Resources
All Virtex-4 FPGAs have configurable high-performance SelectIO drivers and receivers, supporting a wide variety of standard interfaces. The robust feature set includes programmable control of output strength and slew rate, and on-chip termination using Digitally Controlled Impedance (DCI). All banks can support 3.3V I/O. Each IOB contains both input, output, and 3-state SelectIO drivers. These drivers can be configured to various I/O standards. Differential I/O uses the two IOBs grouped together in one tile.
Single-ended I/O standards (LVCMOS, LVTTL, HSTL, SSTL, GTL, PCI) Differential I/O standards (LVDS, LDT, LVPECL, BLVDS, CSE Differential HSTL and SSTL)

I/O Block

I/O Banking

Virtex 5 features
Four platforms Virtex-5 LX: High-performance general logic applications Future platforms will be optimized for advanced serial connectivity, signal processing applications, and embedded systems Most advanced, high-performance, optimal utilization, FPGA fabric True 6-input look-up table (LUT) technology Dual 5-LUT option Improved reduced-hop routing 64-bit distributed RAM option Powerful clock management tile (CMT) clocking Digital Clock Manager (DCM) blocks for zero delay buffering, frequency synthesis, and clock phase shifting PLL blocks for input jitter filtering, zero delay buffering, frequency synthesis, and phase-matched clock division Advanced DSP48E slices 25 x 18, twos complement, signed multiplication Optional adder/accumulator - Optional pipelining Optional bitwise logical functionality Dedicated cascade connections

Virtex 5 features
36-Kbit block RAM/FIFOs True dual-port RAM blocks Enhanced optional programmable FIFO logic Programmable True dual-port widths up to x36 Simple dual-port widths up to x72 Built-in optional error-correction circuitry with scrubbing Optionally program each block as two independent 18-Kbit blocks High-performance parallel SelectIO technology - 1.2 to 3.3V I/O Operation - Source-synchronous interfacing using ChipSync technology Digitally-controlled impedance (DCI) active termination Flexible fine-grained I/O banking High-speed memory interface support Flexible configuration options SPI-4 Parallel FLASH interface Multi-bitstream support with dedicated fallback reconfiguration logic Auto buswidth detection capability 65-nm copper CMOS process technology 1.0V core voltage

You might also like