You are on page 1of 30

L12: Reconfigurable Logic Architectures

Acknowledgements:
Materials in this lecture are courtesy of the following sources and are used with permission.
Frank Honore
Prof. Randy Katz (Unified Microelectronics Corporation Distinguished Professor
in Electrical Engineering and Computer Science at the University of California, Berkeley) and
Prof. Gaetano Borriello (University of Washington Department of Computer Science &
Engineering) From Chapter 2 of R. Katz, G. Borriello. Contemporary Logic Design. 2nd ed.
Prentice-Hall/Pearson Education, 2005.
L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

History of Computational Fabrics

Discrete devices: relays, transistors (1940s-50s)

Discrete logic gates (1950s-60s)

Integrated circuits (1960s-70s)

Gate Arrays (IBM 1970s)

Run instructions on a general purpose core

Programmable Logic (1980s to present)

Transistors are pre-placed on the chip & Place and Route


software puts the chip together automatically only program the
interconnect (mask programming)

Software Based Schemes (1970s- present)

e.g. TTL packages: Data Book for 100s of different parts

A chip that be reprogrammed after it has been fabricated


Examples: PALs, EPROM, EEPROM, PLDs, FPGAs
Excellent support for mapping from Verilog

ASIC Design (1980s to present)

Turn Verilog directly into layout using a library of standard cells


Effective for high-volume and efficient use of silicon area

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

Reconfigurable Logic

Logic blocks

Interconnect

Wires to connect inputs and


outputs to logic blocks

I/O blocks

To implement combinational
and sequential logic

Special logic blocks at


periphery of device for
external connections

Key questions:

How to make logic blocks programmable?


(after chip has been fabbed!)
What should the logic granularity be?
n
How to make the wires programmable?
(after chip has been fabbed!)
Inputs
Specialized wiring structures for local
vs. long distance routes?
How many wires per logic block?

Logic
Logic
D

m
SET

CLR

Outputs

Configuration
L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

Programmable Array Logic (PAL)

Based on the fact that any combinational logic can be


realized as a sum-of-products

PALs feature an array of AND-OR gates with


programmable interconnect
input
signals

AND
array

OR array

output
signals

programming of
product terms

L12: 6.111 Spring 2006

programming of
sum terms

Introductory Digital Systems Laboratory

Inside the 22v10 PAL

Each input pin (and its complement) sent to the AND array

OR gates for each output can take 8-16 product terms, depending on output
pin

Macrocell block provides additional output flexibility...

Image removed due to copyright restrictions.

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

Cypress PAL CE22V10


From Lattice Semiconductor

Image removed due to copyright restrictions.


Images courtesy of Lattice Semiconductor Corporation. Used with permission.

L12: 6.111 Spring 2006

Outputs may be registered


or combinational, positive
or inverted

Introductory Digital Systems Laboratory

+
rows of interconnect
Anti-fuse Technology:
Program Once

Use Anti-fuses to build


up long wiring runs from
short segments

I/O Buffers, Programming and Test Logic

I/O Buffers, Programming and Test Logic


Logic Module

I/O Buffers, Programming and Test Logic

Rows of programmable
logic building blocks

I/O Buffers, Programming and Test Logic

Anti-Fuse-Based Approach (Actel)

Wiring Tracks

8 input, single output combinational logic blocks


FFs constructed from discrete cross coupled gates

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

Actel Logic Module


Combinational block does not have the output FF
Example Gate Mapping
GND
A

00
01
10
11

D
E
B
C

S-R Flip-Flop
GND
VDD

00
01
10
11

S
GND
R
VDD

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

Actel Routing & Programming


Courtesy of Actel. Used with permission.

Precharge
Phase

Vpp/2

Vpp/2

Vpp/2
Input Segments
Vpp/2
Inputs
Outputs

Gnd
Horizontal
Channel

Vpp/2

Vpp/2

Vpp

Logic
Module
Antifuse
shorted

Output Segments

Long Vertical Tracks

Programming an Antifuse

Programming is Permanent (one time)


Courtesy of Actel. Used with permission.
L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

RAM Based Field Programmable


Logic - Xilinx

CLB

Slew
Rate
Control

CLB

D Q

Passive
Pull-Up,
Pull-Down

Vcc

Output
Buffer

Switch
Matrix

Pad

Input
Buffer

CLB

Q D

CLB

Programmable
Interconnect

Delay

I/O Blocks (IOBs)

C1 C2 C3 C4
H1 DIN S/R EC
S/R
Control

G4
G3
G2
G1

DIN

G
Func.
Gen.

SD

F'

H'

F4
F3
F2
F1

H
Func.
Gen.
F
Func.
Gen.

G'

EC
RD

G'
H'
S/R
Control

DIN

SD

F'

G'

EC
RD

H'

H'

F'

Configurable
Logic Blocks (CLBs)
Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

10

The Xilinx 4000 CLB


Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

11

Two 4-input Functions, Registered Output


and a Two Input Function
Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

12

5-input Function, Combinational Output


Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

13

LUT Mapping

N-LUT direct implementation of a truth table: any


function of n-inputs.

N-LUT requires 2N storage elements (latches)

N-inputs select one latch location (like a memory)


Inputs

Why Latches and Not Registers?

Courtesy of Xilinx.
Used with permission.

Output

Latches set by configuration bitstream


4LUT example
L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

14

Configuring the CLB as a RAM


Memory is built using Latches not FFs

Courtesy of Xilinx.
Used with permission.

16x2

Read is same a LUT Function!


L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

15

Xilinx 4000 Interconnect

Courtesy of Xilinx.
Used with permission.

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

16

Xilinx 4000 Interconnect Details

Wires are not ideal!

Courtesy of Xilinx.
Used with permission.

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

17

Xilinx 4000 Flexible IOB


Adjust Transition Time
Outputs through FF or bypassed

Courtesy of Xilinx.
Used with permission.

L12: 6.111 Spring 2006

Adjust the Sampling Edge


Introductory Digital Systems Laboratory

18

Add Bells & Whistles


Hard
Processor

Gigabit
Serial

18 Bit

I/O
36 Bit

18 Bit

Multiplier

VCCIO

Programmable
Termination

Z
Z

Impedance
Control

BRAM

Clock
Mgmt

Courtesy of David B. Parlour, ISSCC 2004 Tutorial, The Reality and Promise of
Reconfigurable Computing in Digital Signal Processing. and Xilinx. Used with permission.
L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

19

The Virtex II CLB (Half Slice Shown)

Courtesy of Xilinx.
Used with permission.

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

20

Adder Implementation
Cout
LUT: AB

Y = A B Cin

Dedicated carry logic

1 half-Slice = 1-bit adder


Cin
L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

Courtesy of Xilinx.
Used with permission.

21

Carry Chain
Courtesy of Xilinx.
Used with permission.

1 CLB = 4 Slices = 2, 4-bit adders


64-bit Adder: 16 CLBs
A[63:0]

Y[63:0]

B[63:0]

Y[64]
A[63:60]
B[63:60]

CLB15

Y[63:60]

A[7:4]
B[7:4]

CLB1

Y[7:4]

A[3:0]
B[3:0]

CLB0

Y[3:0]

CLBs must be in same column


L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

22

Virtex II Features

Double Data Rate registers

Digital Clock Manager

Embedded Multiplier
Courtesy of Xilinx.
Used with permission.

L12: 6.111 Spring 2006

Block SelectRAM

Introductory Digital Systems Laboratory

23

The Latest Generation: Virtex-II Pro


FPGA Fabric

Embedded memories

Embedded PowerPc

Hardwired multipliers
High-speed I/O
Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

24

FPGA Evolution Summary [Parlour04]


Hard MAC
Distributed RAM
Arithmetic Support

Hard CPU

DSP System
Design Tools

Logic + FF

High Speed
Serial IO

Block RAM

1000

Transistors
x 106
10

0.1

1980

1985
ue
l
G gic
Lo

1990
re lity
o
C ona
cti
n
Fu

1995
gic m
o
L for
t
Pla

2000

2005
in
a
m
m ic
ste rm Do ecif m
y
S tfo
Sp tfor
a
l
P
Pla

Courtesy of Xilinx. Used with permission.

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

25

Design Flow - Mapping

Technology Mapping: Schematic/HDL to Physical


Logic units

Compile functions into basic LUT-based groups


(function of target architecture)
a
c
b
D

SET

SET

LUT

b
CLR

CLR

always @(posedge Clock or negedge Reset)


begin
if (! Reset)
q <= 0;
else
q <= (a & b & c) | (b & d);
end

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

26

Design Flow Placement & Route

Placement assign logic location on a particular device


LUT

LUT
LUT

Routing iterative process to connect CLB inputs/outputs and IOBs. Optimizes


critical path delay can take hours or days for large, dense designs

Iterate placement if timing


not met
Satisfy timing? Generate
Bitstream to config device

Challenge! Cannot use full chip for reasonable speeds (wires are not ideal).
Typically no more than 50% utilization.
L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

27

Example: Verilog to FPGA

module adder64 (a, b, sum);


input [63:0] a, b;
output [63:0] sum;

Synthesis
Tech Map
Place&Route

assign sum = a + b;
endmodule

64-bit Adder Example

Virtex II XC2V2000
Courtesy of Xilinx.
Used with permission.

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

28

How are FPGAs Used?

Prototyping

Reconfigurable hardware

Ensemble of gate arrays used to emulate a circuit to be manufactured


Get more/better/faster debugging done than with simulation
One hardware block used to implement more than one function

Special-purpose computation engines

Hardware dedicated to solving one problem (or class of problems)


Accelerators attached to general-purpose computers (e.g., in a cell
phone!)

L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

29

Summary

FPGA provide a flexible platform for implementing


digital computing

A rich set of macros and I/Os supported (multipliers,


block RAMS, ROMS, high-speed I/O)

A wide range of applications from prototyping (to


validate a design before ASIC mapping) to highperformance spatial computing

Interconnects are a major bottleneck (physical design


and locality are important considerations)

College students will study concurrent programming instead of C as their first


computing experience.
-- David B. Parlour, ISSCC 2004 Tutorial
L12: 6.111 Spring 2006

Introductory Digital Systems Laboratory

30

You might also like