You are on page 1of 113

George Mason University

FPGA Memories

ATHENa - Automated Tool for
Hardware EvaluatioN

ECE 545
Lecture 10
2
Recommended reading
Spartan-6 FPGA Block RAM Resources: User Guide
Google search: UG383
Spartan-6 FPGA Configurable Logic Block: User Guide
Google search: UG384
Xilinx FPGA Embedded Memory Advantages: White Paper
Google search: WP360
3
Recommended reading
XST User Guide for Virtex-6, Spartan-6, and 7 Series Devices
Chapter 7, HDL Coding Techniques
Sections:
RAM HDL Coding Techniques
ROM HDL Coding Techniques
ISE In-Depth Tutorial, Section: Creating a CORE Generator
Tool Module
4
Memory Types
5
Memory Types
Memory
RAM ROM
Single port Dual port
With asynchronous
read
With synchronous
read
Memory
Memory
6
Memory Types specific to Xilinx FPGAs
Memory
Distributed
(MLUT-based)
Block RAM-based
(BRAM-based)
Inferred Instantiated
Memory
Manually
Using CORE
Generator
7
FPGA Distributed
Memory
8
Location of Distributed RAM
RAM blocks
Multipliers
Logic blocks
Graphics based on The Design Warriors Guide to FPGAs
Devices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
DSP units
RAM blocks
Logic resources
(#Logic resources, #Multipliers/DSP units, #RAM_blocks)
Logic resources
(CLB slices)
9
Three Different Types of Slices
50% 25% 25%
10
16-bit SR
16 x 1 RAM
4-input LUT
The Design Warriors Guide to FPGAs
Devices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
Spartan-6 Multipurpose LUT (MLUT)
64 x 1 ROM
(logic)
64 x 1 RAM
32-bit SR
11
Single-port 64 x 1-bit RAM
12
Memories Built of Neighboring MLUTs
Single-port 128 x 1-bit RAM: RAM128x1S
Dual-port 64 x 1-bit RAM : RAM64x1D
Memories built of 2 MLUTs:
Memories built of 4 MLUTs:
Single-port 256 x 1-bit RAM: RAM256x1S
Dual-port 128 x 1-bit RAM: RAM128x1D
Quad-port 64 x 1-bit RAM: RAM64x1Q
Simple-dual-port 64 x 3-bit RAM: RAM64x3SDP
(one address for read, one address for write)

13
Dual-port 64 x 1 RAM
Dual-port 64 x 1-bit RAM : 64x1D
Single-port 128 x 1-bit RAM: 128x1S
14
Total Size of Distributed RAM
15
FPGA Block RAM
16
Location of Block RAMs
RAM blocks
Multipliers
Logic blocks
Graphics based on The Design Warriors Guide to FPGAs
Devices, Tools, and Flows. ISBN 0750676043
Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
DSP units
RAM blocks
Logic resources
(#Logic resources, #Multipliers/DSP units, #RAM_blocks)
Logic resources
(CLB slices)
17
Spartan-6 Block RAM Amounts
18
Block RAM can have various configurations (port
aspect ratios)
0
16,383
1

4,095
4

0
8,191
2

0
2047
8+1

0
1023
16+2

0
16k x 1
8k x 2
4k x 4
2k x (8+1)
1024 x (16+2)
19
20
21
Block RAM Port Aspect Ratios
22
Block RAM Interface
23
Block RAM Ports
24
Block RAM Waveforms READ_FIRST mode
25
Block RAM with synchronous read
in Read-First Mode
CE
26
Block RAM Waveforms WRITE_FIRST mode
27
Block RAM Waveforms NO_CHANGE mode
28
Features of Block RAMs in Spartan-6 FPGAs
29
Inference
vs.
Instantiation
30
31
Using
CORE
Generator
32
CORE Generator
33
CORE Generator
34
Generic
Inferred
ROM
35
Distributed ROM with asynchronous read
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_arith.all;

Entity ROM is
generic ( w : integer := 12;
-- number of bits per ROM word
r : integer := 3);
-- 2^r = number of words in ROM
port (addr : in std_logic_vector(r-1 downto 0);
dout : out std_logic_vector(w-1 downto 0));
end ROM;




36
Distributed ROM with asynchronous read
architecture behavioral of rominfr is
type rom_type is array (2**r-1 downto 0)
of std_logic_vector (w-1 downto 0);
constant ROM_array : rom_type :=
("000011000100",
"010011010010",
"010011011011",
"011011000010",
"000011110001",
"011111010110",
"010011010000",
"111110011111");
begin
dout <= ROM_array(conv_integer(unsigned(addr)));
end behavioral;
37
Distributed ROM with asynchronous read
architecture behavioral of rominfr is
type rom_type is array (2**r-1 downto 0)
of std_logic_vector (w-1 downto 0);
constant ROM_array : rom_type :=
(X"0C4",
X"4D2",
X"4DB",
X"6C2",
X"0F1",
X"7D6",
X"4D0",
X"F9F");
begin
dout <= ROM_array(conv_integer(unsigned(addr)));
end behavioral;
38
Generic
Inferred
RAM
39
Distributed versus Block RAM Inference
Examples:
1. Distributed single-port RAM with asynchronous read

2. Distributed dual-port RAM with asynchronous read

1. Block RAM with synchronous read (no version with
asynchronous read!)

More excellent RAM examples from XST Coding Guidelines.
40
Distributed single-port RAM with
asynchronous read
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_arith.all;

entity raminfr is
generic ( w : integer := 32;
-- number of bits per RAM word
r : integer := 6);
-- 2^r = number of words in RAM
port (clk : in std_logic;
we : in std_logic;
a : in std_logic_vector(r-1 downto 0);
di : in std_logic_vector(w-1 downto 0);
do : out std_logic_vector(w-1 downto 0));
end raminfr;

41
Distributed single-port RAM with
asynchronous read
architecture behavioral of raminfr is
type ram_type is array (2**r-1 downto 0)
of std_logic_vector (w-1 downto 0);
signal RAM : ram_type;
begin
process (clk)
begin
if (clk'event and clk = '1') then
if (we = '1') then
RAM(conv_integer(unsigned(a))) <= di;
end if;
end if;
end process;
do <= RAM(conv_integer(unsigned(a)));
end behavioral;
42
Distributed dual-port RAM with asynchronous read
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_arith.all;

entity raminfr is
generic ( w : integer := 32;
-- number of bits per RAM word
r : integer := 6);
-- 2^r = number of words in RAM
port (clk : in std_logic;
we : in std_logic;
a : in std_logic_vector(r-1 downto 0);
dpra : in std_logic_vector(r-1 downto 0);
di : in std_logic_vector(w-1 downto 0);
spo : out std_logic_vector(w-1 downto 0);
dpo : out std_logic_vector(w-1 downto 0));
end raminfr;

43
Distributed dual-port RAM with asynchronous read
architecture syn of raminfr is
type ram_type is array (2**r-1 downto 0) of std_logic_vector
(w-1 downto 0);
signal RAM : ram_type;
begin
process (clk)
begin
if (clk'event and clk = '1') then
if (we = '1') then
RAM(conv_integer(unsigned(a))) <= di;
end if;
end if;
end process;
spo <= RAM(conv_integer(unsigned(a)));
dpo <= RAM(conv_integer(unsigned(dpra)));
end syn;
44
Block RAM with synchronous read
in Read-First Mode
45
Block RAM Waveforms READ_FIRST mode
46
Block RAM with synchronous read
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_arith.all;

entity raminfr is
generic ( w : integer := 32;
-- number of bits per RAM word
r : integer := 9);
-- 2^r = number of words in RAM
port (clk : in std_logic;
we : in std_logic;
en : in std_logic;
addr : in std_logic_vector(r-1 downto 0);
di : in std_logic_vector(w-1 downto 0);
do : out std_logic_vector(w-1 downto 0));
end raminfr;



47
Block RAM with synchronous read
Read-First Mode - cont'd
architecture behavioral of raminfr is
type ram_type is array (2**r-1 downto 0) of
std_logic_vector (w-1 downto 0);
signal RAM : ram_type;

begin
process (clk)
begin
if (clk'event and clk = '1') then
if (en = '1') then
do <= RAM(conv_integer(unsigned(addr)));
if (we = '1') then
RAM(conv_integer(unsigned(addr))) <= di;
end if;
end if;
end if;
end process;
end behavioral;
48
Block RAM Waveforms WRITE_FIRST mode
49
Block RAM with synchronous read
Write-First Mode - cont'd
architecture behavioral of raminfr is
type ram_type is array (2**r-1 downto 0) of
std_logic_vector (w-1 downto 0);
signal RAM : ram_type;

begin
process (clk)
begin
if (clk'event and clk = '1') then
if (en = '1') then
if (we = '1') then
RAM(conv_integer(unsigned(addr))) <= di;
do <= di;
else
do <= RAM(conv_integer(unsigned(addr)));
end if;
end if;
end if;
end process;
end behavioral;
50
Block RAM Waveforms NO_CHANGE mode
51
Block RAM with synchronous read
No-Change Mode - cont'd
architecture behavioral of raminfr is
type ram_type is array (2**r-1 downto 0) of
std_logic_vector (w-1 downto 0);
signal RAM : ram_type;

begin
process (clk)
begin
if (clk'event and clk = '1') then
if (en = '1') then
if (we = '1') then
RAM(conv_integer(unsigned(addr))) <= di;
else
do <= RAM(conv_integer(unsigned(addr)));
end if;
end if;
end if;
end process;
end behavioral;
52
Criteria for Implementing Inferred RAM in BRAMs
George Mason University
ATHENa
54
Resources
ATHENa website
http://cryptography.gmu.edu/athena

55
ATHENa Automated Tool for
Hardware EvaluatioN
Supported in part by the National Institute of Standards & Technology (NIST)
ATHENa Team
Venkata
Vinny
MS CpE
student
Ekawat
Ice
PhD CpE
student
Marcin

PhD ECE
student
Rajesh

PhD ECE
student
Michal
PhD exchange
student from
Slovakia
John

MS CpE
student
ATHENa Automated Tool for Hardware EvaluatioN
57
Benchmarking open-source tool,
written in Perl, aimed at an
AUTOMATED generation of
OPTIMIZED results for
MULTIPLE hardware platforms
Currently under development at
George Mason University.
http://cryptography.gmu.edu/athena
Why Athena?
58
"The Greek goddess Athena was frequently
called upon to settle disputes between
the gods or various mortals. Athena Goddess of Wisdom was
known for her superb logic and intellect.
Her decisions were usually well-considered,
highly ethical, and seldom motivated
by self-interest.

from "Athena, Greek Goddess
of Wisdom and Craftsmanship"
ATHENa
Server

FPGA Synthesis and
Implementation

Result Summary
+ Database
Entries
2
3
HDL + scripts +
configuration files
1
Database
Entries
Download scripts
and
configuration files8

Designer
4
HDL + FPGA Tools
User
Database
query
Ranking
of designs
5
6
Basic Dataflow of ATHENa
0
Interfaces
+ Testbenches

59
60
synthesizable source
files
configuration files
testbench
constraint files
result summary
(user-friendly)
database entries
(machine- friendly)
ATHENa Major Features (1)
synthesis, implementation, and timing analysis in batch mode
support for devices and tools of multiple FPGA vendors:


generation of results for multiple families of FPGAs of a given
vendor


automated choice of a best-matching device within a given
family



61
ATHENa Major Features (2)
automated verification of designs through simulation in batch
mode



support for multi-core processing
automated extraction and tabulation of results
several optimization strategies aimed at finding
optimum options of tools
best target clock frequency
best starting point of placement
OR
62
63
batch mode of FPGA tools




ease of extraction and tabulation of results
Text Reports, Excel, CSV (Comma-Separated Values)
optimized choice of tool options
GMU_optimization_1 strategy

Generation of Results Facilitated by ATHENa
vs.
64
Relative Improvement of Results from Using ATHENa
Virtex 5, 256-bit Variants of Hash Functions
0
0.5
1
1.5
2
2.5
Area
Thr
Thr/Area
Ratios of results obtained using ATHENa suggested options
vs. default options of FPGA tools
65
Other (Somewhat) Similar Tools
ExploreAhead (part of PlanAhead)

Design Space Explorer (DSE)

Boldport Flow

EDAx10 Cloud Platform
66
Distinguishing Features of ATHENa
Support for multiple tools from multiple vendors

Optimization strategies aimed at the best possible
performance rather than design closure

Extraction and presentation of results

Seamless integration with the ATHENa database of results
Read the Tutorial!
Install the Required Tools
(see Tutorial - Part 1 Tools Installation)
Run ATHENa_setup
How To Start Working With ATHENa?
One-Time Tasks
Download and unzip ATHENa
http://cryptography.gmu.edu/athena/
Modify design.config.txt
+ possibly other configuration files
Run ATHENa
How To Start Working With ATHENa?
Repetitive Tasks
Prepare or modify your source files
& source_list.txt
design.config.txt
Your Design
# directory containing synthesizable source files for the project
SOURCE_DIR = <examples/sha256_rs>

# A file list containing list of files in the order suitable for synthesis and implementation
# low level modules first, top level entity last
SOURCE_LIST_FILE = source_list.txt

# project name
# it will be used in the names of result directories
PROJECT_NAME = SHA256

# name of top level entity
TOP_LEVEL_ENTITY = sha256

# name of top level architecture
TOP_LEVEL_ARCH = rs_arch

# name of clock net
CLOCK_NET = clk
design.config.txt
Timing Formulas
#formula for latency
LATENCY = TCLK*65

#formula for throughput
THROUGHPUT = 512/(TCLK*65)
design.config.txt
Application & Optimization Target
# OPTIMIZATION_TARGET = speed | area | balanced
OPTIMIZATION_TARGET = speed

# OPTIONS = default | user
OPTIONS = default

# APPLICATION = single_run | exhaustive_search | placement_search | frequency_search |
# GMU_Optimization_1 | GMU_Xilinx_optimization_1
APPLICATION = single_run

# TRIM_MODE = off | zip | delete
TRIM_MODE = zip
design.config.txt
FPGA Families
# commenting the next line removes all families of Xilinx
FPGA_VENDOR = xilinx

#commenting the next line removes a given family
FPGA_FAMILY = spartan3
# FPGA_DEVICES = <list of devices> | best_match | all
FPGA_DEVICES = best_match
SYN_CONSTRAINT_FILE = default
IMP_CONSTRAINT_FILE = default
REQ_SYN_FREQ = 120
REQ_IMP_FREQ = 100
MAX_SLICE_UTILIZATION = 0.8
MAX_BRAM_UTILIZATION = 0.8
MAX_MUL_UTILIZATION = 1
MAX_PIN_UTILIZATION = 0.9
END FAMILY

END VENDOR
design.config.txt
FPGA Families
# commenting the next line removes all families of Altera
FPGA_VENDOR = altera

#commenting the next line removes a given family
FPGA_FAMILY = Stratix III
# FPGA_DEVICES = <list of devices> | best_match | all
FPGA_DEVICES = best_match
SYN_CONSTRAINT_FILE = default
IMP_CONSTRAINT_FILE = default
REQ_IMP_FREQ = 120
MAX_LOGIC_UTILIZATION = 0.8
MAX_MEMORY_UTILIZATION = 0.8
MAX_DSP_UTILIZATION = 0
MAX_MUL_UTILIZATION = 0
MAX_PIN_UTILIZATION = 0.8
END FAMILY

END VENDOR
Library Files
device_lib/xilinx_device_lib.txt
device_lib/altera_device_lib.txt

Files created during ATHENa setup

Characterize FPGA families and devices available in the version of
Xilinx and Altera tools installed on your computer

Currently supported tool versions:
Xilinx WebPACK 9.1, 9.2, 10.1, 11.1, 11.5, 12.1, 12.2, 12.3, 12.4, 13.1, 13.2, 13.3,
14.1, 14.2, 14.3
Xilinx Design Suite 11.1, 12.1, 12.2, 12.3, 12.4, 13.1, 13.2, 13.3, 14.1, 14.2, 14.3
Altera Quartus II Web Edition 8.1, 8.2, 9.0, 9.1, 10.0, 10.1, 11.0, 11.1, 12.0, 12.1
Altera Quartus II Subscription Edition 9.1, 10.0, 10.1, 11.0, 11.1, 12.0, 12.1

In case a library for a given version not available yet, use a library from the
closest available version
Library Files
device_lib/xilinx_device_lib.txt

VENDOR = Xilinx
#Device, Total Slices, Block RAMs, DSP, Dedicated Multipliers, Maximum User I/O Pins
ITEM_ORDER = SLICE, BRAM, DSP, MULT, IO
FAMILY = spartan3
xc3s50pq208-5, 768, 4, 0, 4, 124
xc3s200ft256-5, 1920, 12, 0, 12, 173
xc3s400fg456-5, 3584, 16, 0, 16, 264
xc3s1000fg676-5, 7680, 24, 0, 24, 391
xc3s1500fg676-5, 13312, 32, 0, 32, 487
END_FAMILY

FAMILY = virtex5
xc5vlx30ff676-3, 4800, 32, 32, 0, 400
xc5vfx30tff665-3, 5120, 68, 64, 0, 360
xc5vlx30tff665-3, 4800, 36, 32, 0, 360
xc5vlx50ff1153-3, 7200, 48, 48, 0, 560
xc5vlx50tff1136-3, 7200, 60, 48, 0, 480
END_FAMILY
Result Files
report_resource_utilization.txt

xilinx : spartan3
+---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+
| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |
+---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+
| default | xc3s200ft256-5* | 1 | 142 | 3 | 74 | 3 | 4 | 33 | 7 | 58 | 0 | 0 | 20 | 11 |
+---------+-----------------+-----+------+---+--------+---+-------+----+-------+----+------+---+----+----+

xilinx : spartan6
+---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+
| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |
+---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+
| default | xc6slx9csg324-3* | 1 | 41 | 1 | 22 | 1 | 4 | 6 | 0 | 0 | 9 | 56 | 20 | 10 |
+---------+------------------+-----+------+---+--------+---+-------+---+-------+---+------+----+----+----+

xilinx : virtex5
+---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+
| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |
+---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+
| default | xc5vlx20tff323-2* | 1 | 101 | 1 | 56 | 1 | 4 | 15 | 0 | 0 | 9 | 37 | 20 | 11 |
+---------+-------------------+-----+------+---+--------+---+-------+----+-------+---+------+----+----+----+

xilinx : virtex6
+---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+
| GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % |
+---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+
| default | xc6vlx75tff784-3* | 1 | 44 | 1 | 21 | 1 | 4 | 1 | 0 | 0 | 9 | 3 | 20 | 5 |
+---------+-------------------+-----+------+---+--------+---+-------+---+-------+---+------+---+----+---+
Result Files
report_timing.txt

REQ SYN FREQ - Requested synthesis clk freq. SYN FREQ Achieved synthesis clk. freq.
REQ SYN TCLK - Requested synthesis clk period SYN TCLK Achieved synthesis clk. period
REQ IMP FREQ - Requested implement. clk freq. IMP FREQ Achieved implement. clk. freq.
REQ IMP TCLK - Requested implement. clk period IMP TCLK Achieved implement clk. period
LATENCY - Latency [ns] THROUGHPUT Throughput [Mbits/s]
TP/Area - Throughput/Area [(Mbits/s)/CLB slices Latency*Area Latency*Area [ns*CLB slices]

xilinx : spartan3
+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |
+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc3s200ft256-5* | 1 | default | 207.370 | default | 4.822 | default | 112.448 | default | 8.893 | 17.786 | 449.792 | 6.078 | 1316.164 |
+---------+-----------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+



xilinx : spartan6
+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |
+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc6slx9csg324-3* | 1 | default | 75.751 | default | 13.201 | default | 78.119 | default | 12.801 | 25.602 | 312.476 | 14.203 | 563.244 |
+---------+------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+


xilinx : virtex5
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc5vlx20tff323-2* | 1 | default | 156.347 | default | 6.396 | default | 126.952 | default | 7.877 | 15.754 | 507.808 | 9.068 | 882.224 |
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+



xilinx : virtex6
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area |
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
| default | xc6vlx75tff784-3* | 1 | default | 158.053 | default | 6.327 | default | 135.410 | default | 7.385 | 14.770 | 541.638 | 25.792 | 310.170 |
+---------+-------------------+-----+--------------+----------+--------------+----------+--------------+----------+--------------+----------+---------+------------+------------+--------------+
Result Files
report_options.txt

xilinx : spartan3
+---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+
| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |
+---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+
| default | xc3s200ft256-5* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std |
+---------+-----------------+-----+------------+------------------------------+-------------------------+--------------+

xilinx : spartan6
+---------+------------------+-----+------------+------------------------------+---------------+--------------+
| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |
+---------+------------------+-----+------------+------------------------------+---------------+--------------+
| default | xc6slx9csg324-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std |
+---------+------------------+-----+------------+------------------------------+---------------+--------------+

xilinx : virtex5
+---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+
| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |
+---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+
| default | xc5vlx20tff323-2* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std |
+---------+-------------------+-----+------------+------------------------------+-------------------------+--------------+

xilinx : virtex6
+---------+-------------------+-----+------------+------------------------------+---------------+--------------+
| GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options |
+---------+-------------------+-----+------------+------------------------------+---------------+--------------+
| default | xc6vlx75tff784-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std |
+---------+-------------------+-----+------------+------------------------------+---------------+--------------+
COST TABLE - parameter determining the starting point of placement
Synthesis Options options of the synthesis tool
Map Options Options of the mapping tool
PAR Options Options of the place & route tool
Result Files
report_execution_time.txt

xilinx : spartan3
+---------+-----------------+-----+----------------+---------------------+--------------+
| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |
+---------+-----------------+-----+----------------+---------------------+--------------+
| default | xc3s200ft256-5* | 1 | 0d 0h:0m:12s | 0d 0h:0m:36s | 0d 0h:0m:48s |
+---------+-----------------+-----+----------------+---------------------+--------------+

xilinx : spartan6
+---------+------------------+-----+----------------+---------------------+--------------+
| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |
+---------+------------------+-----+----------------+---------------------+--------------+
| default | xc6slx9csg324-3* | 1 | 0d 0h:0m:21s | 0d 0h:1m:13s | 0d 0h:1m:34s |
+---------+------------------+-----+----------------+---------------------+--------------+

xilinx : virtex5
+---------+-------------------+-----+----------------+---------------------+--------------+
| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |
+---------+-------------------+-----+----------------+---------------------+--------------+
| default | xc5vlx20tff323-2* | 1 | 0d 0h:0m:39s | 0d 0h:1m:50s | 0d 0h:2m:29s |
+---------+-------------------+-----+----------------+---------------------+--------------+

xilinx : virtex6
+---------+-------------------+-----+----------------+---------------------+--------------+
| GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time |
+---------+-------------------+-----+----------------+---------------------+--------------+
| default | xc6vlx75tff784-3* | 1 | 0d 0h:0m:22s | 0d 0h:3m:22s | 0d 0h:3m:44s |
+---------+-------------------+-----+----------------+---------------------+--------------+
Synthesis Time - Time of Synthesis
Implementation Time - Time of Implementation
Elapsed Time - Total Time
design.config.txt
Functional Simulation (1)
# FUNCTIONAL_VERFICATION_MODE = <on | off>
FUNCTIONAL_VERIFICATION_MODE = <off>

# directory containing source files of the testbench
VERIFICATION_DIR = <examples/sha256_rs/tb>

# A file containing a list of testbench files in the order suitable for compilation;
# low level modules first, top level entity last.
# Test vector files should be located in the same directory and listed
# in the same file, unless fixed path is used. Please refer to tutorial for more detail.
VERIFICATION_LIST_FILE = <tb_srcs.txt>

# name of testbench's top level entity
TB_TOP_LEVEL_ENTITY = <sha_tb>

# name of testbench's top level architecture
TB_TOP_LEVEL_ARCH = <behavior>
design.config.txt
Functional Simulation (2)
# MAX_TIME_FUNCTIONAL_VERIFICATION = <$time $unit>
# supported unit are : ps, ns, us, and ms
# if blank, simulation will run until it finishes =
# = no changes in signals, i.e., clock is stopped and no more inputs coming in.
MAX_TIME_FUNCTIONAL_VERIFICATION = <>

# Perform only verification (synthesis and implementation parameters are ignored)
# VERIFICATION_ONLY = <ON | OFF>
VERIFICATION_ONLY = <off>
82
ATHENa Database
of Results
83
ATHENa Database
http://cryptography.gmu.edu/athenadb
84
ATHENa Database Result View
Algorithm parameters
Design parameters
Optimization target
Architecture type
Datapath width
I/O bus widths
Availability of source code
Platform
Vendor, Family, Device
Timing
Maximum clock frequency
Maximum throughput
Resource utilization
Logic blocks (Slices/LEs/ALUTs)
Multipliers/DSP units
Tools
Names & versions
Detailed options
Credits
Designers & contact information
85
ATHENa Database Compare Feature
Matching fields in grey
Non-matching fields in red and blue
86
Possible Future Customizations
The same basic database can be customized
and adapted for other domains, such as
Digital Signal Processing
Bioinformatics
Communications
Scientific Computing, etc.
87
ATHENa - Website
88
ATHENa Website
http://cryptography.gmu.edu/athena/
Download of ATHENa Tool
Links to related tools
SHA-3 Competition in FPGAs & ASICs
Specifications of candidates
Interface proposals
RTL source codes
Testbenches
ATHENa database of results
Related papers & presentations

89
GMU Source Codes for
all Round 3 SHA-3 Candidates & SHA-2
made available at the ATHENa website at:
http://cryprography.gmu.edu/athena

Included in this release:
Basic architectures
Folded architectures
Unrolled architectures
Each code supports two variants:
with 256-bit and 512-bit output.
Each source code accompanied by comprehensive
hierarchical block diagrams




GMU Source Codes and Block Diagrams
90
ATHENa Result Replication Files
Scripts and configuration files sufficient to easily
reproduce all results (without repeating optimizations)
Automatically created by ATHENa for all
results generated using ATHENa
Stored in the ATHENa Database
In the same spirit of Reproducible Research as:
Patrick Vandewalle
1
, Jelena Kovacevic
2
, and Martin Vetterli
1
(
1
EPFL,
2
CMU)
Reproducible research in signal processing - what, why, and how.
IEEE Signal Processing Magazine, May 2009. http://rr.epfl.ch/17/
J. Claerbout (Stanford University)
Electronic documents give reproducible research a new meaning,
in Proc. 62nd Ann. Int. Meeting of the Soc. of Exploration Geophysics, 1992,
http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible:seg92

. . . . .
91
Benchmarking Goals Facilitated by ATHENa
1. cryptographic algorithms
2. hardware architectures or implementations
of the same cryptographic algorithm
3. hardware platforms from the point of view
of their suitability for the implementation of a given algorithm,
(e.g., choice of an FPGA device or FPGA board)
4. tools and languages in terms of quality
of results they generate (e.g. Verilog vs. VHDL,
Synplicity Synplify Premier vs. Xilinx XST,
ISE v. 13.1 vs. ISE v. 12.3)

Comparing multiple:
George Mason University
Modern FPGA Families
93 ECE 448 FPGA and ASIC Design with VHDL
Major FPGA Vendors
SRAM-based FPGAs
Xilinx, Inc.
Altera Corp.
Lattice Semiconductor
Atmel
Achronix
Tabula
Flash & antifuse FPGAs
Actel Corp. (Microsemi SoC Products Group)
Quick Logic Corp.
~ 51% of the market
~ 34% of the market
~ 85%
Technology Low-cost High-performance
220 nm Spartan II Virtex
120/150 nm Virtex II, II Pro
90 nm Spartan 3 Virtex 4
65 nm Virtex 5
45 nm Spartan 6
40 nm Virtex 6
28 nm Artix 7 Virtex 7
Xilinx FPGA Devices
Altera FPGA Devices
Technology Low-cost Mid-range High-
performance
130 nm Cyclone Stratix
90 nm Cyclone II Stratix II
65 nm Cyclone III Arria I Stratix III
40 nm Cyclone IV Arria II Stratix IV
28 nm Cyclone V Arria V Stratix V
96
Resources
Xcell Journal
available for FREE on line @
http://www.xilinx.com/about/xcell-publications/xcell-journal.html

Electronic Engineering Journal
available for FREE by e-mail after subscribing @
http://www.eejournal.com/subscribe
or on the web @
http://www.eejournal.com/design/fpga
George Mason University
Follow-up Courses
ECE Department
MS in Electrical Engineering
MS EE
MS in Computer Engineering
MS CpE
COMMUNICATIONS
& NETWORKING
SIGNAL PROCESSING
CONTROL & ROBOTICS
MICROELECTRONICS/
NANOELECTRONICS
SYSTEM DESIGN
DIGITAL SYSTEMS DESIGN
COMPUTER NETWORKS
MICROPROCESSORS
& EMBEDDED SYSTEMS
NETWORK & SYSTEM
SECURITY
Programs
Specializations
BIOENGINEERING
DIGITAL SIGNAL PROCESSING
DIGITAL SYSTEMS DESIGN

1. ECE 545 Digital System Design with VHDL (Fall)
K. Gaj, project, FPGA design with VHDL, Aldec/Synplicity/Xilinx/Altera

2. ECE 645 Computer Arithmetic (Spring)
K. Gaj, project, FPGA design with VHDL or Verilog,
Aldec/Synplicity/Xilinx/Altera

3. ECE 586 Digital Integrated Circuits (Spring)
D. Ioannou

4. ECE 681 VLSI Design for ASICs (Fall)
H. Homayoun, project/lab, front-end and back-end ASIC design with
Synopsys tools

5. ECE 682 VLSI Test Concepts (Spring)
T. Storey, homework

6. ECE 699 Digital Signal Processing Hardware Architectures (Spring)
A. Cohen, project, FPGA design with VHDL or Verilog

DIGITAL SIGNAL PROCESSING

Concentration advisors: Aaron Cohen, Kris Gaj, Ken Hintz, Jill Nelson,
Kathleen Wage

1. ECE 535 Digital Signal Processing
L. Griffiths, J. Nelson, Matlab

2. ECE 545 Digital System Design with VHDL
K. Gaj, project, FPGA design with VHDL

3. ECE 645 Computer Arithmetic
K. Gaj, project, FPGA design with VHDL

4. ECE 699 Digital Signals Processing Hardware Architectures
A. Cohen, project, FPGA design with VHDL and Matlab/Simulink

5a. ECE 537 Introduction to Digital Image Processing
K. Hintz
5b. ECE 738 Advanced Digital Signal Processing
K. Wage


Possible New Graduate Computer
Engineering Courses

5xx Digital System Design with Verilog
6xx Reconfigurable Computing

(looking for instructors)


NETWORK AND SYSTEM SECURITY


1. ECE 542 Computer Network Architectures and Protocols
(Fall, Spring)
S.-C. Chang, et al.

2. ECE 646 Cryptography and Computer Network Security (Fall)
K. Gaj, J-P. Kaps lab, project: software/hardware/analytical

3. ECE 746 Advanced Applied Cryptography (every 2
nd
Spring, 2015)
K. Gaj, J-P. Kaps lab, project: software/hardware/analytical

4. ECE 699 Cryptographic Engineering (every 2
nd
Spring, 2014)
J-P. Kaps lectures + student/invited guests seminars

5. ISA 656 Network Security (Fall, Spring)
A. Stavrou
ECE 645
Computer Arithmetic
Instructor: Dr. Kris Gaj
Advanced digital circuit design course covering
addition and subtraction
multiplication
division and modular reduction
exponentiation
Efficient architectures for
Integers
unsigned and signed
Real numbers
fixed point
single and double precision
floating point
Elements
of the Galois
field GF(2
n
)
polynomial base
At the end of this course you should be able to:
Understand mathematical and gate-level algorithms for computer
addition, subtraction, multiplication, division, and exponentiation
Understand tradeoffs involved with different arithmetic
architectures between performance, area, latency, scalability, etc.
Synthesize and implement computer arithmetic blocks on FPGAs
Be comfortable with different number systems, and have familiarity
with floating-point and Galois field arithmetic for future study
Understand sources of error in computer arithmetic and basics
of error analysis

This knowledge will come about through homework, project
and practice exams.

Course Objectives
1. Applications of computer arithmetic algorithms.
Initial Discussion of Project Topics.
INTRODUCTION
Lecture topics
1. Basic addition, subtraction, and counting

2. Addition in Xilinx and Altera FPGAs

3. Carry-lookahead, carry-select, and hybrid adders

4. Adders based on Parallel Prefix Networks

5. Pipelined Adders

6. Modular addition and subtraction

ADDITION AND SUBTRACTION
MULTIOPERAND ADDITION
1. Carry-save adders

2. Wallace and Dadda Trees

3. Adding multiple unsigned and signed numbers
Unsigned Integers
Signed Integers
Fixed-point real numbers
Floating-point real numbers
Elements of the Galois Field GF(2
n
)
NUMBER REPRESENTATIONS
LONG INTEGER ARITHMETIC
1. Modular Exponentiation

2. Montgomery Multipliers and Exponentiation Units
MULTIPLICATION
1. Tree and array multipliers

2. Sequential multipliers

3. Multiplication of signed numbers and squaring

4. Multiplication in Xilinx and Altera FPGAs
- using distributed logic
- using embedded multipliers
- using DSP blocks

5. Multiple clock systems
DIVISION
1. Basic restoring and non-restoring
sequential dividers

2. SRT and high-radix dividers

3. Array dividers

4. Division by Convergence
FLOATING POINT
AND
GALOIS FIELD ARITHMETIC
1. Floating-point units

2. Galois Field GF(2
n
) units

You might also like