You are on page 1of 27

COMPE470Lab

AmirAlimohammad
DepartmentofElectricalandComputerEngineering SanDiegoStateUniversity OfficeFourthfloor,Engineering,403E EMail aalimohammad@mail.sdsu.edu OfficeHoursMondaysandFridays4 5PM ClassScheduleMondaysandFridays5:30 6:45PM ClassLocationGeology,Mathematics&ComputerScience(GMCS) 308

Singlecyclefixedpointprocessordesign

Pleasedonotdistribute

COMPE470Lab.

Binary system
Assume a binary number X is an ordered sequence of binary digits (bits) X = xn-1 xn-2 x1 x0 Upper case letters represent numerical values or sequences of digits Lower case letters, usually indexed, represent individual digits

Integervalueof unsignedX Weightofbinarydigitn1


Numerical value X of representation (xn-1,xn-2,...,x0) in two's complement can be calculated by adding up the power-of-two weights of the "one" bits, but with a negative weight for the most significant (sign) bit

X = -2n-1xn-1 + 2ixi
Range of representable numbers with n bits is from 2n1 to 2n1 1
Pleasedonotdistribute COMPE470Lab. 2

Twos complement system


Asymmetrical range of [2n-1, 2n-1 -1] Slightly asymmetric: One more negative number than positive number -2n-1, represented by 100, does not have a positive equivalent Negation complement all the bits (bit-wise complement) and add 1 2sc(Y) = 2n Y = ((2n 1) Y) + 1 = ~Y + 1 Recall that inverting the bits of the number X gave you the value (2n 1) X (which is the ones complement of X). Adding 1 to this gives 2n X which is the 2s complement of X Note that the two's complement of zero is zero: inverting gives all ones, and adding one changes the ones back to zeros Another method: starting with the least significant bit, copy all of the bits up to and including the first 1 bit and then complementing the remaining bits Recall that negation can result in an overflow (for the smallest negative number)

Pleasedonotdistribute

COMPE470Lab.

Twos complement addition


You can add twos complement numbers without worrying about their signs A carry-out is not necessarily an indication of an overflow

Carryoutdiscarded does notindicateoverflow


In general, if X and Y have opposite signs - no overflow can occur regardless of whether there is a carry-out or not If X and Y have the same sign and result has different sign - overflow occurs

010019 101119 001117 101119 nocarryoutbutoverflow carryoutandoverflow 01000016=16mod32 10111014=18mod32


Pleasedonotdistribute COMPE470Lab. 4

Overflow solution and sign extension


In case of overflow, use carry out bit as a sign bit of the result An efficient solution to overflow: any two n-bit numbers can be added without overflow, by first sign-extending both of them to n+1 bits, and then adding as above The n+1 bit result is large enough to represent any possible sum of two n-bit numbers When turning a two's-complement number with a certain number of bits into one with more bits, the sign bit must be repeated in all the extra bits Similarly, when a two's-complement number is shifted to the right, the sign bit must be maintained. However when shifted to the left, a 0 is shifted in to the LS For subtraction X-Y you perform X+(~Y+1 )

Pleasedonotdistribute

COMPE470Lab.

Representation of non-integer numbers


A sequence of n bits in a register is not necessarily representing an integer It can represent a number with a fractional part and an integer part The n bits are partitioned into two parts: k bits in the integer part and m bits in the fractional part (k+m=n)

binarypoint If m=0, the number is an integer and if k=0, the number is purely fractional The value of a number is

Pleasedonotdistribute

COMPE470Lab.

Base conversion
Use the remainders for the integer part Example: Convert 23.37510 to base 2. Start by converting the integer portion:

Putting it all together, 23.37510 = 10111.0112

Pleasedonotdistribute

COMPE470Lab.

Non-terminating base 2 fraction


We cant always convert a terminating base 10 fraction into an equivalent terminating base 2 fraction:

Pleasedonotdistribute

COMPE470Lab.

Arithmetic operations for fixed-point numbers


The addition or subtraction of two fixed-point numbers can be performed by regular adder or subtractor if the binary points of the two numbers are aligned The binary point remains the same position in the resulted number

bn bn1 bn2 b0 b1 b2 bm +/ c an an1 an2 a0 a1 a2 am sn sn1 sn2 s0 s1 s2 sm n kbits = n+mklbits kbits

B +/ A S

m lbits

lbits

k+lbits

For the convenience of hardware implementation, we prefer to have the product of a multiplication keeping the same length as the multiplicand or the multiplier (assume they have the same length). To achieve this, we normally truncate the least significant bits of the product
Pleasedonotdistribute COMPE470Lab. 9

Signed adder in verilog


`define WIDTH (WL1 > WL2 ? (WL1+1): (WL2+1)) module Add #(parameter WL1 = 4, //first input length WL2 = 4) //second input length (input signed [WL1-1:0] in1, // adder inputs input signed [WL2-1:0] in2, input CLK, output reg signed [`WIDTH-1 : 0] out); //adder output always @(posedge CLK) out <= { { (WL2>WL1 ? (WL2-WL1)+1 : 1){in1[WL1-1]} }, in1} + { { (WL1>WL2 ? (WL1-WL2)+1 : 1){in2[WL2-1]} }, in2}; endmodule

Pleasedonotdistribute

COMPE470Lab.

10

Fixed-point multiplier
module Mul #(parameter WI1 = 4, //first input, integer length WF1 = 4, // first input, fraction length WI2 = 8, // second input, integer length WF2 = 8, //second input, fraction length WIO = 12, // output format, integer length WFO =12) // output format, fraction length (input signed [WI1+WF1-1:0] in1, // multiplier inputs input signed [WI2+WF2-1:0] in2, `` input CLK, output signed [WIO+WFO-1 : 0] out); //multiplier output reg signed [WI1+WF1+WI2+WF2-1:0] temp; always @(posedge CLK) temp <= in1*in2; assign out = {temp[WF1+WF2+WIO-1:WF1+WF2],temp[WF1+WF2-1:(WF1+WF2-WFO+1)]}; endmodule

Modify the above code to take care of the sign bit

Pleasedonotdistribute

COMPE470Lab.

11

Vectors, arrays and memories


Verilog supports three similar data structures: vectors, arrays and memories Vectors are used to represent multi-bit busses
reg [7:0] MultiBitWord1;// 8-bit reg vector with MSB=7 LSB=0; little endian wire [0:7] MultiBitWord2;// 8-bit wire vector with MSB=0 LSB=7; big endian reg [3:0] bitslice; reg a; // single bit vector often referred to as a scalar a = MultiBitWord1[3]; //assigns the 4th bit of MultiBitWord1 to a bitslice = MultiBitWord1[3:0]; //return the 3-0 bits of MultiBitWord1

Arrays are used to hold several objects of the same type


integer i[3:0]; time reg //integer array with a length of 4; four integer values integer [3:0] i; //What is this? x[20:1]; //time array with length of 19 r[7:0]; //scalar reg array with length of 8

c = r[3]; //the 3rd reg value in array r is assigned to c integer count[1:5]; // 5 integers reg var[-15:16]; // 32 1-bit regs

Pleasedonotdistribute

COMPE470Lab.

12

Memories
Memories are arrays of vectors defined using reg variables
reg [ msb : lsb ] memory1 [ upper : lower ]; reg [7:0] myMem [3:0]; // It defines a memory with 4 locations and each // location contains an 8-bit data

Bit76543210 myMem[0] myMem[1] myMem[2] myMem[3]


reg reg reg reg reg [7:0] ram[0:4095]; // 4096 memory cells that are 8 bits wide r[0:4];// An array of 5 1-bit registers [7:0] memdata[0:255];// 256 8-bit registers [8*6:1] strings[1:10];// 10 6-byte strings membits [1023:0];// 1024 1-bit registers

A memory element (word or a row of memory) can be accessed by a memory index as mem[index] // similar ro a bit-select
reg [7:0] array1 [0:255]; wire [7:0] out1 = array1[address];
Pleasedonotdistribute COMPE470Lab. 13

ROMs using BlockRAM resources


In some device family, either the address or the output has to be registered for ROM code to be inferred
//rom with registered output module romSync #(parameter addWidth = 3, dataWidth = 4) (input clk, en, input [addWidth-1:0] addr, output reg [dataWidth-1:0] data); always @(posedge clk) begin if (en) case(addr) 3b000: data <= 4b0010; 3b001: data <= 4b0010; 3b010: data <= 4b1110; 3b011: data <= 4b0010; 3b100: data <= 4b0100; 3b101: data <= 4b1010; 3b110: data <= 4b1100; 3b111: data <= 4b0000; default: data <= 4bXXXX; endcase end endmodule

XST can use block RAM resources to implement ROMs with synchronous outputs or address inputs
Pleasedonotdistribute COMPE470Lab. 14

Dual-port RAM

module dual_port_ram_single_clock(input [(DATA_WIDTH-1):0] dina, dinb, input [(ADDR_WIDTH-1):0] addra, addrb, input WEA, CLK, output reg [(DATA_WIDTH-1):0] douta, doutb) parameter DATA_WIDTH = 8; parameter ADDR_WIDTH = 6; reg [DATA_WIDTH-1:0] ram [2**ADDR_WIDTH-1:0]; // Declare the RAM variable always @ (posedge CLK) begin // Port A if (WEA) begin ram[addra] <= dina; douta <= ram[addra]; end always @ (posedge CLK) begin //Port B doutb <= ram[addrb]; end endmodule

Pleasedonotdistribute

COMPE470Lab.

15

Initializing BlockRAMs
You can use an initial block to initialize the contents of an inferred memory Initialization is only valid for block RAM resources. If you attempt to initialize distributed RAM, XST ignores the initialization, and issues a warning message Block RAM initial contents can be specified by initialization of the signal describing the memory array in your HDL code You can do this directly in your HDL code, or you can specify a file containing the initialization data You can initialize blockRAMs and ROMs using $readmemh and $readmemb system tasks The initialization work identically in synthesis and simulation
reg [7:0] ram[0:15]; initial begin $readmemb("ram.txt", ram); end

Pleasedonotdistribute

COMPE470Lab.

16

Levels of representations
HighLevelLanguage Program Compiler AssemblyLanguage Program Assembler MachineLanguage Program
lw lw sw sw
0000 1010 1100 0101

temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;

Programmer'sView
$15, $16, $16, $15,
1001 1111 0110 1000

0($2) 4($2) 0($2) 4($2)


0110 1000 1111 1001

Computer'sView
1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111

1100 0101 1010 0000

MachineInterpretation ControlSignalSpec
ALUOP[0:3] <= InstReg[9:11] & MASK

Pleasedonotdistribute

COMPE470Lab.

17

Processor execution cycle


Instruction(andoperands)Fetch Calculatenextinstructionaddress Instruction Decode Execute Result Store Determinerequiredactions Computeresultvalueorstatus Depositresultsinamemoryforlateruse Obtaininstructionfrom programstorage

Sequence of instruction execution is controlled by a program counter register The next instruction to be executed is typically implied as instructions are execute sequentially
Pleasedonotdistribute

Instruction1 Instruction2 Instruction3


18

COMPE470Lab.

Unified or separate instruction and data memories?


Princeton (Von Neumann) Architecture Data and Instructions mixed in same unified memory Single memory interface Harvard architecture Data & Instructions in separate memories Has advantages in certain high performance implementations Can optimize each memory

Pleasedonotdistribute

COMPE470Lab.

19

Register-to-register load-store architecture


No memory addresses in the ALU operations Typically three operand ALU operands Most RISC (reduced instruction set) architectures use this scheme

Pleasedonotdistribute

COMPE470Lab.

20

Memory-to-memory architecture
All ALU operands from memory VAX machines

Pleasedonotdistribute

COMPE470Lab.

21

Instruction set design


What data types are supported? Signed integer What operations (and how many) should be provided ADD, MUL, SUB, NEG, JMP, (more will be discussed later) How many explicit operands? Up to three (eg., ADD C, A, B; NEG Z, B;) How operands and results are addressed? We use memory-to-memory architecture Example: ADD C, A, B; means C<- A+B; In this mode we use direct memory addresses for A, B, and C What is the instruction format? opcode opd ops1 ops2 NOTUSEDNOW

where opcode indicates the operation of the instruction, opd, ops1, and ops2 are the destination and source registers specifiers, respectively

Pleasedonotdistribute

COMPE470Lab.

22

How to design a single cycle processor?


1. Analyze the instruction set and select a set of dtaapath components Datapath must support each instruction 2. Implement the datapath components and establish clocking methodology 3. Assemble the datapath to meet the instruction set architecture (ISA) requirements 4. Analyze implementation of each instruction to determine a set of control signals that effects the register transfer level (RTL) implementation 5. Design the control logic and connect it to the datapath

Pleasedonotdistribute

COMPE470Lab.

23

Start!
Assume that our processor supports three instructions: ADD, SUB, and MUL These arithmetic operations are implemented using the ALU that you designed The addition (subtraction) is performed in 2s complement format with overflow detection No carry output is required Lets implement (A-B)x(C+D)x(A+D)

Pleasedonotdistribute

COMPE470Lab.

24

Instruction fetch unit


The instruction fetch unit performs two RTL operations Fetch the current Instruction from the instruction memory addressed by the program counter (PC): mem[PC] Update the PC Since we do not support the jump or branch instruction yet, the program counter will be updated by PC < PC + 1

CLK

PC NextAddress Logic Address Instruction Memory InstructionWord

Pleasedonotdistribute

COMPE470Lab.

25

An abstract view of the processor


Instruction Memory
Instruction Address NextAddress Logic WE PC din CLK
WEA

Controller
Instruction

ControlSignals Conditions

opd ops1 ops2 dout


din1 din2

ctrl
douta

Result

ALU

Datapath

Pleasedonotdistribute

CLK

Data Memory

doutb

COMPE470Lab.

26

Memory modules
Use the provided parameteric memory module (number of rows and columns are parametric) Use 1024 as the default value for the number of rows and 18 as the number of columns Use one memory block for the data memory and one memory block for the instructions The operands of the instructions are stored in the data memory We will discuss the controller design in the next lab

Pleasedonotdistribute

COMPE470Lab.

27

You might also like