Isa

COMPE470Lab
AmirAlimohammad
DepartmentofElectricalandComputerEngineering SanDiegoStateUniversity OfficeFourthfloor,Engineering,403E EMail aalimohammad@mail.sdsu.edu OfficeHoursMondaysandFridays4 5PM ClassScheduleMondaysandFridays5:30 6:45PM ClassLocationGeology,Mathematics&ComputerScience(GMCS) 308
Singlecyclefixedpointprocessordesign
Pleasedonotdistribute
COMPE470Lab.
Binary system
Assume a binary number X is an ordered sequence of binary digits (bits) X = xn-1 xn-2 x1 x0 Upper case letters represent numerical values or sequences of digits Lower case letters, usually indexed, represent individual digits
Integervalueof unsignedX Weightofbinarydigitn1

Numerical value X of representation (xn-1,xn-2,...,x0) in two's complement can be calculated by adding up the power-of-two weights of the "one" bits, but with a negative weight for the most significant (sign) bit
X = -2n-1xn-1 + 2ixi
Range of representable numbers with n bits is from 2n1 to 2n1 1
Pleasedonotdistribute COMPE470Lab. 2
Twos complement system

Asymmetrical range of [2n-1, 2n-1 -1] Slightly asymmetric: One more negative number than positive number -2n-1, represented by 100, does not have a positive equivalent Negation complement all the bits (bit-wise complement) and add 1 2sc(Y) = 2n Y = ((2n 1) Y) + 1 = ~Y + 1 Recall that inverting the bits of the number X gave you the value (2n 1) X (which is the ones complement of X). Adding 1 to this gives 2n X which is the 2s complement of X Note that the two's complement of zero is zero: inverting gives all ones, and adding one changes the ones back to zeros Another method: starting with the least significant bit, copy all of the bits up to and including the first 1 bit and then complementing the remaining bits Recall that negation can result in an overflow (for the smallest negative number)
COMPE470Lab.
Twos complement addition

You can add twos complement numbers without worrying about their signs A carry-out is not necessarily an indication of an overflow
Carryoutdiscarded does notindicateoverflow

In general, if X and Y have opposite signs - no overflow can occur regardless of whether there is a carry-out or not If X and Y have the same sign and result has different sign - overflow occurs
010019 101119 001117 101119 nocarryoutbutoverflow carryoutandoverflow 01000016=16mod32 10111014=18mod32

Overflow solution and sign extension

In case of overflow, use carry out bit as a sign bit of the result An efficient solution to overflow: any two n-bit numbers can be added without overflow, by first sign-extending both of them to n+1 bits, and then adding as above The n+1 bit result is large enough to represent any possible sum of two n-bit numbers When turning a two's-complement number with a certain number of bits into one with more bits, the sign bit must be repeated in all the extra bits Similarly, when a two's-complement number is shifted to the right, the sign bit must be maintained. However when shifted to the left, a 0 is shifted in to the LS For subtraction X-Y you perform X+(~Y+1 )
COMPE470Lab.
Representation of non-integer numbers

A sequence of n bits in a register is not necessarily representing an integer It can represent a number with a fractional part and an integer part The n bits are partitioned into two parts: k bits in the integer part and m bits in the fractional part (k+m=n)
binarypoint If m=0, the number is an integer and if k=0, the number is purely fractional The value of a number is
COMPE470Lab.
Base conversion
Use the remainders for the integer part Example: Convert 23.37510 to base 2. Start by converting the integer portion:
Putting it all together, 23.37510 = 10111.0112
COMPE470Lab.
Non-terminating base 2 fraction

We cant always convert a terminating base 10 fraction into an equivalent terminating base 2 fraction:
COMPE470Lab.
Arithmetic operations for fixed-point numbers

The addition or subtraction of two fixed-point numbers can be performed by regular adder or subtractor if the binary points of the two numbers are aligned The binary point remains the same position in the resulted number
bn bn1 bn2 b0 b1 b2 bm +/ c an an1 an2 a0 a1 a2 am sn sn1 sn2 s0 s1 s2 sm n kbits = n+mklbits kbits
B +/ A S
m lbits
lbits
k+lbits
For the convenience of hardware implementation, we prefer to have the product of a multiplication keeping the same length as the multiplicand or the multiplier (assume they have the same length). To achieve this, we normally truncate the least significant bits of the product
Signed adder in verilog

`define WIDTH (WL1 > WL2 ? (WL1+1): (WL2+1)) module Add #(parameter WL1 = 4, //first input length WL2 = 4) //second input length (input signed [WL1-1:0] in1, // adder inputs input signed [WL2-1:0] in2, input CLK, output reg signed [`WIDTH-1 : 0] out); //adder output always @(posedge CLK) out <= { { (WL2>WL1 ? (WL2-WL1)+1 : 1){in1[WL1-1]} }, in1} + { { (WL1>WL2 ? (WL1-WL2)+1 : 1){in2[WL2-1]} }, in2}; endmodule
COMPE470Lab.
10
Fixed-point multiplier
module Mul #(parameter WI1 = 4, //first input, integer length WF1 = 4, // first input, fraction length WI2 = 8, // second input, integer length WF2 = 8, //second input, fraction length WIO = 12, // output format, integer length WFO =12) // output format, fraction length (input signed [WI1+WF1-1:0] in1, // multiplier inputs input signed [WI2+WF2-1:0] in2, `` input CLK, output signed [WIO+WFO-1 : 0] out); //multiplier output reg signed [WI1+WF1+WI2+WF2-1:0] temp; always @(posedge CLK) temp <= in1*in2; assign out = {temp[WF1+WF2+WIO-1:WF1+WF2],temp[WF1+WF2-1:(WF1+WF2-WFO+1)]}; endmodule
Modify the above code to take care of the sign bit
COMPE470Lab.
11
Vectors, arrays and memories

Verilog supports three similar data structures: vectors, arrays and memories Vectors are used to represent multi-bit busses
reg [7:0] MultiBitWord1;// 8-bit reg vector with MSB=7 LSB=0; little endian wire [0:7] MultiBitWord2;// 8-bit wire vector with MSB=0 LSB=7; big endian reg [3:0] bitslice; reg a; // single bit vector often referred to as a scalar a = MultiBitWord1[3]; //assigns the 4th bit of MultiBitWord1 to a bitslice = MultiBitWord1[3:0]; //return the 3-0 bits of MultiBitWord1
Arrays are used to hold several objects of the same type

integer i[3:0]; time reg //integer array with a length of 4; four integer values integer [3:0] i; //What is this? x[20:1]; //time array with length of 19 r[7:0]; //scalar reg array with length of 8
c = r[3]; //the 3rd reg value in array r is assigned to c integer count[1:5]; // 5 integers reg var[-15:16]; // 32 1-bit regs
COMPE470Lab.
12
Memories
Memories are arrays of vectors defined using reg variables
reg [ msb : lsb ] memory1 [ upper : lower ]; reg [7:0] myMem [3:0]; // It defines a memory with 4 locations and each // location contains an 8-bit data
Bit76543210 myMem[0] myMem[1] myMem[2] myMem[3]

reg reg reg reg reg [7:0] ram[0:4095]; // 4096 memory cells that are 8 bits wide r[0:4];// An array of 5 1-bit registers [7:0] memdata[0:255];// 256 8-bit registers [8*6:1] strings[1:10];// 10 6-byte strings membits [1023:0];// 1024 1-bit registers
A memory element (word or a row of memory) can be accessed by a memory index as mem[index] // similar ro a bit-select
reg [7:0] array1 [0:255]; wire [7:0] out1 = array1[address];
ROMs using BlockRAM resources

In some device family, either the address or the output has to be registered for ROM code to be inferred
//rom with registered output module romSync #(parameter addWidth = 3, dataWidth = 4) (input clk, en, input [addWidth-1:0] addr, output reg [dataWidth-1:0] data); always @(posedge clk) begin if (en) case(addr) 3b000: data <= 4b0010; 3b001: data <= 4b0010; 3b010: data <= 4b1110; 3b011: data <= 4b0010; 3b100: data <= 4b0100; 3b101: data <= 4b1010; 3b110: data <= 4b1100; 3b111: data <= 4b0000; default: data <= 4bXXXX; endcase end endmodule
XST can use block RAM resources to implement ROMs with synchronous outputs or address inputs
Dual-port RAM
module dual_port_ram_single_clock(input [(DATA_WIDTH-1):0] dina, dinb, input [(ADDR_WIDTH-1):0] addra, addrb, input WEA, CLK, output reg [(DATA_WIDTH-1):0] douta, doutb) parameter DATA_WIDTH = 8; parameter ADDR_WIDTH = 6; reg [DATA_WIDTH-1:0] ram [2**ADDR_WIDTH-1:0]; // Declare the RAM variable always @ (posedge CLK) begin // Port A if (WEA) begin ram[addra] <= dina; douta <= ram[addra]; end always @ (posedge CLK) begin //Port B doutb <= ram[addrb]; end endmodule
COMPE470Lab.
15
Initializing BlockRAMs
You can use an initial block to initialize the contents of an inferred memory Initialization is only valid for block RAM resources. If you attempt to initialize distributed RAM, XST ignores the initialization, and issues a warning message Block RAM initial contents can be specified by initialization of the signal describing the memory array in your HDL code You can do this directly in your HDL code, or you can specify a file containing the initialization data You can initialize blockRAMs and ROMs using $readmemh and $readmemb system tasks The initialization work identically in synthesis and simulation
reg [7:0] ram[0:15]; initial begin $readmemb("ram.txt", ram); end
COMPE470Lab.
16
Levels of representations
HighLevelLanguage Program Compiler AssemblyLanguage Program Assembler MachineLanguage Program
lw lw sw sw
0000 1010 1100 0101
temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;
Programmer'sView
$15, $16, $16, $15,
1001 1111 0110 1000
0($2) 4($2) 0($2) 4($2)

0110 1000 1111 1001
Computer'sView
1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111
1100 0101 1010 0000
MachineInterpretation ControlSignalSpec
ALUOP[0:3] <= InstReg[9:11] & MASK
COMPE470Lab.
17
Processor execution cycle

Instruction(andoperands)Fetch Calculatenextinstructionaddress Instruction Decode Execute Result Store Determinerequiredactions Computeresultvalueorstatus Depositresultsinamemoryforlateruse Obtaininstructionfrom programstorage
Sequence of instruction execution is controlled by a program counter register The next instruction to be executed is typically implied as instructions are execute sequentially
Instruction1 Instruction2 Instruction3

18
COMPE470Lab.
Unified or separate instruction and data memories?

Princeton (Von Neumann) Architecture Data and Instructions mixed in same unified memory Single memory interface Harvard architecture Data & Instructions in separate memories Has advantages in certain high performance implementations Can optimize each memory
COMPE470Lab.
19
Register-to-register load-store architecture

No memory addresses in the ALU operations Typically three operand ALU operands Most RISC (reduced instruction set) architectures use this scheme
COMPE470Lab.
20
Memory-to-memory architecture
All ALU operands from memory VAX machines
COMPE470Lab.
21
Instruction set design

What data types are supported? Signed integer What operations (and how many) should be provided ADD, MUL, SUB, NEG, JMP, (more will be discussed later) How many explicit operands? Up to three (eg., ADD C, A, B; NEG Z, B;) How operands and results are addressed? We use memory-to-memory architecture Example: ADD C, A, B; means C<- A+B; In this mode we use direct memory addresses for A, B, and C What is the instruction format? opcode opd ops1 ops2 NOTUSEDNOW
where opcode indicates the operation of the instruction, opd, ops1, and ops2 are the destination and source registers specifiers, respectively
COMPE470Lab.
22
How to design a single cycle processor?

1. Analyze the instruction set and select a set of dtaapath components Datapath must support each instruction 2. Implement the datapath components and establish clocking methodology 3. Assemble the datapath to meet the instruction set architecture (ISA) requirements 4. Analyze implementation of each instruction to determine a set of control signals that effects the register transfer level (RTL) implementation 5. Design the control logic and connect it to the datapath
COMPE470Lab.
23
Start!
Assume that our processor supports three instructions: ADD, SUB, and MUL These arithmetic operations are implemented using the ALU that you designed The addition (subtraction) is performed in 2s complement format with overflow detection No carry output is required Lets implement (A-B)x(C+D)x(A+D)
COMPE470Lab.
24
Instruction fetch unit

The instruction fetch unit performs two RTL operations Fetch the current Instruction from the instruction memory addressed by the program counter (PC): mem[PC] Update the PC Since we do not support the jump or branch instruction yet, the program counter will be updated by PC < PC + 1
CLK
PC NextAddress Logic Address Instruction Memory InstructionWord
COMPE470Lab.
25
An abstract view of the processor

Instruction Memory
Instruction Address NextAddress Logic WE PC din CLK
WEA
Controller
Instruction
ControlSignals Conditions
opd ops1 ops2 dout

din1 din2
ctrl
douta
Result
ALU
Datapath
CLK
Data Memory
doutb
COMPE470Lab.
26
Memory modules
Use the provided parameteric memory module (number of rows and columns are parametric) Use 1024 as the default value for the number of rows and 18 as the number of columns Use one memory block for the data memory and one memory block for the instructions The operands of the instructions are stored in the data memory We will discuss the controller design in the next lab
COMPE470Lab.
27

Isa

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Isa

Uploaded by

Copyright:

Available Formats

COMPE470Lab

Integervalueof unsignedX Weightofbinarydigitn1

Twos complement system

Twos complement addition

Carryoutdiscarded does notindicateoverflow

010019 101119 001117 101119 nocarryoutbutoverflow carryoutandoverflow 01000016=16mod32 10111014=18mod32

Overflow solution and sign extension

Representation of non-integer numbers

Putting it all together, 23.37510 = 10111.0112

Non-terminating base 2 fraction

Arithmetic operations for fixed-point numbers

bn bn1 bn2 b0 b1 b2 bm +/ c an an1 an2 a0 a1 a2 am sn sn1 sn2 s0 s1 s2 sm n kbits = n+mklbits kbits

Signed adder in verilog

Modify the above code to take care of the sign bit

Vectors, arrays and memories

Arrays are used to hold several objects of the same type

Bit76543210 myMem[0] myMem[1] myMem[2] myMem[3]

ROMs using BlockRAM resources

temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;

0($2) 4($2) 0($2) 4($2)

1100 0101 1010 0000

Processor execution cycle

Instruction1 Instruction2 Instruction3

Unified or separate instruction and data memories?

Register-to-register load-store architecture

Instruction set design

How to design a single cycle processor?

Instruction fetch unit

PC NextAddress Logic Address Instruction Memory InstructionWord

An abstract view of the processor

opd ops1 ops2 dout

You might also like