You are on page 1of 12

IMPLEMENTATION OF DIRECT MAPPED CACHE, IN BEHAVIORIAL VERILOG

ECE254 ASSIGNMENT 1 NEERAJ DHOTRE (perm:5483615)

1. Introduction: Memory hierarchy is imperative due to prevalent highly pipelined and super scalar architectures. As main memory access is lot slower compared to other tasks in the pipeline, data and instructions are stored closed to the processor in a small and comparatively faster memory, called cache. The main aspects of cache design are cache size, memory mapping function, write policy and replacement algorithm. In a direct mapped cache each block of memory is mapped to a particular row in the cache. The mapping function is simple to implement but performance of this type of cache is not the best. The synchronization of the cache with main memory, handling read miss and write miss etc. make direct mapped cache a good candidate for this assignment, aim of which is to learn Verilog modeling and design simulation with Model Sim. 2. Cache Design:

2.1 Assumptions: The cache is designed with the following assumptions. This cache lies between the Processor and the Main memory. The cache and processor run on the same fast clock. Main memory is a single port synchronous DRAM running on a slower clock.(4 times slower) Processor sends physical address to the cache. Processor sends/requests one word data (32 bits wide) at a time. The cache implements write through write policy with No write allocate i.e. on a write miss data is written only to main memory. 2.2 Cache and Main Memory Size: To keep the cache size, memory size, data with etc. flexible for any cache module instance parameters are used. Parameters define constants which can be changed during instantiation. The main parameters with the values specified were used for simulations in this assignment. parameter ADDR_SIZE = 8; meaning 8 bit address and 28 Main memory locations. parameter DATA_SIZE = 32; meaning processor is 32 bit and each main memory location is 32bit making it 256 x 4 byte or 1024B memory. parameter LINE_BITS = 2; meaning 4 lines in the cache. parameter LINES = 1 << LINE_BITS; parameter WORD_BITS = 2; meaning 4 words per line in the cache, making it 64B memory. parameter WORDS = 1 << WORD_BITS; So according to these sizes the address is broken like shown is figure 1. for direct mapping in the cache.

Memory Address
TAG 4bit s 2bit s LINE 2bit s WORD

Figure 1. Address break up for direct mapping.

2.3 Block Diagram: The block diagram is shown in figure 2. Behavioral model is written for the cache and main memory blocks. The signals from processor are given as stimulus is tech bench. Table 2 lists the ports of cache.

Processor busy data_valid wr_done rd_done 7 addr data 8 rd_en

Main Memory

Cache Cache Memory mem_data Cache memory tag word 1 word 2 word 3 word 4 8 mem_addr

clk reset

mem_wr Chip_select

Figure 2. Block Diagram showing signal connections.

Port clk reset rd_en data [31:0] addr[7:0] data_valid Busy

Description Common clock between cache and processor Synchronous reset to the cache HIGH for read from cache LOW for write to cache Data from/to processor Direction determined by rd_en Address from the processor Active high signal indicating output data to processor is valid Active High signal indication that cache is busy. Processor will not send another request when cache busy. mem_addr[7:0] output Address bus to main memory mem_wr output HIGH for read from main memory LOW for write to main memory chip_select output Signal to enable main memory access mem_data[31:0] bidirectional Data from/to main memory rd_done input Signal from main memory that requested read operation done wr_done input Signal from main memory that requested write operation done Table 1. Ports of cache with direction and descriptions.

Direction input input input bidirectional input output output

Register cache_hit_reg line tag count data_out mem_data_out mem_data_reg0 to 3

Size 1 bit 2 bit 4bit 2 bit 32bit 32bit 32bit

Description Indicates a tag match, meaning requested address present in cache To store line index to cache from input address To store location tag from input address To keep track of number of main memory reads in case of read miss Registered data out before driving it onto the bidir data bus to processor Registered data out before driving it onto the bidir data bus to Memory 4 registers to store data words read in from main memory

Table 2. Internal registers used in the behavioral model

3.

Verilog Implementation

3.1 Verilog code The verilog code for the cache is given in appendix A. The design is implemented in 6 always blocks which execute simultaneously. There are 2 combinational blocks and 4 sequential blocks. These blocks do the following logical tasks and together model cache behavior. 3.1.1 Combinational Blocks: Tag comparison: This block always checks weather the tag of line mentioned in input address matches with that in the address. It sets the cache_hit_reg if there is a tag match irrespective of read or write operation II. Memory select: This blocks controls the enabling of Main Memory. The Main Memory needs to be enabled only when data is needed to be transferred to/from it. This gives better control over the rd_done and wr_done signals given out by the Main Memory. 3.1.2 Sequential Blocks: I. Cache Hit: Only if Tag comparison is successful this block executes and does the required data manipulation. II. Cache Miss: Only if Tag comparison is un successful this block executes and does the required data manipulation. III. Data Synchronizing from Memory: There two blocks, one runs on posedge clk and other on posedge rd_done. These are required to synchronize the reads from memory in case of a read miss, as cache and memory run at different asynchronous clocks. 3.2 Test bench The test bench code is present in appendix C. The test bench runs 4 test cases to test the functionality of the direct mapped cache. The clk signal is given a period of 10ns and mem_clk period is 40ns I.

1) Write Miss: Initially there is nothing in the cache or Memory. Processor issues 4 writes to consecutive memory locations all of which result in a cache write miss. The data is written only to main memory. As seen in the waveform data 56,57,58,59 were written to memory location 120,121,122 and 123 respectively. Cache_hit_reg signal was always low meaning a cache miss and proper busy pulses were given to the processor form every right.

Figure 3. Wave forms showing Cache write miss test case.

2) Read Miss: Now the test bench requests the data written in the previous step. This results in a read miss and cache brings the data from main memory. In this case as the memory has only one word at each location, cache has to do 4 reads to get a block of data and replace a line. As seen in the waveform in figure 4 the processor requests data at location 120 resulting in a read miss. This triggers 4 reads from main Memory. Required data is given to processor with data_valid and the cache line 2 is written with 4 words (56,57,58,59).

Figure 4. Wave forms showing Cache read miss test case

3) Read Hit: Again the processor requests same data. This time it is a cache hit as the data was brought into the cache in the previous step. The data requested was at location 122 and as seen in figure 5. Correctly data 58 was returned. 4) Write Hit: Now the processor writes a word to the cache at the same address from which it read in last step. This results in a cache hit and the data is written properly. The data 60 is requested to be written at location 122. As seen in the waveform in figure 5. correctly 60 is written to the cache. According to write through method this data is written to main memory too.

Figure 5. Wave forms showing Cache read and write hit test case

4.

APPENDIX A

CACHE Verilog code.

/*###################################################################### --------------------------SIMPLE DIRECT MAPPED CACHE --------------------------Input address is broken like this [----TAG----| -----LINE------|---WORD----] TAG ---> cache tag LINE --> index for the line in cache. 2^line = number of lines in cache WORD --> bits to address word in cache line. 2^word = number of data words in cache line. ######################################################################*/ module cache ( clk, addr, rd_en, data, mem_addr, mem_wr, mem_data, rd_done, wr_done, data_valid, busy, reset, chip_select );

//clock. same as cpu clock. //address from cpu. //HIGH for read from cache.LOW for write to cache. //bidir data from/to CPU //address to main Memory. //HIGH for write to Memory.LOW for read from Memory //bidir data to/from main Memory //read done signal from main Memory //write done signal from main Memory //Signal telling CPU data is valid to read. //telling CPI cache busy when read miss. //reset to cache. //select signal to enable memory.

parameter ADDR_SIZE = 8; parameter LINE_BITS = 2; parameter LINES = 1 << LINE_BITS; parameter WORD_BITS = 2; parameter WORDS = 1 << WORD_BITS; parameter DATA_SIZE = 32; parameter TAG_SIZE = ADDR_SIZE - LINE_BITS - WORD_BITS; parameter LINE_WIDTH = DATA_SIZE * 4 + TAG_SIZE; parameter TAG_INDEX_1 = ADDR_SIZE - TAG_SIZE; parameter TAG_INDEX_2 = LINE_WIDTH - TAG_SIZE; input clk; input reset; input [ADDR_SIZE-1:0] addr; input rd_en; input wr_done; input rd_done; inout [DATA_SIZE-1:0] data; inout [DATA_SIZE-1:0] mem_data; output chip_select; output [ADDR_SIZE-1:0] mem_addr; output mem_wr; output data_valid; output busy; reg chip_select; reg data_valid; reg busy; reg [1:0] count; reg [ADDR_SIZE-1:0] mem_addr; reg mem_wr; reg [DATA_SIZE-1:0] data_out;

reg [DATA_SIZE-1:0] mem_data_out; reg [DATA_SIZE-1:0] mem_data_reg_0; reg [DATA_SIZE-1:0] mem_data_reg_1; reg [DATA_SIZE-1:0] mem_data_reg_2; reg [DATA_SIZE-1:0] mem_data_reg_3; reg cache_hit_reg; reg [LINE_WIDTH-1:0] memory [LINES-1:0]; reg [LINE_WIDTH-1:0] line; reg [LINE_WIDTH-1:0] tag; wire [LINE_BITS-1:0] line_index; assign line_index = addr[(WORD_BITS+LINE_BITS)-1:WORD_BITS]; assign data = (rd_en) ? data_out:{DATA_SIZE{1'bz}}; assign #5 mem_data = (!rd_en && chip_select) ? mem_data_out:{DATA_SIZE{1'bz}}; always @ (rd_done or wr_done or count ) begin if (rd_done || wr_done && count == 2'b00) begin chip_select = 1'b0; busy = 1'b0; end else if (!rd_done && !wr_done && (count != 2'b00) ) begin chip_select = 1'b1; busy = 1'b1; end end //storing data red from main memory. always @ (posedge rd_done) begin if (count == 2'b11) begin count <= 2'b00; end else count <= count + 1; case (mem_addr[1:0]) 2'b00: begin mem_data_reg_0 <= mem_data; mem_addr[1:0] <= 2'b01; end 2'b01: begin mem_data_reg_1 <= mem_data; mem_addr[1:0] <= 2'b10; end 2'b10: begin mem_data_reg_2 <= mem_data; mem_addr[1:0] <= 2'b11; end 2'b11: begin mem_data_reg_3 <= mem_data; mem_addr[1:0] <= 2'b00; end default: mem_data_reg_0 = mem_data; endcase end //BLOCK to handle data in cache miss always @ (posedge clk) begin //if read miss get data from main Memory. if ( rd_en && !cache_hit_reg && count == 2'b01 ) begin if(rd_done) begin data_out <= mem_data; data_valid <= 1'b1; end else data_valid <= 1'b0; end // wriritng for words read from main memory. if ( rd_en && !cache_hit_reg && (count == 2'b00)) begin memory[line_index] <= { addr[ADDR_SIZE-1:TAG_INDEX_1], mem_data_reg_0, mem_data_reg_1, mem_data_reg_2, mem_data_reg_3 }; busy <= 1'b0; end

//if write miss, write data to memory. After data written tell cpu //that CACHE not busy if ( wr_done && !rd_en && !cache_hit_reg) begin //busy <= 1'b0; mem_wr <= 1'b0; chip_select <=1'b0; end end // BLOCK FOR TAG MATCH. CACHE HIT always @ (posedge clk or reset) begin if (reset) begin line <= {LINE_WIDTH{1'b0}}; tag <= {LINE_WIDTH{1'b0}}; busy <= 1'b0; count <= 1'b0; chip_select <= 1'b0; end else begin //check tag every cycle tag <= memory [line_index];

//read request cache hit. return data. if( rd_en && cache_hit_reg) begin line <= memory [line_index]; case (addr[1:0]) 0:data_out <= line[LINE_WIDTH-1:(DATA_SIZE*3)]; 1:data_out <= line[(DATA_SIZE*3)-1:(DATA_SIZE * 2)]; 2:data_out <= line[(DATA_SIZE * 2)-1:DATA_SIZE]; 3:data_out <= line[DATA_SIZE-1:0]; default:data_out <= {DATA_SIZE{1'b0}}; endcase data_valid <= 1'b1; end //write request cache hit write data to cache and memory else if ( !rd_en && cache_hit_reg) begin line <= memory [line_index]; case (addr[1:0]) 0:line[LINE_WIDTH-1:(DATA_SIZE*3)] <= data; 1:line[(DATA_SIZE*3)-1:(DATA_SIZE * 2)] <= data; 2:line[(DATA_SIZE * 2)-1:DATA_SIZE] <= data; 3:line[DATA_SIZE-1:0] <= data; default:line <= {DATA_SIZE{1'b0}}; endcase memory[line_index] <= {addr[ADDR_SIZE-1:TAG_INDEX_1] , line}; mem_addr <= addr; mem_wr <= 1'b1; chip_select <= 1b1; mem_data_out <= data; end end end // BLOCK FOR TAG MIS MATCH. CACHE MISS always @ (posedge clk ) begin //Read miss, read from main memory. Send data to CPU later. if( !rd_done && rd_en && !cache_hit_reg && !busy && count == 2'b00) begin chip_select <= 1'b1; mem_addr <= addr;

mem_wr <= 1'b0; busy <= 1'b1; end //Write miss, write to main memory. if( !wr_done && !rd_en && !cache_hit_reg && !busy ) begin chip_select <= 1'b1; mem_addr <= addr; mem_wr <= 1'b1; mem_data_out <= data; busy <= 1'b1; end end //tag conparison. always @ ( addr or tag or count or rd_done or wr_done) begin if ( (addr[ADDR_SIZE-1:TAG_INDEX_1] === tag[LINE_WIDTH-1:TAG_INDEX_2]) && count == 2'b00 && !busy && !rd_done && !wr_done ) begin cache_hit_reg = 1'b1; end else begin cache_hit_reg = 1'b0; end end endmodule 5. APPENDIX B Memory Verilog code. //single port memory. //simple behaviorial description. module dram ( clk, addr, data, wr_en, //high for write,low for read rd_done, wr_done, chip_select ); //busy signal? parameter ADDR_SIZE = 8; parameter DATA_SIZE = 32; parameter MEM_SIZE = 1 << ADDR_SIZE; input input input input clk; wr_en; [ADDR_SIZE-1:0] addr; chip_select;

inout [DATA_SIZE-1:0] data; output rd_done; output wr_done; wire [1:0] word_loc; wire [ADDR_SIZE-1:2] line_index; reg [DATA_SIZE-1:0] data_out; reg [DATA_SIZE-1:0] memory [MEM_SIZE-1:0]; reg rd_done; reg wr_done; assign data = (!wr_en && rd_done)? data_out : {DATA_SIZE{1'bz}};

always @ ( posedge clk ) begin if( wr_en && chip_select ) begin memory[addr] <= data; wr_done <= 1'b1; end else wr_done <= 1'b0; end always @ ( posedge clk ) begin if ( !wr_en && chip_select) begin data_out <= memory[addr]; rd_done <= 1'b1; end else rd_done <= 1'b0; end endmodule 6. APPENDIX C Test bench

`include "memory.v" `include "cache.v"

module cache_tb; parameter ADDR_SIZE = 8; parameter DATA_SIZE = 32; reg clk; reg mem_clk; reg reset; reg [ADDR_SIZE-1:0] addr; reg rd_en; reg [DATA_SIZE-1:0] data_in; wire chip_select; wire rd_done; wire wr_done; wire [DATA_SIZE-1:0] data; wire [DATA_SIZE-1:0] mem_data; wire [ADDR_SIZE-1:0] mem_addr; wire mem_wr; wire data_valid; wire busy; initial begin clk = 0; mem_clk = 0; reset = 1; #20 reset =0; //WRITE MISS //write 4 times to addresses 120 to 123. addr = 120; #2 rd_en = 0; data_in = 56; #60 addr = 121; data_in = 57; #60 addr = 122; data_in = 58; #60 addr = 123;

data_in = 59; //READ MISSS #175 rd_en = 1'b1; //#25 rd_en = 1'b1; addr = 120; //READ HIT #280 addr = 122; //WRITE HIT #50 rd_en = 1'b0; data_in = 60; #400 rd_en = 1'bz; //$readmemh("dram.list", end assign data = (!rd_en)? data_in:{DATA_SIZE{1'bz}}; always #5 clk = !clk; always #20 mem_clk = !mem_clk;

cache direct_mapped( .clk(clk), .reset(reset), .addr(addr), .rd_en(rd_en), .data(data), .mem_addr(mem_addr), .mem_wr(mem_wr), .mem_data(mem_data), .rd_done(rd_done), .wr_done(wr_done), .data_valid(data_valid), .busy(busy), .chip_select(chip_select) ); dram main_memory( .clk(mem_clk), .addr(mem_addr), .data(mem_data), .wr_en(mem_wr), .rd_done(rd_done), .wr_done(wr_done), .chip_select(chip_select) ); //initial //#100 $finish; endmodule

7.

REFERENCES

http://www.faculty.iu-bremen.de/birk/lectures/PC101-2003/07cache/cache%20memory.htm http://www.asic-world.com/verilog/index.html

You might also like