Professional Documents
Culture Documents
Wyland,Vice President of Engineering Quality Semiconductor, Inc. 851 Martin Avenue Santa Clara. CA 95050-2903 Tel: (408) 986-8326 Fax: (408) 496-0591 ABSTRACT Burst mode memories improve cache design by improving refill time on cache misses. Burst mode RAMs allow refill of a four word cache line in five clock cycles at 50 mHz rather than the eight clock cycles that would be required for a conventional SRAM. Burst mode RAMs also have clock synchronous interfaces which make them easier to design into systems, particukirly at clock rates of 25 mHz and above. Clock Figure 2: Burst RAM Read Timing
,
8
mAlT
Counter
E l
A burst mode RAM provides high speed transfer of a block of sequential words, called a burst. A block diagram of a burst mode SRAM is shown in Figure 1. A burst mode RAM consists of a conveniional SRAM plus an address counter, a read/write flip flop and a write register. Read and write timing is controlled by a clock in combination with the address counter load and read/write signals. In this configuration, random access to a word in the SRAM requires two clock cycles with successive words being read or written at one clock cycle per word. This is shown in the timing diagrams of Figures 2 and 3. Figure 1 : Burst RAM Block Diagram
Read Data For write operations, the first word of data to be written is clocked in to the write register at the same time the address counter and the read/write flip flop are loaded, as shown in Figure 3. Data from the write register is written into the SRAM during the second clock cycle. At the end of the second clock cycle, new data is clocked into the write register and address counter is incremented to the next location to write the next sequential word. Figure 3: Burst RAM Write Timing Clock Address
clOc Address--)l
kTf14T: f
T
Write Register
,
L
LJ
ReadMlrite Data In the read timing diagram of Figure 2,the first clock cycle is used to load the address counter and the readhnrrite flip flop 1 for random access 10 the first word. Read data comes out of the SRAM before Ihe end of the second clock cycle. l h e address counter is incremented at the end of the second clock cycle, and the next word is read from the SRAM. This allows one clock cycle per successive word read following the initial random access.
mAD
Counter
ww
Write Reg
I
I b
, , ,
279
Authorized licensed use limited to: IEEE Xplore. Downloaded on April 8, 2009 at 11:36 from IEEE Xplore. Restrictions apply.
The burst mode memory is capable of high speed operation after the initial access because the sequential addresses are generated internally by the address counter. This greatly reduces the read and write cycle times for sequential data following the first access. Clock speeds of up to 50 mHz are possible in a TTL system, making the burst mode memory particularly well suited to the newer generations of high speed RlSC and ClSC chips.
A direct mapped cache for a 32-bit processor is shown in Figure 4. A direct mapped cache consists of a cache tag RAM, a cache data RAM and a small amount of logic to control events when a cache hit or a cache miss occurs. A cache hit is said to occur if a requested word is found in the cache. A iss occurs when the word is not found in the cache. Figure 4: Cache Block Diagram
32-Bit FP
Ready
ReadMlrite
The cache stores copies of words read from main memory in the cache data RAM and stores the location these words are read from in the cache tag RAM. In the direct mapped cache, the least significant bits of the address bus are sent to both the tag and data RAMS while the most significant bits are stored in the tag RAM when data is stored in the cache data RAM. In the example shown, both the tag and data RAMSare 8K words deep. When a read request is made to main memory, the least significant bits of the address are used to select one of the 8K words in both memories. The most significant bits of the address are compared against the bits stored in the tag RAM. If there is a match between the two, then the data stored in the data RAM is a copy of the data at the requested location and can be immediately supplied to the processor. This is a cache hit. If the upper address bits do not match, the data stored came from a different location. This is a cache miss. Direct mapped caches work because most accesses to main few thousand words located somewhere in the memory space. If the cache is larger than this cluster size, most of the read data will be provided by the cache. The least significant bits of the address bus are used to index within this cluster of words, and the most significant bits identify the region of memory that they came from. (Cache theory is a little more subtle than this. It treats the least significant bits of the address as a hashing function for a hash indexed buffer.)
memory are typically to a small cluster of a
280
Authorized licensed use limited to: IEEE Xplore. Downloaded on April 8, 2009 at 11:36 from IEEE Xplore. Restrictions apply.
,
w0 a
A4-Alb
*= 2:
CD
Address
A2.3
4
Ratch -Enable
7
The design of Figure 6 uses one 088813 8Kx18 Tag RAM and two OS8811 8Kx18 Burst Mode RAMs for the tag and data memories respectively. The QS8813 is an 8Kx18 Tag SRAM with built-in match enable logic that allows it to directly drive the BRDY input of the 80486. This eliminates the need for additional logic in the propagation delay path between the Tag SRAM and the microprocessor. This can save five or more nanoseconds in match time. Only 2K of the 8K are used; however, the E1813 provides a single chip design solution for the TAG RAM. The complete design requires only three RAM chips. Figure 7: 80486 '128K Byte Cache Block Diagram
, ,
Read Data
The design of Figure 7 uses one QS8813 8Kx18 Tag RAM and four QS8839 32Kx9 Burst Mode RAMs for the tag and data memories respectively. The full 8K words of the 8813 are used to support the 32K words of the 8839. Both the 8811 and 8339 Burst Mode RAM chips provide an on-chip address counter and logic for burst mode operation. The address counter provides for bursts of up to four words using the 80486 address counting algorithm. Also, the burst counter on the 8811 count in either binary or 80486 counting modes, pin selectable.
*
w=l CD
Data Address
A1 5-A: 1
Tag
Data RAM
*= o m
ilKxl5 l x CIS881 3
A2.3
CONCl (ISION
8x QS81589
Burst mode memories provide performance improvementfor the cache systems used in high speed ClSC and RlSC systems which use multiple words per cache line. They are particularly useful at CPU clock speeds above 25 mHz due to their higher performance and simpler interface. Because of these advantages, burst mode memories are becoming a standard component for cache design of high speed systems.
-5
A Ratch -Enable
7
281
Authorized licensed use limited to: IEEE Xplore. Downloaded on April 8, 2009 at 11:36 from IEEE Xplore. Restrictions apply.
Qche P e r f o r m e vs Reload T i m Cache performance is defined by miss rate and reload time. Miss rate is the percentage of accesses that miss, and reload time is the number of wait states required to get the data for the processor and reload the cache on a miss. The miss rate of a cache is a function of cache size, cache organization and the statistics of the program running on the processor. Miss rates are like EPA gas mileage estimates: with different programs, your miss rate will vary from benchmark estimates. Generally, caches range from 16 KBytes to 256 KBytes in size, with larger caches having lower miss rates. Target miss rates are in the 2-20% range. Cache reload time for the cache in Figure 4 is the time to access one word out of main memory. This may require three wait states in a conventional access and four wait states with a cache. The cache system has an extra wait state because one clock cycle is required to determine if the data is in the cache before main memory access can be started on a miss. A FOUR WORD PER LINE CACHE Cache refill performance can be improved by loading more than one word on a miss. A cache using this approach is shown in Figure 5. In this design, the data cache is four times as deep as the cache tag memory. The two least significant bits of the address bus go to the cache data memory but do not go to the tag memory. On a cache miss, four words are loaded into the cache data memory, and a single tag - the common tag for the four locations - is written at the same time. This is called a four word per line cache memory, where a line refers to the amount of data fetched on a cache miss. Figure 5 : Four Word/Line Cache Block Diagram Data
32-Bit
PP
Ready
ReadNVrite
282
Authorized licensed use limited to: IEEE Xplore. Downloaded on April 8, 2009 at 11:36 from IEEE Xplore. Restrictions apply.