Professional Documents
Culture Documents
1.
Introduction
Memory of various sizes and types is an integral part of every microprocessor system. We use the terminology memory to include many forms of memory, but there really are only two fundamental kinds: 1 Read-only Memory (ROM) 2 Read-Write Memory (SRAM and DRAM) To add to the confusion, Read-Write memory is most often referred to as RAM (random access memory) which does not fully describe its characteristics. ROM is also random access which refers to the ability of the memory device to access internal data in a fixed, guaranteed maximum time, regardless of the location, or address, of the data. When using the terminology RAM, we usually are implying a reference to memory that is volatile (ie. it does not retain its contents when power is removed). In contrast, ROM retains its contents even when there is no power supplied (non-volatile). This is what makes ROM suitable for storing program code that does not need to change such as in an embedded system, a BIOS in a PC workstation, or images taken with a digital camera. ROM comes in many well-known forms including PROM, EPROM, EEPROM, and FLASH. Various of these have been successfully marketed under additional trade marks such as USB flash drives (UFD), Thumb Drives, Compact Flash (CF), Memory Sticks, Secure Digital Multi-media card (SD/MMC), xD-Picture Card and SmartMedia. This has been driven in recent years of course by consumer markets in PDAs, Cell phones and digital cameras. But all of these are read-only memory technologies, despite the fact that we can write data into them, because of the way the writing (ie. programming) occurs. Most flash type memory requires block-based erasure and writing and this is very slow when compared to RAM-type devices which can read and/or write individual locations. Read-Write Memory includes both static RAM (SRAM) and dynamic RAM (DRAM) which both are forms of volatile memory. DRAM is by far the dominating technology in the worlds memory market and is used in the vast majority of PC workstations as main memory. DRAM is the cheapest (per bit of storage), consumes less power and is physically smaller than SRAM. However, SRAM is used is cache memory for example, because its access times are considerable faster than DRAM. DRAM has evolved over the past few years giving us many acronyms that arise from the strategies used to improve the efficiency of accessing data including synchronous DRAM (SDRAM), single data rate SDRAMM (SDR-SDRAM), double data rate SDRAM (DDR-SDRAM), and quad data rate SDRAM (QDR-SDRAM). For the remainder of this document we will use the common convention of referring to nonvolatile memory as ROM and volatile memory as RAM.
Dr. D. Capson
Organization vs Capacity
A generalized model using a block symbol for ROM and RAM memory devices are shown in Figure 1:
Figure 1
The organization of a memory device specifies its logical operation (but not necessarily its internal architecture). For example: 2k x n or 64K x 8 for k=16, n=8. The capacity of a memory device is the total number of bits of storage. For example, for k=16, n=8, the capacity is 219 bits. The organization tells us how the memory device looks from the outside; the internal logical organization can be implemented in various ways, as we will see below. Knowledge of the external organization is required when building larger memory systems comprising multiple smaller devices that need to be logically combined.
Dr. D. Capson
2.
Figure 2 Summation of decoder outputs to construct a ROM (a) each decoder output is connected to one input of an 8-input OR gate to form an 8x1 ROM (b) two outputs d1 and d0 to form an 8x2 ROM (c) the same 8x2 ROM as in part b, but using a single vertical connection to the OR gates for graphical simplicity (d) an 8x4 ROM
Dr. D. Capson Department of Electrical and Computer Engineering, McMaster University
InFigure2(b),weextendthisideatoinclude2outputs(d1 and d0).Eachsummation gate associated with the outputs are simply connected to the decoder outputs as requiredtoimplementthedesiredsumofproducts.InFigure2(b),twocombinational outputsaregenerated.Itiseasytoimaginethisextendingtomoreoutputs. IfthedecoderselectlinesareinterpretedtobeaddresslinesandtheoutputsfromtheOR gates are considered to be data lines, then we have implemented a 23 x 2 readonly memory, with contents specified by the connections we have chosen to make at the connectionpoints. Asagraphicalsimplification,wecanredrawthecircuitofFigure2(b)asshowninFigure 2(c). The set of eight inputs to each OR gate are redrawn as a single vertical line. In Figure 2 (d), the idea is extended to 4 outputs (d3-d0) and thus implements a 23 x 4 ROM. InFigure3,twoexamplesofROMs,eachwitha64bitsofcapacityareshown,butwith different organizations. In Figure 3 (a), the organization of the ROM is 26 x 1 which arisesfromtheuseofthe6lineto64linedecoderandasinglesummationoutput.Note thattheORgatewillhave64inputsinthiscase!Asamemory,thismaybeviewedasa devicewith64locations,each1bitinwidth. In Figure 3(b), the organization is 23 x 8 using a 3line to 8line decoder and 8 summationgatescorrespondingto8dataoutputs.Thatiswehaveonly8locations,each 8bitsinwidth.Inthiscase,eachORgatehas8inputs. Thepotentialconnectionsbetweenthehorizontaldecoderoutputlinesandthevertical OR gate inputs lines can be viewed as a connection matrix or a programming matrix thatspecifiesthecontentsoftheROM.ForsmallROMsizesaswehaveshownhere,itis easytoimaginemakingtheconnectionsmanuallybutofcourserealROMsaremuch, much larger and programming their contents requires software and hardware assistance! ROMs are programmed with various means of making the connections electronicallyateachintersection,throughoutthearray Theproblemwiththissimplestructureisthatitisdifficulttobuildlarger(real!)devices. Larger decoders require a large number of NAND gates for their implementation. For example, to build a 128 location ROM, we would need a 7to128 decoder and that requires1287inputNANDgatesplusotherlogic! Inthenextsection,weseehowtoextendthesebasicideastobuildlargerdevices.
Dr. D. Capson
Figure 3 Two examples of a 64-bit memory (a) organized as 64x1 (b) organized as 8x8
Two-level decoding
The organization of a memory may also be implemented with the use of a MUX. A multiplexercanbeusedonthearraytoselectgroupsofbitstoberoutedtotheoutput. TwoexamplesareshowninFigure4. Thisisknownastwoleveldecodingortwodimensionaldecodingorcoincidentselection (eventhoughthelogicthatisusedisactuallyadecoderhorizontallyandamultiplexer vertically!).EachoftheeightORgatesinFigure4(a)has8inputs.InFigure4(b),there are16ORgates,eachwith4inputs).
Dr. D. Capson
Figure 4
Two examples of a 64 bit memory with two-level decoding (a) organized as 64x1 (b) organized as 16x4
Address d a5 a0
000000 000001 000010 000011 : 111111 1 0 1 0 : 1
Table 1
Dr. D. Capson
The programming connections would be as shown in Figure 5. Externally, both ROMs appear to be organized as 64x1, both have a capacity of 64 bits, but internally the logic array is different. Note we have not shown the OR gates for graphical simplicity !
Figure 5 Programming connections corresponding to Table 1 (a) 64x1 ROM of Figure 3(a) (b) 64x1 ROM of Figure 4(a)
For the example in Figure 3(b), the organization is 8x8, therefore an 8-line truth table is required, such as the following example in Table 2 (again, the data d7 - d0 for each row is arbitrarily chosen for demonstration).
Address a2 a0
000 001 010 011 100 101 110 111
d7 - d0
01101001 11110000 00000000 00111100 10000000 00011110 00000001 11111111
The programming connections would be as shown in Figure 6. Externally, the ROM appears to be organized as 8x8, and it also has a capacity of 64 bits. Only one decoding level is used. Again, note we have not shown the OR gates for graphical simplicity !
Dr. D. Capson
And finally, for the example of Figure 4(b), the organization is 16x4, therefore a 16-line truth table is required such as the following example in Table 3.
Address a3 - a0 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 d3 - d0 0110 1001 0000 1111 0000 0000 0000 0000 1111 1111 1111 1111 0101 1010 1111 0000
Dr. D. Capson
Larger Arrays
An example of a larger memory is shown in Figure 8, obtained by simply expanding the size of the decoder and the multiplexer to form a 32K x 1 memory.
The logic of Figure 8 can be further expanded as shown in Figure 9 to construct a 32K x 8 = 256 Kbit memory.
Dr. D. Capson
10
To provide the control lines OE and CS (as shown in Figure 1), we need to add the output control logic to the outputs as shown in Figure 10. Note that the tri-state output capability is controlled by both the CS and OE inputs and that CS also serves to enable the array - this has the effect of shutting down the power to the memory when the device is not selected. This can be a substantial advantage in power savings for battery-powered systems where power consumption is a major design issue!
Dr. D. Capson
11
3.
Figure 11 Single-bit storage binary cell for the static RAM (SRAM)
Dr. D. Capson
12
The SRAM is based on construction of arrays of binary cells with decoding logic similar to the structure of the ROM. In the simple example of Figure 12, a 4x4 array is shown with a single level of decoding. An address applied to the select inputs of the decoder enables an entire row of cells (for example, row 1 shown in yellow for an address a1,a0 = 01 in the example of Figure 12). The outputs of each cell in a column (dout) are ORed together to produce each data bit on the output (dout3, dout2, dout1, dout0). Data input (din3, din2, din1, din0) is provided through four lines (shown in blue in Figure 12) that are connected to the data input (din)of each binary cell in a column. Since only one cell in the column will be selected by the decoder output, only one will actually store the incoming data. To provide the control lines OE and CS (as shown in Figure 1), we need to add the output control logic to the outputs as shown in Figure 13. Note that the tri-state output capability is controlled by both the CS and OE inputs just like in the case of the ROM. The WE input is combined with the CS and is connected to the wr input of each binary cell.
13
To complete our design, it is necessary to remove the requirement to have separate lines for data input and data output (this conforms to the block symbol for the SRAM presented in Figure 1 that shows bi-directional lines, used for both input and output). This is easily implemented as an extra set of buffers as shown in Figure 14. Note that the IO logic is changed slightly from that in Figure 13 such that the WE input is also used in the logic to drive the output tri-state buffers. This ensures that the data output from the binary cells is isolated from the data input lines during a write operation. The data input and output lines are now common and labeled (IO3, IO2, IO1, IO0).
Figure 14 4x4 SRAM memory with common input and output data lines (bidirectional)
Dr. D. Capson
14
4.
Memory Timing
The coordination for the input and output signals for the ROM shown in Figure 10 can be described in the timing diagram of Figure 15. There are many important timing specs normally provided with any memory device but we will concentrate on a few key timing parameters as shown, that indicate the performance of the device: tAA (access time from address) The access time from address parameter refers to the delay from the application of a stable address (ie. not changing) until the appearance of stable data at the data out lines, assuming that the chip has been selected (CS=0) and that the outputs have been enabled (OE=0). This is a main indicator of the speed of the memory and is typically 25-100 ns. tACS (access time from chip select) The access time from address refers to the delay from the point in time when chip select is dropped LO until the appearance of stable data at the data out lines, assuming that a stable address is present and that OE=0. This time includes the time required to drive the output buffers out of tri-state since CS controls the tri-state enable together with OE.
Dr. D. Capson
15
The coordination for the input and output signals for the SRAM shown in Figure 14 can be described in the timing diagram of Figures 16 and 17. Figure 16 show the read timing operation for the SRAM which can be seen to be similar to the timing of the ROM with the addition of the WR line that is held HI to initiate read operation. The same access time parameters as described for the ROM apply to the read operation of the SRAM: tACC (access time from address) The access time from address parameter refers to the delay from the application of a stable address to the appearance of stable data at the data out lines, assuming that the chip has been selected (CS=0) and that the outputs have been enabled (OE=0). This is a main indicator of the speed of the memory and is typically faster than a ROM and is in the 10-25 ns range. tACS (access time from chip select) The access time from address refers to the delay from the point in time when chip select is dropped LO until the appearance of stable data at the data out lines, assuming that a stable address is present and that OE=0. This time includes the time required to drive the output buffers out of tri-state since CS controls the tri-state enable together with OE. tOE (output enable time) The output enable time refers to the delay from the point in time when both CS and OE are LO until the outputs have been taken out of tri-state. This is usually much less than the access time. tOZ (output to tri-state time) The output to tri-state time specifies how long it takes the data outputs to become tri-stated when either CS or OE are de-asserted (HI in our examples).
Dr. D. Capson
16
The timing for a write operation for our SRAM is shown in Figure 17. Writes can be initiated in two ways as shown in the diagram: WE-controlled or CS-controlled. Most systems will use WE-controlled write operations. To write data to an SRAM, a stable address is applied to the address inputs. This is followed by assertion of the CS (CS=LO), followed by a LO-going write pulse on the WE input. The trailing edge (rising edge) of the WE pulse completes the storage of the data which has been applied to the data inputs. tDS (data setup time) Data setup time is the minimum time required for the input data to be stable (not changing) before the rising edge of the WE pulse. This is similar to the setup time requirement of any latch that its data be held constant for a minimum time before being latched to guarantee reliable operation! tDH (data hold time) Data must also be held constant on the input for a minimum time after the trailing edge of the WE pulse, to ensure reliable operation. For a CS-controlled write operation, a stable address is applied, the WE input is held LO and the data on the inputs is stored on the trailing edge of the CS pulse.
Dr. D. Capson
17
5.
18
It is the role of the decode logic to ensure that the correct memory device is accessed and that it supplies data on the data bus for this cycle. The basic memory read machine cycle timing is shown in Figure 19. Once the microprocessor supplies a stable address, the decode logic generates a corresponding chip select signal (or more than one for multiple memory devices in a system) according to the timing as shown.
Figure 19
For a write operation, the microprocessor provides an address on the address bus, provides data on the data bus and then issues a write pulse /WR. This is called a memory write machine cycle. It is also the role of the decode logic to ensure that the correct memory device is accessed and that it takes data from the data bus at the end of this cycle. The basic memory write machine cycle timing is shown in Figure 20.
Figure 20
All microprocessors use some form of this basic mechanism; it is a fundamental concept to understand the interfacing of memory and I/O devices to a microprocessor bus. The read and write machine cycles are a major factor in determining the overall performance of a system.
Dr. D. Capson Department of Electrical and Computer Engineering, McMaster University
19
Many strategies have been developed by manufacturers of microprocessors and memory components to enhance this basic mechanism for improving performance.
Memory Maps
It is useful to graphically portray the entire range of possible addresses that can be generated by a microprocessor (also called the address space) in the form of a memory map as in the following examples of Figure 21.
Figure 21
Consider a memory space with 16 address bits (i.e. n=16) as shown in the middle memory map above. Many small systems used in embedded applications use no more than 16 bits. This then defines an address space of 64K. Note that all of the addresses are shown in hexadecimal. We assume the individual bits of the address bus are labelled A15 (the msb) thru A0 (the lsb). The minimum address is 0x0000 (16 binary 0s) and the maximum address is 0xFFFF (16 binary 1s) which is 216 1. The fact that the maximum address is shown at the top of the diagram is completely arbitrary and has no bearing on our discussion. The first address of the top half of the memory map is 0x8000. Every address from this point upward to 0xFFFF has
Dr. D. Capson Department of Electrical and Computer Engineering, McMaster University
20
A15 = 1. Every address in the bottom half, that is, 0x7FFF down to 0x0000 has A15 = 0. In general, the most significant address bit (An-1 for an n-bit address bus) divides the memory
exactly in half. In the 16-bit example, every address from 0xC000 upward to 0xFFFF has address bit A14 (the second m.s.b.) equal to 1. Therefore all addresses in the top quarter of the memory map have A15 =1 and A14 = 1. In general, the value of the two msbs An-1 and An-2 of any n-bit address thus identify which quarter of the memory map in which it belongs. Similarly, the memory map is divided into eighths by the first three msbs of the address and so on. Try a few values to convince yourself that this is true. A typical microprocessor memory system will typically have a limited amount of memory as well as empty space in which there is no memory available. The memory map shows the relative space occupied by all memory devices (and any empty space) proportionally to the total address space. The memory map may also indicate starting- and ending-addresses for each of theses regions as well as I/O device addresses if memory-mapped I/O is being used (however, we will not consider memory-mapped I/O in this discussion) In most real implementations, the address space is rarely completely filled with memory devices. For example, in a typical PC, the Pentium processor has a 32-bit address bus which provides a 4G address space. For a desktop system with say 500Mbytes of memory, this is only 1/8 of the possible address space. That is, 7/8 (87.5 %) of the address space is empty ! With 1Gbyte of memory, you are using only 25%.
Think about this:
Next generation microprocessors will have 64-bit address buses. Estimate the cost at todays prices, to fully populate a 64-bit address space with DRAM.
Dr. D. Capson
21
Figure 22
Dr. D. Capson
22
Figure 23
With this circuit, as many as 8 memory devices, each to a maximum size of 8K, could be connected to the address and data buses. The address range in which each device responds is, of course, determined by which of the decoder outputs is used to drive its chip select input. The regions of the memory map have been labelled with the corresponding names of the decoder outputs. Note that we are showing only the address decoding logic and we have eliminated the connections to the data bus since that really has no role in the decoding of the address space in this discussion.
Dr. D. Capson
23
Figure 24
Dr. D. Capson
24
Similary, if both A15 and A14 are used as enables to the decoder, the bottom quarter of the address space is divided into eight 2K blocks with addresses as shown in Figure 25:
Figure 25
Dr. D. Capson
25
Figure 26
Non-exhaustive decoding
That is, there are two addresses, one with A15 = 0 and one with A15 = 1, that activate the same decoder output. For example, address 0000 and adresss 0x8000 each activate output Y0. Similarly, addresses 0x1000 and 0x9000 both activate output Y1 and so on. The result is that half of the memory addresses are wasted. In this example, there are only 32K unique addresses (0x0000 to 0x7FFF). Addresses 0x8000 to 0xFFFF are not available for other uses. This is known as non-exhaustive decoding. Of course A15 still exists on the address bus and is still being driven by the microprocessor. We havent eliminated it, we are only ignoring A15 in the decode logic. Although the entire address space is used, there are only 8 unique address ranges for a total of 32K.
Dr. D. Capson
26
Figure 27
Exhaustive decoding
The RAM and ROM are each 8K x 8 devices (13 address lines and 8 data lines). The output enable (/OE) of each device is connected to the /RD signal so that the memory places its value onto the data bus at the proper time in the memory read machine cycle. The /WE input of the RAM is controlled by the /WR signal so that the memory takes a value from the data bus at the proper time in the memory write machine cycle. The address range to which each memory device responds is determined by the choice of connection from the decoder output to its chip select input. As shown, the RAM responds to addresses 0x0000 to 0x1FFF and the ROM responds to addresses 0x4000 to 0x5FFF. This is easily observed in the memory map. By changing its chip select connection to other unused decoder outputs, a memory device may be seen to reside in any of the 8K address blocks. If the microprocessor reads or writes with any address in an 8K range in which no memory device is enabled, then no valid data is transferred. Up to eight 8K x 8 devices could be supported with this circuit. With the two memory devices as shown, the percentage of the address space that is populated is: (8K + 8K)/64K = 25%. There is decoding logic for the entire 64K space.
Dr. D. Capson
27
If the memory device is smaller than the decoded block, then regions of redundant addresses are generated. For example, using the same decode logic as above, consider Figure 28 in which 4K x 8 memory devices are used in place of the 8K memory devices:
Figure 28
Since address bit A12 is effectively not used, the memory map contains duplicate regions of addresses which access the same physical memory device ! For example, address 0000 accesses the same location as address 0x1000, address 0x4FFF accesses the same physical memory location as address 0x5FFF and so on. Two 8K blocks are occupied in the address space, but there is only 4K + 4K = 8K of physical memory available. So the percentage of populated space is : (4K + 4K)/64K = 12.5%. Address bit A12 is still driven by the microprocessor on the address bus but is ignored externally and again this leads to wasted addresses in the memory map as in the case of nonexhaustive decode logic. In general, logically replicated areas occur in the memory map when k + p n or k + q n (refer to Figure 18). Redundant areas of duplicated address space as shown in this example, is sometimes called foldback memory. Foldback memory regions are transparent to the microprocessor since it operates independently of any external decode logic and memory devices. The microprocessor executes memory read and write cycles and relies on the design of the external logic to ensure that reads and writes from/to memory take place correctly.
Dr. D. Capson Department of Electrical and Computer Engineering, McMaster University
28
If instead of 4K devices as above, 2K devices were used with this decode logic, what would be the effect in the memory map?
Conclusion
This design of decode logic as shown here is valid for any size of address space, that is, any value of n. Decode logic is generally not dependent on the size of the data bus. There are many ways to implement decode logic; one of the most common uses of the early PAL devices was to implement decode logic for microprocessor memory systems.
Dr. D. Capson
29