Unit 2 - Embedded System

PROCESSOR AND MEMORY ORGANIZATION
[UNIT-II]
V.V.C.E.T
EMBEDDED SYSTEMS DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING UNIT - II PROCESSOR AND MEMORY ORGANIZATION Structural units in a processor Selection of processor Memory devices DMA Memory management Cache mapping techniques, dynamic allocation - Fragmentation Interfacing processor, Memory and I/O units.
CASE STUDY: Required Memory devices for an Automatic Washing machine Chocolate vending machine Digital Camera and Voice recorder Prepared by M.Sujith, Lecturer, Department of Electrical and Electronics Engineering, Vidyaa Vikas College of Engineering and Technology.
HOD/EEE
Department of EEE
Page 1
[UNIT-II]
V.V.C.E.T
EMBEDDED SYSTEM PROCESSOR CHIP OR CORE
MICROPROCESSOR What is a Microprocessor? A microprocessor is a multipurpose, clock-driven, register-based electronic device that reads binary instructions from a storage device called memory accepts binary data as input and processes data according to the instructions given and provides results as output. Ex: 8085, 8086, Z80, 6800, Pentium processors etc Why do we use microprocessors? Arithmetic / Logic Unit Efficiently implements digital systems (executes programs very efficiently) Easier to design families of products and can be extended to meet rapidly changing market demands. (Optimized design in terms of achieving greater speed i.e. processing power)
Department of EEE
Page 2
[UNIT-II]
V.V.C.E.T
MICROCONTROLLER What is a Microcontroller? A microcontroller is essentially an entire computer on a single chip. Ex: Intels 8051, 8096, Motorola M68HC11 XX / M68HC12XX, PIC 16XX series etc. Essential component of a control or Communication unit
Department of EEE
Page 3
[UNIT-II]
V.V.C.E.T
Other Processors
Department of EEE
Page 4
[UNIT-II]
V.V.C.E.T
STRUCTURAL UNITS IN A PROCESSOR
BUSES 1) Internal and external buses interconnect the processor internal units with the external system memories, I/O devices and all other system elements 2) Address, data and control buses
MDR, MAR, BIU , PC and SP 3) MDR (memory data register) holds tMDR, MAR, BIU, PC and SP A and he accessed byte or word 4) MAR (memory address register) holds the address 5) BIU (Bus Interface Unit) 6) Program Counter or Instruction Pointer and 7) Stack Pointer Registers
Department of EEE
Page 5
[UNIT-II]
V.V.C.E.T
8) ARS (Application Register Set): Set of on-chip registers for use in the application program. Register set also called file and associates an ALU or FLPU. 9) Register window- a subset of registers with each subset storing static variables and status words of a task or program thread. Changing windows help in fast context-switching in a program ALU, FLPU 10) ALU and FLPU (Arithmetic and Logic operations Unit and Floating Points operations Unit). FLPU associates a FLP register set for operations.
CACHES 12) Instruction, Data and Branch Target Caches and associated PFCU (Prefetch control unit) for pre-fetching the instructions, data and next branch target instructions, respectively. Multi-way Cache Example- 16 kB, 32-way Instruction cache with 32 byte block for data and 16 kB in ARM Cache block Enables simultaneous caching of several memory locations of a set of instructions
AOU 13) AOU (Atomic Operations Unit ) An instruction is broken into number of processor-instructions called atomic operations (AOs), AOU finishes the AOs before an interrupt of the process occurs - Prevents problems arising out of incomplete processor operations on the shared data in the programs
FEATURES IN MOST PROCESSORS Fixed Instruction Cycle Time RISC processor core 32-bit Internal Bus Width to facilitate the availability of arithmetic operations on 32- bit operands in a single cycle. The 32-bit bus a necessity for signal processing and control system instructions. Program-Counter (PC) bits and its reset value Stack-Pointer bits with and its initial reset value
Instruction, Branch Target and Data Cache Memory-Management unit (MMU) Floating Point Processing unit System Register Set Floating Point Register Set Pre-fetch Control Unit for data into the I and D-caches Instruction level parallelism units (i) multistage pipeline
Page 6
Department of EEE
[UNIT-II]
V.V.C.E.T
(ii)
Multi-line superscalar processing
Executing RISC architecture most instructions on in a single clock cycle execution per instruction (by hardwired implementation of instructions) Using multiple register-sets or register windows or files and Greatly reducing ALU dependency on the external memory accesses for data due to the reduced number of addressing modes provided RISC Load and store architecture. Before ALU operations, the operands are loaded into the registers and similarly the write back result is in the register and then stored at the external memory addresses
CYCLES On cycle 1, the first instruction I0 enters the instruction fetch (IF) stage of the pipeline and stops at pipeline latch (buffer) between instruction fetch and instruction decode (ID) stage of the pipeline. On cycle 2, the second instruction I1 enters the instruction fetch stage, while instruction I proceeds to instruction decode stage. On cycle 3 the instruction I2 enters the register (inputs) read (RR) stage, instruction I1 is in the instruction decode stage, and instruction I2 enters instruction fetch stage. Instructions proceed through the pipeline at one stage per cycle until they reach the register (result) write-back (WB) stage, at which point execution of the instruction I0 is complete. On cycle 6 in the example, instructions I1 through I5 are in the pipeline, while instruction I0 has completed and is no longer in the pipeline. The pipelined processor is still executing instructions at a rate (throughput) of one instruction
Department of EEE Page 7
[UNIT-II]
V.V.C.E.T
per cycle, but the latency of each instruction is now 5 cycles instead of 1. But each cycle period is now 1/5 or less compared to the case without pipelining.
Thus processing performance can improve or more times in five stage pipe line .
Instruction level parallelism (ILP) Execute several instructions is parallel. Two or more instructions execute in parallel as well as in pipeline. During the in which two parallel pipelines in a processor and two instructions In and In+1 executing in parallel at the separate execution units
Department of EEE
Page 8
[UNIT-II]
V.V.C.E.T
PROCESSOR PERFORMANCE Performance of a processor is measured in terms of following metrics: MIPS: It is the measure of processing speed of a processor in million instructions per sec. MFLOPS: It is the measure of processing speed of a processor or DSP in million floating point operations per second Dhrystone per second: It is a benchmarking program developed by Reinhold P. Weicker in 1984 that measures processors performance for processing integers and strings. The benchmark program is available in C, Pascal or Java and benchmarks a CPU and not the performance of IO or OS calls. This metric measures the number of times the program can run in a second. 1 MIPS = 1757 Dhrystone / sec
Department of EEE
Page 9
[UNIT-II]
V.V.C.E.T
The EDN Embedded Benchmark Consortium (EEMBC) proposed five-benchmark program suites for 6 different areas of application of embedded systems: - Telecommunication (modems, xDSL) - Consumer Electronics (digital cameras) - Automotive and Industrial Electronics - Networking (Networking processors) - Office Automation (printers, plotters) - Digital Entertainment (PDA, cell phone) ESSENTIAL CHARACTERISTICS OF PROCESSORSTRUCTURE
Superscalar Processing
A superscalar processor has the capacity to fetch (instructions from memory), decode (instructions) and execute more than one instructions in parallel at any instant. Superscaling allows ( two or more) instructions to be processed in parallel (full overlapping). Multiple units are provided for instruction processing. Supports pipelining Power PC MPC 601 (RISC, first PowerPC, 66 MHz, 132 MIPS) - 3 execution units - 1 branch unit (branching) - 1 integer unit - 1 floating point unit - can dispatch up to 2 instructions and process 3 every clock cycle
Department of EEE
Page 10
[UNIT-II]
V.V.C.E.T
In Pentium two 5-stage pipelines are there to execute two instructions per clock cycle. Whereas Pentium II has a single stage pipeline but multiple functional units.
MICROCODE AND HARDWIRED Microcode: Inside a CPU, the instructions are decoded to a sequence of microcode instructions, which in turn calls a sequence of nanocode commands which controls the sequences and ALU. The instructions do not operate directly on the internal resources. Neither the microcode nor the nanocode are available to the programmer. Steps (known as microcodes) for processing an instruction in the CPU involves following: o Instruction fetch from memory (IF) o Decode instruction (ID) o Load operands from memory (OL) o Execute instruction (EX) o Store results in memory (OS) Note: (1) The fastest instruction will have all operands in CPU (in RISC) such that single clock cycle is needed to execute them. (2) Microcoding requires multiple cycles to load sequencers etc. and therefore cannot be easily used to implement single-cycle execution unit. Hardwired: In some processors (RISC) all the execution units are hardwired i.e. instructions are directly executed by hardware and there wont be any micro coding for processing. Hence instructions will be executed in a single cycle. Pipelining Pipelining means dividing the ALU circuit into n substages. All common steps (IF, ID, OL, EX, OS) involved in instruction processing by the CPU can be pipelined. Each major step of the instruction processing is assigned to and handled independently by a separate subunit of the CPU pipeline. Pipeline Stall is a disadvantage of pipelining and is caused when any stage within the pipeline cannot complete its allotted task at the same time as its peers. This can occur when (i) wait states are inserted into external memory access
[UNIT-II]
V.V.C.E.T
(ii) instructions use iterative techniques or (iii) there is a change in program flow (due to branching etc.). Branch Penalty: It is the time required for re-processing the instructions which had become redundant (executed in part at preceding stages) due to the execution of a branching instruction in a multistage pipeline. Data Dependency Penalty: It is the waiting time by an instruction for further execution when it is dependent on the data output of the other instruction. This happens due to improper alignment of both the instructions.
CACHING Caches are small, fast memory that holds copies of some of the contents of main memory. They provide higherspeed access for the CPU. A cache controller meditates between the CPU and the main memory Cache hit: if the requested location is available in the cache Cache miss: if the requested location is not available in the cache resulting in cache miss penalty (extra time needed to access the missing memory location).
Department of EEE
Page 12
[UNIT-II]
V.V.C.E.T
Cache miss can occur due to various reasons / situations Compulsory miss (cold miss): the first time a location is used (not referenced before) Capacity miss: the programs working set is too large for the cache _ Conflict miss: two particular memory locations are fighting for the same cache line.Behavior of several programs running concurrently must be examined to accurately estimate performance. CPU POWER CONSUMPTION Power: energy consumption per unit time. more consumption -> more heat generation battery life -> depends on energy consumption power -> energy and power consumption CMOS Circuits: used to build all kinds of digital systems - Voltage drops: power consumption proportional to V2 (reduce power supply) o Toggling: more power consumed when changing states (o/p value). So to reduce consumption reduce the circuits operating speed, and unnecessary changes to the inputs of a CMOS circuit (will eliminate unnecessary glitches at the output) o Leakage: some charge leaks through the substrate even in inactive state of the CMOS circuit. (remove power supply -> more time needed to reconnect the supply) Power Saving Strategies Use the CPU at reduced voltage levels (e.g reducing the supply from 5V to 3.3V will reduce power consumption by 52 / 3.32 = 2.29. Operate the CPU at lower clock rates -> may reduce power consumption (but not energy consumption). Disable certain functional units that are not currently needed (reduces energy consumption) Allow part of the CPU to be totally disconnected from the power supply (eliminates leakage current)
Department of EEE
Page 13
[UNIT-II]
V.V.C.E.T
Static power management: invoked by the user like power-down mode activation by executing an instruction. To come out of this mode an interrupt or any other even is needed. No instruction is available for exiting from this mode. Dynamic power management: done by the dynamic activity of the CPU like turning off certain sections of the CPU when the currently executing instruction do not need that particular unit or section.
SELECTING PROCESSORS FOR EMBEDDED APPLICATIONS Instruction set Maximum bits in an operand (8 or 16 or 32) in a single arithmetic or logical operation Clock frequency in MHz Processing speed in MIPS / MFLOPS / Dhrystone Processors ability to solve complex algorithms to meet deadlines
================================================================================
Processor Organisation
Processor ALU. Processor circuit does sequential operations and a clock guides these. Program counter and stack pointer, which points to the instruction to be fetched and top of the data pushed into the stack. Certain processor have on-chip memory management unit (MMU).
Registers General-purpose registers. Registers organize onto a common internal bus of the processor. A register is of 32, 16 or 8 bits depending on whether the ALU performs at an instance a 32- or 16- or 8-bit operation
CISC Processor may have CISC (Complex Instruction Set Computer) or RISC (Reduced Instruction Set Computer) architecture may affect the system design. CISC has ability to process complex instructions and complex data sets with fewer registers as it provides for a large number of addressing modes.
Department of EEE
Page 14
[UNIT-II]
V.V.C.E.T
RISC Simpler instructions and all in a single cycle per instruction. New RISC processors, such as ARM 7 and ARM9 also provide for a few most useful CISC instructions also. CISC converges to a RISC implementation because the most instructions are hardwired and implement in single clock cycle Interrupts Processor provides for the inputs for external interrupts so that the external circuits can send the interrupt signals May possess an internal interrupt controller (handler) to program the service routine priorities and to allocate vector addresses.
Memory
Most of the modern computer system has been designed on the basis of an architecture called Von-Neumann
1
Architecture =The so-called von Neumann architecture is a model for a computing machine that uses a single storage structure to hold both the set of instructions on how to perform the computation and the data required or generated by the computation. Such machines are also known as stored-program computers. The separation of storage from the processing unit is implicit in this model. By treating the instructions in the same way as the data, a stored-program machine can easily change the instructions. In other words the machine is reprogrammable. One important motivation for such a facility was the need for a program to increment or otherwise modify the address portion of instructions. This became less important when index registers and indirect addressing became customary features of machine architecture.
1
The Memory stores the instructions as well as data. No one can distinguish an instruction and data. The CPU has to be directed to the address of the instruction codes.
Department of EEE
Page 15
[UNIT-II]
V.V.C.E.T
The memory is connected to the CPU through the following lines 1. Address 2. Data 3. Control
In a memory read operation the CPU loads the address onto the address bus. Most cases these lines are fed to a decoder which selects the proper memory location. The CPU then sends a read control signal. The data is stored in that location is transferred to the processor via the data lines. In the memory write operation after the address is loaded the CPU sends the write control signal followed by the data to the requested memory location. The memory can be classified in various ways i.e. based on the location, power consumption, way of data storage etc
The memory at the basic level can be classified as 1. Processor Memory (Register Array) 2. Internal on-chip Memory 3. Primary Memory 4. Cache Memory 5. Secondary Memory
Processor Memory (Register Array)

Most processors have some registers associated with the arithmetic logic units. They store the operands and the result of an instruction. The data transfer rates are much faster without needing any additional clock cycles. The number of registers varies from processor to processor. The more is the number the faster is the instruction execution. But the complexity of the architecture puts a limit on the amount of the processor memory.
Department of EEE
Page 16
[UNIT-II]
V.V.C.E.T
Internal on-chip Memory In some processors there may be a block of memory location. They are treated as the same way as the external memory. However it is very fast. Primary Memory This is the one which sits just out side the CPU. It can also stay in the same chip as of CPU. These memories can be static or dynamic. Cache Memory This is situated in between the processor and the primary memory. This serves as a buffer to the immediate instructions or data which the processor anticipates. There can be more than one levels of cache memory. Secondary Memory These are generally treated as Input/Output devices. They are much cheaper mass storage and slower devices connected through some input/output interface circuits. They are generally magnetic or optical memories such as Hard Disk and CDROM devices. The memory can also be divided into Volatile and Non-volatile memory. Volatile Memory The contents are erased when the power is switched off. Semiconductor Random Access Memories fall into this category. Non-volatile Memory The contents are intact even of the power is switched off. Magnetic Memories (Hard Disks), Optical Disks (CDROMs), Read Only Memories (ROM) fall under this category.
Department of EEE
Page 17
[UNIT-II]
V.V.C.E.T
Data Storage
An m word memory can store m x n: m words of n bits each. One word is located at one address therefore to address m words we need. k = Log2(m) address input signals
k
or k number address lines can address m = 2 words Example 4,096 x 8 memory: 32,768 bits 12 address input signals 8 input/output data signals
Memory access
The memory location can be accessed by placing the address on the address lines. The control lines read/write selects read or write. Some memory devices are multi-port i.e. multiple accesses to different locations simultaneously
Department of EEE
Page 18
[UNIT-II]
V.V.C.E.T
Memory Specifications
The specification of a typical memory is as follows The storage capacity: The number of bits/bytes or words it can store The memory access time (read access and writes access): How long the memory takes to load the data on to its data lines after it has been addressed or how fast it can store the data upon supplied through its data lines. This reciprocal of the memory access time is known as Memory
Bandwidth
The Power Consumption and Voltage Levels: The power consumption is a major factor in embedded systems. The lesser is the power consumption the more is packing density. Size: Size is directly related to the power consumption and data storage capacity.
Four generation of RAM chips There are two important specifications for the Memory as far as Real Time Embedded Systems are concerned. Write Ability Storage Performance
Department of EEE
Page 19
[UNIT-II]
V.V.C.E.T
Write ability
It is the manner and speed that a particular memory can be written Ranges of write ability High end processor writes to memory simply and quickly e.g., RAM Middle range processor writes to memory, but slower e.g., FLASH, EEPROM (Electrically Erasable and Programmable Read Only Memory) Lower range special equipment, programmer, must be used to write to memory e.g., EPROM, OTP ROM (One Time Programmable Read Only Memory) Low end bits stored only during fabrication e.g., Mask-programmed ROM In-system programmable memory Can be written to by a processor in the embedded system using the memory Memories in high end and middle range of write ability
Storage permanence
It is the ability to hold the stored bits. Range of storage permanence High end essentially never loses bits e.g., mask-programmed ROM -- Middle range holds bits days, months, or years after memorys power source turned off e.g., NVRAM Lower range holds bits as long as power supplied to memory e.g., SRAM Low end begins to lose bits almost immediately after written e.g., DRAM Nonvolatile memory Holds bits after power is no longer supplied High end and middle range of storage permanence
Common Memory Types Read Only Memory (ROM)

This is a nonvolatile memory. It can only be read from but not written to, by a processor in an embedded system. Traditionally written to, programmed, before inserting to embedded system
Department of EEE
Page 20
[UNIT-II]
V.V.C.E.T
Uses Store software program for general-purpose processor program instructions can be one or more ROM words Store constant data needed by system Implement combinational circuit
EPROM: Erasable programmable ROM

This is known as erasable programmable read only memory. The programmable component is a MOS transistor. This transistor has a floating gate surrounded by an insulator. The Negative charges form a channel between source and drain storing a logic 1. The Large positive voltage at gate causes negative charges to move out of channel and get trapped in floating gate storing a logic 0. The (Erase) Shining UV rays on surface of floatinggate causes negative charges to return to channel from floating gate restoring the logic 1. An EPROM package showing quartz window through which UV light can pass. The EPROM has Better write ability can be erased and reprogrammed thousands of times Reduced storage permanence program lasts about 10 years but is susceptible to radiation and electric noise Typically used during design development
EEPROM
EEPROM is otherwise known as Electrically Erasable and Programmable Read Only Memory. It is erased typically by using higher than normal voltage. It can program and erase individual words unlike the EPROMs where exposure to the UV light erases everything. It has
Department of EEE
Page 21
[UNIT-II]
V.V.C.E.T
Better write ability can be in-system programmable with built-in circuit to provide higher than normal voltage built-in memory controller commonly used to hide details from memory user writes very slow due to erasing and programming busy pin indicates to processor EEPROM still writing can be erased and programmed tens of thousands of times Similar storage permanence to EPROM (about 10 years) Far more convenient than EPROMs, but more expensive
Flash Memory
It is an extension of EEPROM. It has the same floating gate principle and same write ability and storage permanence. It can be erased at a faster rate i.e. large blocks of memory erased at once, rather than one word at a time. The blocks are typically several thousand bytes large Writes to single words may be slower Entire block must be read, word updated, then entire block written back Used with embedded systems storing large data items in nonvolatile memory e.g., digital cameras, TV set-top boxes, cell phones
RAM: Random-access memory

Typically volatile memory
bits are not held without power supply Read and written to easily by embedded system during execution Internal structure more complex than ROM a word consists of several memory cells, each storing 1 bit each input and output data line connects to each cell in its column rd/wr connected to every cell when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit when rd/wr indicates read
Department of EEE
Page 22
[UNIT-II]
V.V.C.E.T
Basic types of RAM

SRAM: Static RAM
Memory cell uses flip-flop to store bit Requires 6 transistors Holds data as long as power supplied DRAM: Dynamic RAM Memory cell uses MOS transistor and capacitor to store bit
Department of EEE
Page 23
[UNIT-II]
V.V.C.E.T
More compact than SRAM Refresh required due to capacitor leak words cells refreshed when read Typical refresh rate 15.625 microsec. Slower to access than SRAM
Ram variations
PSRAM: Pseudo-static RAM
DRAM with built-in memory refresh controller Popular low-cost high-density alternative to SRAM NVRAM: Nonvolatile RAM Holds data after external power removed Battery-backed RAM SRAM with own permanently connected battery writes as fast as reads no limit on number of writes unlike nonvolatile ROM-based memory SRAM with EEPROM or flash stores complete RAM contents on EEPROM or flash before power
Composing memory
Memory size needed often differs from size of readily available memories
When available memory is larger, simply ignore unneeded high-order address bits and higher data lines When available memory is smaller, compose several smaller memories into one larger memory Connect side-by-side to increase width of words Connect top to bottom to increase number of words added high-order address line selects smaller memory containing desired word using a decoder Combine techniques to increase number and width of words
Department of EEE
Page 24
[UNIT-II]
V.V.C.E.T
Department of EEE
Page 25
[UNIT-II]
V.V.C.E.T
Memory Hierarchy
Objective is to use inexpensive, fast memory Main memory Large, inexpensive, slow memory stores entire program and data Cache Small, expensive, fast memory stores copy of likely accessed parts of larger memory Can be multiple levels of cache
Cache
Usually designed with SRAM faster but more expensive than DRAM Usually on same chip as processor space limited, so much smaller than off-chip main memory faster access (1 cycle vs. several cycles for main memory) Cache operation Request for main memory access (read or write)
Department of EEE
Page 26
[UNIT-II]
V.V.C.E.T
First, check cache for copy cache hit - copy is in cache, quick access cache miss - copy not in cache, read address and possibly its neighbors into cache Several cache design choices cache mapping, replacement policies, and write techniques
Cache Mapping
is necessary as there are far fewer number of available cache addresses than the memory Are address contents in cache? Cache mapping used to assign main memory address to cache address and determine hit or miss Three basic techniques: Direct mapping Fully associative mapping Set-associative mapping Caches partitioned into indivisible blocks or lines of adjacent memory addresses usually 4 or 8 addresses per line DIRECT MAPPING Main memory address divided into 2 fields Index which contains - cache address - number of bits determined by cache size Tag - compared with tag stored in cache at address indicated by index - if tags match, check valid bit Valid bit indicates whether data in slot has been loaded from memory Offset used to find particular word in cache line
Department of EEE
Page 27
[UNIT-II]
V.V.C.E.T
Fully Associative Mapping

Complete main memory address stored in each cache address All addresses stored in cache simultaneously compared with desired address Valid bit and offset same as direct mapping
Set-Associative Mapping
Compromise between direct mapping and fully associative mapping
Index same as in direct mapping

[UNIT-II]
V.V.C.E.T
But, each cache address contains content and tags of 2 or more memory address locations Tags of that set simultaneously compared as in fully associative mapping Cache with set size N called N-way set-associative 2-way, 4-way, 8-way are common
Cache-Replacement Policy
Technique for choosing which block to replace
when fully associative cache is full when set-associative caches line is full Direct mapped cache has no choice Random replace block chosen at random LRU: least-recently used replace block not accessed for longest time FIFO: first-in-first-out push block onto queue when accessed choose block to replace by popping queue
Department of EEE
Page 29
[UNIT-II]
V.V.C.E.T
Cache Write Techniques

When written, data cache must update main memory Write-through write to main memory whenever cache is written to easiest to implement processor must wait for slower main memory write potential for unnecessary writes Write-back main memory only written when dirty block replaced extra dirty bit for each block set when cache block written to reduces number of slow main memory writes
Cache Impact on System Performance

Most important parameters in terms of performance:
Total size of cache - total number of data bytes cache can hold - tag, valid and other house keeping bits not included in total Degree of associativity Data block size Larger caches achieve lower miss rates but higher access cost e.g., - 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles
- avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change - avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles (improvement) 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change - avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles
Cache Performance Trade-Offs

Improving cache hit rate without increasing size Increase line size Change set-associativity
[UNIT-II]
V.V.C.E.T
Advanced RAM
DRAMs commonly used as main memory in processor based embedded systems
high capacity, low cost Many variations of DRAMs proposed need to keep pace with processor speeds FPM DRAM: fast page mode DRAM EDO DRAM: extended data out DRAM SDRAM/ESDRAM: synchronous and enhanced synchronous DRAM RDRAM: rambus DRAM
Basic DRAM
Address bus multiplexed between row and column components
Row and column addresses are latched in, sequentially, by strobing ras (row address strobe) and cas (column address strobe) signals, respectively Refresh circuitry can be external or internal to DRAM device strobes consecutive memory address periodically causing memory content to be refreshed Refresh circuitry disabled during read or write operation
Department of EEE
Page 31
[UNIT-II]
V.V.C.E.T
Fast Page Mode DRAM (FPM DRAM)

Each row of memory bit array is viewed as a page
Page contains multiple words Individual words addressed by column address Timing diagram: row (page) address sent 3 words read consecutively by sending column address for each
Extra cycle eliminated on each read/write of words from same
Department of EEE
Page 32
[UNIT-II]
V.V.C.E.T
Extended data out DRAM (EDO DRAM)

Improvement of FPM DRAM
Extra latch before output buffer allows strobing of cas before data read operation completed Reduces read/write latency by additional cycle
(S)ynchronous and Enhanced Synchronous (ES) DRAM

SDRAM latches data on active edge of clock Eliminates time to detect ras/cas and rd/wr signals A counter is initialized to column address then incremented on active edge of clock to access consecutive memory locations ESDRAM improves SDRAM added buffers enable overlapping of column addressing faster clocking and lower read/write latency possible
Department of EEE
Page 33
[UNIT-II]
V.V.C.E.T
Rambus DRAM (RDRAM)

More of a bus interface architecture than DRAM architecture
Data is latched on both rising and falling edge of clock Broken into 4 banks each with own row decoder can have 4 pages open at a time Capable of very high throughput
DRAM Integration Problem SRAM easily integrated on same chip as processor DRAM more difficult Different chip making process between DRAM and conventional logic Goal of conventional logic (IC) designers: - minimize parasitic capacitance to reduce signal propagation delays and power consumption Goal of DRAM designers: - create capacitor cells to retain stored information Integration processes beginning to appear
Memory Management Unit (MMU) Duties of MMU Handles DRAM refresh, bus interface and arbitration Takes care of memory sharing among multiple processors Translates logic memory addresses from processor to physical memory addresses of DRAM Modern CPUs often come with MMU built-in Single-purpose processors can be used
DMA Controller
Introduction
Direct Memory Access (DMA) allows devices to transfer data without subjecting the processor a heavy overhead. Otherwise, the processor would have to copy each piece of data from the source to the destination. This is typically slower than copying normal blocks of memory since access to I/O devices over a peripheral bus is generally slower than normal system RAM. During this time the processor would be unavailable for any other tasks involving processor bus access. But it can continue to work on any work which does not require bus access.
Department of EEE
Page 34
[UNIT-II]
V.V.C.E.T
DMA transfers are essential for high performance embedded systems where large chunks of data need to be transferred from the input/output devices to or from the primary memory.
DMA Controller
A DMA controller is a device, usually peripheral to a CPU that is programmed to perform a sequence of data transfers on behalf of the CPU. A DMA controller can directly access memory and is used to transfer data from one memory location to another, or from an I/O device to memory and vice versa. A DMA controller manages several DMA channels, each of which can be programmed to perform a sequence of these DMA transfers. Devices, usually I/O peripherals, that acquire data that must be read (or devices that must output data and be written to) signal the DMA controller to perform a DMA transfer by asserting a hardware DMA request (DRQ) signal. A DMA request signal for each channel is routed to the DMA controller. This signal is monitored and responded to in much the same way that a processor handles interrupts. When the DMA controller sees a DMA request, it responds by performing one or many data transfers from that I/O device into system memory or vice versa. Channels must be enabled by the processor for the DMA controller to respond to DMA requests. The number of transfers performed, transfer modes used, and memory locations accessed depends on how the DMA channel is programmed. A DMA controller typically shares the system memory and I/O bus with the CPU and has both bus master and slave capability. Fig. shows the DMA controller architecture and how the DMA controller interacts with the CPU. In bus master mode, the DMA controller acquires the system bus (address, data, and control lines) from the CPU to perform the DMA transfers. Because the CPU releases the system bus for the duration of the transfer, the process is sometimes referred to as cycle stealing. In bus slave mode, the DMA controller is accessed by the CPU, which programs the DMA controller's internal registers to set up DMA transfers. The internal registers consist of source and destination address registers and transfer count registers for each DMA channel, as well as control and status registers for initiating, monitoring, and sustaining the operation of the DMA controller.
Department of EEE
Page 35
[UNIT-II]
V.V.C.E.T
DMA ACHITECTURE
Department of EEE
Page 36
[UNIT-II]
V.V.C.E.T
DMA Transfer Types and Modes

DMA controllers vary as to the type of DMA transfers and the number of DMA channels they support. The two types of DMA transfers are flyby DMA transfers and fetch-and-deposit DMA transfers. The three common transfer modes are single, block, and demand transfer modes. These DMA transfer types and modes are described in the following paragraphs. The fastest DMA transfer type is referred to as a single-cycle, singleaddress, or flyby transfer. In a flyby DMA transfer, a single bus operation is used to accomplish the transfer, with data read from the source and written to the destination simultaneously. In flyby operation, the device requesting service asserts a DMA request on the appropriate channel request line of the DMA controller. The DMA controller responds by gaining control of the system bus from the CPU and then issuing the preprogrammed memory address. Simultaneously, the DMA controller sends a DMA acknowledge signal to the requesting device. This signal alerts the requesting device to drive the data onto the system data bus or to latch the data from the system bus, depending on the direction of the transfer. In other words, a flyby DMA transfer looks like a memory read or write cycle with the DMA controller supplying the address and the I/O device reading or writing the data. Because flyby DMA transfers involve a single memory cycle per data transfer, these transfers are very efficient. Fig. shows the flyby DMA transfer signal protocol.
The second type of DMA transfer is referred to as a dual-cycle, dual-address, flow-through, or fetch-and-deposit DMA transfer. As these names imply, this type of transfer involves two memory or I/O cycles. The data being transferred is first read from the I/O device or memory into a temporary data register internal to the DMA controller. The data is then written to the memory or I/O device in the next cycle. FIG .shows the
Department of EEE
Page 37
[UNIT-II]
V.V.C.E.T
fetch-and-deposit DMA transfer signal protocol. Although inefficient because the DMA controller performs two cycles and thus retains the system bus longer, this type of transfer is useful for interfacing devices with different data bus sizes. For example, a DMA controller can perform two 16-bit read operations from one location followed by a 32-bit write operation to another location. A DMA controller supporting this type of transfer has two address registers per channel (source address and destination address) and bus-size registers, in addition to the usual transfer count and control registers. Unlike the flyby operation, this type of DMA transfer is suitable for both memory-to-memory and I/O transfers.
Single, block, and demand are the most common transfer modes. Single transfer mode transfers one data value for each DMA request assertion. This mode is the slowest method of transfer because it requires the DMA controller to arbitrate for the system bus with each transfer. This arbitration is not a major problem on a lightly loaded bus, but it can lead to latency problems when multiple devices are using the bus. Block and demand transfer modes increase system throughput by allowing the DMA controller to perform multiple DMA transfers when the DMA controller has gained the bus. For block mode transfers, the DMA controller performs the entire DMA sequence as specified by the transfer count register at the fastest possible rate in response to a single DMA request from the I/O device. For demand mode transfers, the DMA controller performs DMA transfers at the fastest possible rate as long as the I/O device asserts its DMA request. When the I/O device unasserts this DMA request, transfers are held off.
Department of EEE
Page 38
[UNIT-II]
V.V.C.E.T
DMA Controller Operation For each channel, the DMA controller saves the programmed address and count in the base registers and maintains copies of the information in the current address and current count registers, as shown in Fig.16.1. Each DMA channel is enabled and disabled via a DMA mask register. When DMA is started by writing to the base registers and enabling the DMA channel, the current registers are loaded from the base registers. With each DMA transfer, the value in the current address register is driven onto the address bus, and the current address register is automatically incremented or decremented. The current count register determines the number of transfers remaining and is automatically decremented after each transfer. When the value in the current count register goes from 0 to -1, a terminal count (TC) signal is generated, which signifies the completion of the DMA transfer sequence. This termination event is referred to as reaching terminal count. DMA controllers often generate a hardware TC pulse during the last cycle of a DMA transfer sequence. This signal can be monitored by the I/O devices participating in the DMA transfers. DMA controllers require reprogramming when a DMA channel reaches TC. Thus, DMA controllers require some CPU time, but far less than is required for the CPU to service device I/O interrupts. When a DMA channel reaches TC, the processor may need to reprogram the controller for additional DMA transfers. Some DMA controllers interrupt the processor whenever a channel terminates. DMA controllers also have mechanisms for automatically reprogramming a DMA channel when the DMA transfer sequence completes. These mechanisms include auto initialization and buffer chaining. The auto initialization feature repeats the DMA transfer sequence by reloading the DMA channel's current registers from the base registers at the end of a DMA sequence and re-enabling the channel. Buffer chaining is useful for transferring blocks of data into noncontiguous buffer areas or for handling double-buffered data acquisition. With buffer chaining, a channel interrupts the CPU and is programmed with the next address and count parameters while DMA transfers are being performed on the current buffer. Some DMA controllers minimize CPU intervention further by having a chain address register that points to a chain control table in memory. The DMA controller then loads its own channel parameters from memory. Generally, the more sophisticated the DMA controller, the less servicing the CPU has to perform. A DMA controller has one or more status registers that are read by the CPU to determine the state of each DMA channel. The status register typically indicates whether a DMA request is asserted on a channel and whether a channel has reached TC. Reading the status register often clears the terminal count information in the register, which leads to problems when multiple programs are trying to use different DMA channels. Steps in a Typical DMA cycle Device wishing to perform DMA asserts the processors bus request signal. 1. Processor completes the current bus cycle and then asserts the bus grant signal to the device. 2. The device then asserts the bus grant ack signal.
[UNIT-II]
V.V.C.E.T
3. The processor senses in the change in the state of bus grant ack signal and starts listening to the data and address bus for DMA activity. 4. The DMA device performs the transfer from the source to destination address. 5. During these transfers, the processor monitors the addresses on the bus and checks if any location modified during DMA operations is cached in the processor. If the processor detects a cached address on the bus, it can take one of the two actions: o Processor invalidates the internal cache entry for the address involved in DMA write operation o Processor updates the internal cache when a DMA write is detected 6. Once the DMA operations have been completed, the device releases the bus by asserting the bus release signal. 7. Processor acknowledges the bus release and resumes its bus cycles from the point it left off.
Department of EEE
Page 40
[UNIT-II]
V.V.C.E.T
Signal Description VCC: is the +5V power supply pin GND Ground
CLK: CLOCK INPUT: The Clock Input is used to generate the timing signals which control 82C37A operations. CS: CHIP SELECT: Chip Select is an active low input used to enable the controller onto the data bus for CPU communications. RESET: This is an active high input which clears the Command, Status, Request, and Temporary registers, the First/Last Flip-Flop, and the mode register counter. The Mask register is set to ignore requests. Following a Reset, the controller is in an idle cycle. READY: This signal can be used to extend the memory read and write pulses from the 82C37A to accommodate slow memories or I/O devices. HLDA: HOLD ACKNOWLEDGE: The active high Hold Acknowledge from the CPU indicates that it has relinquished control of the system busses. DREQ0-DREQ3: DMA REQUEST: The DMA Request (DREQ) lines are individual asynchronous channel request inputs used by peripheral circuits to obtain DMA service. In Fixed Priority, DREQ0 has the highest priority and DREQ3 has the lowest priority. A request is generated by activating the DREQ line of a channel. DACK will acknowledge the recognition of a DREQ signal. Polarity of DREQ is programmable. RESET initializes these lines to active high. DREQ must be maintained until the corresponding DACK goes active. DREQ will not be recognized while the clock is stopped. Unused DREQ inputs should be pulled High or Low (inactive) and the corresponding mask bit set. DB0-DB7: DATA BUS: The Data Bus lines are bidirectional three-state signals connected to the system data bus. The outputs are enabled in the Program condition during the I/O Read to output the contents of a register to the CPU. The outputs are disabled and the inputs are read during an I/O Write cycle when the CPU is programming the 82C37A control registers. During DMA cycles, the most significant 8-bits of the address are output onto the data bus to be strobed into an external latch by ADSTB. In memory-to-memory operations, data from the memory enters the 82C37A on the data bus during the read-from-memory transfer, then during the write-to-memory transfer, the data bus outputs write the data into the new memory location. IOR: READ: I/O Read is a bidirectional active low three-state line. In the Idle cycle, it is an input control signal used by the CPU to read the control registers. In the Active cycle, it is an output control signal used by the 82C37A to access data from the peripheral during a DMA Write transfer.
[UNIT-II]
V.V.C.E.T
IOW: WRITE: I/O Write is a bidirectional active low three-state line. In the Idle cycle, it is an input control signal used by the CPU to load information into the 82C37A. In the Active cycle, it is an output control signal used by the 82C37A to load data to the peripheral during a DMA Read transfer. EOP: END OF PROCESS: End of Process (EOP) is an active low bidirectional signal. Information concerning the completion of DMA services is available at the bidirectional EOP pin. The 82C37A allows an external signal to terminate an active DMA service by pulling the EOP pin low. A pulse is generated by the 82C37A when terminal count (TC) for any channel is reached, except for channel 0 in memory-to-memory mode. During memory-to-memory transfers, EOP will be output when the TC for channel 1 occurs. The EOP pin is driven by an open drain transistor on-chip, and requires an external pull-up resistor to VCC. When an EOP pulse occurs, whether internally or externally generated, the 82C37A will terminate the service, and if auto-initialize is enabled, the base registers will be written to the current registers of that channel. The mask bit and TC bit in the status word will be set for the currently active channel by EOP unless the channel is programmed for autoinitialize. In that case, the mask bit remains clear. A0-A3: ADDRESS: The four least significant address lines are bidirectional three-state signals. In the Idle cycle, they are inputs and are used by the 82C37A to address the control register to be loaded or read. In the Active cycle, they are outputs and provide the lower 4-bits of the output address. A4-A7: ADDRESS: The four most significant address lines are three-state outputs and provide 4-bits of address. These lines are enabled only during the DMA service. HRQ: HOLD REQUEST: The Hold Request (HRQ) output is used to request control of the system bus. When a DREQ occurs and the corresponding mask bit is clear, or a software DMA request is made, the 82C37A issues HRQ. The HLDA signal then informs the controller when access to the system busses is permitted. For standalone operation where the 82C37A always controls the busses, HRQ may be tied to HLDA. This will result in one S0 state before the transfer. DACK0-DACK3: DMA ACKNOWLEDGE: DMA acknowledge is used to notify the individual peripherals when one has been granted a DMA cycle. The sense of these lines is programmable. RESET initializes them to active low. AEN: ADDRESS ENABLE: Address Enable enables the 8-bit latch containing the upper 8 address bits onto the system address bus. AEN can also be used to disable other system bus drivers during DMA transfers. AEN is active high. ADSTB: ADDRESS STROBE: This is an active high signal used to control latching of the upper address byte. It will drive directly the strobe input of external transparent octal latches, such as the 82C82. During block
Department of EEE
Page 42
[UNIT-II]
V.V.C.E.T
operations, ADSTB will only be issued when the upper address byte must be updated, thus speeding operation through elimination of S1 states. ADSTB timing is referenced to the falling edge of the 82C37A clock. MEMR: MEMORY READ: The Memory Read signal is an active low three-state output used to access data from the selected memory location during a DMA Read or a memory-to-memory transfer. MEMW MEMORY WRITE: The Memory Write signal is an active low three-state output used to write data to the selected memory location during a DMA Write or a memory-to-memory transfer. NC: NO CONNECT: Pin 5 is open and should not be tested for continuity.
Functional Description The 82C37A direct memory access controller is designed to improve the data transfer rate in systems which must transfer data from an I/O device to memory, or move a block of memory to an I/O device. It will also perform memory-to-memory block moves, or fill a block of memory with data from a single location. Operating modes are provided to handle single byte transfers as well as discontinuous data streams, which allows the 82C37A to control data movement with software transparency. The DMA controller is a state-driven address and control signal generator, which permits data to be transferred directly from an I/O device to memory or vice versa without ever being stored in a temporary register. This can greatly increase the data transfer rate for sequential operations, compared with processor move or repeated string instructions. Memory-to-memory operations require temporary internal storage of the data byte between generation of the source and destination addresses, so memory-to-memory transfers take place at less than half the rate of I/O operations, but still much faster than with central processor techniques. The block diagram of the 82C37A is shown in Fig.16.6. The timing and control block, priority block, and internal registers are the main components. The timing and control block derives internal timing from clock input, and generates external control signals. The Priority Encoder block resolves priority contention between DMA channels requesting service simultaneously.
Memory Allocation To Program Segments and Blocks

Functions, Processes, Data and Stacks at the Various Segments of Memory Segment wise memory allocation in four segments; Code, Data, Stack and Extra (for examples, image, String)
Department of EEE
Page 43
[UNIT-II]
V.V.C.E.T
Different Data Structures at the Various Memory Blocks 1) Stacks Return addresses on the nested calls, Sets of LIFO (Last In First Out) retrievable data, Saved Contexts of the tasks as the stacks 2) Arrays One dimensional or multidimensional 3) Queues Sets of FIFO (First In First Out) retrievable data; Circular Queue (Example- a Printer Buffer); Block Queue (Example- a network stack) 4) Table 5) Look up Table Look-up-table row first column points to another memory block of a data structure data 6) List: In a list element, a data structure of an item also points to the next item 7) Process Control Block
Department of EEE
Page 44
[UNIT-II]
V.V.C.E.T
Fig. Different structure of stack at memory blocks Each stack pointer to the top of the stack to where the processor can read and write . A data word always retrieves in LIFO mode from stack
Below fig: (a) An array at a memory block with one pointer for its base, first element with index=0. Data word can retrieve from any element by defining pointer index. (b) A queue at a memory block with two pointers to point to its two elements at the front and back. A data word always retrieves in FIFO mode from a Queue. (c) A circular Queue at a memory block with two pointers points front and back (d) A memory block for a pipe with front and back points at two different tasks
Department of EEE
Page 45
[UNIT-II]
V.V.C.E.T
Department of EEE
Page 46
[UNIT-II]
V.V.C.E.T
MEMORY MAP Map to show the program and data allocation of the addresses to ROM, RAM, EEPROM or Flash in the system
PRINCETON ARCHITECTURE 80x86 processors and ARM7 have Princeton architecture for main memory. 8051-family microcontrollers have Harvard architecture.). Vectors and pointers, variables, program segments and memory blocks for data and stacks have different addresses in the program in Princeton memory architecture. HARVARD ARCHITECTURE When the address spaces for the data and for program are distinct Handling streams of data that are required to be accessed in cases of single instruction multiple data type instructions and DSP instructions. Separate data buses ensure simultaneous accesses for instructions and data. Program segments and memory blocks for data and stacks have separate set of addresses in Harvard architecture. Control signals and read-write instructions are also read-write instructions are also separate for accessing the program memory and data memory.
Department of EEE
Page 47
[UNIT-II]
V.V.C.E.T
Harvard and Princeton Memory Organizations
Department of EEE
Page 48
[UNIT-II]
V.V.C.E.T
Memory map for an exemplary embedded system, smart card needing 2 kB memory
Memory map for an exemplary Java embedded card with software for encrypting and deciphering the transactions
Department of EEE
Page 49
[UNIT-II]
V.V.C.E.T
Memory map sections in a smart card
Memory map sections in another smart card
Department of EEE
Page 50
[UNIT-II]
V.V.C.E.T
INTERFACING PROCESSOR, MEMORIES AND I/O DEVICES

REAL WORLD INTERFACING
Interfacing Using System Bus
Interfacing of processor, memory and IO devices using memory system bus

System bus interconnections for a simple bus structure has three sets of signals System bus defines by address bus, data bus, and control bus A system-bus interfacing-design is according to the timing diagrams of processor signals, speed, and word length for instructions and data.
Processor internal bus(es) and external bus(es). Characteristics differ in the system
Interconnections for a simple bus structure
address bus
Processor issues the address of the instruction byte or word to memory system through the address bus. Processor execution unit, when required, issues the address of data (byte or word) to be read or written using the memory system through address bus. The address bus of 32-bits used to fetch the instruction or data from an address specified by 32-bit number.
Department of EEE
Page 51
[UNIT-II]
V.V.C.E.T
EXAMPLE Let a processor at the start reset the program counter at address 0. Then the processor issues address 0 on the bus and the instruction at address 0 is fetched from memory on reset Let a processor instruction be such that it needs to load register r1 from the memory address M. The processor issues address M on the address bus and data at address M is fetched.
Data Bus
Instruction fetch Processor issues the address of the instruction, it gets back the instruction through the data bus. Data Read When it issues the address of the data, it loads the data through data bus. Data Write When it issues the address of the data, it stores the data in the memory through the data bus. A data bus of 32-bits fetches, loads, or stores the instruction or data of 32-bits.
EXAMPLE _ Processor issues address m for an instruction, it fetches the instruction through data bus from address m. [For a 32-bit instruction, word at data bus from addresses m, m + 1, m + 2, and m + 3.] _ Instruction executes for store of register r1 bits to the memory address M, the processor issues address M on the bus and sends the data at address M through the data bus. [For 32-bit data, word at data bus sent to the memory addresses M, M + 1, M + 2, and M + 3.]
Control Bus
Issues signals to control the timing of various actions during interconnection. Signals synchronize all the subsystems. Address latch enable (ALE)[ Address Strobe (AS) or address valid, (ADV)], Memory read (RD) or write (WR) or IO read (IORD) or write,(IOWR) or data valid(DAV) Other control signals as per the processor design.
Interrupts and DMA Control Signals

Interrupt acknowledge (INTA) [on a request for drawing the processor attention to an event] INT (Interrupt) from external device interrupt to the system Hold acknowledge (HLDA) [on an external hold request for permitting use of the system buses]
Department of EEE
Page 52
[UNIT-II]
V.V.C.E.T
HOLD when external device sends a hold request for direct memory access (DMA). EXAMPLE Processor issues the address, it also issues a memory-read control signal and waits for the data or instruction. Memory unit must place the instruction or data during the interval in which memory read signal is active (not inactivated by the processor) Processor issues the address on the address bus, and (after allowing sufficient time for the all address bits setup) it places the data on the data bus, it also then issues memory-write control signal (after allowing sufficient time for the all data bits setup) for store signal to memory. Memory unit must write (store) the data during the interval in which memory-write signal is active (not inactivated by the processor).
Program memory access and data buses multiplexed for memory access in Harvard Architecture
Address and data buses are multiplexed Control signal PSEN active when accessing program memory using the address and data buses Control signal Read or Write active when accessing data memory using the address and data buses
Time division multiplexed (TDM) address and data bits for the memories
TDM Different time slots, there are is a different set sets (channel) of the signals. Address signals during one time slot t. and data bus signals in another time slot. Interfacing circuit for the demultiplexing of the buses uses a control signal in such systems. Time division multiplexed (TDM) address and data bits for the memories Control signal Address Latch Enable (ALE) in 8051, Address Strobe (AS) in 68HC11 and address valid (ADV) in 80196. ALE or AS or ADV demultiplexes the address and data buses to the devices Interfacing circuit using Latch and decoders, ALE for latching the address PSEN for program memory read using address data buses Each chip of the memory or port that connects the processor has a separate chip select input from a decoder. Decoder is a circuit, which has appropriate signals of the address bus at the input and control circuit signals to generate corresponding CS (chip select) control signals for each device (memory and ports)
Department of EEE
Page 53
[UNIT-II]
V.V.C.E.T
InterfacingInterfacing- circuit
Consists of latches, decoders and demultiplexers Designed as per available control signals and timing diagrams of the bus signals. Circuit connects all the units, processor, memory and the IO device through the system buses. Also called glue circuit used as it joins the devices and memory with the system bus and processor Can be designed using a GAL (generic array logic) or FPGA
Department of EEE
Page 54
[UNIT-II]
V.V.C.E.T
2. Interfacing Using System and Io Buses

System Bus and IO Bus System bus interconnects Processor memory systems and subsystems Another set of signals called I/O bus Interfacing of processor with system bus at first level and IO bus at second level Popular IO buses and wireless communication PCI Bus interfaces to devices designed to meet the PCI standard. USB interfaces to devices designed to meet the USB IOs PCI Bus interfaces to devices designed to meet the PCI standard. USB interfaces to devices designed to meet the USB IOs Memory system bus and I/O bus interconnections in a bus structure
Department of EEE
Page 55
[UNIT-II]
V.V.C.E.T
3. Multilevel Buses
4. Addresses of Ports and Devices in Real World Interfacing

Device Control Register, Status Register, Receive Buffer, Transmit Buffer
Each I/O device is at a distinct address or set of addresses Each device has three sets of registers data buffer register(s), control register(s) and status register
Device Addresses
Device control and status addresses and port address remains constant and are not re-locatable in a program as the glue circuit (hardware) to accesses these is fixed during the circuit design.
[UNIT-II]
V.V.C.E.T
There can be common addresses for input and output buffers, for example SBUF in 8051
The processor, memory, devices Glue Circuit

The processor, memory and devices are interfaced (glued) together using a programmable circuit like GAL or FPGA. The circuit consists of the address decoders as per the memory and device addresses allocated and the needed latches multiplexers/ demultiplexers
Device Addresses
There may be common addresses for control and status bits There can be a control bits, which changes the function of a register at a device address
Example
Serial line device addresses of device registers Fixed by its hardware configuration of UART port interface circuit in a of a system employing 80x86 processor. 0x2F8 to 0x2FE at COM2 COM1 in IBM PC
Feature of UART serial line device in PC

Two I/O data buffer registers (one for receiving and other for transmitting) at a common address, 0x2F8 Data of two bytes of Divisor Latch are at the distinct addresses, 0x2F8 (LSB) and 0x2F9 (MSB) Three Control Registers of the device are at three distinct addresses 0x2F90x2FA, 0x2FB and 0x2FC Three Status Registers of the device are at three distinct addresses 0x2FA, 0x2FD and 0x2FE
Device Addresses
Processor accesses device registers and buffer registers from allocated addresses for the Ports and Devices
Department of EEE
Page 57
[UNIT-II]
V.V.C.E.T
5. Memory Mapped IO to ports and Device

Memory mapped IO or device Access
Processor access to device is as if to a memory address
Interfacing Processor with Memory Mapped IO

No separate I/O address space exists for the ports and devices. Instructions as well as control signals for operations on bytes at the memory, IO port and device addresses are same. No separate input-output and memory load-store instructions. Arithmetic, logical and bit manipulation instructions that are available for data in memory, are also available for the IO operations. Enables direct manipulation of the data taken from the IO port or device. Directly manipulate the data stored at the IO port or device. All the arithmetic, logical and bit manipulation instructions that are available for data in memory, can be done using an accumulator or any other register or any other memory address, where the IO port byte is transferred after or during or before the arithmetic or logical operation.
Memory mapped IOs Example

8051 microcontroller devices have the addresses for processor-accesses that are not distinct from the memory and are accessed with same set of instructions and control signals RD and WR
Processor and memory organization with I/O devices memory assignments in the 68HC11 (having memory mapped IO architecture) Memory and Port addresses in 68HC11
Department of EEE
Page 58
[UNIT-II]
V.V.C.E.T
6. IO address Mapped IO port or Device Access

IO mapped IO
Processor access to device is by distinct instruction and control signals Memory address is accessed by Load and Store instructions and IO device address is accessed by distinct set of instructions OUT and IN and distinct set of control signals (IOWR and IORD)
IO mapped IOs Example

80x86 processor accesses the external devices using the addresses in space, which is distinct from the memory
Features of IO addresses mapped IOs

Separate I/O address space than for the ports and devices. Instructions and control signals for operations on bytes at the memory and IO port and devices are distinct. Advantage of simplicity.
Department of EEE
Page 59
[UNIT-II]
V.V.C.E.T
IO devices and port addresses are interfaced independently of memory without considering the memory addresses that are assigned for software and data. Processor separate input-output (for read and write) instructions and memory load store (for read and write) instructions. All the arithmetic, logical and bit manipulation instructions that are available for data in memory, can be done using an accumulator, where the IO port byte is transferred before an arithmetic or logical operation Device Addresses in 80x86 based PC
7. Interrupts and IOs Interrupt driven IO

Processor access to device is by executing an ISR on a device-interrupt, for example, interrupt on timer overflow, keyboard data ready or transmit-data buffer empty Used when the processor needs to perform a prolong data transfer operations using a I/O device and wants to be able to do other work while waiting for the transfer operations to complete.
Department of EEE
Page 60
[UNIT-II]
V.V.C.E.T
IO device function slow as compared to processor.

Interrupt driven IO can be used in those cases. Interrupt is the mechanism used by most processors to handle asynchronous type of events Interrupt allows a device to request that the processor to stop what it is currently doing and execute software called interrupt service routine to process the device's request, much like a procedure call. Here the call is initiated by an event at the external or internal device rather than by the program instruction running on the processor
Keyboard Example
Takes about 10 ms to send the code for the key and maximum 10 keys can be pressed in 1 s When does a key input event occurs is not fixed. Intervals between two events of successive key inputs are not fixed. Interrupt driven mode, when a key is pressed, an interrupt signal RxRDY (receiver data ready) to the processing unit causes the execution of a service routine and the service routine program reads the byte for code.
Keyboard interrupt
Printer example
Maximum 300 characters can print in 1 s, 0.3 ms to print the code sent at the output by a port. When does a print operation is complete for a character is not fixed.
Department of EEE
Page 61
[UNIT-II]
V.V.C.E.T
Intervals between two events of successive print of the characters are not fixed. Interrupt driven mode, when a print action completes, an interrupt signal TxDE (transmission data empty) to the printer processing-unit (print controller) will cause the execution of a service routine and the service routine will send another byte as output.
Printer interrupt
Bus Arbitration Mechanisms

1. Bus Sharing by Multiple Processors or Controllers Bus Arbitration Requirement
Several processor and several single purpose processors sharing a bus.* Bus can be granted to one processor at an instance *[A single purpose processor is also called controller. A controller can be part of a device or peripherals]
Department of EEE
Page 62
[UNIT-II]
V.V.C.E.T
System buses shared between the controllers and an IO processor and multiple controllers that have to access the bus, but only one of them can be granted the bus master status at any one instance
Bus Arbitration Mechanism

System buses are shared between the controllers and an IO processor and multiple controllers that have to access the bus, but only one of them can be granted the bus master status at any one instance Bus master has the access to the bus at an instanceheral or peripherals
Bus arbitration process

A process by which the current bus master accesses the bus and then leaves the control of bus and passes it to another bus requesting processor unit. Three methods in bus arbitration process. Daisy Chain method, Independent Bus Requests and Grant method, Polling method
Department of EEE
Page 63
[UNIT-II]
V.V.C.E.T
Daisy Chaining for Bus Sharing by Multiple Processors or controllers Daisy chaining method
Centralized bus arbitration process. Bus control passes from one bus master to the next one, then to the next and so on. Bus control passes from controller units C0 to C1, then to C2, then U3, and so on.
Sequence of Signals in the arbitration process

Bus-grant signal (BG) which functions like a token, is first sent to C0. If C0 does not need the bus, it passes BG to C1. A controller needing the bus raises a bus request (BR) signal. A bus-busy (BUSY) signal generates when that controller becomes the bus master.
Signals in the arbitration process When bus master no longer needs the bus, it deactivates BR and BUSY signal also deactivates. Another BG is issued and passed from C0 to down the priority controllers one by one [For example, COM2 to COM1 in IBM PC]
Daisy method advantage

At each instance of bus access the i-th controller gets the highest priority to bus compared to (i + 1)th. Controllers and processors priorities for granting the bus access (bus master status) fixed
Department of EEE
Page 64
[UNIT-II]
V.V.C.E.T
Independent request and grant method for Bus Sharing by Multiple Processors or Controllers
Independent bus request method

Controller separate BR signals, BR0, BR1, , BRn. Separate BG signals, BG0, BG1, , BGn for the controllers. An ith controller sends BRi (i-th bus request signal) and when it receives BGi (ith bus grant signal), it uses the bus and then BUSY signal activates Any controller, which finds active BUSY,does not send BR from it.
Independent bus request method advantage is that the i-th controller can be programmed to get
the highest priority to the bus and the priority of a controller can be programmed dynamically
Polling Polling method for Bus Sharing by Multiple Processors or controllers

Polling the Requesting Device Method A poll counts value is sent to the controllers and is incremented. Assume that there are 8 controllers. Three poll count signals p2, p1, p0 successively change from 000, 001, , 110, 111, 000, If on count = i, a BR signal is received then counts increment stops, BG is sent. .
Department of EEE
Page 65
[UNIT-II]
V.V.C.E.T
Then BUSY activates when that controller becomes the bus master. When BR deactivates then BG and BUSY also deactivates and counts increment starts.
Polling method advantage is that the controller next to the current bus master gets the highest priority
to the access the bus after the current bus master finishes the operations through the bus
Interfacing examples with keyboard, displays, D/A and A/D Conversions
Keyboard
Two signals KBINT and TxD from a keyboard controller KBINT is interrupt from keyboard controller. TxD is serial UART data output of controller connected to RxD at SI in 8051 or UART Intel 8250 or UART 16550, which includes a 16-byte buffer
Debouncer Bounces create on pressing - Each bounce creates a false pulse. Keyboard controller has hardware debouncer to the care of bouncing of a key. Scan Clock Keyboard controller has counter driven by a scan clock, which continuously increments at certain rate and scans each key whether that is in pressed or released state.
Department of EEE
Page 66
[UNIT-II]
V.V.C.E.T
Keyboard Interface to Serial Interface at Microcontroller Encoder To encode the keyboard output for a ROM. ROM generates the ASCII code output for the pressed key. The code accounts the multiple keys simultaneously pressed. Example, Shift key is also pressed then generate the code for upper case character.
TxD The code bits are serially transferred as TxD Output
LCD DISPLAY CONTROLLER

LCD Controller Interface 3 bits for E, RS and R/W 8 output data. One 8-bit port is used for output data for display. Another port is used for 3 bits
Department of EEE
Page 67
[UNIT-II]
V.V.C.E.T
DAC
DAC using PWM and integrator
DAC - PWM circuit and an integrator. PWM internal device in a microcontroller A pulse width register (PWR) is programmed according to a required analog output.
PWM Functioning
A counter/timer device, which generates two internal- interrupts one on timer overflow and another after an interval proportional to equal to PWR. On first interrupt, the output becomes 1 and on second interrupt it becomes 0.
Integrator
Generates the analog output as per the period of output = 1 (period between first and interrupts) compared to total period of output pulses (period between successive first interrupts).
Department of EEE
Page 68
[UNIT-II]
V.V.C.E.T
DAC Using a DAC external chip
ADC ADC Using ADC external chip
Department of EEE
Page 69
[UNIT-II]
V.V.C.E.T
Start of conversion pulse generator circuit, A sample hold amplifier circuit to hold the signal constant for the conversion period and signal conditioner Voltage references + and for providing the reference for conversion of analog input n-bit ADC A four or eight channel ADC is inbuilt in microcontrollers or an external ADC for example, ADC0808 Interfacing similar to that to the ports. =================================================================================
CASE STUDY
Automatic Washing machine Chocolate vending machine
Department of EEE
Page 70
[UNIT-II]
V.V.C.E.T
Digital Camera and Voice recorder
Automatic Chocolate Vending Machine (ACVM)
Department of EEE
Page 71
[UNIT-II]
V.V.C.E.T
ACVM
Coin insertion slot Keypad on the top of the machine. LCD display unit on the top of the machine. It displays menus, text entered into the ACVM and pictograms, welcome, thank and other messages. Graphic interactions with the machine. Displays time and date. the chocolate and coins, if refunded. Internet connection port so that owner can know status of the ACVM sales from remote.
ACVM Hardware units

Microcontroller or ASIP (Application Specific Instruction Set Processor) RAM for storing temporary variables and stack ROM for application codes and RTOS codes for scheduling the tasks Flash memory for storing user preferences, contact data, user address, user date of birth, user identification code, answers of FAQs Timer and Interrupt controller A TCP/IP port (Internet broadband connection) to the ACVM for remote control and for getting ACVM status reports by owner. ACVM specific hardware Power supply.
ACVM Software components

_ Keypad input read _ Display _ Read coins _ Deliver chocolate _ TCP/IP stack processing _ TCP/IP stack communication
Digital Camera
Department of EEE
Page 72
[UNIT-II]
V.V.C.E.T
A typical Camera
4 M pixel/6 M pixel still images, clear visual display (ClearVid) CMOS sensor, 7 cm wide LCD photo display screen, enhanced imaging processor, double anti blur solution and high-speed processing engine, 10X optical and 20X digital zooms Record high definition video-clips. It therefore has speaker microphone(s) for high quality recorded sound. Audio/video Out Port for connecting to a TV/DVD player.
Arrangements
Keys on the camera. Shutter, lens and charge coupled device (CCD) array sensors Good resolution photo quality LCD display unit Displays text such as image-title, shooting data and time and serial number. It displays messages. It displays the GUI menu when user interacts with the camera. Self-timer lamp for flash.
[UNIT-II]
V.V.C.E.T
Internal units
Internal memory flash to store OS and embedded software and limited number of image files Flash memory stick of 2 GB or more for large storage. Universal Serial Bus (USB), Bluetooth and serial COM port for connecting it to computer, mobile and printer. LCD screen to display frame view. Saved images display using the navigation keys. Frame light falls on the CCD array, which through an ADC transmits the bits for each pixel in each row in the frame and for the dark area pixels in each row for offset correction in CCD signaled light intensities for each row. The CCD bits of each pixel in each row and column are offset corrected by CCD signal processor (CCDSP).
ASIP and Single purpose processors

For Signals compression using a JPEG CODEC and saved in one jpg file for each frame. For DSP for compression using the discrete cosine transformations (DCTs) and decompression. For DCT Huffman coding for the JPEG compression. For decompression by inverse DCT before the DAC sends input for display unit through pixel processor. Pixel processor (for example, image contrast, brightness, rotation, translation, color adjustment)
Digital Camera Hardware units

Microcontroller or ASIP (Application Specific Instruction Set Processor) Multiple processors (CCDSP, DSP, Pixel Processor and others) RAM for storing temporary variables and stack ROM for application codes and RTOS codes for scheduling the tasks Timer, Flash memory for storing user preferences, contact data, user address, user date of birth, user identification code, ADC, DAC and Interrupt controller The DAC gets the input from pixel processor, which gets the inputs from JPEG file for the saved images and also gets input directly from the CCDSP through pixel processor or the frame in present view USB controller Direct Memory Access controller LCD controller Battery and external charging circuit
Digital Camera Software components

_ CCD signal processing for off-set correction
Department of EEE
Page 74
[UNIT-II]
V.V.C.E.T
_ JPEG coding _ JPEG decoding _ Pixel processing before display _ Memory and file systems _ Light, flash and display device drivers _ LCD, USB and Bluetooth Port device- drivers for port operations for display, printer and computer communication control
Digital camera software components
Department of EEE
Page 75

Unit 2 - Embedded System

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 2 - Embedded System

Uploaded by

Copyright:

Available Formats

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

EMBEDDED SYSTEM PROCESSOR CHIP OR CORE

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

STRUCTURAL UNITS IN A PROCESSOR

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

Multi-line superscalar processing

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

Processor Memory (Register Array)

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

Common Memory Types Read Only Memory (ROM)

PROCESSOR AND MEMORY ORGANIZATION

EPROM: Erasable programmable ROM

PROCESSOR AND MEMORY ORGANIZATION

RAM: Random-access memory

PROCESSOR AND MEMORY ORGANIZATION

Basic types of RAM

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

Fully Associative Mapping

Index same as in direct mapping

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

Cache Write Techniques

Cache Impact on System Performance

Cache Performance Trade-Offs

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

Fast Page Mode DRAM (FPM DRAM)

Extra cycle eliminated on each read/write of words from same

PROCESSOR AND MEMORY ORGANIZATION

Extended data out DRAM (EDO DRAM)

(S)ynchronous and Enhanced Synchronous (ES) DRAM

PROCESSOR AND MEMORY ORGANIZATION

Rambus DRAM (RDRAM)

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

DMA Transfer Types and Modes

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

PROCESSOR AND MEMORY ORGANIZATION

Memory Allocation To Program Segments and Blocks