You are on page 1of 42

Basic non pipelined CPU Architecture

CPU Architecture Types

Categories of Registers

Detailed data path of a typical register based CPU


Typical CPU The CPU is the brain of the computer where everything is controlled and all computation is done. It can be broadly divided into two main sections o The datapath This contains the registers and the ALU This is the bit that does computation o The control logic Responsible for interpreting instructions Controls the datapath and configures it to execute a given instruction

The Datapath The datapath is a collection of functional units which allow us to execute our instructions o Each functional unit has a specific purpose The registers, used for storing data The program counter, used to bookmark the code The instruction register, used to store the current instruction

Control

The ALU, used for executing arithmetic and logic operations

The function of the control logic is to configure the datapath to do the right thing to make sure that data is going to the right functional units and that results are put in the correct place. It programs the ALU to do the right kind of operation on the data which includes o ALU does add, sub, and, or, xor etc o Needs to be told which one to do o In the event of a branch or jump, changes the program counter by the correct amount Control is the most complex part of the processor o RISC control is reasonably straightforward Not many possibilities o CISC control is formidably hard

Building a datapath The first step of the instruction cycle is to fetch the instruction o Send the contents of PC to memory After getting fetched the instruction we need a program counter and some code memory to put the instruction.

Instruction Fetch To fetch the instruction the contents of the program counter is send to the memory via the address bus and the instruction is then passed to the instruction register via the data bus. The next thing is to update the program counter o Once we've fetched an instruction we increment the program counter to point to the next instruction for that we deals jumps and branches. o MIPS is byte-addressable So we need to add 4 to PC o Introduce an add component

Incrementing PC

This circuit fetches an instruction from the memory, and then increments PC by 4 to point at the next instruction Slightly different in CISC machines o Variable instruction length

Some more datapath components There are some generic datapath components that are worth defining now o The registers

Provide mechanisms for writing data into the registers Reading data from the registers o Can read two registers at a time needed for some instructions Registers are addressed using 5-bit addresses encoded in the instructions

ALU Instructions Using the registers and the ALU we can create a datapath for our ALU instructions The ALU instructions are partitioned into o 2x5 bits which address source registers o 1x5 bits which address the destination register These are used to select which registers to read/write o Remaining 17 bits are used by the control unit to configure the datapath The ALU takes the contents of the source registers o Performs the selected operations o Puts the result back in the destination register o ALU zero output is not needed yet

Datapath for ALU Instructions

Provided the ALU can do the necessary operations, this is all we need, since ALU instructions only need o Registers to get data from o Registers to put data into o Something to do the computation

Load/Store Operations Load/store operations are a bit more complex than ALU instructions They also need the registers, and it turns out that they also need the ALU to calculate addresses Load/store instructions encode the following information o Source/destination register depending on whether it is a load or a store o Address of the base register used to compute the memory address o A 16-bit offset which is added to the contents of the base register to give the memory address

Sign Extension Need to make the 16-bit address offset the same size as the base address so that they can be added together. It takes the 16-bit address offset and extends it to 32 bits and makes sure that the sign of the offset is correct as we don't want to have to worry about whether to add or subtract the offset

Building the Load/Store Datapath To perform a load/store operation we must: Select the source register (in a store) or the destination register (for a load) o Specified in the instruction Get the base address from a register specified in the instruction Get the offset and add it to the base address (after signextension) Read or write the appropriate memory address Store the data in the destination register in a load

Branches and Jumps Both o o o branches and jumps must achieve essentially the same result Must change the contents of PC Datapath mechanisms are essentially the same Control is different Branches depend on the result of a comparison Jumps do not

Datapath for Branches At a simple branch (we will use beq as our example) we must

Compare the contents of two registers specified in the instruction code o If the contents of the registers are equal, we add a fixed offset to the PC o Otherwise PC=PC+4 Can use the ALU to do comparisons The registers are obviously needs Need to add the branch offset to the PC o We'll use a basic ADD unit, as we did in the instruction fetch datapath to do this o

Fetch-Decode-Execute Cycle An instruction cycle (also called fetch-and-execute cycle, fetch-decodeexecute cycle, and FDX) is the time period during which a computer processes a machine language instruction from its memory or the sequence of actions that the central processing unit (CPU) performs to execute each machine code instruction in a program. The name fetch-and-execute cycle is commonly used. The instruction must be fetched from main memory, and then executed by the CPU. This is fundamentally how a computer operates, with its CPU reading and executing a

series of instructions written in its machine language. From this arise all functions of a computer familiar from the user's end.

Each computer's CPU can have different cycles based on different instruction sets. 1. Fetch the instruction from main memory The CPU uses the value of the program counter (PC) on the address bus. The CPU then fetches the instruction from main memory via the data bus into the memory data register(MDR). The value from the MDR is then placed into the current instruction register (CIR), a circuit that holds the instruction temporarily so that it can be decoded and executed.

2. Decode the instruction The instruction decoder interprets and implements the instruction.

The instruction register (IR) holds the current instruction, while the program counter (PC) holds the address in memory of the next instruction to be executed. Fetch data from main memory Read the effective address from main memory if the instruction has an indirect address. Fetch required data from main memory to be processed and place it into data registers. 3. Execute the instruction From the instruction register, the data forming the instruction is decoded by the control unit. It then passes the decoded information as a sequence of control signals to the relevant function units of the CPU to perform the actions required by the instruction such as reading values from registers, passing them to the Arithmetic logic unit (ALU) to add them together and writing the result back to a register. A condition signal is sent back to the control unit by the ALU if it is involved. 4. Store results Also called write back to memory. The result generated by the operation is stored in the main memory, or sent to an output device. Based on the condition feedback from the ALU, the PC is either incremented to address the next instruction or updated to a different address where the next instruction will be fetched. The cycle is then repeated. Fetch cycle Steps 1 and 2 of the Instruction Cycle are called the Fetch Cycle. These steps are the same for each instruction. The fetch cycle processes the instruction from the instruction word which contains an opcode and an operand. Execute cycle Steps 3 and 4 of the Instruction Cycle are part of the Execute Cycle. These steps will change with each instruction. The first step of the execute cycle is the Process-Memory. Data is transferred between the CPU and the I/O module. Next is the Data-Processing uses mathematical operations as well as logical operations in reference to data. Central alterations is the next step, is a sequence of operations, for example a jump operation. The last step is a combined operation from all the other steps.

Microinstruction Sequencing

Branch Control Logic: Two Address Fields

Implementation of Control Unit A control unit is one of the two components of the central processing unit. The function of the control unit is to extract information which is stored in the memory and to decode and execute those instructions. It also takes help from the arithmetic logic unit, whenever required. The control unit is very important for computers functioning. The control unit has outputs which take charge of the activities of the entire device. Some regard the control unit to be an FSM or finite state machine which is used for hardware and software applications.

Instruction Execution The CPU executes a sequence of instructions. The execution of an instruction is organized as an instruction cycle: it is performed as a succession of several steps;

Each step is executed as a set of several microoperations. The task performed by any microoperation falls in one of the following categories: o o o o Transfer data from one register to another; Transfer data from a register to an external interface (system bus); Transfer data from an external interface to a register; Perform an arithmetic or logic operation, using registers for input and output.

Microoperations and Control Signals In order to allow the execution of a microoperation, one or several control signals have to be issued; they allow the corresponding data transfer and/or computation to be performed. Examples: a) signals for transferring content of register R0 to R1: R0out, R1in b) signals for adding content of Y to that of R0 (result in Z): R0out, Add, Zin c) signals for reading a memory location; address in R3: R3out, MARin, Read The CPU executes an instruction as a sequence of control steps. In each control step one or several microoperations are executed. One clock pulse triggers the activities corresponding to one control step for each clock pulse the control unit generates the control signals corresponding to the microoperations to be executed in the respective control step.

Comments:

The rst (three) control steps are identical for each instruction; they perform instruction fetch and increment the PC. The following steps depend on the actual instruction (stored in the IR). If a control step issues a read, the value will be available in the MBR after one additional step. Several microoperations can be performed in the same control step if they dont conict (for example, only one of them is allowed to output on the bus)

Control Unit The basic task of the control unit: For each instruction the control unit causes the CPU to go through a sequence of control steps; In each control step the control unit issues a set of signals which cause the corresponding microoperations to be executed.

The control unit is driven by the processor clock. The signals to be generated at a certain moment depend on: the actual step to be executed; the condition and status ags of the processor; the actual instruction executed; external signals received on the system bus (e.g. interrupt signals).

Techniques for implementation of the control unit:

1. Hardwired control 2. Microprogrammed control Hardwired Control In this case, the control unit is a combinatorial circuit; it gets a set of inputs (from IR, ags, clock, system bus) and transforms them into a set of control signals.

Generation of signal Zin: rst step of all instructions (fetch instruction) step 5 of ADD with register addressing step 5 of BR step 6 of ADD with register-indirect addressing - ------------------Zin = T1 + T5 (ADDreg + BR) + T6 ADDreg_ind + . . . Generation of signal End: step 6 of ADD with register addressing step 7 of ADD with register-indirect addressing step 6 of BR - -------------------

End = T6 (ADDreg + BR) + T7 ADDreg_ind + . . . Hardwired control provides highest speed. RISCs are implemented with hardwired control. If the instruction set becomes very complex (CISCs) implementing hardwired control is very difcult. In this case microprogrammed control units are used. In order to allow execution of register-to-register operations in a single clock cycle, RISCs (and other modern processors) use three-bus CPU structures (see following slide).

Microprogrammed Control Control word (CW): a sequence of Nsig bits, where Nsig is the total number of control signals; each bit in a CW corresponds to one control signal. Each control step during execution of an instruction denes a certain CW; it represents a combination of 1s and 0s corresponding to the active and non- active control signals. Microroutine: a sequence of CWs corresponding to the control sequence of a machine instruction. An individual CW in a microroutine is called a microinstruction.

Microprogrammed control - basic idea: All microroutines corresponding to the machine instructions are stored in the control store. The control unit generates the sequence of control signals for a certain machine instruction by reading from the control store the CWs of the microroutine corresponding to the respective instruction. The control unit is implemented just like another very simple CPU, inside the CPU, executing microroutines stored in the control store.

Microroutine Executed for Conditional Branch A_fetch +1 +2 +3 -------------A_CB +1 +2 +3 +4 +5 PCout, MARin, Read, Clear Y, Carry-in, Add, Zin Zout, PCin MBRout, IRin end-fetch this produces the jump to A_CB branch to A_CB+2 if N set

end PCout, Yin (displacement-eld)IRout, Add, Zin Zout, PCin end The microroutines contain, beside CWs, also branches which have to be interpreted by the microprogrammed controller.

The sequencer is controlling the right execution sequence of microinstructions. The sequencer is a small control unit of the control unit.

Enhancing Performance with Pipelining One key performance metric of a CPU is the clock frequency. While the frequency by itself does not tell the whole story, it is still nonetheless a very important indicator of CPU performance. The same CPU running at 800 Mhz will be able to perform twice as many tasks as the same CPU running at 400 Mhz. But what does it mean to run at 400 Mhz? When we discussed the execution of an instruction, we broke it down into multiple steps. The amount of time it takes to finish all the steps will limit how fast your CPU can run. In normal applications, you finish one instruction per cycle. The longer it takes for all the steps to finish, the longer your cycle time is. A clock speed of 400 Mhz simply states that the CPU runs at a rate of 400 million cycles per second. A clock speed of 1.0 Ghz means the CPU runs at a rate of 1 billion cycles per second, or alternatively, each cycle only lasts 1 nanosecond. If we look at the earlier breakdown, is it possible to finish all five steps: fetch, decode, issue, execute, and write-back in 1 nanosecond? The answer is no. The key to increasing clock frequency is a technique called pipelining. Pipelining allows us to break up the task of executing a single instruction into multiple steps. By breaking it up into multiple steps, the cycle time will be limited not by the total time for all the steps, but rather by the longest step. In fact, pipelining is something that comes naturally to people in other parts of life. Take the example of doing your laundry. Laundry can be broken down into 3 steps: Washing, Drying, and Folding. If we assume that each of those 3 tasks takes 1 hour, then it should take somebody 3 hours to do one load of laundry, and another 3 hours to do their second load of laundry. The total time it takes to perform two loads of laundry will be 6 hours:

However, if we overlap the tasks, we can finish both loads much faster. All we have to do is start the Washer for the second load (blue load) while the first load (yellow) is in the Dryer: In fact, each additional load of laundry will only take an additional hour. Notice how the pink load finishes 1 hour after the blue load. Our "cycle" time is now 1 hour, instead of 3 hours! The CPU is 3 times faster.

If we look at our CPU, we simply have to break the execution of instructions down into multiple steps, and we can utilize pipelining. From our earlier discussion, we have already broken the task of a CPU down into multiple steps: Fetch, Decode, Issue, Execute, Writeback. Each step is known as a stage. A CPU broken down into 5 steps will result in a 5 stage pipeline. Rather than the total sum of the 5 steps determining your cycle time (and therefore your clock frequency), the cycle time is determined only by the longest of the 5 steps. If we break down CPU execution into 5 steps, and get 5 times the clock frequency, you may be asking if we can break it down into 10 steps and get 10 times the frequency? The answer is that theoretically, you can, but you quickly run into other issues that limit the maximum speed up you can achieve. Picking and balancing your pipeline length is a key design component to CPU development.

You might also like