You are on page 1of 88

INTRODUCTION

ARM is a RISC processor. It is used for small size and high performance applications. Simple architecture low power consumption.

ARM

System - On - Chip Architecture

TIMELINE (1/2)

1985: Acorn Computer Group manufactures the first commercial RISC microprocessor. 1990: Acorn and Apple participation leads to the founding of Advanced RISC Machines (A.R.M.). 1991: ARM6, First embeddable RISC microprocessor. 1992 1994: Various companies use ARM (Sharp, Samsung), while in 1993 ARM7, the first multimedia microprocessor is introduced.
ARM System - On - Chip Architecture

TIMELINE (2/2)

1995: Introduction of Thumb and ARM8. 1996 2000: Alcatel, Huindai, Philips, Sony, use RM, while in 1999 ARM cooperates with Erickson for the development of Bluetooth. 2000 2002: ARMs share of the 32 bit embedded RISC microprocessor market is 80%. ARM Developer Suite is introduced.

ARM

System - On - Chip Architecture

THE ARM ARCHITECTURE

GENERAL INFO (1/2)


AIM: Simple design

Load store architecture 32 bit data bus 3 addressing modes

ARM

System - On - Chip Architecture

GENERAL INFO (2/2)


Simple architecture + Simple instruction set + Code density

Small size

Low power consumption

ARM

System - On - Chip Architecture

Registers

32 general purpose registers 7 modes of operation Different set of visible registers and different cpsr control level in each mode.

ARM

System - On - Chip Architecture

ARM Programming Model


r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 (PC)

usable in user mode


system modes only

r8_fiq r9_fiq r10_fiq r11_fiq r12_fiq r13_fiq r14_fiq

r13_svc r14_svc

r13_abt r14_abt

r13_irq r14_irq

r13_und r14_und

CPSR

SPSR_fiq

SPSR_svc

SPSR_abt

SPSR_irq SPSR_und

user mode

fiq mode

svc mode

abort mode

irq mode

undefined mode

CPSR
ARM CPSR format
31 28 27 8 7 6 5 4 0

NZ CV

unused

IF T

mode

N: Negative Z: Zero C: Carry V: Overflow Q: Saturation (for enhanced DSP instructions)


ARM System - On - Chip Architecture 10

Memory Organization
bi t 3 1
23 19 15 11 7 3 22 18 14 10 6 2 21 17 13 9 5 1

bi t 0
20 16 12 8 4 0

Address bus: 32 bits 1 word = 32 bits

w ord16 half -w ord14 half -w ord12 w ord8 by te6 half -w ord4 by te3 by te2 by te1 by te0

by te address

ARM

System - On - Chip Architecture

11

Instruction Set

Three instruction types


Data processing Data transfer Control flow

ARM

System - On - Chip Architecture

12

Supervisor mode

In user mode the operating system handles operations outside user privileges. Using supervisor calls, the user goes to system level and can perform system functions.

ARM

System - On - Chip Architecture

13

I/O System

ARM handles peripherals as memory mapped devices with interrupt support. Interrupts:

IRQ: normal interrupt FIQ: fast interrupt

ARM

System - On - Chip Architecture

14

Exceptions

Exceptions:

Interrupts Supervisor Call Traps The value of PC is copied to r14_exc The operating mode changes into the respective exception mode. The PC takes the exception handler vector address.
ARM System - On - Chip Architecture 15

When an exception takes place:


ARM programming model


r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 (PC)

usable in user mode


system modes only

r8_fiq r9_fiq r10_fiq r11_fiq r12_fiq r13_fiq r14_fiq

r13_svc r14_svc

r13_abt r14_abt

r13_irq r14_irq

r13_und r14_und

CPSR

SPSR_fiq

SPSR_svc

SPSR_abt

SPSR_irq SPSR_und

user mode

fiq mode

svc mode

abort mode

irq mode

undefined mode

THE ARM INSTRUCTION SET

Data Processing Instructions (1/2)

Arithmetic Operations
ADD r0, r1, r2 ; r0:= r1+r2 and dont update flags ADDS r0, r1, r2 ; r0:= r1+r2 and update flags

Logical Operations
AND r0, r1, r2 ; r0:= r1 AND r2

Register Movement
MOV r0, r2

Comparison
CMP r1, r2
ARM System - On - Chip Architecture 18

Data Processing Instructions (2/2)

Operands:

Immediate operands
ADD r3, r3, #1

Shifted register operands:


ADD r3, r2, r1, LSL #3

Miscellaneous data processing instructions:

Multiplication:
MUL r4, r3, r2

ARM

System - On - Chip Architecture

19

Data transfer instructions

Load and store instructions:


LDR r0, [r1] STR r0, [r1]

Offset: LDR r0, [r1,#4] Post indexed: LDR r0, [r1], #16 Auto indexed: LDR r0, [r1,#16]!

Multiple data transfers:


LDMIA r1, {r0,r2,r5}
ARM System - On - Chip Architecture

20

Examples

PRE:

r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202

LDR r0, [r1, #4]! POST:


r0 = 0x02020202 r1 = 0x00009004
ARM System - On - Chip Architecture 21

Examples

PRE:

r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202

LDR r0, [r1, #4] POST:


r0 = 0x02020202 r1 = 0x00009000
ARM System - On - Chip Architecture 22

Examples

PRE:

r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202

LDR r0, [r1], #4 POST:


r0 = 0x01010101 r1 = 0x00009004
ARM System - On - Chip Architecture 23

Examples
mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x00080010 LDMIA r0!, {r1-r3} r0 = 0x0008001c r1 = 0x00000001 r2 = 0x00000002 r3 = 0x00000003

ARM

System - On - Chip Architecture

24

Examples
mem32[0x8001c] = 0x04 mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x00080010 LDMIB r0!, {r1-r3} r0 = 0x0008001c r1 = 0x00000002 r2 = 0x00000003 r3 = 0x00000004

ARM

System - On - Chip Architecture

25

Conditional execution
Instructions can be executed conditionally without braches CMP r2, r3 ;subtract and set flags ADDGE r4, r5, r6 ; if r2>r3 SUBLT r4, r5, r6 ; else

ARM

System - On - Chip Architecture

26

Conditional execution mnemonics

ARM

System - On - Chip Architecture

27

Control flow instructions


Branch instruction: B label Conditional branch: BNE label Branch and Link: BL label
BL Loop loop
;
28

MOV PC, r14


ARM

System - On - Chip Architecture

Example 1
AREA ARMex, CODE, READONLY ; Name this block of code ARMex ENTRY ; Mark first instruction to execute start MOV r0, #10 ; Set up parameters MOV r1, #3 ADD r0, r0, r1 ; r0 = r0 + r1 stop MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit SWI 0x123456 ; ARM semihosting SWI END ; Mark end of file

ARM

System - On - Chip Architecture

29

Example 2
AREA subrout, CODE, READONLY ; Name this block of code ENTRY ; Mark first instruction to execute start MOV r0, #10 ; Set up parameters MOV r1, #3 BL doadd ; Call subroutine stop MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit SWI 0x123456 ; ARM semihosting SWI doadd ADD r0, r0, r1 ; Subroutine code MOV pc, lr ; Return from subroutine END ; Mark end of file
ARM System - On - Chip Architecture 30

ARM ORGANIZATION AND


IMPLEMENTATION

3 Stage Pipeline (ARM7 80MHz)

A[31:0] address register P C

control

incrementer

P C register bank instruction decode A L U b u s A b u s multiply register B b u s & control

Fetch Decode Execute

barrel shif ter

ALU

Throughput: 1 instruction / cycle


data out register D[31:0] data in register

5 stage pipeline (1/2)

Program execution time: N inst CPI T prog f clk


Ways to reduce T prog :

Increase f clk Logic simplification Reduce CPI reduce the number of multicycle instructions.

ARM

System - On - Chip Architecture

33

5 stage pipeline (ARM9150MHz)


(2/2)

Fetch Decode Execute Buffer / Data Write - Back

ARM coprocessor interface

ARM supports upto 16 coprocessors, which can be software emulated. Each coprocessor has upto 16 generalpurpose registers ARM is a load and store architecture. Coprocessors usually handle on chip functions, such as cache and memory management.
ARM System - On - Chip Architecture 35

ARCHITECTURAL SUPPORT FOR HIGH LEVEL LANGUAGES

Floating - point accelerator

(1/2)

For floating-point operations, ARM has the FPE software emulator and the FPA 10 hardware floating point accelerator. FPA 10 includes:

Coprocessor interface Load / store unit Register bank ( 8 registers 80 bit ) ALU (adder, mult, div)

ARM

System - On - Chip Architecture

37

Floating - point accelerator


da ta b us pi pe li ne co ntro l

(2/2)

i nstructi on i ssuer

l oad /sto re un it

co proce ssor ha nd-shake

co proce ssor i nterfa ce

regi ster ban k

add mult div

ari th metic un it

ARM

System - On - Chip Architecture

38

APCS (1/2)

APCS (ARM Procedure Call Standard) is a set of rules concerning C procedure input and output. Specific use of general purpose registers. (r0 r4: arguments, r4 r8 variables, r10 stack limit, etc. ) Procedure I/O:
BL Loop

Loop MOV pc, lr


ARM System - On - Chip Architecture 39

APCS (2/2)
C code
void f1(int a) { f2(a); }
16 8 4 0 Stack pointer
ARM System - On - Chip Architecture 40

Assembly code
f1 LDR r0, [r13] STR r13!, [r14] STR r13!, [r0] BL f2 SUB r13,#4 LDR r13!, r15

THUMB PROGRAMMERS MODEL

General information

Thumb objective: Code density. Thumb has a 16 bit instruction set. A subset of the ARM instruction set is coded to a 16bit space With appropriate use great benefits can be achieved in terms of

Power efficiency Enhanced performance


ARM System - On - Chip Architecture 42

Going in and out of Thumb mode

Using the BX instruction, in ARM state: e.g. r0

Commands are assembled as 16 bit instructions with the appropriate directive If r0[0] is 1, the T bit in the CPSR becomes 1 and the PC is set to the address obtained from the remaining bits of r0. Using the BX instruction from Thumb state, we return to ARM state.
ARM System - On - Chip Architecture

43

The Thumb programmers model

Thumb registers
r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 SP (r13) LR (r14) PC (r15) Lo registers shaded registers hav e res tricted access

Hi registers CPSR

ARM

System - On - Chip Architecture

44

ARM vs. Thumb (1/3)

Thumb

Upto 70% code size reduction 40% more instructions. 45% faster code with 16-bit memory Requires about 30% less external memory
ARM

ARM

40% faster code when coupled with a 32-bit memory

System - On - Chip Architecture

45

ARM vs. Thumb (2/3)

If performance is critical:

ARM

If cost and power consumption are critical:

Thumb

ARM

System - On - Chip Architecture

46

ARM and humb interaction

A 32 bit ARM system can go into Thumb mode for specific routines, in order to meet power and memory constraints. A 16 bit system: Can use an on chip, 32 bit memory for ARM state routines, and a 16-bit off chip memory and Thumb code for the rest of the application.

ARM

System - On - Chip Architecture

47

Example 3
AREA ThumbSub, CODE, READONLY ; Name this block of code ENTRY ; Mark first instruction to execute CODE32 ; Subsequent instructions are ARM header ADR r0, start + 1 ; Processor starts in ARM state, BX r0 ; so small ARM code header used ; to call Thumb main program CODE16 ; Subsequent instructions are Thumb start MOV r0, #10 ; Set up parameters MOV r1, #3 BL doadd ; Call subroutine stop MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit SWI 0xAB ; Thumb semihosting SWI doadd ADD r0, r0, r1 ; Subroutine code MOV pc, lr ; Return from subroutine END ; Mark end of file
ARM System - On - Chip Architecture

48

Example 4
Implement the following pseudocode in ARM and Thumb assembly. Which is more efficient in terms of execution time and which in terms of code size? If r1>r2 then R3= r4 + r5 R6 = r4 r5 Else R3= r4 - r5 R6 = r4 + r5

ARM System - On - Chip Architecture 49

Example 5

Write an ARM assembly program that loads data from memory location 0x40, sets bits 3 to 5, clears bits 0 to 2 and leaves the remaining bits unchanged. Test it using 0xAD as input data

ARM

System - On - Chip Architecture

50

ARCHITECTURAL SUPPORT FOR SYSTEM DEVELOPMENT

The ARM memory interface


A basic ARM memory system

AMBA (1/4)

Advanced Microcontroller Bus Architecture


Advanced High Performance Bus Advanced System Bus Advanced Peripheral Bus

AMBA objectives: Technology independence To encourage modular system design

ARM

System - On - Chip Architecture

53

AMBA (2/4)

A typical AMBA based system

ARM

System - On - Chip Architecture

54

AMBA (3/4)

AHB bus
Burst transaction Split transaction Data bus 64 128 bit
master 1

arbiter

address slave 1

master 2

write data

slave 2

master 3

read data

slave 3

decoder

ARM

System - On - Chip Architecture

55

AMBA (4/4)

AMBA Design Kit (ADK)

An environment that assists designers in developing based components SoC designs.

ARM

System - On - Chip Architecture

56

Signal Processing Support


(1/2)

Piccolo DSP coprocessor. Various data memories for maximizing throughput.

ARM

System - On - Chip Architecture

57

Signal Processing Support

(2/2)

Piccolo
ALU
mult

decode and control

ARM7TDMI output buffer

register bank

I cache

input buffer

A A i/f MB AMBA

AMBA i/f

MEMORY HIERARCHY

Memory hierarchy
Larger size
Memory type Registers On chip cache Size 32 bit 8 32kbytes

Lower speed
Speed A few nsec 10 nsec

Off chip cache RAM

100 200 10 30 kbytes nsec Mbytes 100 nsec


ARM System - On - Chip Architecture 60

On chip memory

Necessary for performance Some system prefer RAM to on chip cache. Simpler, cheaper and less powerhungry.

ARM

System - On - Chip Architecture

61

Cache types

Cache types:

Unified cache. Separate instruction and data caches.

Performance: hit rate miss rate

t av htcache (1 h)t main


Compulsory miss: first time and address is accessed Capacity miss: When cache full Conflict miss: Two addresses compete for the same place in the cache
ARM System - On - Chip Architecture 62

Replacement policy -implementation


Least Recently Used (LRU) Least Frequently Used (LFU) Data prediction Fully-associative Direct-mapped Set-associative
ARM System - On - Chip Architecture 63

Direct mapped cache

(1/2)

A line of data stored in a tag of memory

ARM

System - On - Chip Architecture

64

Direct mapped cache

(2/2)

Each memory location has a specific place in the cache. Tag and data can be accessed at the same time. Tag RAM smaller than data RAM and has a smaller access time allowing the comparison to complete before accessing the data RAM.
ARM System - On - Chip Architecture 65

2 way set associative cache. (1/3)

Set associative cache (2/3)

A set associative cache has a number of sets yielding n way associative cache. Two addresses that would be competing for the same spot in a direct mapped cache, can be stored in different locations and accessed independently.

ARM

System - On - Chip Architecture

67

Set associative (3/3)

Set selection:

Random allocation Least recently used (LRU) Round robin (cyclic)

ARM

System - On - Chip Architecture

68

Fully associative (1/2)


address tag CAM data RA M

mux hit data

Write strategies

Write through
All write operations are passed to main memory

Write through with buffered write


Write operations are passed to main memory through the write buffer

Copy back (write back)


Write operations update only the cache.
ARM System - On - Chip Architecture 70

Cache feature summary


Org ani zati o nal feature Cache-MMU rel ati o ns hi p Cache co nte nts As s o ci ati v i ty Repl ac ement s trateg y Wri te s trateg y Physical cache Unified instruction and data cache Direct-mapped RAM-RAM Cyclic Write-through Opti o ns Virtual cache Separate instruction and data caches Set-associative RAM-RAM Random Write-through with write buffer

Fully associative CAM-RAM LRU Copy-back

ARM

System - On - Chip Architecture

71

Perfect cache performance


Cache fo rm No cache Instruction-only cache Instruction and data cache Data-only cache Perfo rmance 1 1.95 2.5 1.13

ARM

System - On - Chip Architecture

72

MMU (1/3)

Two memory management approaches: Segmentation Paging

ARM

System - On - Chip Architecture

73

MMU (2/3)

Segmented memory management:


segment selector logical address

base

limit

segment descriptor table

>?

physical address
ARM

access fault
74

System - On - Chip Architecture

MMU (3/3)

Paging memory management:


31 22 2 1 12 11 0

logical address

data

page directory

page table

page f rame

ARM

System - On - Chip Architecture

75

ARCHITECTURAL SUPPORT FOR OPERATING SYSTEMS


External Clock Trace Port Analyser 14 External Interrupts 8 external DMA requests
W'Dog Timers & RTC (PL031)

ETM
VIC (PL192) DMAC (PL080) CLCD (PL110)

CLCD Display

External Reset & Battery Fail

System Control

ARM1136JF core
AHB/APB Bridge

64

64

64

64

config

64 64 64 64

SDRAM & DDR

MPMC (PL176)

1. 2. 3. 4. 5. 6. 7. 8.

unassigned

8 AHBs

config

Bus Matrix 1. ARM Periph AHB 2. ARM D Write AHB 3. ARM D Read AHB 4. ARM I AHB 5. ARM DMA AHB 6. CLCD AHB 7. DMA 2 AHB 8. DMA 1 AHB

Static Memory

SMC (PL093)
AHB/APB Bridge AHB/APB Bridge

UART (PL011)

2x UARTs

GPIO (PL061)

SSP (PL022)

SCI (PL131)

Smart Card (UICC compliant)

32 GPIO Lines

CP15

On chip coprocessor for MMU, cache, protection unit control. Control takes place through registers with instructions executed in supervisor mode.

ARM

System - On - Chip Architecture

77

Protection Unit

Simpler alternative to the MMU. Requires simpler software and hardware. Does not use translation tables, but 8 protection regions instead.

ARM

System - On - Chip Architecture

78

ARM DEVELOPER SUITE

ARMULATOR (1/2)

Armulator: Emulator of various ARM processors. Allows project development in C, C++ or Assembly. It includes debugger, compilers, assembler and this entire set is called ARM Developer Suite (ADS).
ARM System - On - Chip Architecture 80

ARMULATOR (2/2)

Possible project options:


ARM and Thumb Interworking Mixing C, C++ and Assembly Code for ROM Exception handlers

MM
ARM System - On - Chip Architecture 81

ARMULATOR TUTORIAL

CODEWARRIOR ENVIRONMENT

ARM

System - On - Chip Architecture

82

ARM

System - On - Chip Architecture

83

ARM

System - On - Chip Architecture

84

ARM

System - On - Chip Architecture

85

ARM

System - On - Chip Architecture

86

ARM

System - On - Chip Architecture

87

You might also like