You are on page 1of 22

Developing Software for Multicore ARM Platforms

Agenda
What do we mean by multicore? Benefiting from SMP

OS support Applications Existing software

Impact on other ARM technologies

What do we Mean by Multicore?


Heterogeneous multicore systems have existed for a long
time:
User Interface Application and 3D graphics Power Manager

Cortex-A8

Mali-400 MP

Cortex-M3

Interconnect

Memory

Coherent Multicore Cluster


Homogenous multicore cluster, as part of a heterogeneous
system:
User Interface and 3D graphics

Cortex-A9

Cortex-A9

Power Manager

Coherency Logic

Mali-400 MP

Cortex-M3

Interconnect

Cortex family of MPCore processors:



4

Cortex-A5 MPCore, Cortex-A9 MPCore and Cortex-A15 MPCore Cortex-R5 MPCore, Cortex-R7 MPCore

Coherency
ARM MPCore processors provide:

Hardware maintained coherency between L1 data caches Broadcast of cache and TLB maintenance operations Inter-processor interrupt signalling using integrated interrupt controller Coherency with external un-cached masters using ACP
Cortex-A series I$ D$ Cortex-A series I$ D$ ACP

Coherency Logic

AXI connection to memory system

Agenda
What do we mean by multicore? Benefiting from SMP

OS support Applications Existing software

Impact on other ARM technologies

SMP OS
A symmetric multi-processing
(SMP) OS runs across multiple CPUs Each CPU sees the same memory
system

Application Thread Thread

Task can be scheduled to any CPU A multi-threaded task may run on several CPUs at once
CPU SMP OS

The OS can hide much of the


complexity from applications support SMP on ARM

CPU

MPCore Cluster

Many widely used OS already

Multiple OS
It is also possible to run multiple different operating systems

For example, SMP Linux in parallel with a RTOS


Application Task Thread Thread

RTOS

SMP OS

CPU

CPU MPCore Cluster

CPU

The two operating systems perform different, but related tasks

User interface on SMP Linux, Modem stack on the RTOS

Applications
Applications can be written to take advantage of multicore
schedule to different CPUs

environment Work is split across multiple independent threads, which the OS can

Functional blocks are serially dependent but temporally


CPU2 CPU0
Analogue Video Sampling

independent Each block runs as a separate thread, running on different CPUs


CPU1
Remove Inter-Frame Redundancy Quantise Samples Motion Compensation

CPU3
Buffer Store

Run-Length Compress

(Simplified MPEG encoding functional block diagram)

MPEG
9

Applications (cont)
Other examples of splitting across threads: Single frame divided
a different thread
Image Processing Application Thread Thread

into multiple regions Each region handled by

Spawn a new thread


mode on a camera

Application Application Thread Thread Thread

per frame For example, rapid shot

10

Barriers and Synchronization


The tasks being run on different cores will require
synchronization
MPCore cluster
CPU 0 Load application into memory Signal CPU1 that new application can be run CPU 1 Wait for signal from CPU0 Run application

Synchronization code will normally include manual barriers

Data Synchronization Barrier (DSB) or Data Memory Barrier (DMB)

11

Barriers in Action
STR DCCMVAU DSB ICIMVAU BPIMVA DSB STR ISB MOV P1-Pn WAIT ISB MOV r11, [r1] r1 r1 r1 r0, [r2] pc, r1 ; ; ; ; ; ; ; ; ; Save instruction to program memory clean D-$ so instruction visible to I-$ ensure clean completes on all CPUs discard stale data from I-$ and from Branch Predictor ensure I-$/BP invalidates complete for all set flag == 1 to signal completion synchronize context on this processor branch to new code

([r2] == 1) ; wait for flag signaling completion ; no barrier required here pc, r1 ; execute newly saved instruction

12

Single Threaded Tasks and SMP


Video Player E-mail client Browser SMP OS Scheduler CPU 0 CPU 1 CPU<n>
Thread Thread

Existing software may not be optimized for a multicore


environment Single threaded applications

The SMP OS can schedule different applications to different


CPUs Complexity of multicore environment hidden from the application by
the OS

13

Case Study
Single threaded browser
Browser Performance 2 1 0
1 Core 2 Cores

saw a 1.54x performance improvement when run on a dual-core system No code changes required in
the browser

1.54x

Improvement comes from

the OS ability to schedule the non-browser tasks to the other CPU


Dual core Cortex-A9 MPCore, running Android Froyo, 2.6.32 kernel, BBench2010_server

Available Compute Profile


1 Core 2 Core Idle Core 2 Off

14

Case Study (cont.)


Browser Performance with Streaming Web Radio Application 2 1.5x 1 0.78x 1.54x

0
1 Core (Browser only) 1 Core (Browser & 2 Cores (Browser and Web Radio) Web Radio) 2 Cores (Browser only)

Running on a single core, the browser performance fell (0.78x)


when also listening to streaming audio But browser performance increased when run on a dual-core frequency

Could also choose to maintain performance (1.0x), and lower


15

Agenda
What do we mean by multicore? Benefiting from SMP

OS support Applications Existing software

Impact on other ARM technologies

16

Introducing TrustZone Technology


Normal
User

Secure

Application
Vendor Specific Library
Trusted Service(s)

Privileged

Operating System TEE TrustZone Driver Secure Monitor

What about multicore?


17

TrustZone in a Multicore System


Architecture allows for a full SMP OS in the Secure world Design aim for TEE is simplicity, this aids certification

SMP support is normally not needed, and represents unnecessary complexity

TEE executes on one processor only


CPU CPU CPU CPU

Normal

Application

Application

Application

Application

SMP OS

Secure

TEE

18

Dedicated CPU
Alternative model is to dedicate one CPU to TrustZone

Only justified if making heavy use of trusted services


CPU CPU CPU CPU

Application(s)

SMP OS

TEE

19

Large Physical Address


The Cortex-A15 processor introduces support for the Large
Physical Address Extension (LPAE) Extension to ARMv7-A VMSA Provides 32-bit virtual address mapped to a greater than 32-bit
physical address

Each application is limited to a 4GB virtual address space



But OS has potentially more than 4GB of memory to work with Becomes more important as the number of current processes increases

And the amount of memory consumed by those processes increases

20

Multiple Clusters
The Cortex-A15 MPCore processor, together with AMBA4
ACE, supports multiple coherent clusters

Cortex-A15

Cortex-A15

Cortex-A15

Cortex-A15

Coherency Logic in L2 Cache

Coherency Logic in L2 Cache

Coherent Interconnect

SMP OS can be extended across multiple clusters of CPUs Expands the number of CPUs available to scheduler
21

Any Questions?

22

You might also like