You are on page 1of 121

DSP Lecture 02

Chapter 2
TMS320C6000 Architectural Overview

Learning Objectives

Describe C6000 CPU architecture.


Introduce some basic instructions.
Describe the C6000 memory map.
Provide an overview of the peripherals.
Performance
Software Development

General DSP System Block Diagram


Internal Memory
Internal Buses

External
Memory

Central
Processing
Unit

P
E
R
I
P
H
E
R
A
L
S

Implementation of Sum of Products (SOP)


It has been shown in
Chapter 1 that SOP is the
key element for most DSP
algorithms.
So lets write the code for
this algorithm and at the
same time discover the
C6000 architecture.

Y =

an * xn

n = 1

= a1 * x1 + a2 * x2 +... + aN * xN

Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required

Implementation of Sum of Products (SOP)


So lets implement the SOP
algorithm!

Y =

an * xn

n = 1

= a1 * x1 + a2 * x2 +... + aN * xN

The implementation in this


module will be done in
assembly.

Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required

Multiply (MPY)
N

Y =

an * xn

n = 1

= a1 * x1 + a2 * x2 +... + aN * xN

The multiplication of a1 by x1 is done in


assembly by the following instruction:
MPY

a1, x1, Y

This instruction is performed by a


multiplier unit that is called .M

Multiply (.M unit)


40

Y =
.M
.M

an * xn

n = 1

The . M unit performs multiplications in


hardware
MPY

.M

a1, x1, Y

Note: 16-bit by 16-bit multiplier provides a 32-bit result.


32-bit by 32-bit multiplier provides a 64-bit result.

Addition (.?)
40

Y =

an * xn

n = 1

.M
.M
.?.?

MPY

.M

a1, x1, prod

ADD

.?

Y, prod, Y

Add (.L unit)


40

Y =

an * xn

n = 1

.M
.M
.L
.L

MPY

.M

a1, x1, prod

ADD

.L

Y, prod, Y

RISC processors such as the C6000 use registers to


hold the operands, so lets change this code.

Register File - A
40

Register File A
A0
A1
A2
A3

Y =

a1
x1
prod
Y

.M
.M

.
.
.

.L
.L

an * xn

n = 1

MPY

.M

a1, x1, prod

ADD

.L

Y, prod, Y

A15
32-bits

Let us correct this by replacing a, x, prod and Y by the


registers as shown above.

Specifying Register Names


40

Register File A
A0
A1
A2
A3

Y =

a1
x1
prod
Y

.M
.M

.
.
.

.L
.L

an * xn

n = 1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

A15
32-bits

The registers A0, A1, A3 and A4 contain the values to be


used by the instructions.

Specifying Register Names


40

Register File A
A0
A1
A2
A3

Y =

a1
x1
prod
Y

.M
.M

.
.
.

.L
.L

an * xn

n = 1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

A15
32-bits

Register File A contains 16 registers (A0 -A15) which


are 32-bits wide.

Data loading
Register File A
A0
A1
A2
A3

Q: How do we load the


operands into the registers?

a1
x1
prod
Y

.M
.M

.
.
.

.L
.L

A15
32-bits

Load Unit .D
Register File A
A0
A1
A2
A3

Q: How do we load the


operands into the registers?

a1
x1
prod
Y

.M
.M

.
.
.

.L
.L
.D
.D

A15
32-bits

Data Memory

A: The operands are loaded


into the registers by loading
them from the memory
using the .D unit.

Load Unit .D
Register File A
A0
A1
A2
A3

a1
x1
prod
Y

.M
.M

.
.
.

.L
.L
.D
.D

A15
32-bits

Data Memory

It is worth noting at this


stage that the only way to
access memory is through the
.D unit.

Load Instruction
Register File A
A0
A1
A2
A3

a1
x1
prod
Y

.M
.M

.
.
.

.L
.L
.D
.D

A15
32-bits

Data Memory

Q: Which instruction(s) can be


used for loading operands
from the memory to the
registers?

Load Instructions (LDB, LDH,LDW,LDDW)


Register File A
A0
A1
A2
A3

a1
x1
prod
Y

.M
.M

.
.
.

.L
.L

Q: Which instruction(s) can be


used for loading operands
from the memory to the
registers?
A: The load instructions.

.D
.D

A15
32-bits

Data Memory

Using the Load Instructions


Before using the load unit you
have to be aware that this
processor is byte addressable,
which means that each byte is
represented by a unique
address.

Data

address
00000000
00000002
00000004
00000006
00000008

Also the addresses are 32-bit


wide.

FFFFFFFF
16-bits

Using the Load Instructions


The syntax for the load
instruction is:
LD *Rn,Rm
Where:
Rn is a register that contains
the address of the operand to
be loaded

Data

address

a1
x1

00000000
00000002
00000004

prod
Y

00000006
00000008

and
Rm is the destination register.

FFFFFFFF
16-bits

Using the Load Instructions


The syntax for the load
instruction is:
LD *Rn,Rm
The question now is how many
bytes are going to be loaded
into the destination register?

Data

address

a1
x1

00000000
00000002
00000004

prod
Y

00000006
00000008

FFFFFFFF
16-bits

Using the Load Instructions


The syntax for the load
instruction is:
LD *Rn,Rm
The answer, is that it depends on
the instruction you choose:

Data

address

a1
x1

00000000
00000002
00000004

prod
Y

00000006
00000008

LDB: loads one byte (8-bit)


LDH: loads half word (16-bit)
LDW: loads a word (32-bit)
LDDW: loads a double word (64bit)

Note: LD on its own does not


exist.

FFFFFFFF
16-bits

Using the Load Instructions


The syntax for the load
instruction is:

Data

0xA

0xB

0xC

0xD

0x2

0x1

0x4

0x3

If we assume that A5 = 0x4 then:

0x6

0x5

(1) LDB *A5, A7 ; gives A7 = 0x00000001

0x8

0x7

LD *Rn,Rm
Example:

address
00000000
00000002
00000004
00000006
00000008

(2) LDH *A5,A7; gives A7 = 0x00000201


(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201

FFFFFFFF
16-bits

Using the Load Instructions


The syntax for the load
instruction is:

Data

0xA

0xB

0xC

0xD

0x2

0x1

0x4

0x3

If we assume that A5 = 0x4 then:

0x6

0x5

(1) LDB *A5, A7 ; gives A7 = 0x00000001

0x8

0x7

LD *Rn,Rm
Example:

address
00000000
00000002
00000004
00000006
00000008

(2) LDH *A5,A7; gives A7 = 0x00000201


(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201

FFFFFFFF
16-bits

Using the Load Instructions


The syntax for the load
instruction is:

Data

0xA

0xB

0xC

0xD

0x2

0x1

0x4

0x3

If we assume that A5 = 0x4 then:

0x6

0x5

(1) LDB *A5, A7 ; gives A7 = 0x00000001

0x8

0x7

LD *Rn,Rm
Example:

address
00000000
00000002
00000004
00000006
00000008

(2) LDH *A5,A7; gives A7 = 0x00000201


(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201

FFFFFFFF
16-bits

Using the Load Instructions


The syntax for the load
instruction is:

Data

0xA

0xB

0xC

0xD

0x2

0x1

0x4

0x3

If we assume that A5 = 0x4 then:

0x6

0x5

(1) LDB *A5, A7 ; gives A7 = 0x00000001

0x8

0x7

LD *Rn,Rm
Example:

address
00000000
00000002
00000004
00000006
00000008

(2) LDH *A5,A7; gives A7 = 0x00000201


(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201

FFFFFFFF
16-bits

Using the Load Instructions


The syntax for the load
instruction is:

Data

0xA

0xB

0xC

0xD

0x2

0x1

0x4

0x3

If we assume that A5 = 0x4 then:

0x6

0x5

(1) LDB *A5, A7 ; gives A7 = 0x00000001

0x8

0x7

LD *Rn,Rm
Example:

address
00000000
00000002
00000004
00000006
00000008

(2) LDH *A5,A7; gives A7 = 0x00000201


(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201

FFFFFFFF
16-bits

Using the Load Instructions


The syntax for the load
instruction is:
LD *Rn,Rm
Question:
If data can only be accessed by the
load instruction and the .D unit,
how can we load the register
pointer Rn in the first place?

address

Data
0xA

0xB

0xC

0xD

0x2

0x1

0x4

0x3

0x6

0x5

0x8

0x7

00000000
00000002
00000004
00000006
00000008

FFFFFFFF
16-bits

Loading the Pointer Rn


The instruction MVKL will allow a move of a 16-bit constant into a register as shown below:
MVKL

.?

a, A5

(a is a constant or label)

How many bits represent a full address?


32 bits
So why does the instruction not allow a 32-bit move?
All instructions are 32-bit wide (see instruction opcode).

Loading the Pointer Rn


To solve this problem another instruction is available:
MVKH

eg.

MVKH

.?

a, A5

(a is a constant or label)

ah

al

ah

A5

Finally, to move the 32-bit address to a register we can


use:
MVKL

a, A5

MVKH

a, A5

Loading the Pointer Rn


Always use MVKL then MVKH, look at the following
examples:
Example 1
A5 = 0x87654321
MVKL

0x1234FABC, A5

A5 = 0xFFFFFABC (sign extension)

MVKH

0x1234FABC, A5

A5 = 0x1234FABC ; OK

Example 2
MVKH
A5 = 0x12344321

0x1234FABC, A5

MVKL

0x1234FABC, A5

A5 = 0xFFFFFABC ; Wrong

LDH, MVKL and MVKH


Register File A
A0
A1

a
x

A2
A3
A4

prod
Y

.M
.M

.
.
.

.L
.L
.D
.D

A15

MVKL
MVKH

pt1, A5
pt1, A5

MVKL
MVKH

pt2, A6
pt2, A6

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

32-bits
pt1 and pt2 point to some locations

Data Memory

in the data memory.

Creating a loop
So far we have only
implemented the SOP
for one tap only, i.e.
Y= a1 * x1
So lets create a loop
so that we can
implement the SOP
for N Taps.

MVKL
MVKH

pt1, A5
pt1, A5

MVKL
MVKH

pt2, A6
pt2, A6

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

Creating a loop
So far we have only
implemented the SOP
for one tap only, i.e.
Y= a1 * x1
So lets create a loop
so that we can
implement the SOP
for N Taps.

With the C6000 processors


there are no dedicated
instructions such as block
repeat. The loop is created
using the B instruction.

What are the steps for creating a loop


1. Create a label to branch to.
2. Add a branch instruction, B.
3. Create a loop counter.
4. Add an instruction to decrement the loop counter.
5. Make the branch conditional based on the value in
the loop counter.

1. Create a label to branch to

loop

MVKL
MVKH

pt1, A5
pt1, A5

MVKL
MVKH

pt2, A6
pt2, A6

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

2. Add a branch instruction, B.

loop

MVKL
MVKH

pt1, A5
pt1, A5

MVKL
MVKH

pt2, A6
pt2, A6

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

.?

loop

Which unit is used by the B instruction?


Register File A
A0
A1
A2
A3

a
x

.S
.S

prod
Y

.M
.M
.M
.M

.
.
.

.L
.L
.L
.L
.D
.D
.D
.D

A15
32-bits

Data Memory

loop

MVKL
MVKH

pt1, A5
pt1, A5

MVKL
MVKH

pt2, A6
pt2, A6

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

.?

loop

Which unit is used by the B instruction?


Register File A
A0
A1
A2
A3

a
x

.S
.S

prod
Y

.M
.M
.M
.M

.
.
.

.L
.L
.L
.L
.D
.D
.D
.D

A15
32-bits

Data Memory

loop

MVKL
MVKH

.S
.S

pt1, A5
pt1, A5

MVKL
MVKH

.S
.S

pt2, A6
pt2, A6

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

.S

loop

3. Create a loop counter.


Register File A
A0
A1
A2
A3

a
x

.S
.S

prod
Y

.M
.M
.M
.M

.
.
.

.L
.L
.L
.L
.D
.D
.D
.D

A15
32-bits

Data Memory

loop

MVKL
MVKH

.S
.S

pt1, A5
pt1, A5

MVKL
MVKH
MVKL

.S
.S
.S

pt2, A6
pt2, A6
count, B0

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

.S

loop

B registers will be introduced later

4. Decrement the loop counter


Register File A
A0
A1
A2
A3

a
x

.S
.S

prod
Y

.M
.M
.M
.M

.
.
.

.L
.L
.L
.L
.D
.D
.D
.D

A15
32-bits

Data Memory

loop

MVKL
MVKH

.S
.S

pt1, A5
pt1, A5

MVKL
MVKH
MVKL

.S
.S
.S

pt2, A6
pt2, A6
count, B0

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.S

B0, 1, B0

.S

loop

5. Make the branch conditional based on the value


in the loop counter
What is the syntax for making instruction conditional?
[condition]

Instruction

Label

e.g.
[B1]

loop

(1) The condition can be one of the following


registers: A1, A2, B0, B1, B2.
(2) Any instruction can be conditional.

5. Make the branch conditional based on the value


in the loop counter
The condition can be inverted by adding the exclamation symbol ! as follows:
[!condition] Instruction

Label

e.g.
[!B0]

loop ;branch if B0 = 0

[B0]

loop

;branch if B0 != 0

5. Make the branch conditional


Register File A
A0
A1
A2
A3

a
x

.S
.S

prod
Y

.M
.M
.M
.M

.
.
.

.L
.L
.L
.L
.D
.D
.D
.D

A15
32-bits

Data Memory

loop

[B0]

MVKL
MVKH

.S2
.S2

pt1, A5
pt1, A5

MVKL
MVKH
MVKL

.S2
.S2
.S2

pt2, A6
pt2, A6
count, B0

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.S

B0, 1, B0

.S

loop

More on the Branch Instruction (1)


With this processor all the instructions are
encoded in a 32-bit.
Therefore the label must have a dynamic range
of less than 32-bit as the instruction B has to be
coded.
32-bit

B
Case 1:

21-bit relative address

B .S1

label

Relative branch.
Label limited to +/- 220 offset.

More on the Branch Instruction (2)


By specifying a register as an operand instead of
a label, it is possible to have an absolute branch.
This will allow a dynamic range of 232.
32-bit

B
Case 2:

B .S2

Absolute branch.
Operates on .S2 ONLY!

5-bit register
code

register

Testing the code


MVKL
MVKH

.S2
.S2

pt1, A5
pt1, A5

MVKL
MVKH
MVKL

.S2
.S2
.S2

pt2, A6
pt2, A6
count, B0

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

However, we would like to perform:

ADD

.L

A4, A3, A4

a0*x0 + a1*x1 + a2*x2 + + aN*xN

SUB

.S

B0, 1, B0

.S

loop

This code performs the following


operations:

loop
a0*x0 + a0*x0 + a0*x0 + + a0*x0

[B0]

Modifying the pointers

The solution is to modify the pointers

loop

A5 and A6.

[B0]

MVKL
MVKH

.S2
.S2

pt1, A5
pt1, A5

MVKL
MVKH
MVKL

.S2
.S2
.S2

pt2, A6
pt2, A6
count, B0

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.S

B0, 1, B0

.S

loop

Indexing Pointers
Syntax

Description

*R

Pointer

Pointer
Modified
No

In this case the pointers are used but not modified.

R can be any register

Indexing Pointers
Syntax

Description

*R
*+R[disp]
*-R[disp]

Pointer
+ Pre-offset
- Pre-offset

Pointer
Modified
No
No
No

In this case the pointers are modified BEFORE being used


and RESTORED to their previous values.

[disp] specifies the number of elements size in DW (64-bit), W


(32-bit), H (16-bit), or B (8-bit).
disp = R or 5-bit constant.
R can be any register.

Indexing Pointers
Syntax

Description

*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]

Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement

Pointer
Modified
No
No
No
Yes
Yes

In this case the pointers are modified BEFORE being used


and NOT RESTORED to their Previous Values.

Indexing Pointers
Syntax

Description

*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
*R++[disp]
*R--[disp]

Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Post-increment
Post-decrement

Pointer
Modified
No
No
No
Yes
Yes
Yes
Yes

In this case the pointers are modified AFTER being used


and NOT RESTORED to their Previous Values.

Indexing Pointers
Syntax

Description

*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
*R++[disp]
*R--[disp]

Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Post-increment
Post-decrement

Pointer
Modified
No
No
No
Yes
Yes
Yes
Yes

[disp] specifies # elements - size in DW, W, H, or B.


disp = R or 5-bit constant.
R can be any register.

Modify and testing the code

This code now performs the following


operations:

loop

a0*x0 + a1*x1 + a2*x2 + ... + aN*xN

[B0]

MVKL
MVKH

.S2
.S2

pt1, A5
pt1, A5

MVKL
MVKH
MVKL

.S2
.S2
.S2

pt2, A6
pt2, A6
count, B0

LDH

.D

*A5++, A0

LDH

.D

*A6++, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.S

B0, 1, B0

.S

loop

Store the final result

This code now performs the following


operations:

loop

a0*x0 + a1*x1 + a2*x2 + ... + aN*xN

[B0]

MVKL
MVKH

.S2
.S2

pt1, A5
pt1, A5

MVKL
MVKH
MVKL

.S2
.S2
.S2

pt2, A6
pt2, A6
count, B0

LDH

.D

*A5++, A0

LDH

.D

*A6++, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.S

B0, 1, B0

.S

loop

STH

.D

A4, *A7

Store the final result

The Pointer A7 has not been initialized.

loop

[B0]

MVKL
MVKH

.S2
.S2

pt1, A5
pt1, A5

MVKL
MVKH
MVKL

.S2
.S2
.S2

pt2, A6
pt2, A6
count, B0

LDH

.D

*A5++, A0

LDH

.D

*A6++, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.S

B0, 1, B0

.S

loop

STH

.D

A4, *A7

Store the final result


MVKL
MVKH

.S2
.S2

pt1, A5
pt1, A5

MVKL
MVKH

.S2
.S2

pt2, A6
pt2, A6

MVKL .S2 pt3, A7


MVKH .S2 pt3, A7
MVKL .S2 count, B0

The Pointer A7 is now initialized.

loop

[B0]

LDH

.D

*A5++, A0

LDH

.D

*A6++, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.S

B0, 1, B0

.S

loop

STH

.D

A4, *A7

What is the initial value of A4?

A4 is used as an accumulator,
so it needs to be reset to zero.

loop

[B0]

MVKL
MVKH

.S2
.S2

pt1, A5
pt1, A5

MVKL
MVKH

.S2
.S2

pt2, A6
pt2, A6

MVKL
MVKH
MVKL
ZERO
LDH

.S2
.S2
.S2
.L
.D

pt3, A7
pt3, A7
count, B0
A4
*A5++, A0

LDH

.D

*A6++, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.S

B0, 1, B0

.S

loop

STH

.D

A4, *A7

Increasing the processing power!


Register File A
A0
A1
A2
A3
A4

.S1
.S1
.M1
.M1

.
.
.

How can we add more


processing power to
this processor?

.L1
.L1
.D1
.D1

A15
32-bits

Data Memory

Increasing the processing power!


Register File A
A0
A1
A2
A3
A4

.S1
.S1
.M1
.M1

.
.
.

.L1
.L1

(1) Increase the clock


frequency.

(2) Increase the number


of Processing units.

.D1
.D1

A15
32-bits

Data Memory

To increase the Processing Power, this processor has two


sides (A and B or 1 and 2)
Register File A
A0
A1
A2
A3
A4

.
.
.
A15

Register File B
.S1
.S1

.S2
.S2

.M1
.M1

.M2
.M2

.L1
.L1

.L2
.L2

.D1
.D1

.D2
.D2

32-bits

B0
B1
B2
B3
B4

.
.
.
B15
32-bits

Data Memory

Can the two sides exchange operands in order to increase


performance?
Register File A
A0
A1
A2
A3
A4

.
.
.
A15

Register File B
.S1
.S1

.S2
.S2

.M1
.M1

.M2
.M2

.L1
.L1

.L2
.L2

.D1
.D1

.D2
.D2

32-bits

B0
B1
B2
B3
B4

.
.
.
B15
32-bits

Data Memory

The answer is YES but there are limitations.

To exchange operands between the two


sides, some cross paths or links are
required.

What is a cross path?

A cross path links one side of the CPU to


the other.

There are two types of cross paths:

Data cross paths.

Address cross paths.

Data Cross Paths

Data cross paths can also be referred to


as register file cross paths.

These cross paths allow operands from


one side to be used by the other side.

There are only two cross paths:

one path which conveys data from side B


to side A, 1X.

one path which conveys data from side A


to side B, 2X.

TMS320C67x Data-Path

Data Cross Paths

Data cross paths only apply to the .L, .S


and .M units.

The data cross paths are very useful,


however there are some limitations in
their use.

Data Cross Path Limitations

<dst>

.L1
.M1
.S1

<src>
<src>

(1) The destination register must be


on same side as unit.
(2) Source registers - up to one cross
path per execute packet per side.
Execute packet: group of instructions that
execute simultaneously.

2
x
1
x

Data Cross Path Limitations

<dst>

eg:
ADD
MPY
SUB
|| ADD

.L1
.M1
.S1

.L1x
.M1x
.S1x
.L1x

<src>
<src>

A0,A1,B2
A0,B6,A9
A8,B2,A8
A0,B0,A2

|| Means that the SUB and ADD


belong to the same fetch packet,
therefore execute
simultaneously.

2
x
1
x

Data Cross Path Limitations

<dst>

eg:
ADD
MPY
SUB
|| ADD

.L1
.M1
.S1

.L1x
.M1x
.S1x
.L1x

<src>

A0,A1,B2
A0,B6,A9
A8,B2,A8
A0,B0,A2

NOT VALID!

<src>

2
x
1
x

Data Cross Paths for both sides

<dst>

<dst>

.L1
.M1
.S1

.L2
.M2
.S2

<src>
<src>

2
x
<src>
<src>

1
x

Address cross paths


Data
Addr

.D1

(1) The pointer must be on the same


side of the unit.

LDW.D1T1
LDW.D1T1
STW.D1T1
STW.D1T1

*A0,A5
*A0,A5
A5,*A0
A5,*A0

Load or store to either side


Data1
DA1 = T1

DA2 = T2

Data2

A5

.D1
LDW.D1T1
LDW.D1T1
LDW.D1T2
LDW.D1T2

*A0

*A0,A5
*A0,A5
*A0,B5
*A0,B5
B5

A
B

Standard Parallel Loads


Data1
DA1 = T1

DA2 = T2

A5

.D1

*A0

.D2

*B0

LDW.D1T1
LDW.D1T1
||
|| LDW.D2T2
LDW.D2T2

*A0,A5
*A0,A5
*B0,B5
*B0,B5

B5

A
B

Parallel Load/Store using address cross


paths
Data1
DA1 = T1

DA2 = T2

A5

.D1

*A0

.D2

*B0

LDW.D1T2
LDW.D1T2
||
|| STW.D2T1
STW.D2T1

*A0,B5
*A0,B5
A5,*B0
A5,*B0

B5

A
B

Fill the blanks ... Does this work?


Data1
DA1 = T1

DA2 = T2

.D1

*A0

.D2

*B0

LDW.D1__
LDW.D1__
||
|| STW.D2__
STW.D2__

*A0,B5
*A0,B5
B6,*B0
B6,*B0

A
B

Not Allowed!
Parallel accesses: both cross or neither cross
Data1

DA2 = T2

.D1

*A0

.D2

*B0

LDW.D1
LDW.D1T2
T2
||
|| STW.D2
STW.D2T2
T2

B5

*A0,B5
*A0,B5 B6
B6,*B0
B6,*B0

A
B

Conditions Dont Use Cross Paths


If a conditional register comes from the opposite side,
it does NOT use a
data or address cross-path.
Examples:
[B2]
[A1]

ADD
LDW

.L1
.D2

A2,A0,A4
*B0,B5

C62x Data-Path Summary


CPU
Ref Guide

Full CPU Datapath


(Pg 2-2)

C67x Data-Path Summary

C67x

Cross Paths - Summary

Data
Destination register on same side as unit.
Source registers - up to one cross path per execute
packet per side.
Use x to indicate cross-path.

Address
Pointer must be on same side as unit.
Data can be transferred to/from either side.
Parallel accesses: both cross or neither cross.
Conditionals Dont Use Cross Paths.

Code Review (using side A only)


40

Y =
MVK
loop: LDH
LDH
MPY
ADD
SUB
[A2] B
STH

.S1
.D1
.D1
.M1
.L1
.L1
.S1
.D1

an * xn
n = 1

40, A2
*A5++, A0
*A6++, A1
A0, A1, A3
A3, A4, A4
A2, 1, A2
loop
A4, *A7

; A2 = 40, loop count


; A0 = a(n)
; A1 = x(n)
; A3 = a(n) * x(n)
; Y = Y + A3
; decrement loop count
; if A2 0, branch
; *A7 = Y

Note: Assume that A4 was previously cleared and the pointers are initialised.

Dual Resources : Twice as Nice


Register File A

Register File B

A0
A1
A2
A3
A4
A5
A6
A7
..
A15

cn
xn
cnt
prd
sum
*c
*x
*y

.S1
.S1

.S2
.S2

.M1
.M1

.M2
.M2

.L1
.L1

.L2
.L2

..

.D1
.D1

.D2
.D2

A31

32-bits

or

..
32-bits

B0
B1
B2
B3
B4
B5
B6
B7
..
B15
or

B31

Our final view of the sum of products example...

Optional - Resource Specific Coding


40

y =

Register File A

n = 1

A0
A1
A2
A3
A4
A5
A6
A7
..
A15

cn
xn
cnt
prd
sum
*c
*x
*y

.M1
.M1

..

.D1
.D1

A31

32-bits

or

.S1
.S1
loop:

.L1
.L1

cn * xn

MVK

.S1

40, A2

LDH

.D1

*A5++, A0

LDH

.D1

*A6++, A1

MPY

.M1

A0, A1, A3

ADD

.L1

A4, A3, A4

SUB

.S1

A2, 1, A2

.S1

loop

.D1

A4, *A7

[A2] B
STW

Its easier to use symbols rather than


register names, but you can use
either method.

TMS320C6000 Instruction Set

'C6000 System Block Diagram

External
Memory

Internal Buses

.D1 .D2
.M1 .M2
.L1 .L2
.S1 .S2

Reggister Set B

Register Set A

P
E
R
I
P
H
E
R
A
L
S

On-chip
Memory

CPU

To summarize each units instructions ...

C62x RISC-like instruction set


.S Unit

.S
.S
.L
.L
.D
.D
.M
.M

ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKH

NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO

.L Unit
ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM

NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO

.M Unit
.D Unit
ADD
NEG
ADDAB (B/H/W) STB
(B/H/W)
LDB
(B/H/W) SUB
SUBAB (B/H/W)
MV
ZERO

MPY
MPYH
MPYLH
MPYHL

SMPY
SMPYH

No Unit Used
NOP

IDLE

'C62x RISC-Like Instruction Set (by category)


Arithmetic

Logical

ABS
ADD
ADDA
ADDK
ADD2
MPY
MPYH
NEG
SMPY
SMPYH
SADD
SAT
SSUB
SUB
SUBA
SUBC
SUB2
ZERO

AND
CMPEQ
CMPGT
CMPLT
NOT
OR
SHL
SHR
SSHL
XOR

Bit Mgmt
CLR
EXT
LMBD
NORM
SET

Data Mgmt
LDB/H/W
MV
MVC
MVK
MVKL
MVKH
MVKLH
STB/H/W

Program Ctrl
B
IDLE
NOP

Note: Refer to the 'C6000 CPU Reference Guide for more details.

C67x: Superset of Fixed-Point (by units)


.S Unit

.S
.S
.L
.L
.D
.D
.M
.M

ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKH

NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO

ABSSP
ABSDP
CMPGTSP
CMPEQSP
CMPLTSP
CMPGTDP
CMPEQDP
CMPLTDP
RCPSP
RCPDP
RSQRSP
RSQRDP
SPDP

.D Unit
ADD
NEG
ADDAB (B/H/W) STB
(B/H/W)
LDB
(B/H/W) SUB
LDDW
SUBAB (B/H/W)
MV
ZERO

.L Unit
ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM

NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO

ADDSP
ADDDP
SUBSP
SUBDP
INTSP
INTDP
SPINT
DPINT
SPRTUNC
DPTRUNC
DPSP

.M Unit
MPY
MPYH
MPYLH
MPYHL

SMPY
SMPYH

MPYSP
MPYDP
MPYI
MPYID

No Unit Required
NOP

IDLE

'C64x: Superset of C62x Instruction Set


.S
.S

.D
.D

Dual/Quad Arith
SADD2
SADDUS2
SADD4

Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
Bitwise Logical UNPKHU4
ANDN
UNPKLU4
Shifts & Merge SWAP2
SPACK2
SHR2
SPACKU4
SHRU2
SHLMB
SHRMB
Dual Arithmetic Mem Access
ADD2
LDDW
SUB2
LDNW
LDNDW
Bitwise Logical STDW
AND
STNW
ANDN
STNDW
OR
XOR
Load Constant
MVK (5-bit)
Address Calc.
ADDAD

Compares
CMPEQ2
CMPEQ4
CMPGT2
CMPGT4

.L
.L

Branches/PC
BDEC
BPOS
BNOP
ADDKPC

Dual/Quad Arith
ABS2
ADD2
ADD4
MAX
MIN
SUB2
SUB4
SUBABS4
Bitwise Logical
ANDN

.M
.M
Average
AVG2
AVG4
Shifts
ROTL
SSHVL
SSHVR

Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
PACKH4
PACKL4
UNPKHU4
UNPKLU4
SWAP2/4

Multiplies
MPYHI
Shift & Merge
MPYLI
SHLMB
MPYHIR
SHRMB
MPYLIR
Load Constant
MPY2
MVK (5-bit)
SMPY2
Bit Operations DOTP2
DOTPN2
BITC4
DOTPRSU2
BITR
DOTPNRSU2
DEAL
DOTPU4
SHFL
DOTPSU4
Move
GMPY4
MVD
XPND2/4

Sample C62x Compiler Benchmarks


Algorithm

Used In

Asm
Cycles

Assembly Time
(s)

C Cycles
(Rel 4.0)

Block Mean Square Error


MSE of a 20 column image matrix

For motion
compensation of
image data

348

1.16

402

1.34

87%

Codebook Search

CELP based voice


coders

977

3.26

961

3.20

100%

Vector Max
40 element input vector

Search Algorithms

61

0.20

59

0.20

100%

VSELP based voice


coders

238

0.79

280

0.93

85%

Search Algorithms

1185

3.95

1318

4.39

90%

IIR Filter
16 coefficients

Filter

43

0.14

38

0.13

100%

IIR cascaded biquads


10 Cascaded biquads (Direct Form
II)

Filter

70

0.23

75

0.25

93%

VSELP based voice


coders

61

0.20

58

0.19

100%

51

0.17

47

0.16

100%

279

0.93

274

0.91

100%

All-zero FIR Filter


40 samples,
10 coefficients
Minimum Error Search
Table Size = 2304

MAC
Two 40 sample vectors
Vector Sum
Two 44 sample vectors
MSE
MSE between two 256 element
vectors

Mean Sq. Error


Computation in
Vector Quantizer

C Time (s)

% Efficiency vs
Hand Coded

Completely
Completely natural
natural C
C code
code (non
(non C6000
C6000 specific)
specific)
Code
Code available
available at:
at: http://www.ti.com/sc/c6000compiler
http://www.ti.com/sc/c6000compiler
TI C62x Compiler Performance Release 4.0: Execution Time in s @ 300
MHz Versus hand-coded assembly based on cycle count

Sample Imaging & Telecom Benchmarks


DSP & Image Processing
Kernels

Cycle Count

Performance

C62x

C64x

Reed Solomon Decode: Syndrome


Accumulation
(204,188,8) Packet

1680

470

Viterbi Decode (GSM)


(16 states)

38.25

FFT - Radix 4 - Complex


(size = N log (N)) (16-bit)

12.7

Polyphase Filter - Image Scaling


(8-bit)

0.77

Correlation - 3x3
(8-bit)

4.5

Median Filter - 3x3


(8-bit)

9.0

Motion Estimation - 8x8 MAD (8-bit)

cycles/packet

14*
cycles/output

6.0
cycles/data

0.33

cycles/output/filter tap

1.28
cycles/pixel

2.1
cycles/pixel

0.953

0.126
cycles/pixel

Includes traceback

Cycle Improvement
C64:C62

720MHz C64x vs
300MHz C62x

3.5x

8.4x

2.7x

6.5x

2.1x

5x

2.3x

5.5x

3.5x

8.4x

4.3x

10.3x

7.6x

18.2x

Sample C62x Compiler Benchmarks


Algorithm

Used In

Asm
Cycles

Assembly Time
(s)

C Cycles
(Rel 4.0)

Block Mean Square Error


MSE of a 20 column image matrix

For motion
compensation of
image data

348

1.16

402

1.34

87%

Codebook Search

CELP based voice


coders

977

3.26

961

3.20

100%

Vector Max
40 element input vector

Search Algorithms

61

0.20

59

0.20

100%

VSELP based voice


coders

238

0.79

280

0.93

85%

Search Algorithms

1185

3.95

1318

4.39

90%

IIR Filter
16 coefficients

Filter

43

0.14

38

0.13

100%

IIR cascaded biquads


10 Cascaded biquads (Direct Form
II)

Filter

70

0.23

75

0.25

93%

VSELP based voice


coders

61

0.20

58

0.19

100%

51

0.17

47

0.16

100%

279

0.93

274

0.91

100%

All-zero FIR Filter


40 samples,
10 coefficients
Minimum Error Search
Table Size = 2304

MAC
Two 40 sample vectors
Vector Sum
Two 44 sample vectors
MSE
MSE between two 256 element
vectors

Mean Sq. Error


Computation in
Vector Quantizer

C Time (s)

% Efficiency vs
Hand Coded

Completely
Completely natural
natural C
C code
code (non
(non C6000
C6000 specific)
specific)
Code
Code available
available at:
at: http://www.ti.com/sc/c6000compiler
http://www.ti.com/sc/c6000compiler
TI C62x Compiler Performance Release 4.0: Execution Time in s @ 300
MHz Versus hand-coded assembly based on cycle count

TMS320C6000 Memory

Memory size per device


Devices
C6201,
C6204,

C6701
C6205

Internal

EMIFA

EMIFB

P
D

=
=

64 kB
64 kB

C6202

P
D

=
=

256 kB
128 kB

C6203

P
D

=
=

384 kB
512 kB

L1P
L1D
L2

=
=
=

4 kB
4 kB
64 kB

C6713

L1P
L1D
L2

=
=
=

4 kB
4 kB
256 kB

128M Bytes
(32-bits wide)

N/A

C6411
DM642

L1P
L1D
L2

=
=
=

16 kB
16 kB
256 kB

128M Bytes
(32-bits wide)

N/A

C6414
C6415
C6416

L1P
L1D
L2

=
=
=

16 kB
16 kB
1 MB

256M Bytes
(64-bits wide)

C6211
C6711
C6712

52M Bytes
(32-bits wide)

128M Bytes
(32-bits wide)

N/A

N/A

64M Bytes
(16-bits wide)

64M Bytes
(16-bits wide)

Internal Memory Summary


Devices

Internal
(L2)

External

C6211
C6711
C6713

64 kB

512M
(32-bit wide)

C6712

256 kB

512M
(16-bit wide)

Devices

Internal
(L2)

C6414
C6415
C6416

1 MB

DM642

256 kB

C6411

256 kB

LINK: TMS320C6000 DSP Generation

External
A: 1GB
B: 256kB

(64-bit)
(16-bit)

1GB (64-bit)
256MB (32-bit)

Performance
Making use of Parallelism

Given this simple loop

40

y =

n = 1

c
x
cnt
prod
y
*cp
*xp
*yp

cn * xn

short mac(short *c, short *x, int count) {


for (i=0; i < count; i++) {
sum += c[i] * x[i]; }

.S1
.S1
.M1
.M1

MVK

.S1

40, cnt

LDH

.D1

*cp++, c

LDH

.D1

*xp++, x

MPY

.M1

c, x, prod

ADD

.L1

y, prod, y

SUB

.L1

cnt, 1, cnt

.S1

loop

STW

.D

y, *yp

loop:

.L1
.L1
.D1
.D1
[cnt]

How many of these instructions can we get in parallel?

C62x Intense Parallelism


short mac(short *c, short *x, int count) {
for (i=0; i < count; i++) {
sum += c[i] * x[i]; }

MPY
||
MPYH
|| [B0] B
||
LDW
||
LDW

.M2
.M1
.S1
.D1
.D2

B7,A3,B4
B7,A3,A5
L3
*A4++,A3
*B6++,B7

L2: ; PIPED LOOP PROLOG

MPY .M2 B7,A3,B4


||
MPYH .M1 B7,A3,A5
Given
C
code
Giventhis
this
C
code
LDW .D1 *A4++,A3
|| [B0] B
.S1 L3
||
LDW .D2 *B6++,B7
||
LDW .D1 *A4++,A3
The
C62x
compiler
can
achieve
||
LDW .D2 *B6++,B7
The C62x compiler can achieve
LDW .D1 *A4++,A3per cycle
Two
Sum-of-Products
;** -----------------------*
Two
Sum-of-Products
||
LDW .D2 *B6++,B7 per cycle

L3:

[B0] B
.S1 L3
||
LDW .D1 *A4++,A3
||
LDW .D2 *B6++,B7
[B0] B
.S1 L3
||
LDW .D1 *A4++,A3
||
LDW .D2 *B6++,B7
[B0] B
.S1 L3
||
LDW .D1 *A4++,A3
||
LDW .D2 *B6++,B7

||
||
||
||
||
||
||

; PIPED LOOP KERNEL


ADD .L2 B4,B5,B5
ADD .L1 A5,A0,A0
MPY .M2 B7,A3,B4
MPYH .M1 B7,A3,A5
[B0]B
.S1 L3
[B0]SUB .S2 B0,1,B0
LDW .D1 *A4++,A3
LDW .D2 *B6++,B7

;** -----------------------*

What about the C67x?

C67x MAC using Natural C


Memory
The
C67x
compiler
gets
two
32-bit
The
C67x
compiler
gets
two
32-bit
A0
B0
.D1
.D1 .D2
.D2
floating-point
floating-point

float mac(float *c, float *x, int count)


{ int i, float sum = 0;
for (i=0; i < count; i++) {
sum += c[i] * x[i]; }

Sum-of-Products
Sum-of-Productsper
periteration
iteration

..
A15

.M1
.M1

.M2
.M2

.L1
.L1

.L2
.L2

.S1
.S1

.S2
.S2

Controller/Decoder
Controller/Decoder

..
B15

;** --------------------------------------------------*
LOOP: ; PIPED LOOP KERNEL
LDDW .D1
A4++,A7:A6
||
LDDW .D2
B4++,B7:B6
||
MPYSP .M1X
A6,B6,A5
||
MPYSP .M2X
A7,B7,B5
||
ADDSP .L1
A5,A8,A8
||
ADDSP .L2
B5,B8,B8
|| [A1] B
.S2
LOOP
|| [A1] SUB
.S1
A1,1,A1
;** --------------------------------------------------*
Can the 'C64x do better?

C64x gets four MACs using DOTP2


short mac(short *c, short *x, int count)
{ int i, short sum = 0;

DOTP2
m1

n1

m0

A5

n0

B5

=
m1*n1 + m0*n0

A6

+
running sum

A7

for (i=0; i < count; i++) {


sum += c[i] * x[i]; }
;** --------------------------------------------------*
; PIPED LOOP KERNEL
LOOP: ADD
.L2
B8,B6,B6
||
ADD
.L1
A6,A7,A7
||
DOTP2 .M2X B4,A4,B8
||
DOTP2 .M1X B5,A5,A6
|| [ B0] B
.S1
LOOP
|| [ B0] SUB
.S2
B0,-1,B0
||
LDDW .D2T2 *B7++,B5:B4
||
LDDW .D1T1 *A3++,A5:A4
;** --------------------------------------------------*
How many multiplies can the C6x perform?

MMACs

How many 16-bit MMACs (millions of MACs per second)


can the 'C6201 perform?
400 MMACs

(two .M units x 200 MHz)

How about 16x16 MMACs on the C64x devices?


2 .M units
x
2 16-bit MACs (per .M unit / per cycle)
x 720 MHz
---------------2880 MMACs

How many 8-bit MMACs on the C64x?


5760 MMACs (on 8-bit data)

How Do We Get Such High Parallelism?


Compiler and Assembly Optimizer use a technique called Software Pipelining
Software pipelining enables high performance
(esp. on DSP-type loops)
Key point: Tools do all the work!

What is software pipelining?


Let's look at a simple example ...

Tools Use Software Pipelining


Heres a simple example to demonstrate ...

LDH
||

How many cycles would


it take to perform this
loop 5 times?

LDH
MPY

5 x 3 = 15
______________
cycles

ADD

Our functional units could be used like ...

Without Software Pipelining


Cycle

.D1

.D2

ldh

ldh

.M1

.L2

.S1

.S2

add
ldh

ldh

mpy

6
7

.L1

mpy

3
4

.M2

add
ldh

ldh

In seven cycles, were almost half-way done ...

With Software Pipelining


Cycle

.D1

.D2

ldh

ldh

ldh

ldh

mpy

ldh

ldh

mpy

add

ldh

ldh

mpy

add
Completes
ininonly
Completes
only77cycles
cycles

ldh

ldh

mpy

add

mpy

add

6
7

.M1

.M2

.L1

.L2

.S1

.S2

add

It takes 1/2 the time! How does this translate to code?

S/W Pipelining Translated to Code


c1:

Cycle

.D1
ldh

.D2
ldh

ldh

ldh

mpy

ldh

ldh

mpy

c2:

add

ldh

ldh

mpy

add

ldh

ldh

mpy

add

mpy

add

6
7

add

||

LDH
LDH

||
||

MPY
LDH
LDH

||
||
||

ADD
MPY
LDH
LDH

.S1

c3:

.S2

DSK
Code Composer Studio

C6416 DSK

Diagnostic Utility included with DSK ...

C6416 DSK

Diagnostic Utility included with DSK ...

DSKs Diagnostic Utility


Test/Diagnose

DSK hardware

Verify

USB
emulation link

Use Advanced

tests to facilitate
debugging

Reset

DSK
hardware

CCS Overview ...

Code Composer Studio


SIM

Standard
Runtime
Libraries

Compiler
Asm Opto

DSK
Asm

Edit

Link

.out

Debug
EVM

DSP/BIOS
Config
Tool

DSP/BIOS
Libraries

Third
Party

DSKs Code Composer Studio Includes:


Integrated Edit / Debug GUI Simulator
Code Generation Tools
BIOS: Real-time kernel
Real-time analysis

XDS

DSP
Board

CCS is Project centric ...

Code Generation
Asm
Optimizer
Link.cmd
.sa
Editor

.asm

Asm

.obj

Linker

.c / .cpp
.map
Compiler

.out

What is a Project?
Project (.PJT) file contain:
References to files:

Source
Libraries
Linker, etc

Project settings:

Compiler Options
DSP/BIOS
Linking, etc

The project menu ...

Project Menu
Hint:
Hint:
Project Menu
Access
Create
and
open
projects
pull-down
menu
Create
andvia
open
projects
or by right-clicking
.pjt file
from
Project
menu,
frominthe
the
Project
menu,
project explorer window
not
the
not the File
File menu.
menu.

Build Options...

Next slide

Build Options
-g -q -fr"c:\modem\Debug" -mv6700

Eight Categories of
Compiler options

The most common Compiler Options are ...

Compilers Build Options

debug
options

Nearly one-hundred compiler options available to


tune your code's performance, size, etc.
Following table lists the most common options:
Options

Description

-mv6700
-mv6400
-fr <dir>
-fs <dir>
-q
-g
-s

Generate C67x code (C62x is default)


Generate 'C64x code
Directory for object/output files
Directory for assembly files
Quiet mode (display less info while compiling)
Enables src-level symbolic debugging
Interlist C statements into assembly listing

In Chapter 4 we will examine the options which


enable the compilers optimizer
And, the Config Tool ...

DSP/BIOS Configuration Tool

Simplifies
Simplifies system
system design
design by:
by:

Automatically
Automaticallyincludes
includesthe
theappropriate
appropriate
runtime
runtimesupport
supportlibraries
libraries
Automatically
Automaticallyhandles
handlesinterrupt
interrupt vectors
vectors
and
and system
systemreset
reset
Handles
Handlessystem
systemmemory
memoryconfiguration
configuration
(builds
(buildsCMD
CMDfile)
file)
Generates
Generates55files
fileswhen
whenCDB
CDBfile
fileisissaved:
saved:
C
C file,
file,Asm
Asmfile,
file,22header
headerfiles
filesand
andaa
linker
linkercommand
command(.cmd)
(.cmd) file
file
More
Moreto
tobe
bediscussed
discussedlater
later

C6000 C Data Types


Type

Size

Representation

char, signed char


unsigned char
short
unsigned short
int, signed int
unsigned int
long, signed long
unsigned long
enum
float
double
long double
pointers

8 bits
8 bits
16 bits
16 bits
32 bits
32 bits
40 bits
40 bits
32 bits
32 bits
64 bits
64 bits
32 bits

ASCII
ASCII
2s complement
binary
2s complement
binary
2s complement
binary
2s complement
IEEE 32-bit
IEEE 64-bit
IEEE 64-bit
binary

GEL Scripting

GEL:
GEL: General
GeneralExtension
Extension
Language
Language
C style syntax
C style syntax
Large number of debugger
Large number of debugger
commands
commandsas
asGEL
GELfunctions
functions
Write your own functions
Write your own functions
Create GEL menu items
Create GEL menu items

CCS Scripting

Debug using VB Script or Perl


Debug using VB Script or Perl
Using CCS Scripting, a simple script

Using CCS Scripting, a simple scriptcan:


can:
Start CCS
Start CCS
Load a file
Load a file
Read/write memory
Read/write memory
Set/clear breakpoints
Set/clear breakpoints
Run, and perform other basic testing
Run, and perform other basic testing
functions
functions

TCONF Scripting (CDB)

/* load DSK6211 platform into TCOM */


utils.getProgObjs(prog);
Tconf Script (.tcf)

hello_dsk62cfg.tcf

/* make all prog objects JavaScript global vars */


LOG_system.bufLen = 128;
/* set buffer length of LOG_system to 128 */
utils.importFile("hello");
Tconf
File (.tci)script */
/*
importInclude
portable application
prog.gen();
var trace = LOG.create("trace");

hello.tci

/* generate
create a new
user(and
log, named
trace
/*
cfg files
CDB file)
*/ */
= 32;
Yourtrace.bufLen
Application

hello.c

/* initialize<log.h>
its length to 32 (words) */
#include
extern LOG_Obj trace;

/* created in hello.tci */

int main() {
AAtextual
textualway
waytotoconfigure
configureCDB
CDBfiles
files
LOG_printf(&trace, "Hello World!\n");
Runs
return (0);
Runson
onboth
bothPC
PCand
andUnix
Unix
}
Create
Create#include
#includetype
typefiles
files(.tci)
(.tci)

More
Moreflexible
flexiblethan
thanConfig
ConfigTool
Tool

Chapter 2
TMS320C6000 Architectural Overview
- End -

You might also like