Professional Documents
Culture Documents
T TO
Technical Training Organization
Notice
Creation of derivative works unless agreed to in writing by the copyright owner is forbidden. No portion of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission from the copyright holder. Texas Instruments reserves the right to update this Guide to reflect the most current product information for the spectrum of users. If there are any differences between this Guide and a technical reference manual, references should always be made to the most current reference manual. Information contained in this publication is believed to be accurate and reliable. However, responsibility is assumed neither for its use nor any infringement of patents or rights of others that may result from its use. No license is granted by implication or otherwise under any patent or patent right of Texas Instruments or others.
Revision History
October 1999 November 1999 November 2000 March 2001 August 2001 June 2003 August 2003 Version 1.0 Version 1.1 Version 2.0 Version 2.1 Version 2.2 Version 3.0 Version 3.1 (C6211 DSK) (C6711 DSK) (CCS v2.0) (C6416/C6713 DSK and CCS v2.2)
Copyright 1999-2003 by Texas Instruments, Incorporated. All rights reserved. For more information on TI Semiconductor Products, please call the SC Product Information Center at (972) 644-5580 or email them at support@ti.com.
Along with the Welcome introduction, this course consists of four chapters as outlined below. Each chapter concludes with a lab exercise, giving you the opportunity to observe and practice the topics discussed in class.
Workshop Outline
Welcome Introduction to C6000 and Code Composer Studio (CCS) Using C6000 Peripherals eXpressDSP TIs System Solution Optimizing C6000 Code
0-1
Agenda
Chapter Topics
C6416/C6713 DSK One-Day Workshop.................................................................................................. 0-1 Agenda .................................................................................................................................................... 0-3 Please Introduce Yourself ....................................................................................................................... 0-4 TI DSP and C6x Family Positioning ..................................................................................................... 0-5 Applications / System Needs .............................................................................................................. 0-5 TI DSP Families ................................................................................................................................. 0-6 C6000 Roadmap...................................................................................................................................... 0-8 For More Information and Support ........................................................................................................ 0-9 Key C6000 Literature ............................................................................................................................0-10 For Information about Digital Signal Processing .............................................................................0-11 Textbooks on using the C6000 ......................................................................................................0-11 and finally, Workshops from TI .........................................................................................................0-12
0-2
Agenda
Agenda
Todays Agenda
8:30 - 9:00 9:00 - 11:00 11:00 - 1:00
1:00 - 2:15
2:15 - 4:00
Lab 4: Optimize Image Correlation routine Using C Optimizer, DSP Image Library, and On-chip Cache
4:00 - 4:30
As noted above, you will have lunch sometime during Chapter 2. Your facilitator will provide breaks as needed throughout the day.
0-3
Introduce Yourself
A Show of Hands...
Do you have experience with:
TI DSPs (TMS320) Another DSP Other microprocessors
Will you use C, Assembly, or both Who has used an OS or RTOS? Which C6000 DSP do you plan to use?
The two acronyms from above: OS: Operating System RTOS: Real-Time Operating System
While most engineers have used an operating system (e.g. Mac OS, Windows, Unix), many embedded system designers have never developed an application that included an operating system. Understanding the groups level of OS knowledge may assist your facilitator during the eXpressDSP chapter and demo.
0-4
System Considerations
Interfacing Size Performance Power
Ease-of Use
Cost
Integration
Memory Peripherals
These needs challenge the designer with a series of tradeoffs. For example, while performance is important in a portable MP3 player, more important would be efficiency of power dissipation and board space. On the other hand, a cellular base station might require higher performance to maximize the number of channels handled by each processor. Wouldnt it be nice if the fastest DSP consumed the lowest amount of power? While TI is working on providing this (and making it software compatible), our goal is to provide you with a broad assortment of DSP families to cover a varying set of system needs. Think of them as different shoes for different chores
0-5
TI DSP Families
TI provides a variety of DSP families to handle the tradeoffs in system requirements.
C6000
C5000
(C54x/55x/OMAP) (C20x/24x/28x) C1x C2x
C2000
C5x
Efficiency
Best MIPS per Watt / Dollar / Size Wireless phones Internet audio players Digital still cameras Modems Telephony VoIP
Lowest Cost
Control Systems Segway Motor Control Storage Digital Ctrl Systems
The TMS320C2000 (C2000) family of devices is well suited to lower cost, microcontrolleroriented solutions. They are well suited to users who need a bit more performance than todays microcontrollers are able to provide, but still need the control-oriented peripherals and low cost. The C5000 family is the model of processor efficiency. While they boast incredible performance numbers, they provide this with just as incredible low power dissipation. No wonder they are the favorites in most wireless phones, internet audio, and digital cameras (just to name a few). Rounding out the offerings, the C6000 family provides the absolute maximum performance offered in DSP. Couple this with its phenomenal C compiler and you have one fast, easy-toprogram DSP. When performance and/or time-to-market counts, the C6000 is the family to choose. It also happens to be the family this course was designed around, thus, the rest of the workshop will focus on it.
0-6
TI DSP Platforms
C2000 DSP
TM
TM
TI DSP Platforms
C5000 DSP
TM
C28x DSP
In one of 2001s Most Innovative Products
Segway: Human Transporter
Worlds most code-efficient DSP Heart of advanced embedded control applications Hard Disk Drive Servo Control Digital Motor Control in White Goods HVAC Motor Control Un-interruptible Power Supply PFC Optical Lasers Leadership integration of analog and high speed Flash memory Tens of millions shipped to thousands of customers
C55xTM DSP
EDN 2000 Finalist DSP Product of the Year 2001 Internet Telephony Best DSP Microprocessor Report
Worlds most power-efficient DSP Worlds most popular DSP ISA Hundreds of millions shipped to thousands of customers Heart of handheld solutions for the Internet era Wireless terminals and OMAP Digital Still Cameras Internet Audio players VoIP New generation C55x DSP fully code compatible
TM
TI DSP Platforms
C6000TM DSP
2001 Innovation of the Year
EDN Magazine
C64xTM DSP
Worlds highest-performance DSP Shipping at 720MHz Sampled at 1GHz Heart of solutions for new, high-bandwidth communications and video equipment Wireless basestations and transcoders DSL Home theater audio IBOC digital radio Imaging and video servers & gateways Millions shipped to hundreds of customers
0-7
C6000 Roadmap
C6000 Roadmap
The C6000 family has grown considerably over the past few years. With the addition of the 2nd generation of devices (C64x) a couple of years ago, and with the recent announcement of the upcoming 1GHz performance, the C6000 family dominates the high-end DSP market.
C6000 Roadmap
Object Code Software Compatibility
Multi-core Multi-core Floating Point Floating Point C64x DSP C64x DSP 1.1 GHz 1.1 GHz
2nd Generation
C6414 C6414 C6412 C6412 C6411 C6411
ce t es an ighform H r Pe
1st Generation
C6203 C6201 C6202 C6211
C6713 C6713
C6701
C62x: Fixed Point C62x: Fixed Point C67x: Floating Point C67x: Floating Point
TMS320C6000
Easy to Use
Best C engine to date Efficient C Compiler and Assembly Optimizer DSP & Image Libraries include hand-optimized code eXpressDSP Toolset eases system design
SuperComputer Performance
1.38 ns instruction rate: 720x8 MIPS (1GHz sampled) 2880 16-bit MMACs (5760 8-bit MMACs) at 720 MHz Pipelined instruction set (maximizes MIPS) Eight Execution Unit RISC Topology Highly orthogonal RISC 32-bit instruction set Double-precision floating-point math in hardware
Even with its growing family of devices, the ease of design with the C6000 architecture has not been abandoned. Software compatibility is addressed by the architecture, rather than by the hardwork of the programmer. With both the C67x and C64x devices being able to run C62x object code, upgrading DSP designs is much easier.
0-8
Number
+32 (0) 27 45 55 32 +33 (0) 1 30 70 11 64 +49 (0) 8161 80 33 11 1800 949 0107 (free phone) 800 79 11 37 (free phone) +31 (0) 546 87 95 45 +34 902 35 40 28 +46 (0) 8587 555 22 +44 (0) 1604 66 33 99 +358(0) 9 25 17 39 48
Literature, Sample Requests and Analog EVM Ordering Information, Technical and Design support for all Catalog TI Semiconductor products/tools Submit suggestions and errata for tools, silicon and documents
0-9
Peripherals Chip Support Lib. Ref. C67x Two-Level Internal Memory Reference C64x Two-Level Internal Memory Reference Cache Memory Users Guide
Software
SPRU198 SPRU423 SPRU403
- Programmers Guide - C6000 DSP/BIOS Users Guide - C6000 DSP/BIOS API Guide - Assembly Language Tools Users Guide - Optimizing C Compiler Users Guide
Please check the website for the latest versions of these and for additional manuals and applications notes.
0 - 10
C6x-Based Digital Signal Processing by Nasser Kehtarnavaz and Burc Simsek; ISBN 0-13-088310-7 DSP Applications Using C and the TMS320C6x DSK by Rulph Chassaing; ISBN 0471207543
0 - 11
Sign up at:
http://www.ti.com/sc/training
You can find a more complete comparison between the two workshops in the Appendix of this book.
0 - 12
Outline
Outline
C6000 Overview C6000 Parallelism CCS Overview Lab: Build and Graph a Sinewave Optional Topics
CCS Automation CPU Architecture Detail C6000 Instruction Sets Benchmarks
1-1
Chapter 1 Topics
Intro to C6000 and CCS ........................................................................................................................... 1-1 Connecting to a C6000 Device ............................................................................................................... 1-3 Looking into the C6000 Device............................................................................................................... 1-8 Looking at the C6000 CPU..................................................................................................................... 1-9 What is Digital Signal Processing (DSP)?.......................................................................................... 1-9 C6000 Core CPU Architecture ..........................................................................................................1-10 C62x vs. C67x vs. C64x ....................................................................................................................1-11 Using the CPUs Parallelism.................................................................................................................1-12 C62x Compiled Code ........................................................................................................................1-13 C67x Compiled Code ........................................................................................................................1-14 C64x Compiled Code ........................................................................................................................1-15 How many MMACs is that?.............................................................................................................1-16 How can we get such parallelism?.....................................................................................................1-17 Software Pipelining ...........................................................................................................................1-18 DSP Tools Overview ..............................................................................................................................1-20 C6000 DSKs.....................................................................................................................................1-20 Code Composer Studio (CCS)...........................................................................................................1-22 CCS Projects......................................................................................................................................1-26 DSP/BIOS Configuration Tool..........................................................................................................1-29 Lab Preparation.....................................................................................................................................1-30 C64x or C67x Exercises? ..................................................................................................................1-30 Prepare Lab Workstation.......................................................................................................................1-31 Computer Login.................................................................................................................................1-31 Connecting the C6416 DSK to your PC ............................................................................................1-31 Testing Your Connection...................................................................................................................1-31 CCS Setup .........................................................................................................................................1-32 Set CCS Customize Options...........................................................................................................1-36 LAB 1: Using Code Composer Studio....................................................................................................1-40 Sine Generation Algorithm................................................................................................................1-41 Take Home Exercises (Optional).......................................................................................................1-54 Lab1a Customize CCS....................................................................................................................1-54 Lab1b - Using GEL Scripts ...............................................................................................................1-57 Lab1c Using Printf..........................................................................................................................1-60 Lab1d Fixed vs Floating Point........................................................................................................1-61 Lab1e Explore CCS Scripting ........................................................................................................1-63 Lab Debrief........................................................................................................................................1-63 Optional Topics......................................................................................................................................1-64 Optional Topic: CCS Automation .....................................................................................................1-64 Optional Topic: CPU Architecture Details .......................................................................................1-68 Assorted C6000 Benchmarks ............................................................................................................1-78
1-2
VCP TCP
0-16+
C6000 CPU
4
/
3 / 3 3 / 3
/
PCI Host P
EDMA
Boot Loader
32
16 or 32
EMIF
EMAC
EPROM
SDRAM
Sync SRAM
Note: Not all C6000 devices have all the various peripherals shown above. Please refer to the C6000 Product Update for a device-by-device listing.
Lets quickly look at each of these connections beginning with VCP/TCP and working counterclockwise around the diagram.
1-3
Timer / Counters
Two (or three) 32-bit timer/counters Use as a Counter (counting pulses from input pin) or as a Timer (counting internal clock pulses) Can generate: Interrupts to CPU Events to DMA/EDMA Pulse or toggle-value on output pin
C6000 CPU has 12 configurable interrupts. Some of the properties that can be configured are: Interrupt source (for example: Ext Int pin, McBSP receive, HPI, etc.) Address of Interrupt Service Routine (i.e. interrupt vector) Whether to use the HWI dispatcher Interrupt nesting
HPI:
XBUS: Similar to HPI but provides but adds: 32-bit width, Master or slave modes, sync modes, and glueless I/O interface to FIFOs or memory (memory I/O can transfer up to full processor rates, i.e. single-cycle transfer rate). PCI: Standard master/slave 32-bit PCI interface (latest devices e.g. DM642 now allow 66MHz PCI communication)
1-4
DMA: EDMA:
Boot Loader
After reset but before the CPU begins running code, the Boot Loader can be configured to either: Automatically copy code and data into on-chip memory Allow a host system (via HPI, XBUS, or PCI) to read/write code and data into the C6000s internal and external memory Do nothing and let the CPU immediately begin execution from address zero
Boot mode pins allow configuration Please refer to the C6000 Peripherals Guide and each devices data sheet for the modes allowed for each specific device.
1-5
Ethernet
10/100 Ethernet interface To conserve cost, size and power Ethernet pins are muxed with PCI (you can use one or the other) Optimized TCP/IP stack available from TI (under license)
McASP
All McBSP features plus more Targeted for multi-channel audio applications such as surround sound systems Up to 8 stereo lines (16 channels) supported by 16 serial data pins configurable as transmit or receive Throughput: 192 kHz (all pins carrying stereo data simultaneously) Multi-pin IIS for audio interface Multi-pin DIT for digital interfaces Multi-pin IIS for audio interface
Transmit formats:
Utopia
For connection to ATM (async transfer mode) Utopia 2 slave interface 50 MHz wide area network connectivity Byte wide interface Available on C64x devices
1-6
PLL
On-chip PLL provides clock multiplication. The C6000 family can run at one or more times the provided input clock. This reduces cost and electrical interference (EMI). Clock modes are pin configurable. On most devices, along with the Clock Mode (configuration) pins, there are three other clock pins: CLKIN: clock input pin CLKOUT: clock output from the PLL (multiplied rate)
CLKOUT2: a reduced rate clockout. Usually or less of CLKOUT Please check the datasheet for the pins, pin names, and CKKOUT2 rates available for your device. Here are the PLL rates for a sample of C6000 device types: Device Clock Mode Pins PLL Rate
C6201 C6204 C6205 C6701 C6202 C6203 C6211 C6711 C6712 C6414 C6415 C6416 CLKMODE CLKMODE0 CLKMODE1 CLKMODE2 CLKMODE CLKMODE0 CLKMODE1 x1, x4 x1, x4, x6, x7, x8, x9, x10, x11 x1, x4 x1, x6, x12
Power Down
While not shown in the previous diagram, the C6000 supports power down modes to significantly reduce overall system power.
For more detailed information on these peripherals, refer to the C6000 Peripherals Guide.
1-7
2.9 GB/s
5760 MIPS 11.5 GB/s 11.5 GB/s L1D Cache Timer 1 Timer 2
PLL
Timer 0
From this diagram notice two things: Dual-level memory (this will be discussed further in Chapter 4): L1 (level 1) program and data caches L2 (level 2) combined program/data memory Buses as large as 64- and 256-bits allow an enormous amounts of info to be moved Multiple buses allow simultaneous movement of data in a C6000 system Both the EDMA and CPU can orchestrate moving information
Note: While we have looking into the C6414, you can extrapolate these same concepts to other C6000 device types. All device types have multiple, fast, internal buses. Most have a dual-level memory architecture, while a few have a single-level, flat memory.
1-8
DSP
DAC
Y =
i = 1
coeffi * xi
Practically, its probably all of them. To summarize, you might say DSP is the processing of digital signals numerically, usually with real-time constraints. Looking more carefully at the algorithms used in Digital Signal Processing, we almost always find it takes the form shown above. Often this form is called Sum of Products (SOP), or Multiply Accumulate (MAC). The first Digital Signal Processors (also abbreviated DSP) were derived from a standard 16-bit microprocessor. In order to meet the goals of DSP, though, they took on characteristics of RISC processing that ensured instructions executed in a single-cycle (not too common in those days). Also, they included hardware multipliers, which replaced at least 32 microcode instructions with a fast, single-cycle multiply. Further, their addressing capabilities were enhanced to allow quick and easy processing of the streaming data (usually collected into data buffers). In many ways, DSPs today like the C6000 - are not that different from their forefathers. Then again, fast clock frequencies, wider buses, more registers and memory, and an architecture designed to efficiently operate C code make them vastly superior.
1-9
.S1 .S1
.S2 .S2
Dual MACs
. . A15 . . A31
.M1 .M1
.M2 .M2
. . B15 . . B31
.L1 .L1
.L2 .L2
Controller/Decoder Controller/Decoder
Lets point out a few things: The C6000 architecture was co-developed with its C compiler. The CPU was designed for the C language from the ground-up All eight functional units can receive their own 32-bit instruction on every cycle. We might say it another way, we can execute eight instructions in parallel. The ability to control each execution unit independently is why the C6000 architecture is often likened to VLIW (very long instruction word). Both the C6000 architecture (called VelociTI velocity) and VLIW allow such granularity in how they control the individual functional units.
This is not possible with standard DSP (or GPP) architectures, where one single instruction controls all the functional units at once. For example, on other DSP processors, the only way to get all the functional units working all at the same time is to use a MAC type instruction. In all other instructions, only a subset of functional units will operate. Even if you have other things that could be done simultaneously, there is no way to tell the processor how to make this happen.
Unlike the C6000, though, VLIW architectures require and instruction code for each functional unit on every cycle. This often means that millions of NOP instructions are added to the program code. Due to the efficiency of the C6000 VelociTI architecture, these excess NOPs are not required.
For a more detailed explanation of the CPU building blocks, please refer to the CPU Architecture optional topic. Optionally, you may want to consider taking the 4-day, C6000 Optimization Workshop.
1 - 10
Instruction Fetch Instruction Dispatch Advanced Instruction Packing Instruction Decode Registers (A0 - A15) Registers (A16 - A31) L1 + + + + S1 + + + + M1 x x x x X X D1 + +
This CPU block diagram shows the C64x. This block diagram can be converted to the C62x block diagram by removing the elements in light-colored boxes: Advanced Instruction Packing Advanced Emulation Registers A16-A31 and B16-B31 Enhanced fixed-point instruction set (i.e. C64x is a super-set of the C62x instructions). The light-colored boxes added to each functional-unit demonstrate the additional packed-data instructions provided only provided by the C64x.
The C67x block diagram is the same as the C62x diagram. The primary differences between these two are: C67x has 64-bit wide data-load buses C67x functional-units include 32-bit single-precision (and 64-bit double-precision) floating-point hardware. All other instructions are exactly the same.
Interrupt Control
1 - 11
40
y =
n = 1
cn * xn
short mac(short *c, short *x, int count) { for (i=0; i < count; i++) { sum += c[i] * x[i]; } MVK loop: LDH LDH MPY ADD SUB B STW .D1 .D1 .M1 .L1 .L1 .S1 .D *cp++, c *xp++, x c, x, prod y, prod, y cnt, 1, cnt loop y, *yp .S1 40, cnt
Given our eight functional units, dont you think we could perform some of these operations in parallel?
1 - 12
|| MPYH .M1 B7,A3,A5 Given this C code LDW C *A4++,A3 Given this .D1code || [B0] B .S1 L3 || LDW .D2 *B6++,B7 || LDW .D1 *A4++,A3 The C62x compiler can achieve || LDW .D2 *B6++,B7 The C62x compiler can achieve LDW .D1 *A4++,A3 ;** Two Sum-of-Products per cycle -----------------------* Two Sum-of-Products per cycle || LDW .D2 *B6++,B7 [B0] B .S1 L3 || LDW .D1 *A4++,A3 || LDW .D2 *B6++,B7 [B0] B .S1 L3 || LDW .D1 *A4++,A3 || LDW .D2 *B6++,B7 [B0] B .S1 L3 || LDW .D1 *A4++,A3 || LDW .D2 *B6++,B7
L3: || || || || || || ||
; PIPED LOOP KERNEL ADD .L2 B4,B5,B5 ADD .L1 A5,A0,A0 MPY .M2 B7,A3,B4 MPYH .M1 B7,A3,A5 [B0]B .S1 L3 [B0]SUB .S2 B0,1,B0 LDW .D1 *A4++,A3 LDW .D2 *B6++,B7
;** -----------------------*
Notice a few things: Parallel bars || indicate that an assembly instruction is performed at the same time as the previous instruction. Hence, in our loop kernel, 8 instructions are performed in parallel. Thereby using all 8 functional units. Order of instructions is unimportant as they all get executed at once. The rest of the code shown on this slide is used to setup the loop.
Two multiplies and two adds (i.e. two MACs) are performed each cycle. This gets us our two MACs per cycle. The [BO] indicates the branch (and subtract) are performed conditionally. That is, the branch to label L3 will only occur if B0 is non-zero. All C6000 instructions (except NOP and IDLE) are conditional. This makes for fast, code-execution on hardware pipelined processors such as the C6000 devices.
Note: By the way, the C6000 toolset includes an Assembly Optimizer. This tool takes the linear assembly code (shown back one figure) and creates code similar to the compilers, highlyoptimized standard assembly code.
1 - 13
Controller/Decoder Controller/Decoder
Notice: The MPYSP and ADDSP instructions where SP stands for single-precision, 32-bit floating-point. LDDW is loading two 32-bit values into two consecutive 32-bit registers. Using two LDDWs means were getting four SP values per cycle.
Note: To allow the code to fit on a single PowerPoint slide, we had to modify it slightly. The C67x compiler actually creates a four-cycle loop that performs eight MACs. In other words, its the same rate 2 MACs in one cycle as we claim above, but the loop was just too large to fit on our slide.
1 - 14
DOTP2
m1 n1
short mac(short *c, short *x, int count) { int i, short sum = 0;
m0 n0
A5 B5 A6
for (i=0; i < count; i++) { sum += c[i] * x[i]; } ;** --------------------------------------------------* ; PIPED LOOP KERNEL LOOP: ADD .L2 B8,B6,B6 || ADD .L1 A6,A7,A7 || DOTP2 .M2X B4,A4,B8 || DOTP2 .M1X B5,A5,A6 || [ B0] B .S1 LOOP || [ B0] SUB .S2 B0,-1,B0 || LDDW .D2T2 *B7++,B5:B4 || LDDW .D1T1 *A3++,A5:A4 ;** --------------------------------------------------*
=
m1*n1 + m0*n0
+
running sum A7
Combine this with its ability to run at 720 MHz and the C64x CPU will pump out a whopping 2880 MMAC (16-bit Mega MAC's). Heck, if all you need is 8-bit MAC's, the C64x can get twice as many 5760 MMAC's. What is a MMAC and how did we get the C64x doing 2880 of them?
1 - 15
MMACs
How many 16-bit MMACs (millions of MACs per second) can the 'C6201 perform? 400 MMACs
(two .M units x 200 MHz)
How many 8-bit MMACs on the C64x? 5760 MMACs (on 8-bit data)
The C64xs ability to perform two 16-bit multiplies (or four 8-bit multiplies) in each .M unit gives it a tremendous performance advantage. While there is no single benchmark which does a good job of comparing different processor architectures, the number of MMACs probably comes the closes. Other benchmarks obfuscate the real picture (MHz, MIPs, MOPs, Dhrystone, Whetstone, FFT, etc.) and we say this even knowing that many of these make the C6000 look better than its competition. Note: The only true way to benchmark a processor is to compare your key real-time kernels written for each processor you are evaluating. It is the only way to the true performance you will achieve.
1 - 16
Software Pipelining isnt a new technique. In fact, its similar to the form of hardware pipelining found in most high-performance processors available today. What stands software pipelining apart is how the instructions can be combined to build very tight loops of code. Why dont most other processors use software pipelining? Your architecture must have the ability to dispatch a separate instruction to each functional-unit every-cycle in order to get use this programming technique. As we mentioned earlier in the chapter, this is a capability unique to the C6000 CPU (and VLIW processors). Lets briefly look at examine concept of software pipelining
1 - 17
Software Pipelining
Software pipelining enables high performance code. The best thing, the tools do all the work. Great, but what is software pipelining? Lets look at a simple example to demonstrate the concept...
How many cycles would it take to perform this loop 5 times? 5 x 3 = 15 ______________ cycles
Looking at how these instructions would operate on the C6000s eight function units:
1 2 3 4 5 6 7
ldh
ldh
ldh
ldh
Looking at the non-pipelined code above, you can see the inefficiency. Notice how the .D units are left unused when the first multiply occurs. So, in seven cycles we can see were almost half way through the expected 15 cycles. If the code was software pipelined, though
1 - 18
When software pipelining, we take advantage of the unused .D units in cycle 2 and go ahead and perform the next two loads. This allows us to pipeline the instructions resulting in a seven cycle loop less than half the original number of cycles.
1 2 3 4 5 6 7
ldh ldh ldh ldh ldh mpy mpy mpy mpy mpy add add Completes in only 7 cycles Completes in only 7 cycles add add add
Translating the software pipelining above into code, each cycle gets a set of parallel instructions.
1 2 3 4 5 6 7
.D2 ldh ldh ldh ldh ldh mpy mpy mpy mpy mpy add add add add add
c2:
.S1
|| ||
||
.S2
c3: || || ||
Since most processors only allow one instruction to control all their functional units, they cannot take advantage of software pipelining. The granularity of the C6000 architecture gives it the extra flexibility to take advantage of this optimization strategy.
1 - 19
C6416 DSK
1 - 20
C6416 DSK
The C6713 would be almost exactly the same. (We pulled this diagram from the C6416 help file. Look in the C6713 help file <CCS Help menu> to find a similar diagram for that platform.)
1 - 21
DSK Edit Asm Link Debug EVM DSP/BIOS Config Tool DSP/BIOS Libraries Third Party XDS
DSKs Code Composer Studio Includes: Integrated Edit / Debug GUI Simulator Code Generation Tools BIOS: Real-time kernel Real-time analysis
DSP Board
When TI developed Code Composer Studio, it added a number of capabilities to the environment. First of all, the code generation tools (compiler, assembler, linker) were added so that you wouldnt have to purchase them separately. Secondly, the simulator was included (only in the full version of CCS, though). Third, TI has included DSP/BIOS. DSP/BIOS is a real-time kernel consisting of three main features: a real-time, pre-emptive scheduler; real-time capture and analysis; and finally, real-time I/O. Finally, CCS has been built around an extensible software architecture which allows third-parties to build new functionality via plug-ins. See the TI website for a listing of 3rd parties already developing for CCS. At some point in the future, this capability may be extended to all users. If you have an interest, please voice your opinion by calling the TI SC Product Information Center (you can find their phone number and email address in last module, What Next?.)
1 - 22
Since its hard to evaluate a tool by looking at a simple screen capture, well provide you with plenty of hands-on-experience throughout the week.
1 - 23
1 - 24
Closer Look at the C6000 Code Generation Tools and File Extensions
Using Code Composer Studio (CCS) you may not need to know all these file extension names, but we included a basic review of them for your reference:
Code Generation
Asm Optimizer .sa Editor .c / .cpp .map Compiler Asm Link.cmd
.asm
.obj
Linker
.out
C and C++ use the standard .C and .CPP file extensions. Linear Assembly is written in a .SA file. You can either write standard assembly directly, or it can be created by the compiler and Assembly Optimizer. In all cases, standard assembly uses .ASM. Object files (.OBJ), created by the assembler, are linked together to create the DSPs executable output (.OUT) file. The map (.MAP) file is an output report of the linker. The .OUT file can be loaded into your system by the debugger portion of CCS.
If you want to use your own extensions for file names, they can be redefined with code generation tool options. Please refer to the TMS320C6000 Assembly Tools Users Guide for the appropriate options.
1 - 25
CCS Projects
Code Composer works within a project paradigm. If youve done code development with most any sophisticated IDE (Microsoft, Borland, etc.), youve no doubt run across the concept of projects. Essentially, within CCS you create a project for each executable program you wish to create. Projects store all the information required to build the executable. For example, it lists things like: the source files, the header files, the target systems memory-map, and program build options.
What is a Project?
Project (.PJT) file contain: References to files:
Source Libraries Linker, etc
Project settings:
Compiler Options DSP/BIOS Linking, etc
The project information is stored in a .PJT file which is created and maintained by CCS. To create a new project, you need to select the ProjectNew menu. This is different from Microsofts Designers Studio as they provide project new/open commands on the File menu.
Project Menu
Hint: Project Menu Hint:
Access open projects Create andvia pull-down menu Create and open projects or by from the right-clicking .pjt file from the Project menu, Project menu, in project explorer window not the File menu. not the File menu.
Build Options...
Next slide
1 - 26
Build Options
Project options direct the code generation tools (i.e. compiler, assembler, linker) to create code according to your systems needs. Do you need to logically debug your system, improve performance, and minimize code size? Your results can be dramatically affected by the project options available for the C6000 platform. To make it easier to choose build options, CCS provides a graphical user interface (GUI) for the various compiler options. Shown below is a capture of the Basic Compiler options.
Build Options
-g -q -fr"c:\modem\Debug" -mv6700
There is a one-to-one relationship between the items in the text box and the GUI check and drop-down box selections. Once you have mastered the various options, youll probably find yourself just typing in the options. By the way, the linker page looks like:
Linker Options
Options -o<filename> -m<filename> -c -x Description Output file name Map file name Auto-initialize global/static C variables Exhaustively read libs (resolve back ref's) By default linker options include the o option -q -c -m".\Debug\lab1.map" -o".\Debug\lab1.out" -x We recommend you add the m option ".\Debug\" indicates one subfolder level below the projects .pjt folder Run-time Autoinit tells compiler to initialize global/static variables before calling main()
.\Debug\lab1.out .\Debug\lab1.map
Run-time Autoinitialization
1 - 27
Compiler Options
There are probably about a 100 options available for the compiler alone. Usually, this is a bit intimidating to wade through. To that end, weve provided a condensed set of options. These few options cover about 80% of most users needs.
debug options
Generate C67x code (C62x is default) Generate 'C64x code Directory for object/output files Directory for assembly files Quiet mode (display less info while compiling) Enables src-level symbolic debugging Interlist C statements into assembly listing
In Chapter 4 we will examine the options which enable the compilers optimizer
Well add three more important options to this list in Chapter 4, when we discuss optimization.
1 - 28
The GUI (graphical user interface) simplifies system design by: Automatically including the appropriate runtime support libraries Automatically handles interrupt vectors and system reset Handles system memory configuration (builds CMD file) When a CDB file is saved, the Config Tool generates 5 additional files: Filename.cdb Filenamecfg_c.c Filenamecfg.s62 Filenamecfg.cmd Filenamecfg.h Filenamecfg.h62 Configuration Database C code created by Config Tool ASM code created by Config Tool Linker commands header file for *cfg_c.c header file for *cfg.s62
When you add a CDB file to your project, CCS automatically adds the C and assembly (S62 or S64) files to the project under the Generated Files folder. (You must manually add the CMD file, yourself.) In the System Tools chapter, we will point out a few more CDB objects. To get all the details on this tool, we recommend you attend the 4-day DSP/BIOS Workshop.
1 - 29
Lab Preparation
Lab Preparation
Before beginning Lab 1, you need to prepare your lab workstation. This involves: Hooking up your DSK Running the DSK Diagnostic Utility to verify the USB connection and DSK are working Running CCS Setup to select the proper emulation driver (DSK vs. Simulator) Starting CCS and setting a few environment properties
1 - 30
Computer Login
1. If the computer is not already logged-on, check to see if the log-on information is posted on the workstation. If not, please ask your instructor.
1 - 31
CCS Setup
While Code Composer Studio (CCS) has been installed, you will need to assure it is setup properly. CCS can be used with various TI processors such as the C6000 and C5000 families and each of these has various target-boards (simulators, EVMs, DSKs, and XDS emulators). Code Composer Studio must be properly configured using the CCS_Setup application. In this workshop, you should initially configure CCS to use either the C6713 DSK or the C6416 V1.1 DSK. Between you and your lab partner, choose one of the DSKs and the appropriate driver. In any case, the learning objectives will be the same whichever target you choose. 8. Start the CCS Setup utility using its desktop icon:
Be aware there are two CCS icons, one for setup, and the other to start the CCS application. You want the Setup CCS C6000 icon.
The setup program <cc_setup.exe> is installed to the hard drive for both the full and DSK versions of CCS, although the desktop icon and Start menu shortcut are only added when installing the full version of CCS. When installing the lab files for this workshop, for your convenience we also place an icon on the desktop. If, for some unexpected reason, this icon has been deleted, you can find and run the program from: c:\ti\cc\bin\cc_setup.exe
(where \ti\ is the directory you installed CCS)
1 - 32
9. When you open CC_Setup you should see a screen similar to this:
Note: If you dont see the Import Configuration dialog box, you should open it from the menu using File Import Once the Import Configuration dialog box is open, you can change the CC_Setup default to force this dialog to open every time you start CC_Setup. Just check the box in the bottom of the import dialog.
1 - 33
10. Clear the previous configuration. Before you select a new configuration you should delete the previous configuration. Click the Clear System Configuration button. CC_Setup will ask if you really want to do this, choose Yes to clear the configuration.
11. Select a new configuration from the list and click the Import button. If you are using the C6416 DSK in this workshop, please choose the C6416 V1.1 DSK:
64
1 - 34
67 67
If you are using the C6713 DSK in this workshop, please choose the C6713 DSK:
12. Save and Quit the Import Configuration dialog box. 13. Go ahead and start CCS upon exiting CCS Setup.
1 - 35
Here are a couple options that can help make debugging easier. Unless you want the Disassembly Window popping up every time you load a program (which annoys many folks), deselect this option. Many find it convenient to choose the Perform Go Main automatically. Whenever a program is loaded the debugger will automatically run thru the compilers initialization code to your main() function.
1 - 36
16. Set Program Load Options On the Program Load Options tab, select the two following options: Load Program After Build Clear All Breakpoints When Loading New Programs
By default, these options are not enabled, though a previous user of your computer may have already enabled them.
Conceptually, the CCS Integrated Development Environment (IDE) is made up of two parts: Edit (and Build) programs Uses editor and code gen tools to create code. Debug (and Load) programs Communicates with processor/simulator to download and run code.
The Load Program After Build option automatically loads the program (.out file) created when you build a project. If you disabled this automatic feature, you would have to manually load the program via the FileLoad Program menu. Note: You might even think of IDE as standing for Integrated Debugger Editor, since those are the two basic modes of the tool
1 - 37
17. CCS Title Bar Properties CCS allows you to choose what information you want displayed on its titlebar. Note: To reach this tab of the Customize dialog box, you may have to scroll to the right using the arrows in the upper right corner of the dialog.
We have chosen the Board Name, Current Project, and Currently loaded program. The first item allows you to quickly confirm the chosen target (simulator, DSK, etc.). The other two let us quickly determine which project is active and what program we have loaded. Notice how these correlate to the two parts of CCS: Edit and Debug. For our convenience we have also enabled the remaining two features on this dialog page.
1 - 38
Now youre done with the Workstation Setup, please continue with the Lab 1 exercise
1 - 39
These take-home (optional) exercises are provided, as well, for those of you who finish the lab early. If you do not get the chance to complete them during the assigned lab time, please try them at home. Lab1a Customize Your Workspace Lab1b Using GEL Lab1c Try adding a printf() statement Lab1d Fixed vs Floating Point Lab1e Explore CCS Scripting
1 - 40
float y[3] = {0, 0. 0654031, 0}; float A = 1. 9957178; short sineGen() { y[0] = y[1] * A - y[2]; y[2] = y[1]; y[1] = y[0]; return((short)(32000*y[0]); }
There are many ways to create sine values, we have chosen this simple model based upon a monostable IIR filter.
1 - 41
block_sine.c
// // // // // // // // ======== block_sine.c ================================= The coefficient A and the three initial values generate a 500Hz tone (sine wave) when running at a sample rate of 48KHz. Even though the calculations are done in floating point, this function returns a short value since this is what's needed by a 16-bit codec (DAC).
// ======== Prototypes =================================== void blockSine(short *buf, int len); short sineGen(void); // ======== Definitions ================================== // Initial values #define Y1 0.0654031 // = sin((f_tone/f_samp) * 360) // = sin((500Hz / 48KHz) * 360) // = sin (3.75) #define AA 1.9957178 // = 2 * cos(3.75) // ======== Globals ===================================== static float y[3] = {0,Y1,0}; static float A = AA; // ======== sineGen ====================================== // Generate a single element of sine data short sineGen(void) { y[0] = y[1] * A - y[2]; y[2] = y[1]; y[1] = y[0]; // To scale full 16-bit range we would multiply y[0] // by 32768 using a number slightly less than this // (such as 32000) helps to prevent overflow. y[0] *= 32000; // We recast the result to a short value upon returning it // since the D/A converter is programmed to accept 16-bit // signed values. return((short)y[0]); } // ======== blockSine ======== // Generate a block of sine data using sineGen void blockSine(short *buf, int len) { int i = 0; for (i = 0;i < len; i++) { buf[i] = sineGen(); } }
1 - 42
Lab1.c
// Include files #include <c6x.h> #include "lab1cfg.h" // Declarations #define BUFFSIZE 128 // C6000 compiler definitions
// Global Variables static short gBuffer[BUFFSIZE]; // ======== main ======== // Simple function which calls blockSine void main() { blockSine(gBuffer, BUFFSIZE); // Fill buffer with sine data return; }
1 - 43
Lab 1 Procedure
Create the Lab1 project
1. Create a new project. Create a new project C:\c60001day\labs\lab1\LAB1.PJT by choosing: Project New It should look like:
67
If using the C6713 DSK, the target should read, TMS320C67XX 2. Verify that the new project was created correctly. Verify the newly created project is open in CCS by clicking on the + sign next to the Projects folder in the Project View window. Click again on the + sign next to lab1.pjt. If you dont see the new project, notify your instructor. 3. Create a new CDB file. As mentioned during the discussion, configuration database files (*.CDB) control a range of CCS capabilities. In this lab, the CDB file will automatically create the reset vector and specify the memory to the linker. Create a new CDB file (DSP/BIOS Configuration) as shown:
1 - 44
When the dialog box appears, select the dsk6416.cdb (or dsk6713.cdb) template and click OK.
67
If using the C6713 DSK, choose the dsk6713.cdb file
Hint:
In some TI classrooms you may see two or more tabs of CDB templates; e.g. TMS62xx, TMS54xx, etc. If you experience this, just choose the C6x tab.
4. Save your CDB file. File Save As C:\c60001day\labs\lab1\Lab1.CDB Then, close the CDB Config Tool. 5. Add files to your project. You can add files to a project in one of three ways:
Using one of these methods, add the following files from C:\c60001day\labs\lab1 to your project:
LAB1.C LAB1.CDB LAB1cfg.CMD block_sine.c
1 - 45
6. Verify your files were added to the project. The project should look similar to:
Choose one of the above methods and build your program. The Build Output window appears in the lower part of the CCS window. Note the build progress information. If you dont see 0 Errors, 0 Warnings, 0 Remarks, please ask your instructor for help.
1 - 46
9. Verify program is automatically loaded. Since you enabled the Program Load after Build option (step 16, pg. 1-37), CCS should download the program lab1.out once it builds without errors
The yellow arrow indicates the position of the program counter. Once the program is loaded, it should be pointed to the beginning of main(). Why? Setting the Perform Go Main Automatically option (step 15, pg 1-36) causes CCS to run to main after being loaded. If we didnt enable this option, you could do it manually using the Debug Go Main menu option.
Hint: While main( ) is the beginning of our code, there are many initialization steps that occur between reset and your main program. These issues are discussed in the various user guides and the 4-day workshops. Sorry, we dont have time for this detail today.
1 - 47
Watch Variables
10. Add gBuffer to the Watch window. Select and highlight the variable gBuffer in the lab1.c window. Right-click on gBuffer and choose Add to Watch Window. Note: The value shown for gBuffer will most likely differ from that shown below.
Adding a variable to the Watch window opens it automatically. Alternatively, you could have opened the watch window, selected gBuffer, and drag-n-dropped it onto the Watch 1 window. Click on the + sign next to gBuffer to see the individual elements of the array. Note: At some point, if the Watch window shows an error unknown identifier for a variable, dont worry, it's probably due to the variables scope. Local variables do not exist (and thus, dont have a value) until their function is called. If requested, Code Composer will add local variables to the Watch window, but will indicate they arent valid until the appropriate function is reached.
Click OK and resize the window so that you can see your code and the buffer. Because we have just come out of reset and this memory area was not initialized, you should see random values.
1 - 48
12. Record the address of the gBuffer array. There are many ways to find this address. Two of them are: The address shown for the +gBuffer value in the Watch Window; or The address associated with gBuffer in the Memory View window
13. Initialize the gBuffer array to zero. While not necessarily required since gBuffer will be overwritten by our code, lets go ahead and initialize it anyway. Select: Edit Memory Fill and fill in the following: Address Length = = gBuffer 64 0
Fill Pattern =
Click OK. The buffer was 128 16-bit values in length (they were defined as shorts in the C file). The fill memory function fills integer, or 32-bit values. Therefore, we only need to fill sixty-four 32-bit locations in order to zero out the 128x16 array.
Single-Stepping Code
14. Click on the Watch Locals tab of the Watch window. 15. Single-Step through your code. Single-step the debugger until you reach the blockSine() function; it contains local variables. Use toolbar -orDebug menu
Once you have single-stepped to the for loop, youll notice that Watch Locals will look similar to.
1 - 49
If you cannot find it, it can be opened from the View menu: View Debug Toolbars Multiple Operations 17. Set the Multiple Operations values as shown in the proceeding step and execute.
Source Step Into 8 Execute
Setting Breakpoints
While single-stepping is quite useful, it can take a long time to get to the end of your program. A faster way to accomplish this is to set a breakpoint (a marker which tells the processor to stop) and use the RUN command. 18. Set a break point. Set a break point on the return; command in main( ). Breakpoints can be set in 3 different ways. Choose the one you like best and set the breakpoint: Place the cursor on the end brace of main() and click on the: Right-click on the line with the end brace and choose Toggle Breakpoint Double-click in the grey area next to the end brace (as shown below):
1 - 50
Running Code
19. Run your code. Run the code up to the breakpoint. There are 3 different ways to cause CCS to run your code: Use toolbar icon: Select: Debug Run Press F5
The processor will halt at the breakpoint that youve set. Notice that the watch window changes to show the new values of gBuffer[]. You may have to click on the + sign next to buffer to see the values. Code Composer allows you to collapse and expand aggregate data types (structures, arrays, etc.).
Hint:
1 - 51
Graphing Data
22. Graph your sine data. The watch window is a great way to view data in CCS. But, can you tell if this is really a sine wave? Wouldnt it be better to see this data graphed? Well, CCS allows us to do this. Select: View Graph Time/Frequency Modify the following values: Graph Title Start Address Acquisition Buffer Size Display Data Size DSP Data Type Sampling Rate gBuffer gBuffer 128 128 16-bit signed integer 49152
1 - 52
23. Other graphing features CCS supports many different graphing features: time frequency, FFT magnitude, dual-time, constellation, etc. The sine wave that we generated was a 500Hz wave sampled at 48KHz. Lets use the FFT magnitude plot to see the fundamental frequency of the sine wave. Right click on the graphical display of gBuffer and select Properties. Change the display type to FFT Magnitude and click OK. You can now see the frequency spectrum of the wave. 24. Save your workspace again. This will also save your graph window to the workspace.
End of Lab1
We highly recommend trying the first couple optional exercises, if time is still available. Before going on, though, please let your instructor know when you have reached this point.
1 - 53
1 - 54
c:\c60001day\labs
1 - 55
1 - 56
GEL Scripting
T TO
Technical Training Organization
GEL: General Extension GEL: General Extension Language Language C style syntax C style syntax Large number of debugger Large number of debugger commands as GEL functions commands as GEL functions Write your own functions Write your own functions Create GEL menu items Create GEL menu items
1 - 57
4. Create a new menu item In the new gel file, lets create a new menu item (that will appear in CCS menu GEL) called My GEL Functions. Type the following into the file:
menuitem My GEL Functions;
You can access all of the pre-defined GEL commands by accessing: Help Contents Select the Index tab and type the word GEL. 5. Create a submenu item to clear our arrays The menuitem command that we used in the previous step will place the title My GEL Functions under the GEL menu in CCS. When you select this menu item, we want to be able to select different operations. Submenu items are created with the hotmenu command. Enter the following into your GEL file to create a submenu item to clear the memory array:
(Dont forget the semicolon as with C, its important!) hotmenu ClearArray() { GEL_MemoryFill(gBuffer, 0, }
64, 0x0);
The MemoryFill command requires the following info: Address Type of memory (data memory = 0) Length (# of 32-bit values) Memory fill pattern. This example will fill our array (gBuffer) with zeros. For more info on GEL and GEL_ commands, please refer to the CCS help file. 6. Add a second menu item to fill the array In this example, we want to ask the user to enter a value to write to each location in memory. Rather than using the hotmenu command, the dialog command allows us to query the user. Enter the following:
dialog FillArrays(fillVal Fill Array with:) { GEL_MemoryFill(gBuffer, 0, 64, fillVal); }
7. Save then Load your new GEL file To use a GEL file, it must be loaded into CCS. When loaded, it shows up in the CCS Explorer window in the GEL folder. File Save File Load GEL and select your GEL file
1 - 58
8. Show gBuffer array in Memory window Without looking at the arrays, it will be hard to see the effect of our scripts. Lets open a Memory window to view gBuffer. View Memory
Title: Address: Q-Value: Format: gBuffer gBuffer 0 16-bit hex TI style
A couple notes about memory windows: C Style adds 0x in front of the number, TI Style doesnt. Select the Format based on the data type your are interested in viewing. This will make it easier to see your data.
9. Now, try the GEL functions. GEL My GEL Functions ClearArray GEL My GEL Functions FillArray You can actually use this GEL script throughout the rest of the workshop. It is a very handy tool. Feel free to add or delete commands from your new GEL file as you do the labs. 10. Review loaded GEL files. Within the CCS Explorer window (on the left), locate and expand the GEL files folder. CCS lists all loaded GEL files here.
Hint: If you modify a loaded GEL file, before you can use the modifications you must reload it. The easiest way to reload a GEL file: (1) Right-click the GEL file in the CCS Project Explorer window (2) Pick Reload from the right-click popup menu
1 - 59
1.
#include <stdio.h> short func1(short *m, short count); short a[4] = {40,39,38,37}; int y = 0; main() { y = function(); printf("y = %x hex\n", y); }
2.
1. Open lab1.pjt project, if it is not still open. 2. Open the lab1.c file by double-clicking on it in the Project Explorer window. 3. To use printf(): First you must remember to include the header file as in step #1 in the above graphic. Next, you must add the printf() command to your c file. For example, try adding a simple printf() to main.
void main() { blockSine(gBuffer, BUFFSIZE); // Fill buffer with sine data printf("gBuffer (at location 0x%x) was filled with sine values\n", gBuffer); return; }
4. Build and load the .OUT file. When you build and load this program, the Build/Messages window will add a third tab called Stdout which will contain the output from printf(). 5. Verify that it works. This can be done by viewing the printed statement in the output window, Stdout tab of the Output window. 6. Close the project.
1 - 60
You will find LAB1d.PJT already built in the LAB1d folder: C:\c60001day\labs\lab1d\ Try running the project and comparing all three results in three different graphs. To simplify setting up the graph windows, try using the provided workspace LAB1d.wks.
1 - 61
1 - 62
Lab Debrief
Lab 1 Debrief
1. 2. 3. 4. 5.
What differences are there in Lab1 between the C6713 and C6416 solutions? What do we need CCS Setup for? Why did we return from main? What did you have to add to LAB1.C to get printf to work? Did you find the clearArrays GEL menu command useful?
1 - 63
Optional Topics
Optional Topics
Optional Topic: CCS Automation
As evidenced by the optional lab exercise, CCS ships provides scripting/automation tools. They are mentioned here to make you aware of their presence. To explore them further, please examine the online documentation.
Command Window
For those of you ol timers, who remember the old command line debugging tools, you can use the same commands youve used for years.
1 - 64
Optional Topics
GEL Scripting
GEL Scripting
GEL: General Extension GEL: General Extension Language Language C style syntax C style syntax Large number of debugger Large number of debugger commands as GEL functions commands as GEL functions Write your own functions Write your own functions Create GEL menu items Create GEL menu items
Notice the GEL folder in the Project View window. You can load/unload GEL scripts by rightclicking this window. GEL syntax is very C-like. Notice that QuickTest() calls LED_cycle(), defined earlier in the file. (This happens to be a C6711 DSK GEL script.) You can add items to the GEL menu. An example is shown in the above graphic. Finally, a GEL file can be loaded upon starting CCS. The startup GEL script is specified using the CCS Setup application.
1 - 65
Optional Topics
CCS Scripting
CCS Scripting is a CCS plug-in. After installing CCS on your PC, you should use the Update Advisor feature (available from the Help menu) to download and add the CCS Scripting plug-in.
Hint: You may find other useful tools, application notes, and plug-ins available via the CCS Update Advisor.
CCS scripting provides a method of controlling the CCS debugger from another scripting language. Any Microsoft COM (i.e. OLE) compliant language should be able to use the CCS Scripting library, but VB Script and Perl are the two languages for which examples are provided. The graphic below is an example of a VB Script using CCS Scripting:
CCS Scripting
Debug using VB Script or Perl Debug using VB Script or Perl Using CCS Scripting, aasimple script can: Using CCS Scripting, simple script can: Start CCS Start CCS Load aafile Load file Read/write memory Read/write memory Set/clear breakpoints Set/clear breakpoints Run, and perform other basic testing Run, and perform other basic testing functions functions
Among other things, CCS Scripting is very useful for testing purposes. For example, if you have a number of test vectors you would like to run against your system, you can use CCS Scripting to automate this process. Your script could then: Build Run Capture data, memory values, benchmarks And compare the results against what you expect (or hope) Over and over again At this time, the CCS Scripting Plug-in (v1.2) only ships with C5000 based examples. For your convenience, we have written and included some C6000 based examples along with the workshop lab files.
1 - 66
Optional Topics
utils.loadPlatform("dsk6211"); /* load DSK6211 platform into TCOM */ utils.getProgObjs(prog); /* make all prog objects JavaScript global vars */ LOG_system.bufLen = 128; /* set buffer length of LOG_system to 128 */ utils.importFile("hello"); /* import portable application script */ prog.gen(); /* generate cfg files (and CDB file) */
Tconf Include File (.tci)
hello.tci
/* create a new user log, named trace */ /* initialize its length to 32 (words) */
hello.c
int main() { A textual way to configure CDB files A textual way to configure CDB files LOG_printf(&trace, "Hello World!\n");on both PC and Unix Runs on both PC and Unix Runs return (0); Create #include type files (.tci) Create #include type files (.tci) } More flexible than Config Tool
Some users find writing code preferable to using the Graphical User Interface (GUI) of the Configuration Tool. This is especially true for users who build their code in the Unix environment, as there is no Unix version of the GUI.
1 - 67
Optional Topics
DSP
DAC
Y =
i = 1
coeffi * xi
1 - 68
Optional Topics
CPU Architecture
What is the core part of DSP algorithms? In layman's terms, you might say its the Sum of Products (SOP) or Multiply-Accumulate (MAC).
Mult .M .M Mult
y =
n = 1
cn * xn
c, x, prod y, prod, y
The C6000
Designed to handle DSPs math-intensive calculations ALU .L ALU .L
MPY ADD
.M .L
The C6000 CPU has a separate Multiply (.M) unit, along with an arithmetic logic unit (.L). The variables operated upon by the CPU are stored in a register file. Register file A holds 16 or 32 registers, depending upon which C6000 CPU you are using.
.M .M .L .L
y =
n = 1
cn * xn
c, x, prod y, prod, y
prod y
MPY ADD
.M .L
. . .
32-bits
The heart of the Sum of Products routine is easily handled by these two units as shown above
1 - 69
Optional Topics
with the Multiply (MPY) and Add instructions. To make this into a real Sum of Products, though, we need to put them into a loop.
Making Loops
1. Program flow: the branch instruction
B loop
.S .S .M .M .L .L
loop:
y =
n = 1
cn * xn
40, cnt c, x, prod y, prod, y cnt, 1, cnt loop
16 or 32 registers
.S .M .L .L .S
. . .
32-bits
If you (or the compiler) were coding for the C64x, you could optimize the code using the Branch with Decrement (BDEC) instruction. When using a standard branch (B), though, how can we tell our loop counter has reached zero and that we can stop branching and move on?
1 - 70
Optional Topics
[condition]
loop
.S .S .M .M .L .L
[cnt] loop:
y =
n = 1
cn * xn
40, cnt c, x, prod y, prod, y cnt, 1, cnt loop
16 or 32 registers
.S .M .L .L .S
. . .
32-bits
A great thing about the C6000 is that all instructions allow for [conditional] execution. While this may not sound that cool at first, it can make a tremendous difference in how efficient you can code a hardware pipelined processor.
1 - 71
Optional Topics
Since a register can only hold a single value at a time, we have to load our variable registers each time through the loop. The C6000 has a forth functional unit to manage data loads and stores (.D). We use the pointer concept in assembly code, just as you might in C code. The pointer indicates where the data array exists in memory; that is, where we load the data from.
.S .S .M .M .L .L .D .D
loop:
y =
n = 1
cn * xn
40, cnt *cp *xp ,c ,x
16 or 32 registers
.S .D .D .M .L .L
[cnt] B .S loop Note: No restrictions on which regs Note: No restrictions on which regs can be used for address or data! can be used for address or data!
Loads can be performed in many different widths, depending upon your chosen data type.
Instr. Description C Type Size Instr. Description C Type Size LDB load byte char 8-bits LDB load byte char 8-bits LDH load half-word short 16-bits 40 LDH load half-word short 16-bits Register File A LDW load word int c LDW int y =32-bitsn * xn 32-bits cloadword n = 1 .S LDDW* xloaddouble-word double load double-word double 64-bits .S LDDW* 64-bits MVK .S 40, cnt cnt **Only available on the C64x and C67x Only available on the C64x and C67x loop: prod .M .M LDH .D *cp , c y LDH .D *xp , x *cp MPY .M c, x, prod .L .L *xp ADD .L y, prod, y *yp .D .D
Data Memory: x(40), a(40), y SUB .L [cnt] B .S loop
16 or 32 registers
cnt, 1, cnt
1 - 72
Optional Topics
Since we are loading data from arrays in memory, how can we increment the pointers each time through the loop? Again, we use the same increment (++) syntax used by the C language. In this case, the ++ comes after the pointer to indicate we are incrementing the address value contained in the pointer (after using the current value).
Auto-Increment of Pointers
Register File A c x cnt prod y *cp *xp *yp
40
.S .S .M .M .L .L .D .D
loop:
y =
n = 1
cn * xn
40, cnt *cp++, c *xp++, x c, x, prod y, prod, y cnt, 1, cnt loop
16 or 32 registers
.S .D .D .M .L .L .S
Finally, we use a third pointer to store the final result back into our resultant variable.
.S .S .M .M .L .L .D .D
loop:
y =
n = 1
cn * xn
40, cnt *cp++, c *xp++, x c, x, prod y, prod, y cnt, 1, cnt loop y, *yp
16 or 32 registers
.S .D .D .M .L .L .S .D
1 - 73
Optional Topics
So far, weve only told you a half-truth. In reality, the C6000 has eight functional units, rather than four. Also, there are two register sets of 16 or 32 registers each.
Register File B .S1 .S1 .M1 .M1 .L1 .L1 .D1 .D1 .S2 .S2 .M2 .M2 .L2 .L2 .D2 .D2 . .
32-bits
B0 B1 B2 B3 B4 B5 B6 B7 . . B15 B31
or
32-bits
As you will see later, having both sets of functional units can dramatically improve our processor's performance.
1 - 74
Optional Topics
In the assembly coding weve examined thus far, we have used symbols (i.e. labels) to specify registers. This is the preferred method of coding when using Linear Assembly code as described in this module. You also have the option to specify specific registers and/or functional units if you wish to provide constraints to the Assembly Optimizer.
y =
n = 1
cn * xn
40, A2 *A5++, A0 *A6++, A1 A0, A1, A3 A4, A3, A4 A2, 1, A2 loop A4, *A7
.S1 .S1
loop:
32-bits
Its easier to use symbols rather than register names, but you can use either method.
1 - 75
Optional Topics
External Memory
Internal Buses
Reggister Set B
Register Set A
.S .S .L .L .D .D .M .M
.M Unit .D Unit
ADD NEG ADDAB (B/H/W) STB (B/H/W) LDB (B/H/W) SUB SUBAB (B/H/W) MV ZERO MPY MPYH MPYLH MPYHL NOP SMPY SMPYH
No Unit Used
IDLE
1 - 76
Optional Topics
The C67x adds a whole set of floating-point instructions to the C62x capabilities:
.S .S .L .L .D .D .M .M
.M Unit
MPY MPYH MPYLH MPYHL NOP SMPY SMPYH
.D Unit
ADD NEG ADDAB (B/H/W) STB (B/H/W) LDB (B/H/W) SUB LDDW SUBAB (B/H/W) MV ZERO
No Unit Required
.L .L
Dual/Quad Arith ABS2 ADD2 ADD4 MAX MIN SUB2 SUB4 SUBABS4 Bitwise Logical ANDN Shift & Merge SHLMB SHRMB
Data Pack/Un PACK2 PACKH2 PACKLH2 PACKHL2 PACKH4 PACKL4 UNPKHU4 UNPKLU4 SWAP2/4
.D .D
.M .M
Average AVG2 AVG4 Shifts ROTL SSHVL SSHVR
Multiplies MPYHI MPYLI MPYHIR MPYLIR Load Constant MPY2 MVK (5-bit) SMPY2 Bit Operations DOTP2 DOTPN2 BITC4 DOTPRSU2 BITR DOTPNRSU2 DEAL DOTPU4 SHFL DOTPSU4 Move GMPY4 MVD XPND2/4
1 - 77
Optional Topics
Block Mean Square Error For motion MSE of a 20 column compensation image matrix of image data Codebook Search Vector Max 40 element input vector All-zero FIR Filter 40 samples, 10 coefficients Minimum Error Search Table Size = 2304 IIR Filter 16 coefficients IIR cascaded biquads 10 Cascaded biquads (Direct Form II) MAC Two 40 sample vectors Vector Sum Two 44 sample vectors CELP based voice coders Search Algorithms VSELP based voice coders Search Algorithms Filter Filter VSELP based voice coders
87% 100% 100% 85% 90% 100% 93% 100% 100% 100%
Mean Sq. Error MSE Computation MSE between Completelyin Vector C code (non 0.93 two 256 279 274 Completely natural C code (non C6000 specific) 0.91 natural C6000 specific) element vectors Quantizer Code available at: http://www.ti.com/sc/c6000compiler
TI C62x Compiler Performance Release 4.0: Execution Time in s @ 300 MHz Versus hand-coded assembly based on cycle count
1 - 78
Optional Topics
The following sample of benchmarks shows the performance for both the C62x and C64x. While the C62x is no slouch in performance, the C64x is just that much better. At 720MHz today, with 1GHz speeds already demonstrated, the C64x is the family to use for extreme performance.
Cycle Count
C62x C64x
Performance
Cycle Improvement C64:C62 720MHz C64x vs 300MHz C62x
1680 38.25
cycles/packet cycles/output
cycles/output/filter tap
9.0 0.953
0.126
cycles/pixel
Includes traceback
1 - 79
Optional Topics
4. What did you have to add to LAB1.C to get printf to work? Reference to the standard C I/O header file <stdio.h> The printf() statement itself. We hope so!
1 - 80
Using Peripherals
Introduction
A big part of any design is getting data in and out of the processor. Configuring and using peripherals has often been one of the most tedious chores. To this end, TI has created a library of functions, data types, and macros called the Chip Support Library (CSL). This library can replace much of what you might have otherwise needed to write on your own. As the name implies, the Chip Support Library handles the peripheral resources found on-chip. For TI produced development boards (like the C6416 and C6713 DSKs), a Board Support Library (BSL) is also provided. Similar to the CSL, it provides code for using the peripheral resources contained on the board, but outside of the DSP chip. In the next chapter, we briefly discuss how the CSL and BSL can be used to build an encapsulated device driver. In this chapter well use these software libraries directly, in order to output the sine wave we created in previous lab exercise.
sineGen
CPU HWI
DSK6416_AIC23_write()
McBSP
AIC
transmit interrupt
Chapter Outline
Outline
Audio Output (McBSP, Codec)
McBSP is connected to the Codec McBSP a closer look
Using CSL & BSL Hardware Interrupts (HWI) Lab 2 Output a Sinewave Tone
2-1
2-2
Chapter 2 Topics
Using Peripherals ...................................................................................................................................... 2-1 Audio Output McBSP and the AIC23 Codec........................................................................................ 2-4 C6416 DSK McBSPCodec Interface........................................................................................... 2-4 C6713 DSK McBSPCodec Interface........................................................................................... 2-5 McBSP Block Diagram ...................................................................................................................... 2-6 Programming Peripherals with CSL and BSL ........................................................................................ 2-8 What is CSL and BSL?....................................................................................................................... 2-8 Generic Procedure for CSL and BSL ................................................................................................2-10 CSL and BSL Documentation ...........................................................................................................2-13 Hardware Interrupts (HWI) ...................................................................................................................2-14 Enabling Interrupts ............................................................................................................................2-15 Lab 2 ......................................................................................................................................................2-16 The Paperwork...................................................................................................................................2-17 Lab2 Procedure..................................................................................................................................2-21 Lab2a (optional) ....................................................................................................................................2-27 Lab 2 Debrief .........................................................................................................................................2-28
2-3
CPU
McBSP
AIC23
The DSK uses two McBSPs to talk with the AIC23 codec One for control, Another for data
McBSP2
Data
McBSP1 connected to program AIC23s control registers McBSP2 is used to transfer data to A/D and D/A converters Programmable frequency: 8K, 16K, 24K, 32K, 44.1K, 48K, 96K 24-bit converter, Digital transfer widths: 16-bits, 20-bits, 24-bits, 32-bits
2-4
McBSP1
Data
McBSP0 connected to program AIC23s control registers McBSP1 is used to transfer data to A/D and D/A converters Programmable frequency: 8K, 16K, 24K, 32K, 44.1K, 48K, 96K 24-bit converter, Digital transfer widths: 16-bits, 20-bits, 24-bits, 32-bits
Notice that the two DSKs use different McBSPs to communicate with the codec. Other than this, the two boards audio output works in exactly the same way.
2-5
D R R D X R
Expand (optional)
R B R 32
RSR
DR
Compress (optional)
XSR
DX
DMA
Additional Background graphics for McBSP The first slide shows what hardware event causes the Receive and Transmit interrupts.
McBSP Interrupts
DRR RBR
RRDY=1 Ready to Read RRDY & XRDY in McBSP control register displays the status of read and transmit ports: 0: not ready 1: ready to read/write
DXR
XSR
XRDY=1 Ready to Write
CPU
RINT XINT
In Lab 2: XRDY generates McBSP transmit interrupt (XINT2) to CPU when DXR is emptied (and ready for a new value) In Lab 3 (IOM Device Driver): XRDY generates transmit event to EDMA when the DXR is ready for a new value
DMA
REVT XEVT
2-6
The following two slides provide a basic description of the McBSPs synchronous, serial data transfer.
Word
Bit - one data bit per SP clock period
Serial Port
SP Ctrl (SPCR) Rcv Ctrl (RCR) Xmt Ctrl (XCR) Rate (SRGR) Pin Ctrl (PCR)
Word or channel contains #bits specified by WDLEN1 (8, 12, 16, 20, 24, 32)
14 8 7 5
RFRLEN1 RWDLEN1
14 8 7 5
XFRLEN1 XWDLEN1
Frame
Frame - contains one or multiple words FRLEN1 specifies #words per frame (1-128)
Serial Port
SP Ctrl (SPCR) Rcv Ctrl (RCR) Xmt Ctrl (XCR) Rate (SRGR) Pin Ctrl (PCR)
14
RFRLEN1
14 8
XFRLEN1
2-7
TI DSP
1. 2.
2-8
CSL Benefits
Why CSL and BSL? Here are a few reasons:
Increased Portability
Supports all C6000 DSPs. When changing from one device to another, no (or little) re-coding is required for peripherals. Where possible, TI has used the same APIs for both the C5000 and C6000 DSP families. This makes porting C code between processors much easier. Taking into account the cross-platform support of DSP/BIOS makes TIs software tools quite powerful. The goal is to provide compatibility at the _open(), _close(), _config() level. The initialization data structures may be different, but we have striven to make the functions as compatible as possible
Easier to use
When TIs DSP 3rd parties and customers use the same CSL/BSL functions, it becomes easier to use and understand code written by others.
Additionally, suggestions and recommendations come from a large base of knowledgeable users.
2-9
Timer Example:
2. TIMER_Handle myHandle; TIMER_CONFIG myConfig = {control, period, counter}; 3. myHandle = TIMER_open(TIMER_DEVANY, ...); 4. TIMER_config(myHandle, &myConfig); 5. TIMER_start (myHandle);
To some this syntax will appear quite familiar. To those of use who spent most of our careers writing assembly language, though, this may be a new method of programming. These libraries provide two levels of support: A set of macros, functions, and data structures to ease symbolic programming of the resource (module). Basic resource management.
Lets see how the five parts of the timer example shown above correlate to these two ideas. 1. Include the appropriate header files.
As most of you already know, whenever you use a library, there are usually one or more header files you have to include. In the case of CSL, you must first include the general CSL.H file, and then the header file for each module of functions.
2. The first two lines of the example define the required data structures. The data type called TIMER_Handle is defined in CSL. Essentially, it is used to point to one of the timers (as we will see later). Not all CSL modules require the use of a handle (i.e. pointer), only those peripherals where there is more than one resource. For example, timers, DMA, EDMA, McBSP, etc. The handle is used to specify which one of the timers you are working with. The second line defines a data structure. The variable name is myConfig and its data type is TIMER_Config. Again, this data type is defined in the CSL. This variable represents a C data
2 - 10
structure that will be used to define a timer configuration. In other words, all the values you would need to program the timer peripheral are stored in this structure. Note: The myHandle and myConfig names are arbitrary. We could have called them julie and frank. The choice is yours.
3. The third line of the example contains code that opens the peripheral. In this case, open means two things: The CSL code checks to see if the specified resource has already been opened. In other words, is the resource available? In this example, we have requested TIMER_DEVANY, which means we are asking for any available timer. If it is available, the timer resource is marked as being used and a pointer to the specific timer is returned as a TIMER_Handle. If the specific resource has already been opened, then the function returns INV for invalid. Your code could check if the INV error code has been returned. (We didnt do that in our example since there wasnt much room on the slide.
Where does the CSL keep track of opened resources? The CSL maintains a series of data elements (you could think of them as flags) to keep track of this information. Could you have done this? Yes, you probably could; but isnt it nice to have this code already written for you? Even further, if you later decide to use this code on another C6000 processor, you wont have to find and change all the resource management code. You only need to indicate to the CSL that you have switched CHIPs and the rest is done for you automatically. 4. The fourth line of the example configures the peripheral. In this case, the timer specified by myHandle is configured with the myConfig data structure. The actual CSL code copies each of the values in the myConfig data structure to the appropriate memory-mapped peripheral registers. (How many times have I had to write this kind of code in assembly. Id be pouring over the reference guide trying to type in all the bits and memory addresses without making a typo mistake which, of course, happened too often.) 5. Finally, the last line of code is an example of how to use the peripheral. There are many functions that allow you to easily use the peripheral. In the case of the timer, we can: start, stop, pause, etc. With the McBSP serial port, you could: read, write, reset, etc. Even if you have never written code along these lines before, you will find it quickly becomes secondnature. And if ease-of-use wasnt enough reason to use CSL, the reliability, portability, and optimization features of CSL will make you never want to go back to the old ways.
2 - 11
2 - 12
BSL Documentation
DSK Board Support Library C6416DSK.HLP/C6713DSK.HLP BSL Help file Review the header source files (*.h)
e.g. C:\ti\c6000\dsk6416\include\dsk6416_<mod>.h C:\ti\c6000\dsk6416\include\dsk6416_aic23.h
2 - 13
When an interrupt occurs (step 1), the corresponding bit is set in the Interrupt Flag Register (step 2). If the interrupt is enabled as shown in the next figure, the CPU will automatically acknowledge and respond to the interrupt (step 3 above). Finally, the process reaches the Interrupt Service Routine (ISR), which you have to write. The ISR can be written in assembly or C, though nowadays most programmers choose to write their routines in C. There are a few methods of handling the context save and restore within your ISR; in this workshop we will show you the easiest, most robust method: DSP/BIOS Hardware Interrupt Dispatcher
2 - 14
Enabling Interrupts
Receiving Interrupts
IFR
Interrupt Flag
(ext int pin 4)
IER
Individual Enable
GIE
Master Enable
(McBSP1 xmit)
1
0
C6000 CPU
Interrupt Flag Reg (IFR) bit set when int occurs Interrupt Enable Reg (IER) enables individual ints
IRQ_enable(IRQ_EVT_XINT2) IRQ_enable(IRQ_EVT_XINT1)
Global Interrupt Enable (GIE) bit in Control Status Reg enables all IER-enabled interrupts
IRQ_globalEnable() IRQ_globalDisable()
The above diagram shows the logic flow an interrupt signal goes through to reach the CPU. As you can see, there are two switches that must be enabled for the CPU to respond to an interrupt. Individual enable Global enable
The diagram also shows the CSL functions that can be used to enable interrupts. A couple notes: CSL enumerates each interrupt event; that is, we give each one its own name. The example above demonstrates the event name for the McBSP2 transmit interrupt and the McBSP1 transmit event. While its handy to have CSL functions for enabling/disabling the global interrupt enable, you may not ever need to call them yourself. First, interrupts are automatically enabled when the DSP/BIOS scheduler is started (which occurs when you return from main). Second, the DSP/BIOS interrupt dispatcher handles all the necessary global interrupt enable/disable required when going into and out-of an ISR. (The dispatcher even makes nesting interrupts very easy even when writing ISRs in C.)
2 - 15
Lab 2
Lab 2
In this lab, we're going to use all of the information that we discussed in this chapter to send sine wave samples out through the McBSP connected to the AIC23 codec. We are going to use a HWI to synchronize the CPU to the codec rate.
sineGen
CPU HWI
transmit interrupt
Here are the goals of this lab: Use the BSL for the DSK to open the codec Use the Configuration Tool to set up a HWI for the McBSP Write generated sine wave values to the codec
2 - 16
Lab 2
The Paperwork
To get started, we are going to take a moment to think about what we need to do in this lab and put it down on paper. The file below is a copy of what you will need to write to finish this lab. Take a moment to figure out the value for each blank line, before moving on to enter the code on the computer. This way, you can think about what you are doing before you actually need to do it. In order to fill in the blanks, you may need some help with the DSK's BSL. The good news is that excellent documentation for the BSL comes with DSK and Code Composer Studio. You just need to find it. Follow these steps to find the documentation for the BSL. 1. Open Code Composer Studio. Use the desktop icon to open CCS. 2. Open up the CCS Help File. Help Contents You should see something like this:
Take a look at the codec API summary. This lists most of the information that you will need to complete the lab. 3. Please fill in the blanks in the file on the next page.
2 - 17
Lab 2
lab2.c
1
#include " #include " .h" .h" // need DSK specific header file // need AIC23 specific header file
DSK6
_AIC23_Config config = { \
0x0017, /* 0x0017, /* headsetVol, headsetVol, 0x0011, /* 0x0000, /* 0x0000, /* 0x0043, /* 0x0081, /* 0x0001 /*
0 DSK6416_AIC23_LEFTINVOL Left line input channel volume */ \ 1 DSK6416_AIC23_RIGHTINVOL Right line input channel volume */\ /* 2 DSK6416_AIC23_LEFTHPVOL Left channel headphone vol */ \ /* 3 DSK6416_AIC23_RIGHTHPVOL Right channel headphone vol */ \ 4 DSK6416_AIC23_ANAPATH Analog audio path control */ \ 5 DSK6416_AIC23_DIGPATH Digital audio path control */ \ 6 DSK6416_AIC23_POWERDOWN Power down control */ \ 7 DSK6416_AIC23_DIGIF Digital audio interface format */ \ 8 DSK6416_AIC23_SAMPLERATE Sample rate control */ \ 9 DSK6416_AIC23_DIGACT Digital interface activation */ \
3
/* * main() - Main code routine, initializes BSL and a hardware interrupt */ void main() { // Initialize the board support library, this must be called first
4 5 6
// Open the codec // Enable the McBSP interrupt for IRQ_EVT_XINT2 (for 6416 DSK) // or Enable the McBSP interrupt for IRQ_EVT_XINT1 (for 6713 DSK) // Invoke DSP/BIOS scheduler return; } /* * myHWI() - ISR called when the McBSP wants more data */ void myHWI(void) { static short mySample; static int leftChan = 1; if(leftChan) { mySample = sineGen(); leftChan = 0; } else { leftChan = 1; }
// Send a sample to the McBSP (which then sends it to the AIC23 codec) }
2 - 18
Lab 2
lab2.c (hints)
The 6416 (or C6713) DSK Board Support Library is divided into several modules, each of which has its own include file. First of all, the file dsk6416.h (or dsk6713.h) must be included in every program that uses the BSL. You also need to include the header file for the AIC23 codec since this is the BSL module used in this exercise.
We created a structure called config which has the parameters needed to initialize the AIC23 codec. BSL creates a new datatype for this configuration information. We left part of this blank. The main reason we left it blank was because it the remaining three characters are specific to the DSK you are using (either C6416 or C6713). DSK6???_AIC23_Config By the way, to understand the values we chose for the AIC initialization; please refer to the DSKs help file. You need declare a handle for AIC23 codec. This step is similar to part of Step 1 as described for the generic CSL procedure (page 2-10). Similar to above, BSL creates a new datatype for the handle of the AIC23 codec. The BSL library contains a BSL function to initialize itself. It must be called before any other BSL function. Take a look at this function in the DSKs help file.
Next, you need to open the codec. The BSL function that opens the codec returns a handle. This step is similar to part of Step 3 as described for the generic CSL procedure (page 2-10). FYI, the BSL function that opens the codec actually does a number of things: Opens both of the McBSPs it requires Configures both McBSPs Configures the AIC23 Returns a handle to the AIC23 (which essentially points to the two McBSPs)
Looking at the block diagram for this lab (pg 2-16), we can see that were using the McBSP transmit interrupt to tell the CPU when to create another sine wave value and output it to the codec. To allow this to happen, we must enable the McBSP transmit interrupt as shown on page pg 2-15. In this same diagram we listed the CSL function used to enable individual interrupts. Remember, though, that each DSK (C6416 vs C6713) uses a different McBSP to talk to the codec. In other words, you need to choose the correct transmit event name based upon which DSK you are using. (Rather than making you look up the event names for the McBSP transmit interrupts, we have provided the names for you in the code's comments.) Note: The GIE bit is enabled automatically when you exit main() and return to the DSP/BIOS scheduler. Once more, look for a BSL function which writes a sample to the codec. You should be able to find this in the DSK help file. Note: What really happens is that the codec write function writes the value to the McBSP data transmit register (DXR), which then sends it to the AIC23 codec, which then converts it to the analog signal which we hear.
2 - 19
Lab 2
lab2.c (answers)
Note: For the C6713 DSK, just replace 6416 with 6713 in all the answers below.
dsk6416
.h"
// need DSK specific header file // need AIC23 specific header file
dsk6416_aic23 .h"
DSK
0 DSK6416_AIC23_LEFTINVOL Left line input channel volume */ \ 1 DSK6416_AIC23_RIGHTINVOL Right line input channel volume */\ /* 2 DSK6416_AIC23_LEFTHPVOL Left channel headphone vol */ \ /* 3 DSK6416_AIC23_RIGHTHPVOL Right channel headphone vol */ \
};
DSK6416_AIC23_CodecHandle hCodec;
/* * main() - Main code routine, initializes BSL and a hardware interrupt */ void main() { // Initialize the board support library, this must be called first
4 5
DSK6416_init();
// Open the codec
IRQ_enable(IRQ_EVT_XINT2);
or IRQ_enable(IRQ_EVT_XINT1);
64
// Invoke DSP/BIOS scheduler return; } /* * myHWI() - ISR called when the McBSP wants more data */ void myHWI(void) { static short mySample; static int leftChan = 1; if(leftChan) { mySample = sineGen(); leftChan = 0; } else { leftChan = 1; }
67
7
}
// Send a sample to the McBSP (which then sends it to the AIC23 codec) DSK6416_AIC23_write(hCodec, mySample);
2 - 20
Lab 2
Lab2 Procedure
4. Create a new project called LAB2.PJT in the C:\c60001day\labs\lab2 subdirectory. Project New You will encounter the Project Creation dialog. Fill in the Project Name and Location as shown below:
If you are using the C6713 DSK, the Target field should read: TMS320C67XX
5. You can also use the button to specify the correct path.
2 - 21
Lab 2
7. CCS allows you to select a template configuration file. Since no simulator specific CDB template is available, well choose the dsk6416.cdb or dsk6713.cdb template.
If you are using the C6713 DSK, please choose its template file: dsk6713.cdb
Note:
In some TI classrooms you may see two or more tabs of CDB templates; e.g. TMS6xxx, TMS54xx, etc. If you experience this, just choose the C6xxx tab and make your selection.
The CDB templates automate the process of setting up numerous system objects/parameters. Those shown above are shipped with the C6416 DSK. You can create your own CDB templates, just copy a CDB file you have created to the directory where the above files are stored (C:\ti\c6000\bios\include). 8. While there are many objects displayed in the configuration editor, we only need to configure one of them (which well do starting in step 15). The other dsk6416 (or dsk6713) defaults will work fine.
2 - 22
Lab 2
Using one of these methods, add the following files from C:\c60001day\labs\lab2 to your project:
LAB2.C LAB2.CDB LAB2cfg.CMD block_sine.c
Edit Files
12. Open lab2.c for editing by double-clicking on it in the Project Explorer pane. 13. Use your answers from the paperwork exercise back on page 18 to make the appropriate changes to lab2.c. You should find a place commented in the file to make each of the changes (one change has question marks for you to replace). If you have any questions, feel free to ask your instructor for help. 14. When you're done, save lab2.c.
Configure a HWI
15. Open lab2.cdb. 16. Navigate to the Scheduling folder inside the Configuration Tool. 17. Inside this folder, find the "HWI Hardware Interrupt Service Routine Manager" and open it by clicking on the little + sign next to it. 18. You can pick any interrupt, from HWI_INT4 to HWI_INT15, that you want to use for the lab. The lab instructions are going to use HWI_INT12 (this was an arbitrary choice).
2 - 23
Lab 2
19. Open the properties of the interrupt that you chose by right-clicking and choosing Properties.
20. Change the interrupt source of the HWI interrupt number that you have selected to: C6416: McBSP 2 Transmit Interrupt (MCSP_2_Transmit) C6713: McBSP 1 Transmit Interrupt (MCSP_1_Transmit)
64
67
2 - 24
Lab 2
21. Change the function property of the interrupt to call the myHWI() function (defined in lab2.c). You will need to add an underscore in front of the function name since it is a C function. Here's what it should look like: From the previous step. if you are using the C6713 DSK, the interrupt source should be: MCSP_1_Transmit
Note: The TI C compiler (as with most compilers) differentiate C source labels from assembly source labels by prepending an _ to all C labels as it generates assembly code. In this dialog box, the HWI function property requires an assembly label; hence, we need the underscore. 22. Click on the Dispatcher tab in the interrupt properties. We want to use DSP/BIOS's HWI Dispatcher to take care of everything (i.e. context save/restore) for the ISR. Enable the Dispatcher by clicking on the check box:
23. When you're all done with the changes, click on OK to save the HWI_INT properties. 24. Save the changes that you made to the .cdb file. Go ahead and close the .cdb file.
2 - 25
Lab 2
2 - 26
Lab2a (optional)
Lab2a (optional)
Now that you've successfully got the DSK to spit out some sound, wouldn't you like to be able to turn it off? Is there anything on the DSK that we might be able to use as a switch to turn the sine wave on and off? Yeah, the DIP switches could be used to do that. Now, if we only had a function that made it easy to read one of the DIP switches. Do you think the BSL might have something like that? If so, you could simply read the DIP switch then decide whether to send a new sine wave sample or a 0 to the codec, effectively turning the sine wave off.
sineGen
CPU HWI
transmit interrupt
Here are the basic steps that you'll need to perform in order to do this: In the HWI routine, right before you write the new sine wave to the codec, read a DIP switch. If the DIP switch is on, then write the sine wave value that you calculated. If the DIP switch is off, simply write a 0 to the codec
2 - 27
Lab 2 Debrief
Lab 2 Debrief
Lab 2 Debrief
1.
First, lets quickly review the values we filled-in. Click Here to Open Lab2.c How much differs between the C6713 and C6416 solutions? What would be the benefit if we could eliminate hardware specific references in our code?
2. 3.
1. Please refer to the solutions file for the results. 2. The differences are: The BSL calls we had you complete in Lab2.c. The CDB template file. The reference to the BSL library.
3. If we eliminate the hardware specific references, we would could write and maintain a single piece of code for all C6000 platforms. This highlights two key points: One, the consistency between families of the C6000 architecture makes porting code between them very easy. Even more important, if you learn one family, youve basically learned all of them. The increased modularity and reuse of a single code-base used across multiple families usually enhances the stability and robustness of the code. DSP/BIOS device drivers (SIO/PIP/IOM) are discussed briefly in the next chapter. These can allow us to achieve hardware independence in our code.
2 - 28
eXpressDSP Tools
Introduction
TI provides solutions to DSP engineers facing ever increasing complexity in their systems. Providing efficient, capable code libraries, I/O driver schemes, certified Algorithm Standards, and extraordinarily robust starter applications (reference designs, so to speak). While a single chapter in a one-day workshop cannot begin to describe the many details of these tools and libraries, hopefully we can give you a sense of what they offer and how they might help you finish your designs more quickly.
Outline
Outline
Overview of eXpressDSP
DSP/BIOS Scheduler Real Time Analysis Device Drivers (IOM) Algorithm Standard (XDAIS) Reference Frameworks (RF)
3-1
What is eXpressDSP?
Chapter Topics
eXpressDSP Tools ..................................................................................................................................... 3-1 What is eXpressDSP? ............................................................................................................................. 3-3 DSP/BIOS ............................................................................................................................................... 3-4 DSP/BIOS Scheduler.......................................................................................................................... 3-4 Real-Time Analysis ...........................................................................................................................3-11 Device Drivers (IOM) .......................................................................................................................3-13 TMS320 DSP Algorithm Standard (XDAIS) ..........................................................................................3-19 Introduction .......................................................................................................................................3-19 XDAIS (background info) .................................................................................................................3-20 Thousands of XDAIS Compliant Algorithms ...................................................................................3-24 Reference Frameworks ..........................................................................................................................3-25 RF3 Demo ..............................................................................................................................................3-27 Inspect the .cdb file............................................................................................................................3-28 Use Real-time Analysis Tools ...........................................................................................................3-35 Flashing RF3 .........................................................................................................................................3-39 Create the Flash Image ......................................................................................................................3-39 Use Flashburn to Burn the Image ......................................................................................................3-41 Flashing POST...................................................................................................................................3-42 Lab/Demo Debrief .................................................................................................................................3-44 eXpressDSP Summary ...........................................................................................................................3-44
3-2
What is eXpressDSP?
What is eXpressDSP?
What is eXpress DSP?
A premier, open DSP software strategy for TIs Leadership TMS320 DSP Family
CCS
DSP/BIOS
XDAIS
Target Content
3-3
DSP/BIOS
DSP/BIOS
DSP BIOS Consists Of:
Real-time analysis tools Allows application to run uninterrupted while displaying debug data Real-time scheduler Preemptive thread mgmt kernel Real-time I/O (Drivers) Allows two-way communication between threads or between threads and hardware
DSP/BIOS Scheduler
SWI Priority
Software Interrupts
TSK
Tasks
IDL
Background
3-4
DSP/BIOS
Algo2
While Loop?
Possible Solution While Loop
main { while(1) { Algo1 Algo2 } } Put each routine into an endless loop under main Potential Problems: Algos run at different rates: Algo1: 8kHz Algo2: 4Hz What if one algorithm starves the other for recognition or delays its response?
3-5
DSP/BIOS
idle
Time
Interrupt is missed
Time 0
Use DSP/BIOS HWI dispatcher for context save/restore, and allow preemption Reasonable approach if you have limited number of interrupts/functions Limitation: Number of HWI and their priorities are statically determined, only one HWI function for each interrupt
3-6
DSP/BIOS
DSP/BIOS
Algo1 Algo2
SWI read serial port ints disabled process data (filter, etc.) rather than all this time
HWI
Fast response to interrupts Minimal context switching High priority only Can post SWI Could miss an interrupt while executing ISR
SWI
Latency in response time Context switch performed Selectable priority levels Can post another SWI Execution managed by scheduler
3-7
DSP/BIOS
Tasks (TSK)
Another Solution Tasks (TSK)
main { // return to O/S; } DSPBIOS tasks (TSK) are similar to SWI, but offer additional flexibility TSK is more like traditional O/S task Tradeoffs: SWI context switch is faster than TSK TSK module requires more code space TSKs have their own stack User preference and system needs usually dictates choice. Its easy to use both!
DSP/BIOS
Algo1 Algo2
SWI_post
TSK
SEM_post
start end
end
Similar to hardware interrupt, but triggered by SWI_post() All SWI's share system software stack
SEM_post() triggers execution Each TSK has its own stack, which allows them to pause
3-8
DSP/BIOS
post3 rtn SWI_post(&swi2); post2 rtn post1 int2 rtn rtn rtn rtn int1 User sets the priority...BIOS does the scheduling
(lowest)
3-9
DSP/BIOS
Drag and Drop SWIs to change priority Drag and Drop SWIs to change priority Equal priority SWIs run in the order that Equal priority SWIs run in the order that they are posted they are posted
3 - 10
DSP/BIOS
Real-Time Analysis
Execution Graph
Software logic analyzer Debug event timing and priority
3 - 11
DSP/BIOS
Profile routines w/o halting the CPU Capture & analyze data without stopping CPU
Message LOG
Send debug msgs to host Doesnt halt the DSP Deterministic, low DSP cycle count More efficient than traditional printf() LOG_printf (&logTrace, addSine ENabled);
PC
Display TI 3rd Party Third Party
Display
TMS320 DSP
JTAG USER CODE RTDX EMU
User
CCS
3 - 12
DSP/BIOS
App
Codec
McBSP
void audioLoopBack() { DSK6416_AIC23_CodecHandle hCodec; short buf[64]; int N; DSK6416_init(); /* Start the codec */ hCodec = DSK6416_AIC23_openCodec(0, &config); while () { for (N = 0; N < 64; N++) { while (!DSK6416_AIC23_read(hCodec, buf[N])); while (!DSK6416_AIC23_write(hCodec, buf[N]));
} } }
App writes to hardware directly using the the specific targets BSL functions Every application needs to be customized to hardware: You must change each instance of DSK6416_xxx to another function call every time you port the code Portability suffers
3 - 13
DSP/BIOS
App
SWI or TSK
Device Drivers standardize the interface between the Application and the H/W Application programmer only has to know PIP or SIO (no matter what H/W is connected) The H/W can be changed without changing the Application (only need to change IOM included in project) Therefore, Drivers (SIO/PIP with IOM) insulate the Application from the hardwares details
SIO
PIP
Any mini-driver (IOM) can be used with any DSP/BIOS I/O model
Application Programmer chooses the preferred class driver Interface is consistent regardless of which device (mini-driver) connected Software interface doesnt change, even if you change the IOM device
3 - 14
DSP/BIOS
Bottom Line: Application Code never changes even if you change H/W
SIO (Stream I/O) is another DSP/BIOS device driver methodology SIO (Stream I/O) is another DSP/BIOS device driver methodology Think of issuing and reclaiming buffers from aastream Think of issuing and reclaiming buffers from stream Bottom Line: Application Code doesnt change even ififhardware does Bottom Line: Application Code doesnt change even hardware does
Some further notes about SIO: SIO (Stream I/O) is another DSP/BIOS device driver methodology Handles queuing of buffers to/from devices Issue a buffer to a stream Issue full buffer to a transmit stream Or, empty going to a receive stream Full from a receive stream Empty from a transmit stream
3 - 15
DSP/BIOS
And heres a little larger code sample we couldnt fit in the slide.
/ Prime the stream buf0 = (Ptr)MEM_calloc(0, 64, BUFALIGN); buf3 = (Ptr)MEM_calloc(0, 64, BUFALIGN); // Issue the two empty buffers to the input stream */ SIO_issue(inStream, buf0, SIO_bufsize(inStream), NULL); SIO_issue(inStream, buf1, SIO_bufsize(inStream), NULL); // Issue the two empty buffers to the output stream */ SIO_issue(outStream, buf2, SIO_bufsize(outStream), NULL); SIO_issue(outStream, buf3, SIO_bufsize(outStream), NULL); // Run forever looping-back buffers for (;;){ // Reclaim full buffer from the input stream SIO_reclaim(inStream, (Ptr *)&inbuf, NULL)) // Reclaim empty buffer from output stream and reissue SIO_reclaim(outStream, (Ptr *)&outbuf, NULL) SIO_issue(outStream, inbuf, nmadus, NULL) // Issue an empty buffer to the input stream SIO_issue(inStream, inbuf, SIO_bufsize(inStream), NULL) }
Please refer to the code examples that ship with the DSK for a full, working example using this code.
3 - 16
DSP/BIOS
A closer look at IOM And heres a closer look at the functions and data structures that make up an IOM driver.
Data Structures:
BIOS Device Table IOM function table Dev params Global Data Pointer (device inst. obj.) Channel Params Channel Instance Obj. IOM_Packet (aka IOP)
Provided Royalty Free Requires CCS v2.2 or greater Search for DDK on the TI website to download
DDK v1.0 DDK v1.0 DDK v1.1 DDK v1.1 DDK v1.2 (3Q03) DDK v1.2 (3Q03)
3 - 17
DSP/BIOS
DSKs BSL
TI CSL
3 - 18
Buying Algorithms
Why is it hard to integrate someone elses algo? 1. Dont know how fast it runs or how much memory it uses. 2. How can I adapt the algorithm to meet my needs? 3. Will the function names conflict with other code in the system? 4. Will it use memory or peripherals needed by other algos? 5. How can I run the same algo on more than one channel at a time? (How can I prevent variables from conflicting?) 6. How many interfaces (APIs) do I have to learn? Traditional Solution When I buy an algorithm, I need the source code (and lots of development time) or I cant guarantee it will work. But, purchasing source code costs a lot of money!
Algorithm
3 - 19
Packaging Rules
All algorithms packaged and delivered in a consistent format
Documentation Rules
Algorithms must provide basic memory and performance information to enable apples to apples comparisons and to aid system designers with algorithm integration
3 - 20
Will the function names conflict with other code in the system?
Algorithm must be C callable and re-entrant Strict rules on function naming virtually eliminate conflicts.
fir_company123_min.l64 fir_company123_max.h62
Algorithm Module Name Vendor Name Variant L: library h: header 62: C62x/C67x 64: C64x
Algorithm
Application
(framework)
malloc()
Memory
*ptr *ptr
3 - 21
This type of dynamic instantiation sounds great, but what if I want to allocate my memory statically? No problem, XDAIS algorithms can be designed to work both ways. And theres even a utility that will interrogate an algorithm and create a C file containing all the required memory elements of an algorithm. That is, well even help you with your static instantiation.
Dynamic
algNumAlloc algAlloc algInit algActivate Filter algDeactivate algFree
Create
algInit
Filter
Execute Delete
*Note: Static case can also use algActivate if algo uses scratch memory
Heres a more detailed look at the process of creating an instantiation of an algorithm. (Sorry we dont have time to go through this example in class.)
Algorithm
Params
2. How many blocks of memory will you need to do this for me? 4. Ill make a place where you can tell me about your memory needs
*memTab = malloc(5*N)
MemTab
3 - 22
Algorithm 6. My needs, given these parameters, are this, for each of the N blocks of memory InstObj
Param1 Param2 Base1 Base2
N algInit()
If I want to run the same algo on more than one channel How can I prevent variables from conflicting with each other?
Each algorithm gets its own storage location called an instance object.
*fxns *a *x
3 - 23
And finally, heres a little diagram showing most of the XDAIS algorithm interface.
XDAIS Summary
instance handle Key: User Vendor Module XDAIS
fxns
params
3 - 24
Reference Frameworks
Reference Frameworks
IOM and XDAIS: Common Interfaces
System I O M System Software
Data Init Mem. Mgmt.
(Peripherals)
H/W
X D A I S
Algorithm
With standardized interfaces to Algorithms and H/W, system software (i.e. framework) can also be standardized A standard framework can be used as a starting point for many different Applications
Currently, three generic frameworks are available Also, application specific frameworks available (or coming) for specific applications (audio, video, etc.)
Reference Frameworks
t pac Com le xib Fle ens Ex t
ive
te d nec Con
Design Parameter
Static Configuration Dynamic Object Creation Static Memory Management Dynamic Memory Allocation Recommended # of Channels Recommended # of XDAIS Algos Absolute Minimum Footprint Single/Multi Rate Operation Thread Preemption and Blocking Implements Control Functionality Supports Implements DSPLink (DSPGPP) Total Memory Footprint (less algos) Processor Family Supported
RF1
RF3
RF5
RF6
1 to 3 1 to 3 single
HWI
HWI, SWI
3.5KW C5000
3 - 25
Reference Frameworks
Memory
Host (GEL)
FIR
In
IOM
PIP
Split SWI
Out
IOM
SWI Audio 1
IOM Drivers for input/output Two processing threads with generic algorithms Split/Join threads used to simulate stereo codec. (On C6416/C6713 DSKs, how could we save cycles on split/join?)
The Reference Frameworks available today provide a SWI thread which creates the stereo audio used by the audio processing threads. This was necessary for the early DSKs since they only supported mono audio. In the case of mono audio input, the Split thread just duplicated the audio to both channels. Today, with stereo codecs the Split thread sorts the two incoming channels into two different channels. How could you make the above system more efficient? How about re-writing the IOM driver so that it uses the EDMA to perform the channel sorting. The IOM interface supports multichannels, thus you should be able to directly connect it to both of the Audio Processing PIPs.
3 - 26
RF3 Demo
RF3 Demo
Here are the steps the facilitator will go through during the in-class demo. These were included to allow you to go back through the demo at your own pace, and to explore further any additional aspects of RF3 that you find interesting. Note: This demo assumes that your Code Composer Studio installation is setup just like we did it back in Lab 1 and that the files that we provided you are installed on the computer that you are using. If either of these are NOT true, then you may have some difficulty with the following steps.
3 - 27
RF3 Demo
Hardware Portability
One of the best characteristics of RF3 (and the other reference frameworks for that matter) is that they can easily be ported to a new hardware platform. Getting useful applications up and running on a new hardware design used to be a difficult task. RF3 is built using IOM drivers. All of the hardware specific code is encapsulated in the driver. Let's take a look at how this is done. 4. Inside the .cdb file, navigate to the udevCodec object which is located in the User-Defined Devices folder under Device Drivers. The Device Drivers folder is in the Input/Output folder. It looks something like this:
Most of the hardware specific information is contained in this one object. So, where is the other stuff? Here is a list of the few places that are hardware specific: The library that actually contains the code that the udevCodec object refers to is referenced in the linker command file: link.cmd. There is a C file, dsk6416_devParams.c (or dsk6713_devParams.c), that contains the parameters for how to setup the hardware controlled by the driver. One of these parameters is the hardware interrupt that will be used by the driver to synchronize with the CPU.
3 - 28
RF3 Demo
5. As we just mentioned, most of the hardware specific information needed to talk to the codec for RF3's audio is contained in this object. Open the properties of the object by right-clicking on it and choosing properties to see this information:
Obviously, the C6713 DSK version is similar but uses symbols that begin with:
_DSK6713
Since most of the hardware specific information is contained in this one object, it is the main place that needs to change in order to talk to new hardware. 6. Close the udevCodec Properties box by clicking "Cancel".
I/O Flexibility
The IOM driver interfaces to a DSP/BIOS PIP. A PIP is a simple buffer manager with synchronization capabilities. RF3 uses PIPs to flow data from hardware (an IOM driver) to software processing engines called threads. To see all of the PIPs and how they connect things together refer back to the diagram at the beginning of the lab. 7. Let's take a look at how a PIP connects a driver to a thread by looking at the properties of the receive PIP. This is the PIP that connects the input device driver (audio source) to the receive/split thread. Navigate to the PIP Buffered PIP Manager which is inside the Input/Output folder in the .cdb file. 8. Open the properties of the pipRx PIP by right-clicking on it and choosing properties.
3 - 29
RF3 Demo
9. Click on the "Notify Functions" tab. You should see something that looks like this:
Notifies the driver when an empty buffer is available. Notifies the thread when a full buffer is available.
This interface makes it easy to change who gets notified when a PIP needs to be written to or needs to be read. The thread structures that RF3 uses also make it easy to change which PIP a thread is talking to. All of these capabilities work together to make the RFs great places to start a design because they are powerful and easy to adapt to different needs. 10. Click "Cancel" to close the properties box. 11. Feel free to look at the other PIPs in the application if you'd like. Make sure to refer back to the block diagram at the beginning of this discussion to see how things fit into the big picture.
Processing Threads
Flexible I/O and drivers are important, but come on, the whole reason for their being is to feed data to functions for processing. RF3 uses DSP/BIOS Software Interrupts (SWIs) to run processing functions. SWIs are very similar to hardware interrupts (HWIs), but they are controlled by software through API calls by the program. For example, the SWI_andnHook() function in the PIP properties that we looked at earlier is a BIOS API call that notifies a SWI that one of the conditions that it needs to run has been met. When all of the conditions have been met, the SWI is readied by the scheduler and it runs when it is the highest priority thread that needs servicing. 12. Let's take a look at the SWIs in RF3. Navigate to the "SWI Software Interrupt Manager" that is located in the Scheduling folder in the .cdb file.
3 - 30
RF3 Demo
13. You should see the following SWI objects: Name swiAudioProc0 swiAudioProc1 swiControl swiRxSplit swiTxJoin Purpose Runs volume and filter for channel 0 Runs volume and filter for channel 1 Runs control thread periodically (more info. later) Splits incoming data flow into two channel flows Combines two channels into outgoing data flow
Don't forget to refer back to the block diagram and see how everything fits together. 14. Let's take a closer look at the swiRxSplit thread to see what function it calls when it runs. Open the properties of swiRxSplit by right-clicking on it and selecting properties. You should see something like this:
Here are some details on the different properties: Name comment function priority mailbox arg0, arg1 Purpose Allows user to comment the object in the .cdb file Function that is called by the SWI Priority that the SWI executes at Used with APIs to signal the SWI (more info. later) Arguments to function (i.e. _thrRxSplitRun(arg0,arg1) )
3 - 31
RF3 Demo
swiRxSplit
mailbox
Each bit in the swiRxSplit mailbox represents a pre-condition. The SWI should only run when there is a full buffer to split and two empty buffers to fill. Three preconditions with a bit each needs three bits, or 0x7. The SWI_andnHook() function is a BIOS API call that can only be called from within BIOS (that's why we add Hook) and it essentially clears a bit in the mailbox when it runs. When all the bits are zero, the SWI is automatically scheduled to run. 15. When you're done examining the SWI object, click "Cancel" to close the window. 16. Take a moment to find all of the threads that are listed in the .cdb file in the block diagram.
Taking Control
RF3 has a built in control function to change the execution of the processing algorithms at runtime. It uses this thread to modify the volume that each of the channels is played at. It can easily be modified to do just about anything else that you might want to do to control your application. 17. The control function is executed by swiControl. Open and examine the properties of swiControl. 18. When should the application tell swiControl to run? Normally, a control thread would run when a user changed something. For example, turning up the volume on your MP3 player. Well, the DSK only has a few inputs and they're not really tied to the application. So, RF3 simulates user activity by calling the control thread on a periodic basis. RF3 calls swiControl from a Timer HWI routine that BIOS sets up. Navigate to the "CLK Clock Manager" in the Scheduling folder.
3 - 32
RF3 Demo
19. You should see a clkControl object. Open up its properties to examine them. You should see something like this:
The thrControlIsr() function reads the control values into a control structure then posts swiControl to apply the changes. 20. Click Cancel to close the clkControl Properties window.
Analyzing Priority
Since we've got all of these threads running around processing data and providing control, the question might come up about priority. So, let's take a look at how priorities are assigned in RF3. 21. DSP/BIOS makes it really easy to compare SWI priorities. Click on the SWI Software Interrupt Manager. You should see something like this:
From this picture of the .cdb file we can see that all of the threads are currently set to priority level 1. It turns out that RF3 doesn't really need any of the threads to run at different priorities. However, if your application did need to use priority, it is easy to make the changes here by dragging and dropping the SWI objects to the desired priority level.
3 - 33
RF3 Demo
Getting Feedback
RF3 uses a variety of DSP/BIOS Real-time Analysis Tools to provide feedback about the application. The DSP/BIOS LOG module is used to send general information about the state of the application (things like trace information and dynamic memory or heap activity) back to the user. The BIOS STS (Statistics) module is used to calculate timing information. All of this is done without ever halting the DSP Target application. 22. You can see the objects for either of these modules in the Configuration Tool under Instrumentation. Take a moment to look at these objects and we'll show you how they're used here in a bit.
Project Build or click on 24. If you have CCS configured properly (or at least the way we had you do it back in lab 1), the application should automatically load and go to main(). 25. Make sure your DSK is set up properly for audio. Plug an audio source (CD Player, computer sound card, etc.) into the line in on the DSK. You can use either speakers or headphones for the audio output. Plug speakers into the line out. Plug headphones into the headphone out. 26. Make sure there is audio playing at the source. 27. Run the application.
Debug Run or press F5 or click on 28. You should hear audio playing. If not, double-check all of your connections.
3 - 34
RF3 Demo
DSP/BIOS CPU Load Graph or click on You should see something like this:
As you can now see, the RF3 application (including algorithms) is only taking up a very small amount of the CPU's time.
Message Log
The DSP/BIOS Message Log provides printf() like capability at a much lower cost (memory and MIPS) to the target application. RF3 provides a module called UTL that powerfully extends the basic LOG module. 30. To see the output of the BIOS LOG, open a Message Log. DSP/BIOS Message Log or click on
3 - 35
RF3 Demo
Heap Allocations
Statistics
The DSP/BIOS STS (Statistics) Module provides an easy way to get timing information about the threads in your application. For example, in real-time systems, designers are usually concerned with the maximum execution time of a thread. If a thread's maximum execution time ever exceeds its deadline, then you know you have a problem. 32. Open the DSP/BIOS Statistics View. DSP/BIOS Statistics View or click on
3 - 36
RF3 Demo
This window gives you a lot more detail regarding the execution times of your threads. 33. This window can be modified in several ways. Right-click on the window and choose properties. Inside this window you can enable and disable the statistics for each thread, change the unit that the timing information is displayed in (instructions, microseconds, and milliseconds), etc. Try changing the swiRxSplit thread so that it displays in Microseconds.
3 - 37
RF3 Demo
36. You should now see a window that looks something like this:
Each column in the graph indicates that an event happened. The blue indicates that a thread is running. The white boxes represent that a thread is waiting for its turn to run. The teal green or dark lines indicate that the graph doesn't know what state a thread is in (running, waiting, or not doing anything). The reason for the green lines relates directly to the vertical red line. The red line indicates that the circular buffer that was being used to accumulate the information on the target wrapped around and overwrote some data. This indicates that there is a discontinuity in the data being displayed. Since the graph is essentially starting over, it doesn't know what state a thread is in until its state changes. The horizontal time line at the bottom has a little tick in it when the timer interrupt fires. This line can be used to relate the event based data to time. The threads are usually listed by priority, but since we only have one priority they are listed in the same order that we found them back in step 21. You could change this order in the Configuration Tool just like you changed priority.
3 - 38
Flashing RF3
Flashing RF3
Once you have an application up and running, most people want to see it work without using CCS to control it. In order to do this, we need to burn the application to the flash that is located on the DSK. This sounds like a pretty hefty task, but once again we have tools that make the job a lot easier. In order to speed things along, we have provided another project that contains a slightly modified RF3 application to make it easier to flash.
Switching Projects
40. Close the project we have been working with so far. Project Close Note: If you get a message asking you if you want to save the project file, it shouldn't be necessary since we weren't supposed to make any changes to it.
Hint
41. Close any open windows, otherwise you may get an error when Flashburn opens. Window Close All 42. Close the GEL sliders by clicking on the little X in the upper right-hand corner of each one of them. 43. Open the new project app.pjt located in: C:\c60001day\referenceframeworks\apps\rf3\dsk6416_boot C:\c60001day\referenceframeworks\apps\rf3\dsk6713_boot Project Open or
3 - 39
Flashing RF3
45. Click on the General tab in the window that pops up. You should now see something like this:
C:\ti\c6000\cgtools\bin\hex6x C:\c60001day\referenceframeworks\apps\rf3\dsk
We have added a command to call hex6x using CCS's Final build steps option. This option tells CCS to call our command every time it does a full build. CCS also has Initial build steps option for those commands that need to run before CCS builds a project. 46. When you're through looking everything over, close the box by clicking on Cancel. 47. To have CCS build the project and call hex6x to generate the flash image, we need to do a Rebuild All. Project Rebuild All or click on 48. Wait for CCS to finish building the project and creating the hex image.
3 - 40
Flashing RF3
Note: Flashburn should automatically connect to the target when you open the .cdd file. If it does not, you need to use CCS to run the CPU. When you do this, Flashburn should connect to the target and you should see this icon in Flashburn:
3 - 41
Flashing RF3
51. Use Flashburn to erase the flash. Program Erase Flash or click on 52. Wait until the blue progress indicator bar goes away. 53. Now that the flash is erased, we can burn our hex file. Program Program Flash or click on 54. Wait until the blue progress indicator bar goes away. 55. Close Flashburn. 56. Now, let's see if it worked. Since the program is now in flash, we don't need CCS to load it anymore. Close CCS. 57. Disconnect the USB emulation cable from the DSK. 58. Hold your breath and press the white reset button on the DSK. If everything is working properly, you should now have music coming out of the DSK. If not, check to make sure that you have music playing. 59. Congratulations! You just flashed RF3 to the DSK.
Flashing POST
You probably don't want to leave your DSK running RF3. Here are the steps to program the flash with the post routine. 60. Reconnect your USB emulation cable. 61. Open Code Composer Studio. 62. Open Flashburn. Tools Flashburn 63. Use Flashburn to open the post.cdd located at: c:\ti\examples\dsk6416\bsl\post\ or c:\ti\examples\dsk6713\bsl\post\ File Open 64. Make sure that Flashburn is connected. If not, you may need to run the processor. 65. Erase the flash. Program Erase Flash or click on 66. Wait on the blue progress bar to complete and go away.
3 - 42
Flashing RF3
67. Burn the flash. Program Program Flash or click on 68. Wait on the blue progress bar to complete and go away. 69. Close Flashburn. 70. Close CCS. 71. Push the white reset button on the DSK. The LEDs should flash to indicate the progress of the POST routine as it runs through its tests, then flash and remain on. You should also hear a tone if the speakers/headphones are still connected. 72. Your DSK is now good as new.
3 - 43
Lab/Demo Debrief
Lab/Demo Debrief
eXpressDSP Demo Debrief
1.
2.
eXpressDSP Summary
eXpressDSP Summary
Target Software
C6416/C6713 DSK One-Day Workshop - eXpressDSP Tools
3 - 44
Host Tools
C6000 Optimization
Introduction
The importance of the C language has grown significantly over the past few years. TI has responded by creating a compiler that produces extremely efficient processor code, which is so speed efficient you may not need to program in assembly. After getting your C code running, you may want to optimize it to get the best performance possible. In this chapter we discuss three major optimizations you can take, and then point out where you can go to discover more techniques.
Outline
Outline
Build Options Use Optimized Libraries Enable Cache Where To Go for More Information
4-1
Chapter Topics
C6000 Optimization .................................................................................................................................. 4-1 Optimization Build Options .................................................................................................................... 4-3 Use Optimized Libraries ......................................................................................................................... 4-7 C6000 Double-Level Cache ...................................................................................................................4-10 Why Cache?.......................................................................................................................................4-10 Details of C67x & C64x Internal Memory ........................................................................................4-12 Configuring External Memory as Cacheable (MAR) ........................................................................4-16 Where To Go For More Information .....................................................................................................4-18 LAB 4: Using C......................................................................................................................................4-19 Optimized C.......................................................................................................................................4-27 Using ASM Libraries.........................................................................................................................4-30 Lab 4 Results .....................................................................................................................................4-32 Lab 4a: Memory and Cache...................................................................................................................4-33 Everything Off-chip...........................................................................................................................4-33 Use Some Cache (L1)........................................................................................................................4-38 Use All the Cache ..............................................................................................................................4-41 Cache Re-use .....................................................................................................................................4-44 Lab 4a Results ...................................................................................................................................4-45 Optional Topics......................................................................................................................................4-46 Cache Data Coherency ......................................................................................................................4-46 Advanced Optimizations (Brief List) ................................................................................................4-48
4-2
debug optimize
Debug and Optimize options conflict with each other, therefore they should be not be used together
As you probably learned in college programming courses, you should probably follow a two step process when creating code: Write your code and debug its logical correctness (without optimization). Next, optimize your code and verify it still performs as expected.
As demonstrated above, certain options are ideal for debugging, but others work best to create highly optimized code. When you create a new project, CCS creates two sets of build options called Configurations: one called Debug, the other Release (you might think of as Optimize). Configurations will be explored next. Note: Like any compiler or toolset, learning the various options requires a bit of experimentation, but it pays off in the tremendous performance gains that can be achieved by the compiler. To this end, this workshop will explore these options further in an upcoming chapter.
4-3
The main difference is that the Release (optimized) configuration invokes the optimizer with o3.
*Note: We recommend you add the two options gp k to make Release more useful.
We recommend you add the gp and k options to the Release configuration as this makes it easier to evaluate the optimized performance. In the upcoming lab exercise, you will get a chance to do this. Note: The examples shown hear are for a C67x DSP, hence the mv6700 option.
4-4
To edit a configuration, first make it active (via Project Configurations dialog or toolbar dropdown).
4-5
We are often asked, Why Use fr? When changing configurations, using -fr prevents the object (and .out) files from being overwritten. While not required, it allows you to preserve all variations of your projects output.
4-6
DSPLIB
Optimized DSP Function Library for C programmers using C62x/C67x and C64x devices These routines are typically used in computationally intensive real-time applications where optimal execution speed is critical. By using these routines, you can achieve execution speeds considerably faster than equivalent code written in standard ANSI C language. And these ready-to-use functions can significantly shorten your development time. The DSP library features:
C-callable Hand-coded assembly-optimized Tested against C model and existing run-time-support functions
Adaptive filtering Math DSP_firlms2 DSP_dotp_sqr Correlation DSP_dotprod DSP_autocor DSP_maxval FFT DSP_maxidx DSP_bitrev_cplx DSP_minval DSP_radix 2 DSP_mul32 DSP_r4fft DSP_neg32 DSP_fft DSP_recip16 DSP_fft16x16r DSP_vecsumsq DSP_fft16x16t DSP_w_vec DSP_fft16x32 Matrix DSP_fft32x32 DSP_mat_mul DSP_fft32x32s DSP_mat_trans DSP_ifft16x32 Miscellaneous DSP_ifft32x32 DSP_bexp Filters & convolution DSP_blk_eswap16 DSP_fir_cplx DSP_blk_eswap32 DSP_fir_gen DSP_blk_eswap64 DSP_fir_r4 DSP_blk_move DSP_fir_r8 DSP_fltoq15 DSP_fir_sym DSP_minerror DSP_iir DSP_q15tofl
IMGLIB
Optimized Image Function Library for C programmers using C62x/C67x and C64x devices The Image library features: C-callable C and linear assembly src code Tested against C model
Compression / Decompression IMG_fdct_8x8 IMG_idct_8x8 IMG_idct_8x8_12q4 IMG_mad_8x8 IMG_mad_16x16 IMG_mpeg2_vld_intra IMG_mpeg2_vld_inter IMG_quantize IMG_sad_8x8 IMG_sad_16x16 IMG_wave_horz IMG_wave_vert
Picture Filtering / Format Conversions IMG_conv_3x3 IMG_corr_3x3 IMG_corr_gen IMG_errdif_bin IMG_median_3x3 IMG_pix_expand IMG_pix_sat IMG_yc_demux_be16 IMG_yc_demux_le16 IMG_ycbcr422_rgb565 Image Analysis IMG_boundary IMG_dilate_bin IMG_erode_bin IMG_histogram IMG_perimeter IMG_sobel IMG_thr_gt2max IMG_thr_gt2thr IMG_thr_le2min IMG_thr_le2thr
4-7
FastRTS (C67x)
Optimized floating-point math function library for C programmers using TMS320C67x devices Includes all floating-point math routines currently in existing C6000 runtime-support libraries The FastRTS library features: C-callable Hand-coded assembly-optimized Tested against C model and existing run-time-support functions FastRTS must be installed per directions in its Users Guide (SPRU100a.PDF)
Single Precision Double Precision atanf atan atan2f atan2 cosf cos expf exp exp2f exp2 exp10f exp10 logf log log2f log2 log10f log10 powf pow recipf recip rsqrtf rsqrt sinf sin
FastRTS (C62x/C64x)
Optimized floating-point math function library for C programmers enhances floating-point performance on C62x and C64x fixed-point devices The FastRTS library features: C-callable Hand-coded assembly-optimized Tested against C model and existing run-time-support functions FastRTS must be installed per directions in its Users Guide (SPRU653.PDF)
Single Double Others Precision Precision _addf _addd _cvtdf _divf _divd _cvtfd _fixfi _fixdi _fixfli _fixdli _fixfu _fixdu _fixful _fixdul _fltif _fltid _fltlif _fltlid _fltuf _fltud _fltulf _fltuld _mpyf _mpyd recipf recip _subf _subd
4-8
Now that you know about the libraries, here's where to find them and some information on how they are organized. Each library also has documentation that goes along with it.
Location of Libraries
(in CCS v2.2)
DSP and IMG Libraries provided as source archive, and Little Endian C6000 obj library Folder Structure: lib - library files (.lib) and source code (.src) include - contains the library header files support - miscellaneous supporting code bin - supporting Windows executables CCS Docs folder contains: SPRU565A.pdf - DSP API User Guide SPRU023A.pdf - Imaging API User Guide SPRU100A.pdf FastRTS Math API UG Application Notes: SPRA885.pdf - DSPLIB App note SPRA886.pdf- IMGLIB App note
4-9
Parking Dilemma
Close Parking
0 minute walk
Distant Parking-Ramp
10 minute walk 1000 spaces $5/space
Concert Hall
10 spaces $100/space
10 minute walk
Parking Choices: 0 minute walk @ $100 for close-in parking 10 minute walk @ $5 for distant parking or Valet parking: 0 minute walk @ only $6.00
You do! A valet service gives the same access as the close parking for just a little more cost than the parking garage. So, you arrive on time (and dry) and you still have money left over to buy some goodies.
4 - 10
Cache is the valet service of DSPs. Memory that is close to the processor and fast can only be so big. You can attach plenty of external memory, but it is slower. Cache helps solve this problem by keeping what you need close to the processor. It makes the close parking spaces look like the big parking garage around the corner.
Why Cache?
Cache Memory CPU
Fast Small Works like Big, Fast Memory
Bulk Memory
Slower Larger Cheaper
Memory Choices: Small, fast memory Large, slow memory or Use Cache: Combines advantages of both Like valet, data movement is automatic
One of the often overlooked advantages of cache is that it is automatic. Data that is requested by the CPU is moved automatically from slower memories to faster memories where it can be accessed quickly.
4 - 11
Levels of Memory
Program Cache
L2 Internal SRAM
CE0
Daughter-Card Daughter-Card
CE2
SDRAM
(16MB) EMIF
CE1
CPU
Data Cache
L1
Flash
(512KB) Level 2 Level 3
We often refer to a systems memory in hierarchical levels Higher levels (L1) are closer to the CPU
It's important to understand these hierarchical levels in order to comprehend how requests flow from the CPU to the caches and memories. The daughtercard interface on the DSK allows you to add more memory, as well as other devices, to your board. Check the dspvillage.com web site and search for daughtercard to find out what is available for purchase.
4 - 12
L2 CPU
8/16/32/64
Level 2 Memory
Unified: Prog or Data L2 L1D delivers 32-bytes in 4 cycles L2 L1P delivers 16 instrs in 5 cycles Configure L2 as cache or addressable RAM
L1 Data
(4KB)
(C6711/12: L2 memory is 64K bytes)
Here are some more details about the C6713 internal memories.
CPU
8/16/32/64
Level 1 Program Always cache 1 way cache (Direct-Mapped) Zero wait-state Line size: 512 bits (or 16 instr) Level 1 Data Always cache 2 way cache Zero wait-state Line size: 256 bits Level 2 Unified (prog or data) RAM or cache 1-4 way cache 32 data bytes in 4 cycles 16 instr. in 5 cycles Line Size: 1024 bits (or 128 bytes)
(4KB)
L1 Data
4 - 13
C67x L2 Memory
A nice feature of the C6000 L2 memories is that they can be configured as internal SRAM or cache ways. This allows designers to customize the memory architecture to fit their needs.
The Configuration Tool makes setting up the cache the way you want as easy as choosing an option in a drop-down box.
4 - 14
L2 CPU
8/16/32/64
L1 Data Cache
2-Way Cache Single cycle access Size = 16K Bytes Linesize = 64 bytes
L1 Data
(16KB)
Level 2 Memory
C6414/15/16 = 1M Byte C6411/DM642 = 256K Byte
The C64x L2 memory is also configurable. It always has 4 cache ways, but you can change the size of the ways from 0K bytes (all SRAM) to larger sizes up to 64K bytes in 4 ways (256K cache, 728K SRAM).
C64x L2 Memory
Configuration
When cache is enabled, its always 4-Way This differs from C671x
Linesize
Linesize = 128 bytes Same linesize as C671x
4 - 15
CE0
CE2
CE3
The processor reset value for the MAR bits is zero. This means that when the processor wakes up, all of the external memory is uncacheable. What do you think this will do to the performance of your system if you are using cache (and you should)? Let's just say you probably won't be pleased. So, one of the first things that you need to do is turn on the MAR bits for the memory regions that you need to cache.
4 - 16
The Configuration Tool makes this easy. You can use a mask in the Configuration Tool to setup up the MAR bits that you want to enable and it will be done for you.
00000000
If your code is running a lot slower than you thought it would, you might want to check the MAR bit settings. These bits are set to zero at reset and the default configuration files usually leave them that way. So, if you want cache enabled in your system, you need to turn these bits on.
4 - 17
Optimizing C Performance
Attend the 4-day C6000 Optimization Workshop
http://www.ti.com/sc/training
Read:
C6000 Programmers Guide (SPRU198) Cache Memory Users Guide (SPRU656) C6000 Optimizing C Compiler Users Guide (SPRU187)
All the options are detailed in TMS320C6000 Optimizing C Compiler User's Guide. Its highly recommended that you take time to read through the entire manual. OK, we know that reference manuals can be boring (and this one isnt any different) but the information you gain will be worth it.
Also recommended is the TMS320C6000 Programmers Guide. It contains code optimization details for C, Linear Assembly, and standard assembly programming. The TMS320C6000 Compiler Tutorial is an invaluable reference. You can find this excellent resource at the TI website: http://www.ti.com/sc/c6000compiler, or built into Code Composer Studio tutorials.
The Cache Memory User's Guide is an excellent resource for everyone that needs to know more about cache and how to use it in a DSP system. It includes examples on how to make your code go faster in a cache based architecture. Finally, there are several great application notes out on our web site. These notes go into detail about specific subjects to help solve common problems.
4 - 18
LAB 4: Using C
LAB 4: Using C
Lab 4
Build project with Image Correlation function Compare performance between:
1. 2. 3. 4.
Without Optimization With Compilers Optimizer Using IMGLIB Function With and Without Cache
This lab has several goals: Build a project using C source files. Benchmark/profile C code. Contrast results for both sets suggested compiler options: Debug build configuration Release (Optimized) build configuration
Call optimized Assembly routine from a library Examine the effects of cache on this optimized application
4 - 19
LAB 4: Using C
Image Correlation
We're going to use an image correlation algorithm in the lab. Here's some more information on this algorithm for those of us that haven't had much exposure to it.
Search image for pixel location of mask Step through entire image processing each 3x3 pixel block Basically, image correlation involves summing the values of 3x3 matrix multiply between mask and each 3x3 block in the image. The result of each summation is written to an array. The best match is the largest value in the array. 128
(8-bit pixels)
64
64
No Match
128
4 - 20
LAB 4: Using C
64
Partial Correlation
128
64
Better Fit
128
64
Match!
128
4 - 21
LAB 4: Using C
67
If using the C6713 DSK, the target should read, TMS320C67XX You can also use the button to specify the correct path. 2. Verify the newly created project is open in CCS by clicking on the + sign next to the Projects folder in the Project View window.
You may want to expand the lab4.pjt icon to observe the project contents. (Of course, it should be empty at this time.)
4 - 22
LAB 4: Using C
6. Save your CDB file as LAB4.CDB in C:\c60001day\labs\lab4 directory. File Save As 7. Add the following files from C:\c60001day\labs\lab4 to your project:
LAB4.C LAB4.CDB LAB4cfg.CMD
When these files have been added, the project should look like:
4 - 23
LAB 4: Using C
9. Before building, though, we need to add a symbol definition to the Debug configuration.
In this lab we plan to use the hardware timer to benchmark our performance. (This is just one of many ways to do this within CCS). We will use the Chip Support Library (CSL) discussed back in Chapter 2 to setup and use the timer. CSL requires that the chip type is defined in your project so the proper code can be extracted from the library. Use the following three steps to modify your project configuration: Project Build Options Select the Preprocessor category on the left-hand side. Add CHIP_6416 to the Define Symbols text box. It should now look like this:
67
If using the C6713 DSK, the symbol should be: CHIP_6713
4 - 24
LAB 4: Using C
Add -ml3 (small ML3) to the build options by typing it in the box at the top of the build options. You can always add options by simply typing them into the text box, if you already know them.
11. Click OK to apply the changes that you have made and close the Build Options dialog.
Project Build 13. Load the program if this did not happen automatically.
4 - 25
LAB 4: Using C
17. The output prints to the Stdout window in Code Composer Studio. Here is what the output should look like. Note, though that the cycle counts shown are for the C6416 DSK. If you are using the C6713 DSK, your cycle numbers will be different.
Number of data: number of pixels that were calculated. Number of cycles: number of CPU cycles it took to do the correlation. The next line tells us where the template image is at in the original image. The highest correlation found should match the template location if everything worked correctly.
18. Write down the number of cycles needed for this unoptimized C code in the table on page 32. 19. If CCS is still running, please halt it. 20. If you'd like to see the image that you just searched, Code Composer can show it to you with its graphing capability. In order to save you time, we have saved the graph in a workpace. If you'd like to see the image, just load the workspace:
File Workspace Load Workspace Choose the appropriate Workspace in C:\c60001day\labs\lab4. Use the pink data cursor to make sure that the template match is where the correlation algorithm said it was. Note: You can right-click on the image and choose properties if you'd like to see how this was set up. We simply used View Graph Image and filled in the values that you see in the properties box.
4 - 26
LAB 4: Using C
Optimized C
Now that we've seen how long it takes to execute the unoptimized C, let's take a look at how fast this code runs when the optimizer is turned on.
22. We need to make a couple of simple changes to the configuration before we can use it. You'll probably recognize these changes since they are the same one that we made earlier. Open the project build options. Project Build Options
4 - 27
LAB 4: Using C
23. Add the following two options to the build options for the release configuration just like you did earlier. Either use the text box at the top or the GUI at the bottom to add these options: -d"CHIP_6416" and ml3 -d"CHIP_6713" and ml3 Your options should now look like this: (for the C6416 DSK) (for the C6713 DSK)
6416 6713
Note: The biggest difference between Release and Debug is that Release turns on the Optimizing C Compiler with the o3 option and turns off symbolic debugging by removing the g option.
4 - 28
LAB 4: Using C
4 - 29
LAB 4: Using C
So, all you need to do is find all places where corr_3x3 is used in the code and change it to IMG_corr_3x3. You should only need to do two replacements:
29. Now that we have replaced the C function calls with IMGLIB function calls, let's go ahead and comment out the actual corr_3x3 C function. This way, if we've missed any changing any references to it, we should get a compiler error.
4 - 30
LAB 4: Using C
30. The next thing we need to do in order to call the function in the library is to include a header file. Find the comments at the beginning of lab4.c that talk about including the header files for the Image Library and add the following line of code: #include "img_corr_3x3.h" 31. We also need to let CCS know where it can find the above header file. We can do this by adding the following to our build options under Project Build Options. -i"C:\ti\c6400\imglib\include" -i"C:\ti\c6200\imglib\include" (for the C6416 DSK) (for the C6713 DSK)
32. The last step to calling the ASM routine is to actually add the library to your project. Project Add Files to Project 33. In the following dialog box, navigate to the correct folder and the appropriate library file to your project. C:\ti\c6400\imglib\lib\img64x.lib" C:\ti\c6200\imglib\lib\img62x.lib" (for the C6416 DSK) (for the C6713 DSK)
Yes, it may appear odd that we are using the C62x library for the C6713, but since C67x devices can run C62x object code, this works out fine. Hint: You may have to changes the "Files of type" drop-down box at the bottom of the Add Files dialog to see the library files.
35. If your program doesn't automatically load, go ahead and load it. File Load Program 36. When you're ready, run the code to see how fast it executes (and to make sure that it is accurate). 37. Record your results in the table on page 32.
4 - 31
LAB 4: Using C
Lab 4 Results
Here's a summary of the results that we've obtained from lab 4. These are the results with all code and data located in the internal memory of the C6000. In Lab 4a, we'll explore the effects that memory organization and cache can have on this system.
Lab Step
Lab 4 Step 18 Lab 4 Step 27 Lab 4 Step 37
Build Configuration
Debug Release IMGLIB
Cycles
4 - 32
Everything Off-chip
Let's start off by moving everything (code and data) from on-chip memory where it is now, to off-chip memory. We're also going to leave the cache turned off for now to see what the absolute worst case performance of this code might be. 1. We're going to use the Configuration Tool to change the memory configuration. So, open the lab4.cdb file. 2. The .cdb file is broken down into different categories to make it easy to set up. We need to use the Memory Manager which is located in the System folder.
3. You can see the three different kinds of memory that we have available to us on the DSK by clicking on the plus sign next to the Memory Manager. You should see ISRAM, FLASH, and SDRAM for the 6416 DSK, and CACHE_L2, IRAM, and SDRAM for the 6713 DSK.
4 - 33
4. We need to change the properties of the SDRAM so that it can store code and data. Currently, it is only setup to store data. Open the SDRAM properties by right-clicking on it and choosing properties. 5. You should see the box below. Change the "space" option from data only to code/data. When complete, the dialog should look like one of the following:
6416
only
4 - 34
6.
Now we can move everything from ISRAM (or IRAM) to SDRAM. Open the properties of the Memory Manager itself by right-clicking on it and selecting properties.
7. You should now see a window with five tabs. We don't need to do anything with the General tab, so go ahead and select the next tab, BIOS Data. We want to move everything on this tab from ISRAM (or IRAM)to SDRAM by clicking on each one of the drop-down boxes. When you get finished with the BIOS Data tab, the dialog should look like this:
4 - 35
8. Click on the BIOS Code tab and make the same changes. When you are done with this tab, the window should look like this:
9. We also need to make the same change in the Compiler Sections tab. Select this tab and move everything to SDRAM. Here's what you'll be left with:
4 - 36
10. When you create a DSKC6713 CDB file, the cache is enabled automatically (whereas it is disabled by default on the DSKC6416 CDB file.) Therefore, to get an accurate no cache comparison, we need to turn off the cache by making all external memory Non-Cacheable. (C6416 users can ignore these two steps). Open the Config Tool makes this easy.
6713 ONLY
Find and open the Global Settings under the System by right-clicking on it and selecting properties. Select the 621x/671x tab. Make the dialog box look like this:
11. Modify the C6713 MAR bit settings to make external memory non-cacheable:
4 - 37
18. Open the Global Settings properties by right-clicking on it and selecting properties.
4 - 38
19. Modify the C6416 MAR bit settings: Select the 641x tab. Check the check box that says "641x Configure L2 Memory Settings".
6416 ONLY
We need to turn on the MAR bits for the EMIFA, CE0 memory space. Do this by changing the appropriate text box from 0x0000 to 0xffff. It should now look like this:
4 - 39
20. Modify the C6713 MAR bit settings: Select the 621x/671x tab. Check the check box that says "621x/671x Configure L2 Memory Settings". Lets turn on all the MAR bits for the EMIF. Do this by changing the appropriate text box from 0x0000 to 0xffff. It should now look like this:
6713 ONLY
21. When you've made the changes, click OK.
23. If your program doesn't automatically load, go ahead and load it. File Load Program 24. When you're ready, run the code. 25. Record the results of running everything with L1 cache in the table on page 45.
4 - 40
6416 ONLY
Note: This setting is a little confusing. When most people see "4-way cache", they might actually think that the cache is on. Well, it is true, but the one that has cache ways that are 0K in length don't do much cacheing!
4 - 41
29. Save the changes that you have made to the .cdb file by making sure that it is selected and choosing File Save. 30. You should see the following box appear:
6713 ONLY
Click OK when you are done. 34. Save the changes that you have made to the .cdb file by making sure that it is selected and choosing File Save.
4 - 42
36. If your program doesn't automatically load, go ahead and load it. File Load Program 37. When you're ready, run the code. 38. Record the results of running everything with L1 and L2 cache in the table on page 45.
4 - 43
Cache Re-use
Cache memories perform best when the information that they are cacheing is used many times. So far, we've only looked at cache in the worst possible scenario, when the data is only used once. What if we're going to use the code/data again and again, then the cache can really start to help us. To see this, let's call the image correlation twice on the same image and see if the performance improves any on the second call. 39. Open lab4.c. 40. Scroll down to the code where we actually time the image correlation algorithm with the timer. The code should look something like this:
41. Copy the line of code that calls IMG_corr_3x3() and paste it above the line that starts with start (isn't that redundant?). This way, the algorithm get's called and everything gets brought into cache, then we call it again to see what the real benefit of cache would be.
43. If your program doesn't automatically load, go ahead and load it. File Load Program 44. When you're ready, run the code. 45. Record the results of running everything with L1 and L2 cache in the table on page 45.
4 - 44
Lab 4a Results
Here's a summary of the results that we've obtained from lab 4a.
Lab Step
Lab 4a Step 15 Lab 4a Step 25 Lab 4a Step 38 Lab 4a Step 45
Memory Configuration
All Off-chip L1 Cache On L1 and L2 Cache On Everything Already in Cache
Cycles
4 - 45
Optional Topics
Optional Topics
Cache Data Coherency
EDMA
RcvBuf
CPU
EDMA
RcvBuf
CPU
CPU reads the buffer for processing CPU read causes a cache miss in L1D and L2 (Assuming L2 cache is on) RcvBuf is added to both caches
Space is allocated in each cache RcvBuf data is copied to both caches
4 - 46
Optional Topics
EDMA
RcvBuf
CPU
EDMA writes new data to buffer When the CPU reads RcvBuf again, what will happen? CPU gets old data!
Solutions
L1D RcvBuf
RcvBuf:
L2 RcvBuf
EDMA
External
CPU
Whenever L1 or L2 Whenever L1 or L2 are read, the other is are read, the other is checked to make sure checked to make sure there isnt newer data there isnt newer data
2
Invalidate (remove) RcvBuf from cache before receiving new data CSL provides cache invalidate routines
L1D RcvBuf
L2 RcvBuf
External
RcvBuf:
EDMA
RcvBuf
CPU
4 - 47
Optional Topics
Advanced Optimizations
(Other than the techniques discussed here) Let EDMA move data (or code) on-chip before needed
Data is on-chip when its needed EDMA gets better transfer performance than CPU due to its ability to perform burst transfers Minimize back-to-back Reads and Writes to/from off-chip memory
Compiler Intrinsic functions Program Level Optimization: -pm op2 -o3 Various Compiler Pragmas:
#pragma #pragma #pragma #pragma UNROLL(# of times to unroll); MUST_ITERATE(min, max, %); DATA_ALIGN(variable, 2n alignment); DATA_MEM_BANK(var, 0 or 2 or 4 or 6);
4 - 48
Wrap Up
Introduction
What do you need to put around your DSP? Most microprocessors usually require some support chips power management, clock drivers, bus interface, and so on. DSP systems usually contain some additional devices such as sensors, data acquisition, and such because they receive, modify, and output real-world signals. Finally, pull out your DSP Selection Guide and C6000 Product Update sheet to follow along with the last part of the workshop summarizing the C6000 devices, tools, and support
Outline
Chapter Outline
What Goes Around a DSP?
Linear Products Logic Products
5-1
5-2
Chapter Topics
Wrap Up..................................................................................................................................................... 5-1 What goes around a DSP? ...................................................................................................................... 5-4 Linear.................................................................................................................................................. 5-4 Logic................................................................................................................................................... 5-8 C6000 Summary.....................................................................................................................................5-12 Hardware Tools .....................................................................................................................................5-13 Software Tools .......................................................................................................................................5-17 Whats Next?..........................................................................................................................................5-18 Before Leaving ..................................................................................................................................5-22
5-3
DSP
Data Converters
Analog-to-Digital Converters (ADC) Analog input to digital output Output is typically interfaced directly to DSP Digital input to analog output Input interfaces directly to DSP Data converter system Combination of ADC and DAC in single package
CODEC
Power Management
Power Modules complete power solutions Linear Regulators regulated power for analog and digital DC-DC controllers efficient power isolation Battery Management for portable applications Charge Pumps & Boost Converters portable applications Supervisory Circuits to monitor processor supply voltages and control reset conditions Power Distribution controlling power to system components for high efficiency References for data converter circuits
5-4
OP-AMPs
Supply Voltage available? Bandwidth required? (kHz or MHz) What is the input signal? What is the output driving? # of channels needed? Most Important Spec(s)?
DAC Digital
(MSP430/DSP/uP/ FPGA/ASIC)
ADC
Power
POWER Management
Do you build your own power solutions, use modules, or both? What Input Voltage(s) & the source of these voltages (Wall, battery, AC/DC, etc.) What Output Voltage(s), and Output Current(s) do you need? How would you prioritize size, efficiency, and cost? What are the most important parameters in the design? (efficiency, form factor, ripple voltage, tolerance, etc.)
Data Converter/AIC/Codec
Clocking Solution
Clocks
Input frequencies? Output frequencies desired & number of copies necessary Supply voltages available/required? Special needs? (low jitter/jitter cleaner? low part to part skew? etc.)
Resolution? (bits & ask for ENOB!) Speed? (KSPS or MSPS for high speed, KHz or MHz for precision ADCs, uS (settling time) for precision DACs) # of channels needed? What is it interfacing to? (uC/uP/DSP/FPGA/ASIC)
ADC
. . . 01101010
DAC
Power
Clock Circuits
Interface Circuits
Digital Radio
Music Traffic
Weather Stocks
5-5
http://focus.ti.com/docs/tool/toolfolder.jhtml?PartNumber=5-6KINTERFACE
Analog Cards
Single-width Serial-Interface Card
5-6
5-7
Logic
5+ V Logic
Harris now TI
Cypress now TI
3.3 V Logic
AHC AC LV ALVT ALVC ALB LVT
HSTL SSTV
2.5 V Logic
LV AVC LVC
1.8 V Logic
LVC ALVC AVC AUC
ALVC CBTLV
1.5 V Logic
AUC
ALVT
Logic Families
100
GTLP
5V 3.3 V 2.5 V
64
ALVT
BCT 74F
24
ALB
ABT AC/T AHC/T ALB ALVC ALVT AVC AUC BCT CBT CBTLV 74F FCT GTLP HC/T LV LVC LVT LS
Advanced BiCMOS Technology Advanced CMOS Advanced High Speed CMOS Advanced LV BiCMOS Advanced Low Voltage CMOS Adv LV BiCMOS Technology Advanced Very-LV CMOS Advanced Ultra-LV CMOS BiCMOS Technology Cross Bar Technology CBT Low Voltage Technology 74F Bipolar Technology Fast CMOS Technology Gunning Transceiver Logic Plus High Speed CMOS Low Voltage HCMOS Low Voltage CMOS Low Voltage BiCMOS Technology TTL HC/HCT
AC/ACT
ALS
12 8
AVC AUC
AC
AHC/AHCT
CBT CBTLV
AHC
5 10
LV
CD4K
50
15
20
5-8
5V
3.3V 2.5V
LV245 :10 ns LVC4245 :6.3 ns LVCC3245 :6.0 ns LVCC4245 :7.0 ns ALVC164245 :5.8 ns LV245 :15 ns LVC* :4.8 ns LVCC3245 :9.4 ns AVC* :2.5 ns
AUC*
: 5.0 ns
1.8V 0.8V
LVC* :4.8 ns AVC* :4.0 ns * 16245 functions
Little Logic
The Principle Example Single Gate
5 4
SN74AHC1G00DCKR SN74AHCT1G00DBVR
LVC 1G
Dual Gate
00
SN74AHC2G00DCTR SN74AHCT2G00DCUR
YEA
Triple Gate
SN74LVC3G04DCTR SN74LVC3G04DCUR
5-9
AUC
Features The Worlds First 1.8V Logic
1.8V optimized performance VCC Specified @ 2.5V, 1.8, 1.5, 1.2 0.8V typical Balanced Drive 3.6V I/O Tolerance Bushold (II(HOLD)) IOFF Spec for Partial Power-down ESD protection Low noise Second Source agreements Little Logic, Widebus, Octal Device SN74AUC1G00 SN74AUC16244 VCC 1.8 V 1.8 V Drive -8/8 mA -8/8 mA
NEW FAMILY
Advanced Packaging NanoStar - YEA SOT 23 - DBV (Microgate) SC-70 - DCK (PicoGate) TSSOP - PW & DGG TVSOP - DGV LFBGA - GKE & GKF VFBGA - GQL
CHOOSING LOGIC
PRIMARY CONCERN SECONDARY CONCERN
5V
ABT, 74F ABT, 74F ABT, AC/ACT ABT, 74F ABT, 74F ABT ABT, AHC ABT, 74F AHC, ABT ABT, AHC ABT AHC, ABT
3V
2.5V
1.8V
AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC
HIGH DRIVE
ALVT, LVT, ALVC AVC, ALVC, ALVT ALVC, LVT, LVC ALVC, LVT, LVC AVC AVC
HIGH SPEED
ALVT, LVT, ALVC AVC, ALVC, ALVT LVT LVT ALVC,LVT,LVC,LV LVT
ALVC,LVT,LVC, LV,AHC
HIGH DRIVE
LOW NOISE
HIGH SPEED
LVT, ALVC
ALVC,ALVT,LVT,LVC ALVC,LVT,LVC.LV
LOW POWER
5 - 10
TI FIFOs
MEMORY
TMS320 DSP
TI FIFO
Host Interface
Host Bus
5 - 11
C6000 Summary
C6000 Summary
TMS320C6000
Easy to Use
Best C engine to date Efficient C Compiler and Assembly Optimizer DSP & Image Libraries include hand-optimized code eXpressDSP Toolset eases system design
SuperComputer Performance
1.38 ns instruction rate: 720x8 MIPS (1GHz sampled) 2880 16-bit MMACs (5760 8-bit MMACs) at 720 MHz Pipelined instruction set (maximizes MIPS) Eight Execution Unit RISC Topology Highly orthogonal RISC 32-bit instruction set Double-precision floating-point math in hardware
C6000 Roadmap
Object Code Software Compatibility
Multi-core Multi-core Floating Point Floating Point C64x DSP C64x DSP 1.1 GHz 1.1 GHz
2nd Generation
C6414 C6414 C6412 C6412 C6411 C6411
ce t es an ighform H r Pe
1st Generation
C6203 C6201 C6202 C6211
C6713 C6713
C6701
5 - 12
Hardware Tools
Hardware Tools
C6416 / C6713 DSK Contents
DSK Board
DSK Code Composer Studio CD ROM* * DSK version of CCS requires DSK to be connected or CCS cannot startup
Low-cost video interface demo shows how to Low-cost video interface demo shows how to connect an inexpensive 'C6000 DSP to a video connect an inexpensive 'C6000 DSP to a video decoder through a low-cost FPGA. decoder through a low-cost FPGA.
5 - 13
Hardware Tools
XDS560
eXtended Development System (XDS) Industry Standard Connections PCI plugs into PC JTAG plugs into DSP target board Download code up to 500Kbytes/sec Advanced Event Triggering for simple and complex breakpoints Real Time Data Exchange (RTDX) can transfer data at 2Mbytes/sec
Code Composer Studio Automate Code Composer Studio Communicate directly to DSP through RTDX
RTDX
5 - 14
Hardware Tools
Hyperceptions VAB
Easy to use graphical Tool Hierarchical: Can write code graphically (down to ASM level instr.) One worksheet can become block in another worksheet Block/Component Wizard: You can create an optimized VAB bldg block Create XDAIS algorithms If desired, wrap PC interface into standalone EXE Outputs: Directly to DSP Burn program to Flash with single-click Create an .OUT file Create Relocatable Object file (i.e. library) to use in CCS
Capabilities:
DSP program control, memory access, and real time data transfer with RTDX MATLAB automates testing and provides advanced analysis Function call support enables hardware-in-loop simulation and debugging C28x / C5000 / C6000 support Supports XDS560 and XDS510 Integrated with MATLAB design environment for a complete design solution
5 - 15
Hardware Tools
FPGA development system fits standard DSK daughter card sockets Contains Altera FPGA software including power SOPC builder (shown above) After designing and burning FPGA, DSP can talk to FPGA via memory-mapped addresses (SOPC creates C header file) For more info:
http://www.altera.com/products/devkits/altera/kit-dsp_stratix.html
Hardware Tools
For a full list of tools available from TI and its 3rd Parties, please check:
http://dspvillage.ti.com/docs/catalog/devtools/dsptoolslist.jhtml?familyId=132&toolTypeId=6&toolTypeFlagId=2&templateId=5154&path=templatedata/cm/toolswchrt/data/c6000_devbds
5 - 16
Software Tools
Software Tools
eXpress DSP
Target Software
Host Tools
Tools of the Trade
5 - 17
Whats Next?
Whats Next?
Optimizing C Performance
Attend a four-day workshop (see next slide) Review the Compiler Tutorial
See tutorials in CCS online help, or
http://www.ti.com/sc/c6000compiler
Read:
C6000 Programmers Guide (SPRU198) Cache Memory Users Guide (SPRU656) C6000 Optimizing C Compiler Users Guide (SPRU187)
Sign up at:
http://www.ti.com/sc/training
5 - 18
Whats Next?
dspvillage.ti.com
Getting Started Discussion Groups DSP Knowledge Base Third Party Network eXpressDSP Guided Tour
analog.ti.com
Design Resources Technical Documents Solution/Selection Guides
Applications Solutions
Find complete solutions for your application including: DSP, Analog, Boards Target Software, Development tools, third party support
Install Code Composer Studio Free Evaluation Tools (FET) from the Essential Guide to DSP CD Check out the DSP Selection Guide, its your consolidated resource for all pertinent information
5 - 19
Whats Next?
Number
+32 (0) 27 45 55 32 +33 (0) 1 30 70 11 64 +49 (0) 8161 80 33 11 1800 949 0107 (free phone) 800 79 11 37 (free phone) +31 (0) 546 87 95 45 +34 902 35 40 28 +46 (0) 8587 555 22 +44 (0) 1604 66 33 99 +358(0) 9 25 17 39 48
Literature, Sample Requests and Analog EVM Ordering Information, Technical and Design support for all Catalog TI Semiconductor products/tools Submit suggestions and errata for tools, silicon and documents
5 - 20
Whats Next?
C6x-Based Digital Signal Processing by Nasser Kehtarnavaz and Burc Simsek; ISBN 0-13-088310-7 DSP Applications Using C and the TMS320C6x DSK by Rulph Chassaing; ISBN 0471207543
5 - 21
Before Leaving
Before Leaving
Lets Go Home
Thanks for your valuable time today Please fill out an evaluation and let us know how we could improve this class If you purchased a DSK:
Make sure you pack up (or receive) your DSK before leaving If available, you may keep the earbud headphones and audio patch cable
Workshop lab and solutions files will be available via CDROM or the Internet. Please check with your instructor.
5 - 22
Target Attendee
System Integration (data input/output, peripherals, real-scheduling, etc.) Algorithm Development and Optimization
IW6000
OP6000
IW6000 OP6000
C6000 Hardware
CPU CPU Architecture Details CPU Pipeline Details Peripherals C6000 Peripherals Overview Using CSL (Chip Support Library) to program peripherals DMA/EDMA (Direct Memory Access ) Serial Port (McBSP) External Memory Interface (EMIF) Host Port Interface (HPI) XBUS Memory Basic Memory Management Advanced Memory Management Using Overlays Multiple Heaps Via DSP/BIOS C6000 Cache Cache Optimization
+ + + + + + + + + +
Development Tools
Code Composer Studio DSP/BIOS Configuration Tool C6711 DSP Starter Kit (DSK) C6000 Simulator Compiler Options for Optimization Assembly Optimizer Profile Based Compiler (PBC) Absolute Lister Hex6x Utility FlashBurn C6711 Board Support Library (BSL)
IW6000
OP6000
+ + + + + +
IW6000
+ + + + + +
Coding
Building Code Composer Studio Projects Compiler Build Options Running C programs C Coding Efficiency Techniques Writing / Optimizing Assembly Linear Assembly Coding Calling Assembly from C Software Pipelining Techniques Numerical Issues with Fixed Point Processors C Runtime Environment (stack pointer, global pointer, etc.) C Optimization (pragmas and other techniques)
OP6000
+ +
+ + + + + + + + +
System Topics
DSP/BIOS Real-Time Scheduler DSP/BIOS Real-Time Analysis (LOG, STS) Reference Frameworks Double-Buffers For Data Input/Output Creating A Bootable Standalone System (Boot Without Emulator) Programming Flash Memory Interrupt Basics Advanced Interrupt Topics Interruptibility of High-Performance C Code XDAIS ( eXpressDSP Algorithm Standard) Introduction
IW6000
OP6000
+ + + + + + +
The C6000 Integration Workshop (IW6000) is not a prerequisite to this workshop, though if you are looking for a broad introduction to all aspects of building a C6000 based system, the Integration Workshop might be a better choice. On the other hand, if you are evaluating the C6000 CPU architecture or want to learn how to write better C and assembly code for the C6000, this workshop (OP6000) would be the best choice. (Please refer to the C6000 Workshop Comparison for differences between the two workshops.)
Bottom Line:
w If you're main goal is to understand the C6000 architecture and write optimized software for it, then the C6000 Optimization Workshop (OP6000) is the best one to attend. Peripherals and other system foundation software (DSP/BIOS, XDAIS, CSL) are only peripherally mentioned. Many software engineers are tasked with getting their algorithms to run ... and run as fast as possible. This course is well designed to handle these issues. On the other hand, if you need to figure out how to get an entire system working -- from programming the peripherals to get data in/out all the way to burning the Flash memory with your final program -- the C6000 Integration Workshop (IW6000) is the ticket. Along the way you'll be introduced to (and use in lab exercises) many of the TI Software Foundation tools (DSP/BIOS, XDAIS, CSL, BSL, and Reference Frameworks). This is probably the single best course for an engineer/programmer that is new to the C6000 DSP and needs to get a whole system running, as opposed to just optimizing one or two algorithms. Of course, some engineers will need to handle both of these jobs. Get everything running and optimize their software algorithms. In that case, you may want to take both workshops.
Product Info / Tech Support / Literature: Texas Instruments Website: DSP Knowledge Base:
North America support@ti.com or (972) 644-5580 Europe epic@ti.com http://www.ti.com or http://www.dspvillage.com http://www-k.ext.ti.com/sc/technical-support/knowledgebase.htm
C6202 2000/1600 250/200 Prog:256KB (1) Data:128KB 32-bit 52MB (4 CE) 32-bit XBUS Standard (4+1) 3 2 1.8V 3.3V GJL or GLS NOW $110.08 / $94.03 C6711B 900/600 150/100 L1 Prog: 4KB(2) L1 Data: 4KB(2) L2 P/D: 64KB(2) 32-bit 512MB (4 CE) 16-bit HPI Enhanced(4) (16+1+1) 2 2 1.8V 3.3V GFN NOW $30.77 / $21.54
(3)
C6202B 2400/2000 300/250 Prog:256KB (1) Data:128KB 32-bit 52MB (4 CE) 32-bit XBUS Standard (4+1) 3 2 1.5V 3.3V GNY or GNZ NOW $67.14 / $55.95 C6711C 1200 200 L1 Prog: 4KB(2) L1 Data: 4KB(2) L2 P/D: 64KB(2) 32-bit 512MB (4 CE) 16-bit HPI Enhanced(4) (16+1+1) 2 2 1.2V 3.3V GDP NOW (TMX) $21.55
(3)
C6203B 2400/2000 300/250 Prog:384KB (1) Data:512KB 32-bit 52MB (4 CE) 32-bit XBUS Standard (4+1) 3 2 1.5V 3.3V GNY or GNZ NOW $71.62 / $60.43 C6712 600 100 L1 Prog: 4KB(2) L1 Data: 4KB(2) L2 P/D: 64KB(2) 16-bit 512MB (4 CE) ---Enhanced(4) (16+1+1) 2 2 1.8V 3.3V GFN NOW $19.87
(3)
C6204 1600 200 Prog:64KB (1) Data:64KB 32-bit 52MB (4 CE) 32-bit XBUS Standard (4+1) 2 2 1.5V 3.3V GHK or GLW NOW $9.95 / $20.92 C6712C 900 150 L1 Prog: 4KB(2) L1 Data: 4KB(2) L2 P/D: 64KB(2) 16-bit 512MB (4 CE) ---Enhanced(4) (16+1+1) 2 2 1.2V 3.3V GDP NOW (TMX) $14.95
(3)
C6205 1600 200 Prog:64KB (1) Data:64KB 32-bit 52MB (4 CE) 32-bit PCI Standard (4+1) 2 2 1.5V 3.3V GHK NOW $10.74
(3)
C6211B 1336/1200 167/150 L1 Prog:4KB (2) L1 Data:4KB (2) L2 P/D:64KB (2) 32-bit 512MB (4 CE) 16-bit HPI Enhanced (4) (16+1+1) 2 2 1.8V 3.3V GFN NOW $26.93 / $21.54 VC33 (5) 150 / 120 75 / 60 P: 256B cache P/D: 136KB 32-bit 16M x 32 (4 CE) ---C3x DMA(1) 1 (not McBSP) 2 1.8V 3.3V PGE NOW $13.38 / $11.15
* The C6713 DSP can be configured to have up to three serial ports in various McASP/McBSP combinations by not utilizing the HPI. Other configurable serial options include IC and additional GPIO. There are 16 GPIO pins.
www.dspvillage.com
Page 1 of 4
16/32-bit HPI
Enhanced (64)
McBSP
2 Standard
2 Standard
3 standard
16/32-bit HPI or 32-bit 66MHz PCI or 16-bit HPI + EMAC Enhanced (64) 3 20-bit Video Ports (VP) or 1 20-bit VP + 2 10bit VP + 2 McBSP + 1 8-bit McASP ---3 16 1.2 (500MHz) 1.4 (600 MHz) 3.3V GDK/GNZ TMX320DM642 NOW/ 4Q03 $63.08 (TMX)
---H/W Accelerators Timer/ Counters GPIO Core Voltage I/O Voltage Package(s) (9) Part Number TMX / TMS TMS 1,000u (1) (2) (3) (4) (5) (6) (7) (8) (9) ---3 16 1.2 (500MHz) 1.4 (600 MHz) 3.3V GDK/GNZ TMX320C6412G DK NOW / 4Q03 $56.07 (TMX) -------
---3 8 1.2 (500MHz) 1.4 (600 MHz) 3.3V GDK/GNZ TMX320DM641 4Q03 / 1Q04 $45.82 (TMX)
---3 8 1.2 (400MHz) 3.3V GDK/GNZ TMX320DM640 4Q03 / 1Q04 $28.00 (TMX)
Notes:
C6201/C6204/C6205/C6701 internal program memory can be configured as cache or addressable RAM. C6202/C6203 allows 512Kb to be programmed as cache or addressable RAM, the balance is always addressable RAM. L1 data cache and L1 program cache are always configurable as cache memory. L2 is configurable between SRAM and cache memory. DMA has 4 fully configurable channels, plus one dedicated to host for HPI transfers. C6211/C6711/C6712 Enhanced DMA (EDMA) has 16 fully configurable channels. Additionally, there is an independent singlechannel quick DMA (QDMA) and a channel dedicated to the host for HPI transfers. VC33 is an upgrade TIs C3x family. While not a C6000 device, it is part of TIs floating-point family. Each Chip Enable (CE) allows the user to assign a specific memory space. A third timer is on-chip but not pinned-out. Host Port Interface (HPI) is slave-only async host access. Expansion Bus (XBUS) is master/slave async or sync interface; operates in host or FIFO/Memory modes. These devices are Pin-for-Pin compatible: (Note, be aware of voltage differences.) (GJC) C6201/C6701 (GJL, GNZ) C6202/C6203, (GLS, GNY, GLW) C6202/C6203/C6204 (GFN) C6211/C6711/C6712 (GLZ) C6411/C6414/C6415/C6416 (GDP) C6713/C6711C/C6712C (GDK, GNZ) C6412/DM642/DM641/DM640
Packages:
GGP= 35mm x 35mm, 1.27mm ball pitch 352-pin BGA GFN = 27mm x 27mm, 1.27mm ball pitch 256-pin BGA GLS = 18mm x 18mm, 0.8mm ball pitch 384-pin BGA PGE = 20mm x 20mm, 0.5mm pitch, 144-pin TQFP PYP = 28mm x 28mm, 0.5mm pitch, 208-pin PQFP GNY = Same as GLS GDP = 27mm x 27mm, 1.27mm ball pitch, 272-pin BGA GJC GJL GHK GLW GLZ GNZ GDK = = = = = = = 35mm x 35mm, 1.27mm ball pitch, 352-pin BGA 27mm x 27mm, 1.0mm ball pitch 352-pin BGA 16mm x 16mm, 288-pin Star BGA 18mm x 18mm, 340-pin BGA 23mm x 23mm, 0.8mm ball pitch, 532-pin BGA Same as GJL 23mm x 23mm, 0.8mm ball pitch, 548-pin BGA
www.dspvillage.com
Page 2 of 4
Price $1995 $6495 $395 $395 $995 $4500 $395 $4495 $5995 $3495 $1995 $1500 $1995
PCI-bus JTAG Scan-Based Emulator $3995 C6416 Test Evaluation Board Only * C6416 TEB $1995 Planned EOL for this product replaced by 6416 DSK C6416 Test Evaluation Board bundled with CCS & Spectrum TMDX3260E6416 Digital 510PP+ * Planned EOL for this product replaced by C6416 TEB Bundle $3995 TMDX3260E6416E 6416 DSK rd Additional hardware development tools are provided by TIs large assortment of Third Parties. See the 3 Party resource link below. * E is European version CCS only works with the DSK. Does not include simulation and has 256K word program space memory limitation. CCS only works with the DSK. Does not include simulation however there is no memory limitation. Full version of CCS.
$599
* Specific upgrades to Code Composer Studio available to users with a current registration for previous versions of TI TOOL LINKS
www.dspvillage.com
Page 3 of 4
GENERAL
TMS320C6000 Technical Brief TMS320C64x Technical Overview TMS320C6711C Migration Document
NUMBER
SPRU197d SPRU395b SPRA837
REVISED
02/1999 01/2001 08/2002
LOCATION
http://www-s.ti.com/sc/psheets/spru197d/spru197d.pdf http://www-s.ti.com/sc/psheets/spru395b/spru395b.pdf http://www-s.ti.com/sc/psheets/spra837/spra837.pdf
NUMBER
SPRU189f SPRZ168b SPRU190d SPRU653 SPRU600a SPRU041b SPRU175a SPRU233a
REVISED
11/2000 08/2001 02/2001 02/3002 12/2002 05/2003 10/2002 04/2003
LOCATION
http://www-s.ti.com/sc/psheets/spru189f/spru189f.pdf http://www-s.ti.com/sc/psheets/sprz168b/sprz168b.pdf http://www-s.ti.com/sc/psheets/spru190d/spru190d.pdf http://focus.ti.com/lit/ug/spru653/spru653.pdf http://focus.ti.com/lit/ug/spru600a/spru600a.pdf http://focus.ti.com/lit/ug/spru041b/spru041b/pdf http://foucs.ti.com/lit/ug/spru175a/spru175a.pdf http://focus.ti.com/lit/ug/spru233a/spru233a.pdf
NUMBER
SPRU198g SPRU187i SPRU186i SPRU328b SPRU301c SPRU303b SPRU403d SPRU424b SPRU360b SPRU224 SPRU401d
REVISED
08/2002 04/2001 04/2001 02/2000 02/2000 05/2000 12/2001 01/2002 03/2002 01/1997 04/2002
LOCATION
http://www-s.ti.com/sc/psheets/spru198g/spru198g.pdf http://www-s.ti.com/sc/psheets/spru187i/spru187i.pdf http://www-s.ti.com/sc/psheets/spru186i/spru186i.pdf http://www-s.ti.com/sc/psheets/spru328b/spru328b.pdf http://www-s.ti.com/sc/psheets/spru301c/spru301c.pdf http://www-s.ti.com/sc/psheets/spru303b/spru303b.pdf http://www-s.ti.com/sc/psheets/spru403d/spru403d.pdf http://www-s.ti.com/sc/psheets/spru424b/spru424b.pdf http://www-s.ti.com/sc/psheets/spru360b/spru360b.pdf http://www-s.ti.com/sc/psheets/spru224/spru224.pdf http://www-s.ti.com/sc/psheets/spru401d/spru401d.pdf
NUMBER
REVISED
LOCATION
C6201 Data Sheet SPRS051g 11/2000 http://www-s.ti.com/sc/ds/tms320c6201.pdf C6202 Data Sheet SPRS104c 08/2002 http://www-s.ti.com/sc/ds/tms320c6202.pdf C6203B Data Sheet SPRS086g 08/2002 http://www-s.ti.com/sc/ds/tms320c6203b.pdf C6204 Data Sheet SPRS152a 06/2001 http://www-s.ti.com/sc/ds/tms320c6204.pdf C6205 Data Sheet SPRS106c 06/2001 http://www-s.ti.com/sc/ds/tms320c6205.pdf C6211/C6211B Data Sheet SPRS073f 09/2001 http://www-s.ti.com/sc/ds/tms320c6211.pdf C6701 Data Sheet SPRS067e 05/2000 http://www-s.ti.com/sc/ds/tms320c6701.pdf C6711/C6711B/C6711C Data Sheet SPRS088c 10/2002 http://www-s.ti.com/sc/ds/tms320c6711.pdf C6712/C6712C Data Sheet SPRS148a 10/2002 http://www-s.ti.com/sc/ds/tms320c6712.pdf C6713 Data Sheet SPRS186 12/2001 http://www-s.ti.com/sc/ds/tms320c6713.pdf C6411 Data Sheet SPRS196 03/2002 http://www-s.ti.com/sc/ds/tms320c6411.pdf C6414 Data Sheet SPRS134c 09/2001 http://www-s.ti.com/sc/ds/tms320c6414.pdf C6415 Data Sheet SPRS146c 09/2001 http://www-s.ti.com/sc/ds/tms320c6415.pdf C6416 Data Sheet SPRS164c 09/2001 http://www-s.ti.com/sc/ds/tms320c6416.pdf DM642 Data Sheet SPRS200a 04/2003 http://www-s.ti.com/sc/ds/tms320dm642.pdf VC33 Data Sheet SPRS087b 07/2002 http://www-s.ti.com/sc/ds/tms320vc33.pdf (*) For Military C6000 information and data sheets, please visit: http://www.ti.com/sc/docs/products/military/processr/index.htm
Workshops
C6416/C6713 One-Day Workshop C6000 Integration Workshop (IW6000) C6000 Optimization Workshop (OP6000) DSP/BIOS Design Workshop
Length
1 day 4 days 4 days 4 days
ADDITIONAL ONLINE RESOURCES TI Monthly DSP Customer Technology Webcasts: FTP Site: ftp://ftp.ti.com/mirrors/tms320bbs http://www.ti.com/sc/webcasts TI & ME Online Sample Requests https://www-a.ti.com/apps/ti_me/ti_me.asp Tech Online University: Software Upgrades & Registration / Hardware Repair & Upgrades (972) 293-5050 / (281) 274-2285 http://www.ti.com/sc/docs/training/techonline.htm C6000 Platform Benchmarks: http://www.ti.com/sc/docs/products/dsp/c6000/benchmarks/index.htm Network Video Developers Kit (NVDK): Data Converters and Power Solutions http://www.ti.com/sc/docs/msp/dsps.htm http://ti-training.com/courses/coursedescription.asp?iCSID=1250 www.dspvillage.com Page 4 of 4
ONLINE TRAINING