C 60001 Day

ti
TMS320C6416/C6713 DSK One-Day Workshop

Student Guide
T TO
Technical Training Organization
Revision 3.1 August 2003
Notice
Creation of derivative works unless agreed to in writing by the copyright owner is forbidden. No portion of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission from the copyright holder. Texas Instruments reserves the right to update this Guide to reflect the most current product information for the spectrum of users. If there are any differences between this Guide and a technical reference manual, references should always be made to the most current reference manual. Information contained in this publication is believed to be accurate and reliable. However, responsibility is assumed neither for its use nor any infringement of patents or rights of others that may result from its use. No license is granted by implication or otherwise under any patent or patent right of Texas Instruments or others.
Revision History
October 1999 November 1999 November 2000 March 2001 August 2001 June 2003 August 2003 Version 1.0 Version 1.1 Version 2.0 Version 2.1 Version 2.2 Version 3.0 Version 3.1 (C6211 DSK) (C6711 DSK) (CCS v2.0) (C6416/C6713 DSK and CCS v2.2)
Copyright 1999-2003 by Texas Instruments, Incorporated. All rights reserved. For more information on TI Semiconductor Products, please call the SC Product Information Center at (972) 644-5580 or email them at support@ti.com.
C6416/C6713 DSK One-Day Workshop

Introduction
The C6000 One-Day Workshop introduces you to the C6000 architecture, peripherals, and tools. In one day, we cant make you a C6000 expert, though after this workshop you should be able to:
Recognize the various peripherals of the C6000 family and determine which peripherals are on given device. Describe the basic capabilities of the EDMA, McBSP, and HPI peripherals. Enable the double-level cache. Explain the concept of Software Pipelining and its value. Create a CCS project; build and run a C program Explain the mv, -g, -gp, and o compiler options. Choose settings for debug and code optimization. Use the TI Software Foundation libraries (CSL, BIOS, DSPLIB, IMGLIB) in a CCS project. List the benefits of TIs system software tools and standards (XDIAS, IOM, RF). Install the C6416 (or C6713) DSK hardware and software; and, run the included diagnostic utility.
Along with the Welcome introduction, this course consists of four chapters as outlined below. Each chapter concludes with a lab exercise, giving you the opportunity to observe and practice the topics discussed in class.
Workshop Outline
Welcome Introduction to C6000 and Code Composer Studio (CCS) Using C6000 Peripherals eXpressDSP TIs System Solution Optimizing C6000 Code
0-1
Agenda
Chapter Topics
C6416/C6713 DSK One-Day Workshop.................................................................................................. 0-1 Agenda .................................................................................................................................................... 0-3 Please Introduce Yourself ....................................................................................................................... 0-4 TI DSP and C6x Family Positioning ..................................................................................................... 0-5 Applications / System Needs .............................................................................................................. 0-5 TI DSP Families ................................................................................................................................. 0-6 C6000 Roadmap...................................................................................................................................... 0-8 For More Information and Support ........................................................................................................ 0-9 Key C6000 Literature ............................................................................................................................0-10 For Information about Digital Signal Processing .............................................................................0-11 Textbooks on using the C6000 ......................................................................................................0-11 and finally, Workshops from TI .........................................................................................................0-12
0-2
Agenda
Agenda
Todays Agenda
8:30 - 9:00 9:00 - 11:00 11:00 - 1:00
Welcome Intro to C6000 and CCS

Lab 1: Generate and Graph a Sine Wave Algorithm
Using C6000 Peripherals

Lab 2: Output Sine tone via the DSKs Audio Codec (Break for lunch during this lab)
1:00 - 2:15
eXpressDSP TIs System Solution

Demo: Examine eXpressDSP using TI Reference Framework 3 (IOM drivers, XDAIS algorithms, and Real-Time Analysis)
2:15 - 4:00
Optimizing C6000 Code Wrap Up
Lab 4: Optimize Image Correlation routine Using C Optimizer, DSP Image Library, and On-chip Cache
4:00 - 4:30
As noted above, you will have lunch sometime during Chapter 2. Your facilitator will provide breaks as needed throughout the day.
0-3
Please Introduce Yourself
Please Introduce Yourself

The following questions will help your facilitator better understand the general level of understanding for the class.
Introduce Yourself
A Show of Hands...
Do you have experience with:
TI DSPs (TMS320) Another DSP Other microprocessors
Will you use C, Assembly, or both Who has used an OS or RTOS? Which C6000 DSP do you plan to use?
The two acronyms from above: OS: Operating System RTOS: Real-Time Operating System
While most engineers have used an operating system (e.g. Mac OS, Windows, Unix), many embedded system designers have never developed an application that included an operating system. Understanding the groups level of OS knowledge may assist your facilitator during the eXpressDSP chapter and demo.
0-4
TI DSP and C6x Family Positioning

Applications / System Needs
DSP systems today face a host of system needs:
System Considerations
Interfacing Size Performance Power
Programming Interfacing Debugging
Ease-of Use
Cost
Integration
Memory Peripherals
Device cost System cost Development cost Time to market
These needs challenge the designer with a series of tradeoffs. For example, while performance is important in a portable MP3 player, more important would be efficiency of power dissipation and board space. On the other hand, a cellular base station might require higher performance to maximize the number of channels handled by each processor. Wouldnt it be nice if the fastest DSP consumed the lowest amount of power? While TI is working on providing this (and making it software compatible), our goal is to provide you with a broad assortment of DSP families to cover a varying set of system needs. Think of them as different shoes for different chores
0-5
TI DSP Families
TI provides a variety of DSP families to handle the tradeoffs in system requirements.
Different Needs? Multiple Families

(C62x/64x/67x)
C6000
C5000
(C54x/55x/OMAP) (C20x/24x/28x) C1x C2x
C3x C4x C8x
C2000
C5x
Efficiency
Best MIPS per Watt / Dollar / Size Wireless phones Internet audio players Digital still cameras Modems Telephony VoIP
Max Performance with Best Ease-of-Use

Multi Channel and Multi Function App's Wireless Base-stations DSL Imaging & Video Home Theater Performance Audio Multi-Media Servers Digital Radio
Lowest Cost
Control Systems Segway Motor Control Storage Digital Ctrl Systems
The TMS320C2000 (C2000) family of devices is well suited to lower cost, microcontrolleroriented solutions. They are well suited to users who need a bit more performance than todays microcontrollers are able to provide, but still need the control-oriented peripherals and low cost. The C5000 family is the model of processor efficiency. While they boast incredible performance numbers, they provide this with just as incredible low power dissipation. No wonder they are the favorites in most wireless phones, internet audio, and digital cameras (just to name a few). Rounding out the offerings, the C6000 family provides the absolute maximum performance offered in DSP. Couple this with its phenomenal C compiler and you have one fast, easy-toprogram DSP. When performance and/or time-to-market counts, the C6000 is the family to choose. It also happens to be the family this course was designed around, thus, the rest of the workshop will focus on it.
0-6
TI DSP Platforms
C2000 DSP
TM
TM
TI DSP Platforms
C5000 DSP
TM
C28x DSP
In one of 2001s Most Innovative Products
Segway: Human Transporter
Worlds most code-efficient DSP Heart of advanced embedded control applications Hard Disk Drive Servo Control Digital Motor Control in White Goods HVAC Motor Control Un-interruptible Power Supply PFC Optical Lasers Leadership integration of analog and high speed Flash memory Tens of millions shipped to thousands of customers
C55xTM DSP
EDN 2000 Finalist DSP Product of the Year 2001 Internet Telephony Best DSP Microprocessor Report
Worlds most power-efficient DSP Worlds most popular DSP ISA Hundreds of millions shipped to thousands of customers Heart of handheld solutions for the Internet era Wireless terminals and OMAP Digital Still Cameras Internet Audio players VoIP New generation C55x DSP fully code compatible
TM
47 Products AUP: $3 - $15
New generation C28x DSP products fully code compatible

TM
92 Products AUP: $5 - $120
TI DSP Platforms
C6000TM DSP
2001 Innovation of the Year
EDN Magazine
C64xTM DSP
Worlds highest-performance DSP Shipping at 720MHz Sampled at 1GHz Heart of solutions for new, high-bandwidth communications and video equipment Wireless basestations and transcoders DSL Home theater audio IBOC digital radio Imaging and video servers & gateways Millions shipped to hundreds of customers
Best DSP of 2001

InStat/MicroDesign Resources
13 Products AUP: $9.95 - $250
New generation C64x DSP products fully code compatible
0-7
C6000 Roadmap
C6000 Roadmap
The C6000 family has grown considerably over the past few years. With the addition of the 2nd generation of devices (C64x) a couple of years ago, and with the recent announcement of the upcoming 1GHz performance, the C6000 family dominates the high-end DSP market.
C6000 Roadmap
Object Code Software Compatibility
Multi-core Multi-core Floating Point Floating Point C64x DSP C64x DSP 1.1 GHz 1.1 GHz
2nd Generation
C6414 C6414 C6412 C6412 C6411 C6411
ce t es an ighform H r Pe
C6416 C6416 C6415 C6415 DM642 DM642
1st Generation
C6203 C6201 C6202 C6211
C6713 C6713
C6204 C6205 C6711 C6712
C6701
C62x: Fixed Point C62x: Fixed Point C67x: Floating Point C67x: Floating Point
TMS320C6000
Easy to Use
Best C engine to date Efficient C Compiler and Assembly Optimizer DSP & Image Libraries include hand-optimized code eXpressDSP Toolset eases system design
SuperComputer Performance
1.38 ns instruction rate: 720x8 MIPS (1GHz sampled) 2880 16-bit MMACs (5760 8-bit MMACs) at 720 MHz Pipelined instruction set (maximizes MIPS) Eight Execution Unit RISC Topology Highly orthogonal RISC 32-bit instruction set Double-precision floating-point math in hardware
Fix and Float in the Same Family

C62x Fixed Point C64x 2nd Generation Fixed Point C67x Floating Point
Even with its growing family of devices, the ease of design with the C6000 architecture has not been abandoned. Software compatibility is addressed by the architecture, rather than by the hardwork of the programmer. With both the C67x and C64x devices being able to run C62x object code, upgrading DSP designs is much easier.
0-8
For More Information and Support
For More Information and Support

For support we suggest you try TIs web site first. Then call your local support either your local TI representative or Authorized Distributor Sales/FAE. Finally, here are a few other places to go:
For More Information . . .

Internet
Website: http://www.ti.com http://www.dspvillage.com FAQ: http://www-k.ext.ti.com/sc/technical_support/knowledgebase.htm Device information my.ti.com Application notes News and events Technical documentation Training Enroll in Technical Training: http://www.ti.com/sc/training
USA - Product Information Center ( PIC )

Phone: 800-477-8924 or 972-644-5580 Email: support@ti.com Information and support for all TI Semiconductor products/tools Submit suggestions and errata for tools, silicon and documents
European Product Information Center (EPIC)

Web: http://www-k.ext.ti.com/sc/technical_support/pic/euro.htm Phone: Language
Belgium (English) France Germany Israel (English) Italy Netherlands (English) Spain Sweden (English) United Kingdom Finland (English)
Number
+32 (0) 27 45 55 32 +33 (0) 1 30 70 11 64 +49 (0) 8161 80 33 11 1800 949 0107 (free phone) 800 79 11 37 (free phone) +31 (0) 546 87 95 45 +34 902 35 40 28 +46 (0) 8587 555 22 +44 (0) 1604 66 33 99 +358(0) 9 25 17 39 48
Fax: All Languages Email: epic@ti.com
+49 (0) 8161 80 2045
Literature, Sample Requests and Analog EVM Ordering Information, Technical and Design support for all Catalog TI Semiconductor products/tools Submit suggestions and errata for tools, silicon and documents
0-9
Key C6000 Literature
Key C6000 Literature

Here is a brief summary of the C6000 manuals available from TI.
Key C6000 Manuals

Hardware
SPRU189 SPRU190 SPRZ122 SPRU401 SPRU609 SPRU610 SPRU656
CPU and Instruction Set Ref. Guide Peripherals Ref. Guide

SPRU190 Manual Update Sheet (important!)
Peripherals Chip Support Lib. Ref. C67x Two-Level Internal Memory Reference C64x Two-Level Internal Memory Reference Cache Memory Users Guide
Software
SPRU198 SPRU423 SPRU403
- Programmers Guide - C6000 DSP/BIOS Users Guide - C6000 DSP/BIOS API Guide - Assembly Language Tools Users Guide - Optimizing C Compiler Users Guide
Code Generation Tools

SPRU186 SPRU187
Please check the website for the latest versions of these and for additional manuals and applications notes.
0 - 10
For Information about Digital Signal Processing
For Information about Digital Signal Processing

Looking for Literature on DSP?
A Simple Approach to Digital Signal Processing by Craig Marven and Gillian Ewers; ISBN 0-4711-5243-9 DSP Primer (Primer Series) by C. Britton Rorabaugh; ISBN 0-0705-4004-7 A DSP Primer : With Applications to Digital Audio and Computer Music by Ken Steiglitz; ISBN 0-8053-1684-1 DSP First : A Multimedia Approach James H. McClellan, Ronald W. Schafer, Mark A. Yoder; ISBN 0-1324-3171-8
Textbooks on using the C6000

Looking for Literature on C6000 DSP?
Digital Signal Processing Implementation using the TMS320C6000TM DSP Platform by Naim Dahnoun; ISBN 0201-61916-4
C6x-Based Digital Signal Processing by Nasser Kehtarnavaz and Burc Simsek; ISBN 0-13-088310-7 DSP Applications Using C and the TMS320C6x DSK by Rulph Chassaing; ISBN 0471207543
0 - 11
and finally, Workshops from TI
and finally, Workshops from TI

DSP Workshops Available from TI
Attend another workshop:
4-day C2000 Workshops 4-day C5000 Integration Workshops 4-day C6000 Integration Workshop 4-day C6000 Optimization Workshop 4-day DSP/BIOS Workshop 4-day OMAP Software Workshop 1-day versions of these workshops 1-day Reference Frameworks and XDAIS
Sign up at:
http://www.ti.com/sc/training
C6000 Workshop Comparison

Audience Algorithm Coding and Optimization System Integration (data I/O, peripherals, real-scheduling, etc.) C6000 Hardware CPU Architecture & Pipeline Details Using Peripherals (EDMA, McBSP, EMIF, HPI, XBUS) Tools Compiler Optimizer, Assembly Optimizer, Profiler, PBC CSL, Hex6x, Absolute Lister, Flashburn, BSL Coding & System Topics C Performance Techniques, Adv. C Runtime Environment Calling Assembly From C, Programming in Linear Asm Software Pipelining Loops DSP/BIOS, Real-Time Analysis, Reference Frameworks Creating a Standalone System (Boot), Programming DSK Flash IW6000 OP6000
You can find a more complete comparison between the two workshops in the Appendix of this book.
0 - 12
Intro to C6000 and CCS

Introduction
This chapter begins with a detailed look at the C6000 architecture. First, looking at how the C6000 generally fits into a system down to the various functional units that make up the core CPU. Using a technique called software pipelining, we can visualize how all the functional units might be able to work in parallel to achieve massive numerical performance. The second part of this chapter introduces Code Composer Studio (CCS). After a CCS overview, we quickly examine CCS projects, C build options, and point out the DSP/BIOS Configuration Tool. In Lab 1, you will use CCS to build and graph a sine-wave.
Outline
Outline
C6000 Overview C6000 Parallelism CCS Overview Lab: Build and Graph a Sinewave Optional Topics
CCS Automation CPU Architecture Detail C6000 Instruction Sets Benchmarks
C6416/C6713 DSK One-Day Workshop - Intro to C6000 and CCS
1-1
Connecting to a C6000 Device
Chapter 1 Topics
Intro to C6000 and CCS ........................................................................................................................... 1-1 Connecting to a C6000 Device ............................................................................................................... 1-3 Looking into the C6000 Device............................................................................................................... 1-8 Looking at the C6000 CPU..................................................................................................................... 1-9 What is Digital Signal Processing (DSP)?.......................................................................................... 1-9 C6000 Core CPU Architecture ..........................................................................................................1-10 C62x vs. C67x vs. C64x ....................................................................................................................1-11 Using the CPUs Parallelism.................................................................................................................1-12 C62x Compiled Code ........................................................................................................................1-13 C67x Compiled Code ........................................................................................................................1-14 C64x Compiled Code ........................................................................................................................1-15 How many MMACs is that?.............................................................................................................1-16 How can we get such parallelism?.....................................................................................................1-17 Software Pipelining ...........................................................................................................................1-18 DSP Tools Overview ..............................................................................................................................1-20 C6000 DSKs.....................................................................................................................................1-20 Code Composer Studio (CCS)...........................................................................................................1-22 CCS Projects......................................................................................................................................1-26 DSP/BIOS Configuration Tool..........................................................................................................1-29 Lab Preparation.....................................................................................................................................1-30 C64x or C67x Exercises? ..................................................................................................................1-30 Prepare Lab Workstation.......................................................................................................................1-31 Computer Login.................................................................................................................................1-31 Connecting the C6416 DSK to your PC ............................................................................................1-31 Testing Your Connection...................................................................................................................1-31 CCS Setup .........................................................................................................................................1-32 Set CCS Customize Options...........................................................................................................1-36 LAB 1: Using Code Composer Studio....................................................................................................1-40 Sine Generation Algorithm................................................................................................................1-41 Take Home Exercises (Optional).......................................................................................................1-54 Lab1a Customize CCS....................................................................................................................1-54 Lab1b - Using GEL Scripts ...............................................................................................................1-57 Lab1c Using Printf..........................................................................................................................1-60 Lab1d Fixed vs Floating Point........................................................................................................1-61 Lab1e Explore CCS Scripting ........................................................................................................1-63 Lab Debrief........................................................................................................................................1-63 Optional Topics......................................................................................................................................1-64 Optional Topic: CCS Automation .....................................................................................................1-64 Optional Topic: CPU Architecture Details .......................................................................................1-68 Assorted C6000 Benchmarks ............................................................................................................1-78
1-2

C6000 devices contain a variety of peripherals to allow easy communication with off-chip memory, co-processors, and other devices. The diagram below provides a quick overview:
Example C6000 System

Switches Lamps Latches FPGA Etc. Reset NMI Ext Interrupts
2 / 2
/ /
Timer / Counters GPIO HWI PCI HPI
VCP TCP
PLL Utopia 2 McASP McBSP

/
Clockin Clockout Clockoutx

8
/
0-16+
C6000 CPU
ATM Audio Codec Serial Codec Ethernet

(TCP/IP stack avail)
4
/
3 / 3 3 / 3
/
PCI Host P
EDMA
Boot Loader
32
16 or 32
EMIF
EMAC
\ 16, 32, or 64-bits
EPROM
SDRAM
Sync SRAM
Note: Not all C6000 devices have all the various peripherals shown above. Please refer to the C6000 Product Update for a device-by-device listing.
Lets quickly look at each of these connections beginning with VCP/TCP and working counterclockwise around the diagram.
Viterbi Coprocessor (VCP)

Used for 3G Wireless applications Supports >500 voice channels at 8 kbps Programmable decoder parameters include constraint length, code rate, and frame length Available on the C6416
Turbo Coprocessor (TCP)

Used for 3G Wireless applications Supports 35 data channels at 384 kbps 3GPP / IS2000 Turbo coder Programmable parameters include mode, rate and frame length Available on the C6416
1-3
Timer / Counters
Two (or three) 32-bit timer/counters Use as a Counter (counting pulses from input pin) or as a Timer (counting internal clock pulses) Can generate: Interrupts to CPU Events to DMA/EDMA Pulse or toggle-value on output pin
Each timer/counter as both input and output pin
General Purpose Input/Output (GPIO)

Observe or control the signal of a single-pin Dedicated GPIO pins on C6713 and all C64x devices All C6000 devices have shared GPIO with unused peripheral pins
Hardware Interrupts (HWI)

Allows synchronization with outside world: Four configurable external interrupt pins One Non-Maskable Interrupt (NMI) pin Reset pin
C6000 CPU has 12 configurable interrupts. Some of the properties that can be configured are: Interrupt source (for example: Ext Int pin, McBSP receive, HPI, etc.) Address of Interrupt Service Routine (i.e. interrupt vector) Whether to use the HWI dispatcher Interrupt nesting
The DSP/BIOS HWI Dispatcher makes interrupts easy to use
Parallel Peripheral Interface

C6000 provides three different parallel peripheral interfaces; the one you have depends upon which C6000 device you are using (see C6000 Product Update for which device has which interface) Allows another processor access to C6000s memory using a dedicated, async 16/32-bit bus; where C6000 is slave-only to host.
HPI:
XBUS: Similar to HPI but provides but adds: 32-bit width, Master or slave modes, sync modes, and glueless I/O interface to FIFOs or memory (memory I/O can transfer up to full processor rates, i.e. single-cycle transfer rate). PCI: Standard master/slave 32-bit PCI interface (latest devices e.g. DM642 now allow 66MHz PCI communication)
1-4
Direct Memory Access (DMA / EDMA)

EDMA stands for the Enhanced DMA (each C6000 has either a DMA or EDMA) Transfers any set of memory locations to any another (internal or external) Allows synchronized transfers; that is, they can be triggered by any event (i.e. interrupt) Operates independent of CPU 4 / 16 / 64 channels (sets of transfer parameters) (various by C6000 device type) If you are not using the DMA/EDMA, youre probably not getting the full performance from your C6000 device. Offers four fully configurable channels (additional channel for the HPI), Event synchronization, Split mode for use with McBSP, and Address/count reload Enhanced DMA (EDMA) offers 16 fully configurable channels (64 channels on C64x devices), Event synchronization, Channel linking, and Channel autoinitialization.
DMA: EDMA:
Boot Loader
After reset but before the CPU begins running code, the Boot Loader can be configured to either: Automatically copy code and data into on-chip memory Allow a host system (via HPI, XBUS, or PCI) to read/write code and data into the C6000s internal and external memory Do nothing and let the CPU immediately begin execution from address zero
Boot mode pins allow configuration Please refer to the C6000 Peripherals Guide and each devices data sheet for the modes allowed for each specific device.
External Memory Interface (EMIF)

EMIF is the interface between the CPU (or DMA/EDMA) and the external memory and provides all of the required pins and timing to access various types of memory. Glueless access to async or sync memory Works with PC100 SDRAM cheap, fast, and easy! (more recent designs now allow use of PC133 SDRAM) Byte-wide data access C64x devices have two EMIFs (16-bit and 64-bit width) 16, 32, or 64-bit bus widths (please check the specifics for your device)
1-5
Ethernet
10/100 Ethernet interface To conserve cost, size and power Ethernet pins are muxed with PCI (you can use one or the other) Optimized TCP/IP stack available from TI (under license)
Multi-Channel Buffered Serial Port (McBSP)

Commonly used to connect to serial codecs (codec: combined A/D and D/A devices), but can be used for any type of synchronous serial communication Two (or three) synchronous serial-ports Full Duplex: Independent transmit and receive sections (each can be individually syncd) High speed, up to 100 Mb/sec performance Supports: SPI mode AC97 codec interface standard Supports multi-channel operation (T1, E1, MVIP, ) And many other modes Software UART available for most C6000 devices (Check the DSP/BIOS Drivers Developer Kit (DDK))
McASP
All McBSP features plus more Targeted for multi-channel audio applications such as surround sound systems Up to 8 stereo lines (16 channels) supported by 16 serial data pins configurable as transmit or receive Throughput: 192 kHz (all pins carrying stereo data simultaneously) Multi-pin IIS for audio interface Multi-pin DIT for digital interfaces Multi-pin IIS for audio interface
Transmit formats:
Receive format: Available on C6713 and DM642 devices.
Utopia
For connection to ATM (async transfer mode) Utopia 2 slave interface 50 MHz wide area network connectivity Byte wide interface Available on C64x devices
1-6
PLL
On-chip PLL provides clock multiplication. The C6000 family can run at one or more times the provided input clock. This reduces cost and electrical interference (EMI). Clock modes are pin configurable. On most devices, along with the Clock Mode (configuration) pins, there are three other clock pins: CLKIN: clock input pin CLKOUT: clock output from the PLL (multiplied rate)
CLKOUT2: a reduced rate clockout. Usually or less of CLKOUT Please check the datasheet for the pins, pin names, and CKKOUT2 rates available for your device. Here are the PLL rates for a sample of C6000 device types: Device Clock Mode Pins PLL Rate
C6201 C6204 C6205 C6701 C6202 C6203 C6211 C6711 C6712 C6414 C6415 C6416 CLKMODE CLKMODE0 CLKMODE1 CLKMODE2 CLKMODE CLKMODE0 CLKMODE1 x1, x4 x1, x4, x6, x7, x8, x9, x10, x11 x1, x4 x1, x6, x12
Power Down
While not shown in the previous diagram, the C6000 supports power down modes to significantly reduce overall system power.
For more detailed information on these peripherals, refer to the C6000 Peripherals Guide.
1-7
Looking into the C6000 Device
Looking into the C6000 Device

Going from the system connection to looking inside the C6000 device, along with each of the peripherals just discussed, we find the CPU and a number of internal busses. As an example, here is an internal view of the C6414 device:
C6415 DSP (720MHz)

1064 MB/s 266 MB/s 12.5 MB/s 12.5 MB/s 100 MB/s 12.5 MB/s 133 MB/s EMIF 64 EMIF 16 McBSP 0 McBSP 1
or
Enhanced DMA Controller (64 channels)
L1P Cache 11.5 GB/s L2 Memory 23 GB/s
C64x CPU Core

TM
2.9 GB/s
Utopia 2 Utopia 2 McBSP 2 HPI / PCI JTAG RTDX
5760 MIPS 11.5 GB/s 11.5 GB/s L1D Cache Timer 1 Timer 2
Power Down Logic
PLL
Timer 0
From this diagram notice two things: Dual-level memory (this will be discussed further in Chapter 4): L1 (level 1) program and data caches L2 (level 2) combined program/data memory Buses as large as 64- and 256-bits allow an enormous amounts of info to be moved Multiple buses allow simultaneous movement of data in a C6000 system Both the EDMA and CPU can orchestrate moving information
High-performance, internal buses
Note: While we have looking into the C6414, you can extrapolate these same concepts to other C6000 device types. All device types have multiple, fast, internal buses. Most have a dual-level memory architecture, while a few have a single-level, flat memory.
1-8
Looking at the C6000 CPU

Before looking at the make-up of the C6000 CPU, lets quickly review, What is DSP?. This will allow us to better see how/why the CPU was designed in a specific way.
What is Digital Signal Processing (DSP)?

If asked, engineers come up with many definitions for DSP. Some include:
Converting a signal from analog to digital, then processing it using software. Numerical analysis of signals. Performing traditional signal processing algorithms (filters, FFTs, etc.). Processing data within a time-limited duration; that is, executing code under real-time constraints. And so on
What Problem Are We Trying To Solve?

ADC x
DSP
DAC
Digital sampling of an analog signal:

A
Most DSP algorithms can be expressed with MAC:

count
Y =
i = 1
coeffi * xi
for (i = 0; i < count; i++){ sum += c[i] * x[i]; }
Practically, its probably all of them. To summarize, you might say DSP is the processing of digital signals numerically, usually with real-time constraints. Looking more carefully at the algorithms used in Digital Signal Processing, we almost always find it takes the form shown above. Often this form is called Sum of Products (SOP), or Multiply Accumulate (MAC). The first Digital Signal Processors (also abbreviated DSP) were derived from a standard 16-bit microprocessor. In order to meet the goals of DSP, though, they took on characteristics of RISC processing that ensured instructions executed in a single-cycle (not too common in those days). Also, they included hardware multipliers, which replaced at least 32 microcode instructions with a fast, single-cycle multiply. Further, their addressing capabilities were enhanced to allow quick and easy processing of the streaming data (usually collected into data buffers). In many ways, DSPs today like the C6000 - are not that different from their forefathers. Then again, fast clock frequencies, wider buses, more registers and memory, and an architecture designed to efficiently operate C code make them vastly superior.
1-9
C6000 Core CPU Architecture

Heres a quick peek into the C6000 core CPU:
'C6000 CPU Architecture

Memory
A0 .D1 .D1 .D2 .D2 B0 C6000 Compiler excels at Natural C While dual-MAC speeds math intensive algorithms, flexibility of 8 independent functional units allows the compiler to quickly perform other types of processing All C6000 instructions are conditional allowing efficient hardware pipelining C6000 CPU can dispatch up to eight parallel instructions each cycle
.S1 .S1
.S2 .S2
Dual MACs
. . A15 . . A31
.M1 .M1
.M2 .M2
. . B15 . . B31
.L1 .L1
.L2 .L2
Controller/Decoder Controller/Decoder
Lets point out a few things: The C6000 architecture was co-developed with its C compiler. The CPU was designed for the C language from the ground-up All eight functional units can receive their own 32-bit instruction on every cycle. We might say it another way, we can execute eight instructions in parallel. The ability to control each execution unit independently is why the C6000 architecture is often likened to VLIW (very long instruction word). Both the C6000 architecture (called VelociTI velocity) and VLIW allow such granularity in how they control the individual functional units.
This is not possible with standard DSP (or GPP) architectures, where one single instruction controls all the functional units at once. For example, on other DSP processors, the only way to get all the functional units working all at the same time is to use a MAC type instruction. In all other instructions, only a subset of functional units will operate. Even if you have other things that could be done simultaneously, there is no way to tell the processor how to make this happen.
Unlike the C6000, though, VLIW architectures require and instruction code for each functional unit on every cycle. This often means that millions of NOP instructions are added to the program code. Due to the efficiency of the C6000 VelociTI architecture, these excess NOPs are not required.
For a more detailed explanation of the CPU building blocks, please refer to the CPU Architecture optional topic. Optionally, you may want to consider taking the 4-day, C6000 Optimization Workshop.
1 - 10
C62x vs. C67x vs. C64x

Heres a very quick synopsis of the various C6000 families.
Instruction Fetch Instruction Dispatch Advanced Instruction Packing Instruction Decode Registers (A0 - A15) Registers (A16 - A31) L1 + + + + S1 + + + + M1 x x x x X X D1 + +
Control Registers Emulation Advanced Emulation
Registers (B0 - B15) Registers (B16 - B31) D2 + + M2 X X x x x x S2 + + + + L2 + + + +
This CPU block diagram shows the C64x. This block diagram can be converted to the C62x block diagram by removing the elements in light-colored boxes: Advanced Instruction Packing Advanced Emulation Registers A16-A31 and B16-B31 Enhanced fixed-point instruction set (i.e. C64x is a super-set of the C62x instructions). The light-colored boxes added to each functional-unit demonstrate the additional packed-data instructions provided only provided by the C64x.
The C67x block diagram is the same as the C62x diagram. The primary differences between these two are: C67x has 64-bit wide data-load buses C67x functional-units include 32-bit single-precision (and 64-bit double-precision) floating-point hardware. All other instructions are exactly the same.
Interrupt Control
1 - 11
Using the CPUs Parallelism

Heres a quick look at a simple sum-of-products (Multiply-ACcumulate) loop written in linear assembly code.
Given this simple loop

c x cnt prod y *cp *xp *yp .S1 .S1 .M1 .M1 .L1 .L1 .D1 .D1
[cnt]
40
y =
n = 1
cn * xn
short mac(short *c, short *x, int count) { for (i=0; i < count; i++) { sum += c[i] * x[i]; } MVK loop: LDH LDH MPY ADD SUB B STW .D1 .D1 .M1 .L1 .L1 .S1 .D *cp++, c *xp++, x c, x, prod y, prod, y cnt, 1, cnt loop y, *yp .S1 40, cnt
Given our eight functional units, dont you think we could perform some of these operations in parallel?
1 - 12
C62x Compiled Code

Sure, on the C62x the compiler can achieve 2 results per cycle with 8 instructions executing in parallel.
C62x Intense Parallelism

short mac(short *c, short *x, int count) { for (i=0; i < count; i++) { sum += c[i] * x[i]; }
L2: ; PIPED LOOP PROLOG MPY || MPYH || [B0] B || LDW || LDW MPY .M2 .M1 .S1 .D1 .D2 .M2 B7,A3,B4 B7,A3,A5 L3 *A4++,A3 *B6++,B7 B7,A3,B4
|| MPYH .M1 B7,A3,A5 Given this C code LDW C *A4++,A3 Given this .D1code || [B0] B .S1 L3 || LDW .D2 *B6++,B7 || LDW .D1 *A4++,A3 The C62x compiler can achieve || LDW .D2 *B6++,B7 The C62x compiler can achieve LDW .D1 *A4++,A3 ;** Two Sum-of-Products per cycle -----------------------* Two Sum-of-Products per cycle || LDW .D2 *B6++,B7 [B0] B .S1 L3 || LDW .D1 *A4++,A3 || LDW .D2 *B6++,B7 [B0] B .S1 L3 || LDW .D1 *A4++,A3 || LDW .D2 *B6++,B7 [B0] B .S1 L3 || LDW .D1 *A4++,A3 || LDW .D2 *B6++,B7
L3: || || || || || || ||
; PIPED LOOP KERNEL ADD .L2 B4,B5,B5 ADD .L1 A5,A0,A0 MPY .M2 B7,A3,B4 MPYH .M1 B7,A3,A5 [B0]B .S1 L3 [B0]SUB .S2 B0,1,B0 LDW .D1 *A4++,A3 LDW .D2 *B6++,B7
;** -----------------------*
Notice a few things: Parallel bars || indicate that an assembly instruction is performed at the same time as the previous instruction. Hence, in our loop kernel, 8 instructions are performed in parallel. Thereby using all 8 functional units. Order of instructions is unimportant as they all get executed at once. The rest of the code shown on this slide is used to setup the loop.
Two multiplies and two adds (i.e. two MACs) are performed each cycle. This gets us our two MACs per cycle. The [BO] indicates the branch (and subtract) are performed conditionally. That is, the branch to label L3 will only occur if B0 is non-zero. All C6000 instructions (except NOP and IDLE) are conditional. This makes for fast, code-execution on hardware pipelined processors such as the C6000 devices.
Note: By the way, the C6000 toolset includes an Assembly Optimizer. This tool takes the linear assembly code (shown back one figure) and creates code similar to the compilers, highlyoptimized standard assembly code.
1 - 13
C67x Compiled Code

The C67x can accomplish the same rate of performance as the C62x, but does it with two 32-bit floating-point instructions done in parallel. This is where the C67xs 64-bit internal bus (via the LDDW instruction) really pays off. Its required in order to achieve this kind of throughput on 32-bit wide data.
C67x MAC using Natural C

Memory The C67x compiler gets two 32-bit The C67x compiler gets two 32-bit A0 B0 .D1 floating-point .D1 .D2 .D2 floating-point Sum-of-Products per iteration Sum-of-Products per iteration
.M1 .M1 .M2 .M2 float mac(float *c, float *x, int count) { int i, float sum = 0; for (i=0; i < count; i++) { sum += c[i] * x[i]; } ;** --------------------------------------------------* LOOP: ; PIPED LOOP KERNEL LDDW .D1 A4++,A7:A6 || LDDW .D2 B4++,B7:B6 || MPYSP .M1X A6,B6,A5 || MPYSP .M2X A7,B7,B5 || ADDSP .L1 A5,A8,A8 || ADDSP .L2 B5,B8,B8 || [A1] B .S2 LOOP || [A1] SUB .S1 A1,1,A1 ;** --------------------------------------------------*
.L1 .L1 . . A15 .S1 .S1
.L2 .L2 . . .S2 .S2 B15
Controller/Decoder Controller/Decoder
Notice: The MPYSP and ADDSP instructions where SP stands for single-precision, 32-bit floating-point. LDDW is loading two 32-bit values into two consecutive 32-bit registers. Using two LDDWs means were getting four SP values per cycle.
Note: To allow the code to fit on a single PowerPoint slide, we had to modify it slightly. The C67x compiler actually creates a four-cycle loop that performs eight MACs. In other words, its the same rate 2 MACs in one cycle as we claim above, but the loop was just too large to fit on our slide.
1 - 14
C64x Compiled Code

Even better yet, the C64x can do four MACs per cycle using its DOTP2 instruction:
C64x gets four MACs using DOTP2
DOTP2
m1 n1
short mac(short *c, short *x, int count) { int i, short sum = 0;
m0 n0
A5 B5 A6
for (i=0; i < count; i++) { sum += c[i] * x[i]; } ;** --------------------------------------------------* ; PIPED LOOP KERNEL LOOP: ADD .L2 B8,B6,B6 || ADD .L1 A6,A7,A7 || DOTP2 .M2X B4,A4,B8 || DOTP2 .M1X B5,A5,A6 || [ B0] B .S1 LOOP || [ B0] SUB .S2 B0,-1,B0 || LDDW .D2T2 *B7++,B5:B4 || LDDW .D1T1 *A3++,A5:A4 ;** --------------------------------------------------*
=
m1*n1 + m0*n0
+
running sum A7
Combine this with its ability to run at 720 MHz and the C64x CPU will pump out a whopping 2880 MMAC (16-bit Mega MAC's). Heck, if all you need is 8-bit MAC's, the C64x can get twice as many 5760 MMAC's. What is a MMAC and how did we get the C64x doing 2880 of them?
1 - 15
How many MMACs is that?

If we think of a MAC as a single Multiply-ACcumulate (multiply + add), then a million of these would be called a mega-MAC or MMAC. In the following graphic, we see the calculation for the C6201 and C64x:
MMACs
How many 16-bit MMACs (millions of MACs per second) can the 'C6201 perform? 400 MMACs
(two .M units x 200 MHz)
How about 16x16 MMACs on the C64x devices?

2 .M units x 2 16-bit MACs (per .M unit / per cycle) x 720 MHz ---------------2880 MMACs
How many 8-bit MMACs on the C64x? 5760 MMACs (on 8-bit data)
The C64xs ability to perform two 16-bit multiplies (or four 8-bit multiplies) in each .M unit gives it a tremendous performance advantage. While there is no single benchmark which does a good job of comparing different processor architectures, the number of MMACs probably comes the closes. Other benchmarks obfuscate the real picture (MHz, MIPs, MOPs, Dhrystone, Whetstone, FFT, etc.) and we say this even knowing that many of these make the C6000 look better than its competition. Note: The only true way to benchmark a processor is to compare your key real-time kernels written for each processor you are evaluating. It is the only way to the true performance you will achieve.
1 - 16
How can we get such parallelism?

First, were aided by the fact that each C6000 has a double-set of functional-units. Two multipliers, ALUs, and Data access units mean that right off the bat we should be able to get two MACs per cycle. (And the C64xs DOTP instruction takes this up, yet, another level.) But all this hardware doesnt necessarily get us to our best performance. Figuring out how to use all this hardware is important, too. To this end, the C6000 compiler uses a technique called Software Pipelining to get the high degree of performance previous shown.
How Do We Get Such High Parallelism?

Compiler and Assembly Optimizer use a technique called Software Pipelining Software pipelining enables high performance (esp. on DSP-type loops) Key point: Tools do all the work!
Software Pipelining isnt a new technique. In fact, its similar to the form of hardware pipelining found in most high-performance processors available today. What stands software pipelining apart is how the instructions can be combined to build very tight loops of code. Why dont most other processors use software pipelining? Your architecture must have the ability to dispatch a separate instruction to each functional-unit every-cycle in order to get use this programming technique. As we mentioned earlier in the chapter, this is a capability unique to the C6000 CPU (and VLIW processors). Lets briefly look at examine concept of software pipelining
1 - 17
Software Pipelining
Software pipelining enables high performance code. The best thing, the tools do all the work. Great, but what is software pipelining? Lets look at a simple example to demonstrate the concept...
LDH || LDH MPY ADD
How many cycles would it take to perform this loop 5 times? 5 x 3 = 15 ______________ cycles
Looking at how these instructions would operate on the C6000s eight function units:
Without Software Pipelining

Cycle .D1 .D2 .M1 .M2 .L1 .L2 .S1 .S2
1 2 3 4 5 6 7
ldh
ldh mpy add
ldh
ldh mpy add
ldh
ldh
Looking at the non-pipelined code above, you can see the inefficiency. Notice how the .D units are left unused when the first multiply occurs. So, in seven cycles we can see were almost half way through the expected 15 cycles. If the code was software pipelined, though
1 - 18
When software pipelining, we take advantage of the unused .D units in cycle 2 and go ahead and perform the next two loads. This allows us to pipeline the instructions resulting in a seven cycle loop less than half the original number of cycles.
With Software Pipelining

Cycle .D1 .D2 .M1 .M2 .L1 .L2 .S1 .S2
1 2 3 4 5 6 7
ldh ldh ldh ldh ldh
ldh ldh ldh ldh ldh mpy mpy mpy mpy mpy add add Completes in only 7 cycles Completes in only 7 cycles add add add
Translating the software pipelining above into code, each cycle gets a set of parallel instructions.
S/W Pipelining Translated to Code

Cycle
c1:
1 2 3 4 5 6 7
.D1 ldh ldh ldh ldh ldh
.D2 ldh ldh ldh ldh ldh mpy mpy mpy mpy mpy add add add add add
c2:
.S1
|| ||
||
LDH LDH MPY LDH LDH ADD MPY LDH LDH
.S2
c3: || || ||
Since most processors only allow one instruction to control all their functional units, they cannot take advantage of software pipelining. The granularity of the C6000 architecture gives it the extra flexibility to take advantage of this optimization strategy.
1 - 19
DSP Tools Overview
DSP Tools Overview

C6000 DSKs
C6416 DSK
C6416 / C6713 DSK Features

TMS320C6416 DSP: or TMS320C6713 DSP: External SDRAM: External Flash: AIC23 Codec: CPLD: 4 User LEDs: 4 User DIP Switches: 3 Configuration Switches: Daughtercard Expansion I/F: HPI Expansion Interface: Embedded JTAG Emulator: 600MHz, fixed-point, 1M Byte internal RAM 225MHz, floating-point, 256K Byte internal RAM 16M Bytes, C6416 64-bit interface C6713 32-bit interface 512K Bytes, 8-bit interface Stereo, 8KHz 96KHz sample rate, 16 to 24-bit samples; mic, line-in, line-out and speaker jacks Programmable "glue" logic Writable through CPLD Readable through CPLD Selects power-on configuration and boot modes Allows user to enhance functionality with add-on daughtercards Allows high speed communication with another DSP Provides high speed JTAG debug through widely accepted USB host interface
1 - 20
DSP Tools Overview
Heres a block diagram view of the C6416 DSK.
C6416 DSK
The C6713 would be almost exactly the same. (We pulled this diagram from the C6416 help file. Look in the C6713 help file <CCS Help menu> to find a similar diagram for that platform.)
DSK Diagnostic Utility

Test/Diagnose DSK hardware Verify USB emulation link Use Advanced tests to facilitate debugging Reset DSK hardware
1 - 21
DSP Tools Overview
Code Composer Studio (CCS)

The Code Composer Studio (CCS) application provides all the necessary software tools for DSP development. At the heart of CCS youll find the original Code Composer IDE (integrated development environment). The IDE provides a single application window in which you can perform all your code development; from entering and editing your program code, to compilation and building an executable file, and finally, to debugging your program code.
Code Composer Studio

Compiler Asm Opto Standard Runtime Libraries .out SIM
DSK Edit Asm Link Debug EVM DSP/BIOS Config Tool DSP/BIOS Libraries Third Party XDS
DSKs Code Composer Studio Includes: Integrated Edit / Debug GUI Simulator Code Generation Tools BIOS: Real-time kernel Real-time analysis
DSP Board
When TI developed Code Composer Studio, it added a number of capabilities to the environment. First of all, the code generation tools (compiler, assembler, linker) were added so that you wouldnt have to purchase them separately. Secondly, the simulator was included (only in the full version of CCS, though). Third, TI has included DSP/BIOS. DSP/BIOS is a real-time kernel consisting of three main features: a real-time, pre-emptive scheduler; real-time capture and analysis; and finally, real-time I/O. Finally, CCS has been built around an extensible software architecture which allows third-parties to build new functionality via plug-ins. See the TI website for a listing of 3rd parties already developing for CCS. At some point in the future, this capability may be extended to all users. If you have an interest, please voice your opinion by calling the TI SC Product Information Center (you can find their phone number and email address in last module, What Next?.)
1 - 22
DSP Tools Overview
Heres a snapshot of the CCS screen:
Since its hard to evaluate a tool by looking at a simple screen capture, well provide you with plenty of hands-on-experience throughout the week.
1 - 23
DSP Tools Overview
Page left intentionally blank.
1 - 24
DSP Tools Overview
Closer Look at the C6000 Code Generation Tools and File Extensions
Using Code Composer Studio (CCS) you may not need to know all these file extension names, but we included a basic review of them for your reference:
Code Generation
Asm Optimizer .sa Editor .c / .cpp .map Compiler Asm Link.cmd
.asm
.obj
Linker
.out
C and C++ use the standard .C and .CPP file extensions. Linear Assembly is written in a .SA file. You can either write standard assembly directly, or it can be created by the compiler and Assembly Optimizer. In all cases, standard assembly uses .ASM. Object files (.OBJ), created by the assembler, are linked together to create the DSPs executable output (.OUT) file. The map (.MAP) file is an output report of the linker. The .OUT file can be loaded into your system by the debugger portion of CCS.
If you want to use your own extensions for file names, they can be redefined with code generation tool options. Please refer to the TMS320C6000 Assembly Tools Users Guide for the appropriate options.
1 - 25
DSP Tools Overview
CCS Projects
Code Composer works within a project paradigm. If youve done code development with most any sophisticated IDE (Microsoft, Borland, etc.), youve no doubt run across the concept of projects. Essentially, within CCS you create a project for each executable program you wish to create. Projects store all the information required to build the executable. For example, it lists things like: the source files, the header files, the target systems memory-map, and program build options.
What is a Project?
Project (.PJT) file contain: References to files:
Source Libraries Linker, etc
Project settings:
Compiler Options DSP/BIOS Linking, etc
The project information is stored in a .PJT file which is created and maintained by CCS. To create a new project, you need to select the ProjectNew menu. This is different from Microsofts Designers Studio as they provide project new/open commands on the File menu.
Project Menu
Hint: Project Menu Hint:
Access open projects Create andvia pull-down menu Create and open projects or by from the right-clicking .pjt file from the Project menu, Project menu, in project explorer window not the File menu. not the File menu.
Build Options...
Next slide
1 - 26
DSP Tools Overview
Build Options
Project options direct the code generation tools (i.e. compiler, assembler, linker) to create code according to your systems needs. Do you need to logically debug your system, improve performance, and minimize code size? Your results can be dramatically affected by the project options available for the C6000 platform. To make it easier to choose build options, CCS provides a graphical user interface (GUI) for the various compiler options. Shown below is a capture of the Basic Compiler options.
Build Options
-g -q -fr"c:\modem\Debug" -mv6700
Eight Categories of Compiler options
There is a one-to-one relationship between the items in the text box and the GUI check and drop-down box selections. Once you have mastered the various options, youll probably find yourself just typing in the options. By the way, the linker page looks like:
Linker Options
Options -o<filename> -m<filename> -c -x Description Output file name Map file name Auto-initialize global/static C variables Exhaustively read libs (resolve back ref's) By default linker options include the o option -q -c -m".\Debug\lab1.map" -o".\Debug\lab1.out" -x We recommend you add the m option ".\Debug\" indicates one subfolder level below the projects .pjt folder Run-time Autoinit tells compiler to initialize global/static variables before calling main()
.\Debug\lab1.out .\Debug\lab1.map
Run-time Autoinitialization
1 - 27
DSP Tools Overview
Compiler Options
There are probably about a 100 options available for the compiler alone. Usually, this is a bit intimidating to wade through. To that end, weve provided a condensed set of options. These few options cover about 80% of most users needs.
Compilers Build Options

Nearly one-hundred compiler options available to tune your code's performance, size, etc. Following table lists the most common options:
Options Description
debug options
-mv6700 -mv6400 -fr <dir> -fs <dir> -q -g -s
Generate C67x code (C62x is default) Generate 'C64x code Directory for object/output files Directory for assembly files Quiet mode (display less info while compiling) Enables src-level symbolic debugging Interlist C statements into assembly listing
In Chapter 4 we will examine the options which enable the compilers optimizer
Well add three more important options to this list in Chapter 4, when we discuss optimization.
1 - 28
DSP Tools Overview
DSP/BIOS Configuration Tool

The DSP/BIOS Configuration Tool (often called Config Tool or GUI Tool or GUI) creates and modifies a system file called the Configuration DataBase (.CDB). If we talk about using CDB files, were also talking about using the Config Tool. The following figure shows a CDB file opened within the configuration tool:
The GUI (graphical user interface) simplifies system design by: Automatically including the appropriate runtime support libraries Automatically handles interrupt vectors and system reset Handles system memory configuration (builds CMD file) When a CDB file is saved, the Config Tool generates 5 additional files: Filename.cdb Filenamecfg_c.c Filenamecfg.s62 Filenamecfg.cmd Filenamecfg.h Filenamecfg.h62 Configuration Database C code created by Config Tool ASM code created by Config Tool Linker commands header file for *cfg_c.c header file for *cfg.s62
When you add a CDB file to your project, CCS automatically adds the C and assembly (S62 or S64) files to the project under the Generated Files folder. (You must manually add the CMD file, yourself.) In the System Tools chapter, we will point out a few more CDB objects. To get all the details on this tool, we recommend you attend the 4-day DSP/BIOS Workshop.
1 - 29
Lab Preparation
Lab Preparation
Before beginning Lab 1, you need to prepare your lab workstation. This involves: Hooking up your DSK Running the DSK Diagnostic Utility to verify the USB connection and DSK are working Running CCS Setup to select the proper emulation driver (DSK vs. Simulator) Starting CCS and setting a few environment properties
C64x or C67x Exercises?

We support processor types in these workshop lab exercises. Please see the specific callouts for each processor as you work. Overall, there are very little differences between the procedures.
Lab Exercises C67x vs. C64x

Which DSK are you using? We provide instructions and solutions for both C67x and C64x. We have tried to call out the few differences in lab steps as explicitly as possible:
1 - 30
Prepare Lab Workstation

The computers used in TIs classrooms and dedicated workshops may be configured for one of ten different courses. The last class taught may have been DSP/BIOS, TMS320 Algorithm Standard, or a C5000 workshop. To provide a consistent starting point for all users, we need to have you complete a few steps to reset the CCS environment to a known starting point.
Computer Login
1. If the computer is not already logged-on, check to see if the log-on information is posted on the workstation. If not, please ask your instructor.
Connecting the C6416 DSK to your PC

The software should have already been installed on your lab workstation. All you should have to do physically connect the DSK 2. Connect the supplied USB cable to your PC or laptop. If you connect the USB cable to a USB Hub, be sure the hub is connected to the PC or laptop and power is applied to the hub. 3. Plug-in the appropriate audio connections. Connect your headphone or speaker to the audio output. An audio patch cable is provided to connect your computers soundcard (or your music source) to the line-in connector on the DSK board. 4. Plug the AC power cord into the power supply and AC source. Note: Power cable must be plugged into AC source prior to plugging the 5 Volt DC output connector into the DSK. 5. Plug the power cable into the board. 6. When power is applied to the board, the Power On Self Test (POST) will run. LEDs 0-3 will flash. When the POST is complete all LEDs blink on and off then stay on.
Hint: At this point, if you were installing the DSK for the first time on your own machine you would now finish the USB driver installation. We have already done this for you on our classroom PCs.
Testing Your Connection

7. Test your USB connection to the DSK by launching the DSK Diagnostic Utility from the icon on the PC desktop. From the diagnostic utility, press the start button to run the diagnostics. In approximately 20 seconds all the on-screen test indicators should turn green. Note: If using the C6713 DSK, the title on this icon will differ accordingly.
1 - 31
CCS Setup
While Code Composer Studio (CCS) has been installed, you will need to assure it is setup properly. CCS can be used with various TI processors such as the C6000 and C5000 families and each of these has various target-boards (simulators, EVMs, DSKs, and XDS emulators). Code Composer Studio must be properly configured using the CCS_Setup application. In this workshop, you should initially configure CCS to use either the C6713 DSK or the C6416 V1.1 DSK. Between you and your lab partner, choose one of the DSKs and the appropriate driver. In any case, the learning objectives will be the same whichever target you choose. 8. Start the CCS Setup utility using its desktop icon:
Be aware there are two CCS icons, one for setup, and the other to start the CCS application. You want the Setup CCS C6000 icon.
Sidebar: CCS Setup

Installed the version of CCS that ships with the DSK will not place the Setup CCS 2 icon on the desktop, nor will the shortcut appear under the Windows start menu:
Start Programs Texas Instruments Code Composer Studio 2 (C6000) Setup Code Composer Studio
The setup program <cc_setup.exe> is installed to the hard drive for both the full and DSK versions of CCS, although the desktop icon and Start menu shortcut are only added when installing the full version of CCS. When installing the lab files for this workshop, for your convenience we also place an icon on the desktop. If, for some unexpected reason, this icon has been deleted, you can find and run the program from: c:\ti\cc\bin\cc_setup.exe
(where \ti\ is the directory you installed CCS)
1 - 32
9. When you open CC_Setup you should see a screen similar to this:
Note: If you dont see the Import Configuration dialog box, you should open it from the menu using File Import Once the Import Configuration dialog box is open, you can change the CC_Setup default to force this dialog to open every time you start CC_Setup. Just check the box in the bottom of the import dialog.
1 - 33
10. Clear the previous configuration. Before you select a new configuration you should delete the previous configuration. Click the Clear System Configuration button. CC_Setup will ask if you really want to do this, choose Yes to clear the configuration.
11. Select a new configuration from the list and click the Import button. If you are using the C6416 DSK in this workshop, please choose the C6416 V1.1 DSK:
64
1 - 34
67 67
If you are using the C6713 DSK in this workshop, please choose the C6713 DSK:
12. Save and Quit the Import Configuration dialog box. 13. Go ahead and start CCS upon exiting CCS Setup.
1 - 35
Set CCS Customize Options

There are a few option settings that need to be verified before we begin. Otherwise, the lab procedure may be difficult to follow. Disable open Disassembly Window upon load Go to main() after build Program load after build Clear breakpoints when loading a new program Set CCS Titlebar information
14. Open the OptionsCustomize Dialog.
15. Set Debug Properties
Here are a couple options that can help make debugging easier. Unless you want the Disassembly Window popping up every time you load a program (which annoys many folks), deselect this option. Many find it convenient to choose the Perform Go Main automatically. Whenever a program is loaded the debugger will automatically run thru the compilers initialization code to your main() function.
1 - 36
16. Set Program Load Options On the Program Load Options tab, select the two following options: Load Program After Build Clear All Breakpoints When Loading New Programs
By default, these options are not enabled, though a previous user of your computer may have already enabled them.
Conceptually, the CCS Integrated Development Environment (IDE) is made up of two parts: Edit (and Build) programs Uses editor and code gen tools to create code. Debug (and Load) programs Communicates with processor/simulator to download and run code.
The Load Program After Build option automatically loads the program (.out file) created when you build a project. If you disabled this automatic feature, you would have to manually load the program via the FileLoad Program menu. Note: You might even think of IDE as standing for Integrated Debugger Editor, since those are the two basic modes of the tool
1 - 37
17. CCS Title Bar Properties CCS allows you to choose what information you want displayed on its titlebar. Note: To reach this tab of the Customize dialog box, you may have to scroll to the right using the arrows in the upper right corner of the dialog.
We have chosen the Board Name, Current Project, and Currently loaded program. The first item allows you to quickly confirm the chosen target (simulator, DSK, etc.). The other two let us quickly determine which project is active and what program we have loaded. Notice how these correlate to the two parts of CCS: Edit and Debug. For our convenience we have also enabled the remaining two features on this dialog page.
1 - 38
Choose Text-Based Linker

CCS includes two different linkers. The Visual Linker is now obsolete; therefore we want to make sure it is not selected. 18. Open the CCS linker selection dialog. Tools Linker Configuration 19. Select Use the text linker and click OK (as shown below).
Now youre done with the Workstation Setup, please continue with the Lab 1 exercise
1 - 39
LAB 1: Using Code Composer Studio

This lab has four goals: Build a project using C source files. Load a program onto the DSK. Run the program and view the results. Use the CCS graphing feature to verify the results.
Lab 1 Create & Graph a Sine Wave

CPU
sineGen() buffer
Introduction to Code Composer Studio (CCS)

Hook up DSK hardware Create and build a project Examine variables, memory, code Run, halt, step, multi-step, use breakpoints Graph results in memory (to see the sine wave)
These take-home (optional) exercises are provided, as well, for those of you who finish the lab early. If you do not get the chance to complete them during the assigned lab time, please try them at home. Lab1a Customize Your Workspace Lab1b Using GEL Lab1c Try adding a printf() statement Lab1d Fixed vs Floating Point Lab1e Explore CCS Scripting
1 - 40
Lab Source Files

Sine Generation Algorithm
Well use a block sine-wave generator function to create our data samples, which we can then graph. The block sine-wave generator function is a basic for loop that uses the following routine to generate individual sine values:
Creating a Sine Wave Sine_float.c

Generates a value for each output sample
t
A
float y[3] = {0, 0. 0654031, 0}; float A = 1. 9957178; short sineGen() { y[0] = y[1] * A - y[2]; y[2] = y[1]; y[1] = y[0]; return((short)(32000*y[0]); }
There are many ways to create sine values, we have chosen this simple model based upon a monostable IIR filter.
1 - 41
block_sine.c
// // // // // // // // ======== block_sine.c ================================= The coefficient A and the three initial values generate a 500Hz tone (sine wave) when running at a sample rate of 48KHz. Even though the calculations are done in floating point, this function returns a short value since this is what's needed by a 16-bit codec (DAC).
// ======== Prototypes =================================== void blockSine(short *buf, int len); short sineGen(void); // ======== Definitions ================================== // Initial values #define Y1 0.0654031 // = sin((f_tone/f_samp) * 360) // = sin((500Hz / 48KHz) * 360) // = sin (3.75) #define AA 1.9957178 // = 2 * cos(3.75) // ======== Globals ===================================== static float y[3] = {0,Y1,0}; static float A = AA; // ======== sineGen ====================================== // Generate a single element of sine data short sineGen(void) { y[0] = y[1] * A - y[2]; y[2] = y[1]; y[1] = y[0]; // To scale full 16-bit range we would multiply y[0] // by 32768 using a number slightly less than this // (such as 32000) helps to prevent overflow. y[0] *= 32000; // We recast the result to a short value upon returning it // since the D/A converter is programmed to accept 16-bit // signed values. return((short)y[0]); } // ======== blockSine ======== // Generate a block of sine data using sineGen void blockSine(short *buf, int len) { int i = 0; for (i = 0;i < len; i++) { buf[i] = sineGen(); } }
1 - 42
Lab1.c
// Include files #include <c6x.h> #include "lab1cfg.h" // Declarations #define BUFFSIZE 128 // C6000 compiler definitions
// Global Variables static short gBuffer[BUFFSIZE]; // ======== main ======== // Simple function which calls blockSine void main() { blockSine(gBuffer, BUFFSIZE); // Fill buffer with sine data return; }
1 - 43
Lab 1 Procedure
Create the Lab1 project
1. Create a new project. Create a new project C:\c60001day\labs\lab1\LAB1.PJT by choosing: Project New It should look like:
67
If using the C6713 DSK, the target should read, TMS320C67XX 2. Verify that the new project was created correctly. Verify the newly created project is open in CCS by clicking on the + sign next to the Projects folder in the Project View window. Click again on the + sign next to lab1.pjt. If you dont see the new project, notify your instructor. 3. Create a new CDB file. As mentioned during the discussion, configuration database files (*.CDB) control a range of CCS capabilities. In this lab, the CDB file will automatically create the reset vector and specify the memory to the linker. Create a new CDB file (DSP/BIOS Configuration) as shown:
1 - 44
When the dialog box appears, select the dsk6416.cdb (or dsk6713.cdb) template and click OK.
67
If using the C6713 DSK, choose the dsk6713.cdb file
Hint:
In some TI classrooms you may see two or more tabs of CDB templates; e.g. TMS62xx, TMS54xx, etc. If you experience this, just choose the C6x tab.
4. Save your CDB file. File Save As C:\c60001day\labs\lab1\Lab1.CDB Then, close the CDB Config Tool. 5. Add files to your project. You can add files to a project in one of three ways:

Project Add Files to Project

Right-click the project icon in the Project Explorer window and select Add files Drag and drop files from Windows Explorer onto the project icon
Using one of these methods, add the following files from C:\c60001day\labs\lab1 to your project:
LAB1.C LAB1.CDB LAB1cfg.CMD block_sine.c
1 - 45
6. Verify your files were added to the project. The project should look similar to:
Examine the C Code

7. Open and inspect lab1.c Open lab1.c and inspect its contents (double-click on the file in the Project window). Notice the buffer (gBuffer) in memory of length 128. This buffer will receive values generated by the block sine wave generator routine. After running the code, you will graph the values in this buffer. Look at the main( ) routine. We simply call the block sine function and then return.
Building the program (.OUT)

Now that all the files have been added to our project, its time create the executable output program (.OUT file). By default, CCS names the program after the project name. Therefore, your output program should be named lab1.out. 8. Build the program. There are two ways to build (compile and link) your program: Use the REBUILD ALL toolbar icon:
Select Project Rebuild All
Choose one of the above methods and build your program. The Build Output window appears in the lower part of the CCS window. Note the build progress information. If you dont see 0 Errors, 0 Warnings, 0 Remarks, please ask your instructor for help.
1 - 46
9. Verify program is automatically loaded. Since you enabled the Program Load after Build option (step 16, pg. 1-37), CCS should download the program lab1.out once it builds without errors
The yellow arrow indicates the position of the program counter. Once the program is loaded, it should be pointed to the beginning of main(). Why? Setting the Perform Go Main Automatically option (step 15, pg 1-36) causes CCS to run to main after being loaded. If we didnt enable this option, you could do it manually using the Debug Go Main menu option.
Hint: While main( ) is the beginning of our code, there are many initialization steps that occur between reset and your main program. These issues are discussed in the various user guides and the 4-day workshops. Sorry, we dont have time for this detail today.
1 - 47
Watch Variables
10. Add gBuffer to the Watch window. Select and highlight the variable gBuffer in the lab1.c window. Right-click on gBuffer and choose Add to Watch Window. Note: The value shown for gBuffer will most likely differ from that shown below.
Adding a variable to the Watch window opens it automatically. Alternatively, you could have opened the watch window, selected gBuffer, and drag-n-dropped it onto the Watch 1 window. Click on the + sign next to gBuffer to see the individual elements of the array. Note: At some point, if the Watch window shows an error unknown identifier for a variable, dont worry, it's probably due to the variables scope. Local variables do not exist (and thus, dont have a value) until their function is called. If requested, Code Composer will add local variables to the Watch window, but will indicate they arent valid until the appropriate function is reached.
Viewing and Filling Memory

11. View the memory contents at the address gBuffer. Another way to view values in memory is to use a Memory window. Select View Memory and type in the following: Title = gBuffer gBuffer 0 16-Bit Hex-TI Style Address = Q-Value = Format =
Click OK and resize the window so that you can see your code and the buffer. Because we have just come out of reset and this memory area was not initialized, you should see random values.
1 - 48
12. Record the address of the gBuffer array. There are many ways to find this address. Two of them are: The address shown for the +gBuffer value in the Watch Window; or The address associated with gBuffer in the Memory View window
Address of gBuffer: ________________________________________________________
13. Initialize the gBuffer array to zero. While not necessarily required since gBuffer will be overwritten by our code, lets go ahead and initialize it anyway. Select: Edit Memory Fill and fill in the following: Address Length = = gBuffer 64 0
Fill Pattern =
Click OK. The buffer was 128 16-bit values in length (they were defined as shorts in the C file). The fill memory function fills integer, or 32-bit values. Therefore, we only need to fill sixty-four 32-bit locations in order to zero out the 128x16 array.
Single-Stepping Code
14. Click on the Watch Locals tab of the Watch window. 15. Single-Step through your code. Single-step the debugger until you reach the blockSine() function; it contains local variables. Use toolbar -orDebug menu
Use icon with the yellow arrow.

Yellow arrow for source-stepping. Green arrow for assembly stepping.
Once you have single-stepped to the for loop, youll notice that Watch Locals will look similar to.
1 - 49
Multiple Operations Toolbar

CCS has a new feature that provides multi-operations from a toolbar. Lets try to single-step our source code another eight times using this feature. 16. Verify the Multiple Operations toolbar is visible. It should look like:
If you cannot find it, it can be opened from the View menu: View Debug Toolbars Multiple Operations 17. Set the Multiple Operations values as shown in the proceeding step and execute.
Source Step Into 8 Execute
You should see it executing multiple single-steps, just as in step 15.
Setting Breakpoints
While single-stepping is quite useful, it can take a long time to get to the end of your program. A faster way to accomplish this is to set a breakpoint (a marker which tells the processor to stop) and use the RUN command. 18. Set a break point. Set a break point on the return; command in main( ). Breakpoints can be set in 3 different ways. Choose the one you like best and set the breakpoint: Place the cursor on the end brace of main() and click on the: Right-click on the line with the end brace and choose Toggle Breakpoint Double-click in the grey area next to the end brace (as shown below):
1 - 50
Running Code
19. Run your code. Run the code up to the breakpoint. There are 3 different ways to cause CCS to run your code: Use toolbar icon: Select: Debug Run Press F5
The processor will halt at the breakpoint that youve set. Notice that the watch window changes to show the new values of gBuffer[]. You may have to click on the + sign next to buffer to see the values. Code Composer allows you to collapse and expand aggregate data types (structures, arrays, etc.).
Hint:
Values highlighted in red have changed with the last update
1 - 51
Windows and Workspaces

20. Rearrange windows. As long as a window is not maximized in CCS, it can be moved around to any location you prefer. Windows can float or be docked. Select the Watch window, right-click on the upper portion, and select Float In Main Window. Then, move it around. Try docking it again. 21. Save your Workspace When you have the windows exactly where you want them, save your workspace by choosing: File Workspace Save Workspace As Pick a filename and save it in any location you prefer (the lab1 directory may be convenient). Note: The workspace includes the current open project. So, when you retrieve the workspace, it will retrieve the project. If you dont wish to save the project info with the workspace, close the project before saving your workspace. If you want to retrieve a previously saved workspace, select: File Load Workspace
Graphing Data
22. Graph your sine data. The watch window is a great way to view data in CCS. But, can you tell if this is really a sine wave? Wouldnt it be better to see this data graphed? Well, CCS allows us to do this. Select: View Graph Time/Frequency Modify the following values: Graph Title Start Address Acquisition Buffer Size Display Data Size DSP Data Type Sampling Rate gBuffer gBuffer 128 128 16-bit signed integer 49152
Click OK when finished.
1 - 52
Your graph should look something like this:
23. Other graphing features CCS supports many different graphing features: time frequency, FFT magnitude, dual-time, constellation, etc. The sine wave that we generated was a 500Hz wave sampled at 48KHz. Lets use the FFT magnitude plot to see the fundamental frequency of the sine wave. Right click on the graphical display of gBuffer and select Properties. Change the display type to FFT Magnitude and click OK. You can now see the frequency spectrum of the wave. 24. Save your workspace again. This will also save your graph window to the workspace.
End of Lab1
We highly recommend trying the first couple optional exercises, if time is still available. Before going on, though, please let your instructor know when you have reached this point.
1 - 53
Take Home Exercises (Optional)

Optional exercises are additional labs that you can use to extend your knowledge of the C6000 and CCS toolset. If you finish the main exercise early, you are welcome to begin working on these labs.
Lab1a Customize CCS

Add Custom Keyboard Assignment
While most CCS commands are available via hotkeys, you may find yourself wanting to modify CCS to better suit your personal needs. For example, to restart the processor, the default hotkey(s) are: Debug Restart CCS lets you remap many of these functions. Lets try remapping Restart. 1. Start CCS if it isnt already open. 2. Open the CCS customization dialog. Option Customize 3. Choose the Keyboard tab in the customize dialog box. 4. Scroll down in the Commands list box to find Debug Restart and select it. 5. Click the Add button. 6. When asked to, Press new shortcut key, press: F4 We already checked and this one isnt assigned within CCS, by default. 7. Click OK twice to close the dialog boxes. 8. From now on, to Restart and Run the CPU, all you need to do is push F4 then F5.
1 - 54
Customize your Workspace

You may not find the default workspace for CCS as convenient as youd like. If thats the case, you can modify as needed. 9. Close CCS if its open, and then open CCS. This is forces CCS back to its default states (i.e. no breakpoints, profiling, etc.). 10. Move the toolbars around as youd like them to be. For example, you may want to close the BIOS and PBC toolbars and then move the Watch toolbar upwards so that you free up another inch of screen space. 11. If you want the Project and File open dialogs to default to a specific path, you need to open a project or file from that path. 12. Make sure you close any project or file from step 11. 13. Save the current workspace. File Workspace Save Workspace As... Save this file to a location you can remember. For example, you might want to save it to: C:\c60001day\labs\ 14. Close CCS. 15. Change the properties of the CCS desktop icon. Right-click on the CCS desktop icon Add your workspace to the Target, as shown below
This should be the path and name of your workspace.
c:\c60001day\labs
16. Open up CCS and verify it worked.
1 - 55
1 - 56
Lab1b - Using GEL Scripts

GEL stands for General Extension Language, a fancy name for a scripting tool. You can use GEL scripts to automate processes as you see necessary. Well be using a few of them in the lab in just a few minutes.
GEL Scripting
T TO
Technical Training Organization
GEL: General Extension GEL: General Extension Language Language C style syntax C style syntax Large number of debugger Large number of debugger commands as GEL functions commands as GEL functions Write your own functions Write your own functions Create GEL menu items Create GEL menu items
Using GEL Scripts

When debugging, you often need to fill memory with a known value prior to building and running some new code. Instead of constantly using the menu commands, lets create a GEL (General Extension Language) file that automates the process. GEL files can be used to execute a string of commands that the user specifies. They are quite handy. 1. Start CCS and open your project (lab1.pjt) and load the program (lab1.out), if theyre not already open and loaded. 2. Create a GEL file (GEL files are just text files) File New Source File 3. Save the GEL file Save this file in the lab1 folder. Pick any name you want that ends in *.gel. File Save We chose the name mygel.gel.
1 - 57
4. Create a new menu item In the new gel file, lets create a new menu item (that will appear in CCS menu GEL) called My GEL Functions. Type the following into the file:
menuitem My GEL Functions;
You can access all of the pre-defined GEL commands by accessing: Help Contents Select the Index tab and type the word GEL. 5. Create a submenu item to clear our arrays The menuitem command that we used in the previous step will place the title My GEL Functions under the GEL menu in CCS. When you select this menu item, we want to be able to select different operations. Submenu items are created with the hotmenu command. Enter the following into your GEL file to create a submenu item to clear the memory array:
(Dont forget the semicolon as with C, its important!) hotmenu ClearArray() { GEL_MemoryFill(gBuffer, 0, }
64, 0x0);
The MemoryFill command requires the following info: Address Type of memory (data memory = 0) Length (# of 32-bit values) Memory fill pattern. This example will fill our array (gBuffer) with zeros. For more info on GEL and GEL_ commands, please refer to the CCS help file. 6. Add a second menu item to fill the array In this example, we want to ask the user to enter a value to write to each location in memory. Rather than using the hotmenu command, the dialog command allows us to query the user. Enter the following:
dialog FillArrays(fillVal Fill Array with:) { GEL_MemoryFill(gBuffer, 0, 64, fillVal); }
7. Save then Load your new GEL file To use a GEL file, it must be loaded into CCS. When loaded, it shows up in the CCS Explorer window in the GEL folder. File Save File Load GEL and select your GEL file
1 - 58
8. Show gBuffer array in Memory window Without looking at the arrays, it will be hard to see the effect of our scripts. Lets open a Memory window to view gBuffer. View Memory
Title: Address: Q-Value: Format: gBuffer gBuffer 0 16-bit hex TI style
A couple notes about memory windows: C Style adds 0x in front of the number, TI Style doesnt. Select the Format based on the data type your are interested in viewing. This will make it easier to see your data.
9. Now, try the GEL functions. GEL My GEL Functions ClearArray GEL My GEL Functions FillArray You can actually use this GEL script throughout the rest of the workshop. It is a very handy tool. Feel free to add or delete commands from your new GEL file as you do the labs. 10. Review loaded GEL files. Within the CCS Explorer window (on the left), locate and expand the GEL files folder. CCS lists all loaded GEL files here.
Hint: If you modify a loaded GEL file, before you can use the modifications you must reload it. The easiest way to reload a GEL file: (1) Right-click the GEL file in the CCS Project Explorer window (2) Pick Reload from the right-click popup menu
1 - 59
Lab1c Using Printf

If you want to use printf() to output the value of y, two steps are required.
Lab 1c: Using Printf
1.
#include <stdio.h> short func1(short *m, short count); short a[4] = {40,39,38,37}; int y = 0; main() { y = function(); printf("y = %x hex\n", y); }
2.
1. Open lab1.pjt project, if it is not still open. 2. Open the lab1.c file by double-clicking on it in the Project Explorer window. 3. To use printf(): First you must remember to include the header file as in step #1 in the above graphic. Next, you must add the printf() command to your c file. For example, try adding a simple printf() to main.
void main() { blockSine(gBuffer, BUFFSIZE); // Fill buffer with sine data printf("gBuffer (at location 0x%x) was filled with sine values\n", gBuffer); return; }
4. Build and load the .OUT file. When you build and load this program, the Build/Messages window will add a third tab called Stdout which will contain the output from printf(). 5. Verify that it works. This can be done by viewing the printed statement in the output window, Stdout tab of the Output window. 6. Close the project.
1 - 60
Lab1d Fixed vs Floating Point

We included a functioning integer sinewave routine for comparison to the float routine used throughout the workshop. Notice the additional effort required make integer math routines work correctly. This extra work is required so that the 16-bit integer values do not overflow and cause data corruption. The method used to solve overflow in this application is often called Q-math. Maybe a better name for it is fractional, fixed-point math. The beauty of fractions is that when multiplied together, their value gets smaller. Hence the result is always bounded (i.e. no overflow). The problem with integer math is not confined to TI DSPs (or DSPs in general), rather it is a side affect between the fact that integer numbers get bigger when add or multiply them and that the C language provides no means of handling overflow for signed numbers. In fact, the C language leaves signed math that overflows undefined every compiler writer can handle it however they want (so much for portability). The dynamic range of floating-point variables sure makes life easier. Its why many folks choose floating-point to decrease their engineering time (and get to market more quickly). Of course, this is why the C6713 is so popular as its designed to do floating-point math in hardware. We have provided a project for you to compare different versions of sineGen: Standard fixed-point math Q-math (fractional, fixed-point) Floating-point math
You will find LAB1d.PJT already built in the LAB1d folder: C:\c60001day\labs\lab1d\ Try running the project and comparing all three results in three different graphs. To simplify setting up the graph windows, try using the provided workspace LAB1d.wks.
1 - 61
1 - 62
Lab1e Explore CCS Scripting

A number of CCS Scripting examples have been included in the Scripting folder. You will find them at: C:\c60001day\labs\scripting\ These scripts are contained in Excel spreadsheets and are written in VBA (Visual Basic for Applications). Scripts can also be written in perl or any COM compliant language. Please feel free to explore them.
Lab Debrief
Lab 1 Debrief
1. 2. 3. 4. 5.
What differences are there in Lab1 between the C6713 and C6416 solutions? What do we need CCS Setup for? Why did we return from main? What did you have to add to LAB1.C to get printf to work? Did you find the clearArrays GEL menu command useful?
1 - 63
Optional Topics
Optional Topics
Optional Topic: CCS Automation
As evidenced by the optional lab exercise, CCS ships provides scripting/automation tools. They are mentioned here to make you aware of their presence. To explore them further, please examine the online documentation.
Command Line Window

Provides a convenient way to type in CCS commands, rather than using the pull-down menus.
Command Window
Some frequently used commands:

help dlog dlog alias take
<filename>,a close ... <filename.txt>
load <filename.out> reload reset restart ba <label > wa <label>
run run go step cstep halt
<cond> <label> <number> <number>
For those of you ol timers, who remember the old command line debugging tools, you can use the same commands youve used for years.
1 - 64
Optional Topics
GEL Scripting
GEL Scripting
GEL: General Extension GEL: General Extension Language Language C style syntax C style syntax Large number of debugger Large number of debugger commands as GEL functions commands as GEL functions Write your own functions Write your own functions Create GEL menu items Create GEL menu items
Notice the GEL folder in the Project View window. You can load/unload GEL scripts by rightclicking this window. GEL syntax is very C-like. Notice that QuickTest() calls LED_cycle(), defined earlier in the file. (This happens to be a C6711 DSK GEL script.) You can add items to the GEL menu. An example is shown in the above graphic. Finally, a GEL file can be loaded upon starting CCS. The startup GEL script is specified using the CCS Setup application.
1 - 65
Optional Topics
CCS Scripting
CCS Scripting is a CCS plug-in. After installing CCS on your PC, you should use the Update Advisor feature (available from the Help menu) to download and add the CCS Scripting plug-in.
Hint: You may find other useful tools, application notes, and plug-ins available via the CCS Update Advisor.
CCS scripting provides a method of controlling the CCS debugger from another scripting language. Any Microsoft COM (i.e. OLE) compliant language should be able to use the CCS Scripting library, but VB Script and Perl are the two languages for which examples are provided. The graphic below is an example of a VB Script using CCS Scripting:
CCS Scripting
Debug using VB Script or Perl Debug using VB Script or Perl Using CCS Scripting, aasimple script can: Using CCS Scripting, simple script can: Start CCS Start CCS Load aafile Load file Read/write memory Read/write memory Set/clear breakpoints Set/clear breakpoints Run, and perform other basic testing Run, and perform other basic testing functions functions
Among other things, CCS Scripting is very useful for testing purposes. For example, if you have a number of test vectors you would like to run against your system, you can use CCS Scripting to automate this process. Your script could then: Build Run Capture data, memory values, benchmarks And compare the results against what you expect (or hope) Over and over again At this time, the CCS Scripting Plug-in (v1.2) only ships with C5000 based examples. For your convenience, we have written and included some C6000 based examples along with the workshop lab files.
1 - 66
Optional Topics
TCONF Scripting (Textual Configuration)

CCS now provides a textual scripting method for creating and editing CDB files.
TCONF Scripting (CDB)

Tconf Script (.tcf)
hello_dsk62cfg.tcf
utils.loadPlatform("dsk6211"); /* load DSK6211 platform into TCOM */ utils.getProgObjs(prog); /* make all prog objects JavaScript global vars */ LOG_system.bufLen = 128; /* set buffer length of LOG_system to 128 */ utils.importFile("hello"); /* import portable application script */ prog.gen(); /* generate cfg files (and CDB file) */
Tconf Include File (.tci)
hello.tci
var trace = LOG.create("trace"); trace.bufLen = 32;

Your Application
/* create a new user log, named trace */ /* initialize its length to 32 (words) */
hello.c
#include <log.h> extern LOG_Obj trace; /* created in hello.tci */
int main() { A textual way to configure CDB files A textual way to configure CDB files LOG_printf(&trace, "Hello World!\n");on both PC and Unix Runs on both PC and Unix Runs return (0); Create #include type files (.tci) Create #include type files (.tci) } More flexible than Config Tool
More flexible than Config Tool
Some users find writing code preferable to using the Graphical User Interface (GUI) of the Configuration Tool. This is especially true for users who build their code in the Unix environment, as there is no Unix version of the GUI.
1 - 67
Optional Topics
Optional Topic: CPU Architecture Details

Again, before more closely examining the C6000 CPU architecture, its beneficial to remind ourselves of the purpose of a Digital Signal Processor. In other words, what is digital signal processing?
What is Digital Signal Processing (DSP)?

We already explored this topic back on page 1-9. As a reminder, here is the graphic that summed up the answer to our question.
What Problem Are We Trying To Solve?

ADC x
DSP
DAC
Digital sampling of an analog signal:

A
Most DSP algorithms can be expressed with MAC:

count
Y =
i = 1
coeffi * xi
for (i = 0; i < count; i++){ sum += c[i] * x[i]; }
1 - 68
Optional Topics
CPU Architecture
What is the core part of DSP algorithms? In layman's terms, you might say its the Sum of Products (SOP) or Multiply-Accumulate (MAC).
The Core of DSP : Sum of Products

40
Mult .M .M Mult
y =
n = 1
cn * xn
c, x, prod y, prod, y
The C6000
Designed to handle DSPs math-intensive calculations ALU .L ALU .L
MPY ADD
.M .L
Note: You dont have to specify functional units (.M or .L)
The C6000 CPU has a separate Multiply (.M) unit, along with an arithmetic logic unit (.L). The variables operated upon by the CPU are stored in a register file. Register file A holds 16 or 32 registers, depending upon which C6000 CPU you are using.
Working Variables : The Register File

Register File A c x
16 or 32 registers
40
.M .M .L .L
y =
n = 1
cn * xn
c, x, prod y, prod, y
prod y
MPY ADD
.M .L
. . .
32-bits
The heart of the Sum of Products routine is easily handled by these two units as shown above
1 - 69
Optional Topics
with the Multiply (MPY) and Add instructions. To make this into a real Sum of Products, though, we need to put them into a loop.
Making Loops
1. Program flow: the branch instruction
B loop
2. Initialization: setting the loop count

MVK 40, cnt
3. Decrement: subtract 1 from the loop counter

SUB cnt, 1, cnt
Adding these instructions to our example:
.S Unit: Branch and Shift Instructions

Register File A c x cnt prod y
40
.S .S .M .M .L .L
loop:
y =
n = 1
cn * xn
40, cnt c, x, prod y, prod, y cnt, 1, cnt loop
16 or 32 registers
MVK MPY ADD SUB B
.S .M .L .L .S
. . .
32-bits
Note: C64x could use BDEC in place of SUB and Branch
If you (or the compiler) were coding for the C64x, you could optimize the code using the Branch with Decrement (BDEC) instruction. When using a standard branch (B), though, how can we tell our loop counter has reached zero and that we can stop branching and move on?
1 - 70
Optional Topics
By putting a conditional [reg] statement before the branch, as shown below.
Conditional Instruction Execution

To minimize branching, all instructions are conditional
[condition]
loop
Execution based on [zero/non-zero] value of specified variable

Code Syntax [ cnt ] [ !cnt ] Execute if: cnt 0 cnt = 0
Note: If condition is false, execution is essentially replaced with nop
Loop Control via Conditional Branch

Register File A c x cnt prod y
40
.S .S .M .M .L .L
[cnt] loop:
y =
n = 1
cn * xn
40, cnt c, x, prod y, prod, y cnt, 1, cnt loop
16 or 32 registers
MVK MPY ADD SUB B
.S .M .L .L .S
. . .
32-bits
A great thing about the C6000 is that all instructions allow for [conditional] execution. While this may not sound that cool at first, it can make a tremendous difference in how efficient you can code a hardware pipelined processor.
1 - 71
Optional Topics
Since a register can only hold a single value at a time, we have to load our variable registers each time through the loop. The C6000 has a forth functional unit to manage data loads and stores (.D). We use the pointer concept in assembly code, just as you might in C code. The pointer indicates where the data array exists in memory; that is, where we load the data from.
Memory Access via .D Unit

Register File A c x cnt prod y *cp *xp *yp
40
.S .S .M .M .L .L .D .D
loop:
y =
n = 1
cn * xn
40, cnt *cp *xp ,c ,x
16 or 32 registers
MVK LDH LDH MPY ADD SUB
.S .D .D .M .L .L
c, x, prod y, prod, y cnt, 1, cnt
Data Memory: x(40), a(40), y
[cnt] B .S loop Note: No restrictions on which regs Note: No restrictions on which regs can be used for address or data! can be used for address or data!
Loads can be performed in many different widths, depending upon your chosen data type.
Instr. Description C Type Size Instr. Description C Type Size LDB load byte char 8-bits LDB load byte char 8-bits LDH load half-word short 16-bits 40 LDH load half-word short 16-bits Register File A LDW load word int c LDW int y =32-bitsn * xn 32-bits cloadword n = 1 .S LDDW* xloaddouble-word double load double-word double 64-bits .S LDDW* 64-bits MVK .S 40, cnt cnt **Only available on the C64x and C67x Only available on the C64x and C67x loop: prod .M .M LDH .D *cp , c y LDH .D *xp , x *cp MPY .M c, x, prod .L .L *xp ADD .L y, prod, y *yp .D .D
Data Memory: x(40), a(40), y SUB .L [cnt] B .S loop
Memory Access via .D Unit
16 or 32 registers
cnt, 1, cnt
1 - 72
Optional Topics
Since we are loading data from arrays in memory, how can we increment the pointers each time through the loop? Again, we use the same increment (++) syntax used by the C language. In this case, the ++ comes after the pointer to indicate we are incrementing the address value contained in the pointer (after using the current value).
Auto-Increment of Pointers
40
.S .S .M .M .L .L .D .D
loop:
y =
n = 1
cn * xn
40, cnt *cp++, c *xp++, x c, x, prod y, prod, y cnt, 1, cnt loop
16 or 32 registers
MVK LDH LDH MPY ADD SUB [cnt] B
.S .D .D .M .L .L .S
Finally, we use a third pointer to store the final result back into our resultant variable.
Storing Results Back to Memory

40
.S .S .M .M .L .L .D .D
loop:
y =
n = 1
cn * xn
40, cnt *cp++, c *xp++, x c, x, prod y, prod, y cnt, 1, cnt loop y, *yp
16 or 32 registers
MVK LDH LDH MPY ADD SUB [cnt] B STW
.S .D .D .M .L .L .S .D
1 - 73
Optional Topics
So far, weve only told you a half-truth. In reality, the C6000 has eight functional units, rather than four. Also, there are two register sets of 16 or 32 registers each.
Dual Resources : Twice as Nice

Register File A cn A0 xn A1 cnt A2 prd A3 sum A4 *c A5 *x A6 *y A7 . . . . A15 A31
or
Register File B .S1 .S1 .M1 .M1 .L1 .L1 .D1 .D1 .S2 .S2 .M2 .M2 .L2 .L2 .D2 .D2 . .
32-bits
B0 B1 B2 B3 B4 B5 B6 B7 . . B15 B31
or
32-bits
As you will see later, having both sets of functional units can dramatically improve our processor's performance.
1 - 74
Optional Topics
In the assembly coding weve examined thus far, we have used symbols (i.e. labels) to specify registers. This is the preferred method of coding when using Linear Assembly code as described in this module. You also have the option to specify specific registers and/or functional units if you wish to provide constraints to the Assembly Optimizer.
Optional - Resource Specific Coding

40
Register File A cn A0 xn A1 cnt A2 prd A3 sum A4 *c A5 *x A6 *y A7 . . . . A15 A31

or
y =
n = 1
cn * xn
40, A2 *A5++, A0 *A6++, A1 A0, A1, A3 A4, A3, A4 A2, 1, A2 loop A4, *A7
.S1 .S1
loop:
MVK LDH LDH MPY ADD SUB [A2] B STW
.S1 .D1 .D1 .M1 .L1 .S1 .S1 .D1
.M1 .M1 .L1 .L1 .D1 .D1
32-bits
Its easier to use symbols rather than register names, but you can use either method.
1 - 75
Optional Topics
C6000 Instruction Set

All C6000 processors have the same basic CPU architecture.
'C6000 System Block Diagram

P E R I P H E R A L S
On-chip Memory
External Memory
Internal Buses
.D1 .D2 .M1 .M2 .L1 .L2 .S1 .S2

CPU
Reggister Set B
To summarize each units instructions ...
One major difference is the instructions each CPU can execute.
C62x RISC-like instruction set

.S Unit .L Unit
ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKH NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO
Register Set A
.S .S .L .L .D .D .M .M
.M Unit .D Unit
ADD NEG ADDAB (B/H/W) STB (B/H/W) LDB (B/H/W) SUB SUBAB (B/H/W) MV ZERO MPY MPYH MPYLH MPYHL NOP SMPY SMPYH
No Unit Used
IDLE
1 - 76
Optional Topics
The C67x adds a whole set of floating-point instructions to the C62x capabilities:
C67x: Superset of Fixed-Point

.S Unit .L Unit
ABS ADD AND CMPEQ CMPGT CMPLT LMBD MV NEG NORM NOT OR SADD SAT SSUB SUB SUBC XOR ZERO ADDSP ADDDP SUBSP SUBDP INTSP INTDP SPINT DPINT SPRTUNC DPTRUNC DPSP MPYSP MPYDP MPYI MPYID IDLE ADD ADDK ADD2 AND B CLR EXT MV MVC MVK MVKH NEG NOT OR SET SHL SHR SSHL SUB SUB2 XOR ZERO ABSSP ABSDP CMPGTSP CMPEQSP CMPLTSP CMPGTDP CMPEQDP CMPLTDP RCPSP RCPDP RSQRSP RSQRDP SPDP
.S .S .L .L .D .D .M .M
.M Unit
MPY MPYH MPYLH MPYHL NOP SMPY SMPYH
.D Unit
ADD NEG ADDAB (B/H/W) STB (B/H/W) LDB (B/H/W) SUB LDDW SUBAB (B/H/W) MV ZERO
No Unit Required
Similarly, the C64x is a superset of the C62x
'C64x: Superset of C62x Instruction Set

.S .S
Dual/Quad Arith SADD2 SADDUS2 SADD4 Data Pack/Un PACK2 PACKH2 PACKLH2 PACKHL2 Bitwise Logical UNPKHU4 ANDN UNPKLU4 Shifts & Merge SWAP2 SPACK2 SHR2 SPACKU4 SHRU2 SHLMB SHRMB Dual Arithmetic Mem Access ADD2 LDDW SUB2 LDNW LDNDW Bitwise Logical STDW AND STNW ANDN STNDW OR Load Constant XOR MVK (5-bit) Address Calc. ADDAD Compares CMPEQ2 CMPEQ4 CMPGT2 CMPGT4 Branches/PC BDEC BPOS BNOP ADDKPC
.L .L
Dual/Quad Arith ABS2 ADD2 ADD4 MAX MIN SUB2 SUB4 SUBABS4 Bitwise Logical ANDN Shift & Merge SHLMB SHRMB
Data Pack/Un PACK2 PACKH2 PACKLH2 PACKHL2 PACKH4 PACKL4 UNPKHU4 UNPKLU4 SWAP2/4
.D .D
.M .M
Average AVG2 AVG4 Shifts ROTL SSHVL SSHVR
Multiplies MPYHI MPYLI MPYHIR MPYLIR Load Constant MPY2 MVK (5-bit) SMPY2 Bit Operations DOTP2 DOTPN2 BITC4 DOTPRSU2 BITR DOTPNRSU2 DEAL DOTPU4 SHFL DOTPSU4 Move GMPY4 MVD XPND2/4
1 - 77
Optional Topics
Assorted C6000 Benchmarks

Here are a few assorted benchmarks for the C62x. Actually, these benchmarks evaluate both the device and the C compiler, as theyre benchmarks of natural C code:
Sample C62x Compiler Benchmarks

Algorithm Used In Asm Cycles
348 977 61 238 1185 43 70 61 51
Assembly Time (s)

1.16 3.26 0.20 0.79 3.95 0.14 0.23 0.20 0.17
C Cycles (Rel 4.0)

402 961 59 280 1318 38 75 58 47
C Time % Efficiency vs (s) Hand Coded

1.34 3.20 0.20 0.93 4.39 0.13 0.25 0.19 0.16
Block Mean Square Error For motion MSE of a 20 column compensation image matrix of image data Codebook Search Vector Max 40 element input vector All-zero FIR Filter 40 samples, 10 coefficients Minimum Error Search Table Size = 2304 IIR Filter 16 coefficients IIR cascaded biquads 10 Cascaded biquads (Direct Form II) MAC Two 40 sample vectors Vector Sum Two 44 sample vectors CELP based voice coders Search Algorithms VSELP based voice coders Search Algorithms Filter Filter VSELP based voice coders
87% 100% 100% 85% 90% 100% 93% 100% 100% 100%
Mean Sq. Error MSE Computation MSE between Completelyin Vector C code (non 0.93 two 256 279 274 Completely natural C code (non C6000 specific) 0.91 natural C6000 specific) element vectors Quantizer Code available at: http://www.ti.com/sc/c6000compiler
TI C62x Compiler Performance Release 4.0: Execution Time in s @ 300 MHz Versus hand-coded assembly based on cycle count
Code available at: http://www.ti.com/sc/c6000compiler
1 - 78
Optional Topics
The following sample of benchmarks shows the performance for both the C62x and C64x. While the C62x is no slouch in performance, the C64x is just that much better. At 720MHz today, with 1GHz speeds already demonstrated, the C64x is the family to use for extreme performance.
Sample Imaging & Telecom Benchmarks

DSP & Image Processing Kernels
Reed Solomon Decode: Syndrome Accumulation (204,188,8) Packet Viterbi Decode (GSM) (16 states) FFT - Radix 4 - Complex (size = N log (N)) (16-bit) Polyphase Filter - Image Scaling (8-bit) Correlation - 3x3 (8-bit) Median Filter - 3x3 (8-bit) Motion Estimation - 8x8 MAD (8-bit)
*
Cycle Count
C62x C64x
Performance
Cycle Improvement C64:C62 720MHz C64x vs 300MHz C62x
1680 38.25
470 14* 6.0

cycles/data
cycles/packet cycles/output
3.5x 2.7x 2.1x 2.3x 3.5x 4.3x 7.6x
8.4x 6.5x 5x 5.5x 8.4x 10.3x 18.2x
12.7 0.77 4.5

cycles/pixel
0.33 1.28 2.1

cycles/pixel
cycles/output/filter tap
9.0 0.953
0.126
cycles/pixel
Includes traceback
You can find more C6000 benchmarks at:

http://dspvillage.ti.com/docs/catalog/generation/details.jhtml?templateId=5154&path=templatedata/cm/dspdetail/data/c6000_benchmarks
1 - 79
Optional Topics
Lab Debrief Answers

1. What differences are there in Lab1 between the C6713 and C6416 solutions? Each uses a different CDB template file One uses the mv6700 option, while the other uses the mv6400 (by the way, what is this option for?) It allows you to specify which target CCS should talk to. In this workshop, it was either the C6713 or C6416 DSK. When using a CDB file, returning from main invokes the DSP/BIOS scheduler. This is discussed a bit more in Chapter 3. And in much greater detail in both of the following 4-day workshops: C6000 Integration Workshop (IW6000) DSP/BIOS Workshop
2. What do we need CCS Setup for?
3. Why did we return from main?
4. What did you have to add to LAB1.C to get printf to work? Reference to the standard C I/O header file <stdio.h> The printf() statement itself. We hope so!
5. Did you find the clearArrays GEL menu command useful?
1 - 80
Using Peripherals
Introduction
A big part of any design is getting data in and out of the processor. Configuring and using peripherals has often been one of the most tedious chores. To this end, TI has created a library of functions, data types, and macros called the Chip Support Library (CSL). This library can replace much of what you might have otherwise needed to write on your own. As the name implies, the Chip Support Library handles the peripheral resources found on-chip. For TI produced development boards (like the C6416 and C6713 DSKs), a Board Support Library (BSL) is also provided. Similar to the CSL, it provides code for using the peripheral resources contained on the board, but outside of the DSP chip. In the next chapter, we briefly discuss how the CSL and BSL can be used to build an encapsulated device driver. In this chapter well use these software libraries directly, in order to output the sine wave we created in previous lab exercise.
sineGen
CPU HWI
DSK6416_AIC23_write()
McBSP
AIC
transmit interrupt
Chapter Outline
Outline
Audio Output (McBSP, Codec)
McBSP is connected to the Codec McBSP a closer look
Using CSL & BSL Hardware Interrupts (HWI) Lab 2 Output a Sinewave Tone
C6416/C6713 DSK One-Day Workshop - Using Peripherals
2-1
Audio Output McBSP and the AIC23 Codec

We intended for this page to be almost blank.
2-2
Chapter 2 Topics
Using Peripherals ...................................................................................................................................... 2-1 Audio Output McBSP and the AIC23 Codec........................................................................................ 2-4 C6416 DSK McBSPCodec Interface........................................................................................... 2-4 C6713 DSK McBSPCodec Interface........................................................................................... 2-5 McBSP Block Diagram ...................................................................................................................... 2-6 Programming Peripherals with CSL and BSL ........................................................................................ 2-8 What is CSL and BSL?....................................................................................................................... 2-8 Generic Procedure for CSL and BSL ................................................................................................2-10 CSL and BSL Documentation ...........................................................................................................2-13 Hardware Interrupts (HWI) ...................................................................................................................2-14 Enabling Interrupts ............................................................................................................................2-15 Lab 2 ......................................................................................................................................................2-16 The Paperwork...................................................................................................................................2-17 Lab2 Procedure..................................................................................................................................2-21 Lab2a (optional) ....................................................................................................................................2-27 Lab 2 Debrief .........................................................................................................................................2-28
2-3

Audio Connections
DSK DSP
McBSP
CPU
McBSP
AIC23
The DSK uses two McBSPs to talk with the AIC23 codec One for control, Another for data
C6416 DSK McBSPCodec Interface
C6416 DSK: McBSP Codec Interface

McBSP1
Control
McBSP2
Data
McBSP1 connected to program AIC23s control registers McBSP2 is used to transfer data to A/D and D/A converters Programmable frequency: 8K, 16K, 24K, 32K, 44.1K, 48K, 96K 24-bit converter, Digital transfer widths: 16-bits, 20-bits, 24-bits, 32-bits
2-4
C6713 DSK McBSPCodec Interface
C6713 DSK: McBSP Codec Interface

McBSP0
Control
McBSP1
Data
McBSP0 connected to program AIC23s control registers McBSP1 is used to transfer data to A/D and D/A converters Programmable frequency: 8K, 16K, 24K, 32K, 44.1K, 48K, 96K 24-bit converter, Digital transfer widths: 16-bits, 20-bits, 24-bits, 32-bits
Notice that the two DSKs use different McBSPs to communicate with the codec. Other than this, the two boards audio output works in exactly the same way.
2-5
McBSP Block Diagram

Heres a block diagram of the C6000s Multi-Channel Buffered Serial Port (McBSP). Notice the independent Receive and Transmit pins and data paths. You might also notice that either the CPU or the EDMA can access the memory-mapped Data Receive Register (DRR) or Data Transmit Register (DXR). Companding (Law/aLaw data compression) is optional on both receive and/or transmit.
McBSP Block Diagram

CPU
I n t e r n a l B u s
D R R D X R
Expand (optional)
R B R 32
RSR
DR
Compress (optional)
XSR
DX
CLKR CLKX McBSP Control Registers CLKS FSR FSX
DMA
Additional Background graphics for McBSP The first slide shows what hardware event causes the Receive and Transmit interrupts.
McBSP Interrupts
DRR RBR
RRDY=1 Ready to Read RRDY & XRDY in McBSP control register displays the status of read and transmit ports: 0: not ready 1: ready to read/write
DXR
XSR
XRDY=1 Ready to Write
CPU
RINT XINT
In Lab 2: XRDY generates McBSP transmit interrupt (XINT2) to CPU when DXR is emptied (and ready for a new value) In Lab 3 (IOM Device Driver): XRDY generates transmit event to EDMA when the DXR is ready for a new value
DMA
REVT XEVT
2-6
The following two slides provide a basic description of the McBSPs synchronous, serial data transfer.
Basic Definitions - Bit, Word

CLK FS D a1 a0 Bit
b7 b6 b5 b4 b3 b2 b1 b0
Word
Bit - one data bit per SP clock period
Serial Port
SP Ctrl (SPCR) Rcv Ctrl (RCR) Xmt Ctrl (XCR) Rate (SRGR) Pin Ctrl (PCR)
Word or channel contains #bits specified by WDLEN1 (8, 12, 16, 20, 24, 32)
14 8 7 5
RFRLEN1 RWDLEN1
14 8 7 5
XFRLEN1 XWDLEN1
Basic Definitions - Frame

FS D w6 w7 Word
w0 w1 w2 w3 w4 w5 w6 w7
Frame
Frame - contains one or multiple words FRLEN1 specifies #words per frame (1-128)
Serial Port
SP Ctrl (SPCR) Rcv Ctrl (RCR) Xmt Ctrl (XCR) Rate (SRGR) Pin Ctrl (PCR)
14
RFRLEN1
14 8
XFRLEN1
2-7
Programming Peripherals with CSL and BSL

TI Software Foundation Libraries
Board Support Library (BSL)
Board-level routines supporting DSK-specific hardware Higher level of abstraction than CSL BSL functions make use of CSL Codec Leds Switches Flash
Chip Support Library (CSL)

Serial Ports EDMA EMIF Cache Timers Etc.
Low-level routines supporting on-chip peripherals
TI DSP
CSL helps with:
1. 2.
Configure Peripherals Managing Multiple Resources (e.g. McBSP channels)
What is CSL and BSL?

The Chip Support Library (CSL) and Board Support Library (BSL) provide a set of data structures and functions to assist you in building a TMS320 DSP based system. This library supports the use of the on-chip peripherals by providing a set of low-level functions and data structures to ease their implementation. CSL has been designed to support multiple invocations such as you might find when using the multiple serial ports or DMA channels found on TIs DSP devices. Simplified directions for using the CSL are located in the lab exercises and in the appendix of this workshop. For complete details, please refer to the C6000 CSL Reference Guide (SPRU401.pdf). BSL is build upon CSL. Essentially, BSL provides a higher-level system interface than CSL, taking into account the peripherals found on the board (but outside the DSP chip).
2-8
CSL Benefits
Why CSL and BSL? Here are a few reasons:
Increased Portability
Supports all C6000 DSPs. When changing from one device to another, no (or little) re-coding is required for peripherals. Where possible, TI has used the same APIs for both the C5000 and C6000 DSP families. This makes porting C code between processors much easier. Taking into account the cross-platform support of DSP/BIOS makes TIs software tools quite powerful. The goal is to provide compatibility at the _open(), _close(), _config() level. The initialization data structures may be different, but we have striven to make the functions as compatible as possible
Easier to use
When TIs DSP 3rd parties and customers use the same CSL/BSL functions, it becomes easier to use and understand code written by others.
Increased Reliability / Decreased Maintenance

When libraries such as CSL/BSL are made available to a large population of users, they become more reliable. The large user base quickly finds any coding bugs (if any).
Additionally, suggestions and recommendations come from a large base of knowledgeable users.
2-9
Generic Procedure for CSL and BSL

Heres a rough description for using the CSL ( and BSL) libraries.
1. Include Header Files 1. Include Header Files Library and individual module header files Library and individual module header files 2. Declare variables 2. Declare variables Usually handle & configuration Usually handle & configuration 3. Open peripheral 3. Open peripheral Reserves resource; returns handle Reserves resource; returns handle 4. Configure peripheral 4. Configure peripheral Applies your configuration to peripheral Applies your configuration to peripheral 5. Use peripheral 5. Use peripheral Some periphs have use-oriented functions Some periphs have use-oriented functions (like read and write) (like read and write)
General Procedure for using CSL & BSL
1. #include <csl.h> #include <csl_timer.h>
Timer Example:
2. TIMER_Handle myHandle; TIMER_CONFIG myConfig = {control, period, counter}; 3. myHandle = TIMER_open(TIMER_DEVANY, ...); 4. TIMER_config(myHandle, &myConfig); 5. TIMER_start (myHandle);
To some this syntax will appear quite familiar. To those of use who spent most of our careers writing assembly language, though, this may be a new method of programming. These libraries provide two levels of support: A set of macros, functions, and data structures to ease symbolic programming of the resource (module). Basic resource management.
Lets see how the five parts of the timer example shown above correlate to these two ideas. 1. Include the appropriate header files.
As most of you already know, whenever you use a library, there are usually one or more header files you have to include. In the case of CSL, you must first include the general CSL.H file, and then the header file for each module of functions.
2. The first two lines of the example define the required data structures. The data type called TIMER_Handle is defined in CSL. Essentially, it is used to point to one of the timers (as we will see later). Not all CSL modules require the use of a handle (i.e. pointer), only those peripherals where there is more than one resource. For example, timers, DMA, EDMA, McBSP, etc. The handle is used to specify which one of the timers you are working with. The second line defines a data structure. The variable name is myConfig and its data type is TIMER_Config. Again, this data type is defined in the CSL. This variable represents a C data
2 - 10
structure that will be used to define a timer configuration. In other words, all the values you would need to program the timer peripheral are stored in this structure. Note: The myHandle and myConfig names are arbitrary. We could have called them julie and frank. The choice is yours.
3. The third line of the example contains code that opens the peripheral. In this case, open means two things: The CSL code checks to see if the specified resource has already been opened. In other words, is the resource available? In this example, we have requested TIMER_DEVANY, which means we are asking for any available timer. If it is available, the timer resource is marked as being used and a pointer to the specific timer is returned as a TIMER_Handle. If the specific resource has already been opened, then the function returns INV for invalid. Your code could check if the INV error code has been returned. (We didnt do that in our example since there wasnt much room on the slide.
Where does the CSL keep track of opened resources? The CSL maintains a series of data elements (you could think of them as flags) to keep track of this information. Could you have done this? Yes, you probably could; but isnt it nice to have this code already written for you? Even further, if you later decide to use this code on another C6000 processor, you wont have to find and change all the resource management code. You only need to indicate to the CSL that you have switched CHIPs and the rest is done for you automatically. 4. The fourth line of the example configures the peripheral. In this case, the timer specified by myHandle is configured with the myConfig data structure. The actual CSL code copies each of the values in the myConfig data structure to the appropriate memory-mapped peripheral registers. (How many times have I had to write this kind of code in assembly. Id be pouring over the reference guide trying to type in all the bits and memory addresses without making a typo mistake which, of course, happened too often.) 5. Finally, the last line of code is an example of how to use the peripheral. There are many functions that allow you to easily use the peripheral. In the case of the timer, we can: start, stop, pause, etc. With the McBSP serial port, you could: read, write, reset, etc. Even if you have never written code along these lines before, you will find it quickly becomes secondnature. And if ease-of-use wasnt enough reason to use CSL, the reliability, portability, and optimization features of CSL will make you never want to go back to the old ways.
2 - 11
2 - 12
CSL and BSL Documentation
C6000 CSL Documentation

CSL Reference Guide: CCS Help files: SPRU401.PDF CC_C64xW.HLP or CC_C67xW.HLP
Examine source code (header files) located at:

C:\ti\c6000\bios\include\csl_<MODULE>.h e.g. C:\ti\c6000\bios\include\csl_mcbsp.h
BSL Documentation
DSK Board Support Library C6416DSK.HLP/C6713DSK.HLP BSL Help file Review the header source files (*.h)
e.g. C:\ti\c6000\dsk6416\include\dsk6416_<mod>.h C:\ti\c6000\dsk6416\include\dsk6416_aic23.h
2 - 13

How do Interrupts Work?
1. An interrupt occurs 3. CPU acknowledges interrupt and
Stops what it is doing Turns off interrupts globally Clears flag in register Saves return-to location Determines which interrupt Calls ISR
DMA HPI Timers Ext pins Etc.
4. ISR (Interrupt Service Routine)

Saves context of system* Runs your interrupt code (ISR) Restores context of system* Continues where left off*
2. Sets a flag in a register ...
* Can be performed by DSP/BIOS dispatcher
When an interrupt occurs (step 1), the corresponding bit is set in the Interrupt Flag Register (step 2). If the interrupt is enabled as shown in the next figure, the CPU will automatically acknowledge and respond to the interrupt (step 3 above). Finally, the process reaches the Interrupt Service Routine (ISR), which you have to write. The ISR can be written in assembly or C, though nowadays most programmers choose to write their routines in C. There are a few methods of handling the context save and restore within your ISR; in this workshop we will show you the easiest, most robust method: DSP/BIOS Hardware Interrupt Dispatcher
2 - 14
Enabling Interrupts
Receiving Interrupts
IFR
Interrupt Flag
(ext int pin 4)
IER
Individual Enable
GIE
Master Enable
EXTINT4 XINT1 etc
(McBSP1 xmit)
1
0
C6000 CPU
Interrupt Flag Reg (IFR) bit set when int occurs Interrupt Enable Reg (IER) enables individual ints
IRQ_enable(IRQ_EVT_XINT2) IRQ_enable(IRQ_EVT_XINT1)
Global Interrupt Enable (GIE) bit in Control Status Reg enables all IER-enabled interrupts
IRQ_globalEnable() IRQ_globalDisable()
The above diagram shows the logic flow an interrupt signal goes through to reach the CPU. As you can see, there are two switches that must be enabled for the CPU to respond to an interrupt. Individual enable Global enable
The diagram also shows the CSL functions that can be used to enable interrupts. A couple notes: CSL enumerates each interrupt event; that is, we give each one its own name. The example above demonstrates the event name for the McBSP2 transmit interrupt and the McBSP1 transmit event. While its handy to have CSL functions for enabling/disabling the global interrupt enable, you may not ever need to call them yourself. First, interrupts are automatically enabled when the DSP/BIOS scheduler is started (which occurs when you return from main). Second, the DSP/BIOS interrupt dispatcher handles all the necessary global interrupt enable/disable required when going into and out-of an ISR. (The dispatcher even makes nesting interrupts very easy even when writing ISRs in C.)
2 - 15
Lab 2
Lab 2
In this lab, we're going to use all of the information that we discussed in this chapter to send sine wave samples out through the McBSP connected to the AIC23 codec. We are going to use a HWI to synchronize the CPU to the codec rate.
Lab 2 - Output a Sinewave

DSK DSP
McBSP
AIC
sineGen
CPU HWI
transmit interrupt
Use hardware interrupts to output a sine wave to the codec
Here are the goals of this lab: Use the BSL for the DSK to open the codec Use the Configuration Tool to set up a HWI for the McBSP Write generated sine wave values to the codec
2 - 16
Lab 2
The Paperwork
To get started, we are going to take a moment to think about what we need to do in this lab and put it down on paper. The file below is a copy of what you will need to write to finish this lab. Take a moment to figure out the value for each blank line, before moving on to enter the code on the computer. This way, you can think about what you are doing before you actually need to do it. In order to fill in the blanks, you may need some help with the DSK's BSL. The good news is that excellent documentation for the BSL comes with DSK and Code Composer Studio. You just need to find it. Follow these steps to find the documentation for the BSL. 1. Open Code Composer Studio. Use the desktop icon to open CCS. 2. Open up the CCS Help File. Help Contents You should see something like this:
Take a look at the codec API summary. This lists most of the information that you will need to complete the lab. 3. Please fill in the blanks in the file on the next page.
2 - 17
Lab 2
lab2.c
1
#include " #include " .h" .h" // need DSK specific header file // need AIC23 specific header file
// Codec configuration settings
DSK6
_AIC23_Config config = { \
// which DSK are you using?
0x0017, /* 0x0017, /* headsetVol, headsetVol, 0x0011, /* 0x0000, /* 0x0000, /* 0x0043, /* 0x0081, /* 0x0001 /*
0 DSK6416_AIC23_LEFTINVOL Left line input channel volume */ \ 1 DSK6416_AIC23_RIGHTINVOL Right line input channel volume */\ /* 2 DSK6416_AIC23_LEFTHPVOL Left channel headphone vol */ \ /* 3 DSK6416_AIC23_RIGHTHPVOL Right channel headphone vol */ \ 4 DSK6416_AIC23_ANAPATH Analog audio path control */ \ 5 DSK6416_AIC23_DIGPATH Digital audio path control */ \ 6 DSK6416_AIC23_POWERDOWN Power down control */ \ 7 DSK6416_AIC23_DIGIF Digital audio interface format */ \ 8 DSK6416_AIC23_SAMPLERATE Sample rate control */ \ 9 DSK6416_AIC23_DIGACT Digital interface activation */ \
}; // Declare BSL Handle for AIC23 Codec
3
/* * main() - Main code routine, initializes BSL and a hardware interrupt */ void main() { // Initialize the board support library, this must be called first
4 5 6
// Open the codec // Enable the McBSP interrupt for IRQ_EVT_XINT2 (for 6416 DSK) // or Enable the McBSP interrupt for IRQ_EVT_XINT1 (for 6713 DSK) // Invoke DSP/BIOS scheduler return; } /* * myHWI() - ISR called when the McBSP wants more data */ void myHWI(void) { static short mySample; static int leftChan = 1; if(leftChan) { mySample = sineGen(); leftChan = 0; } else { leftChan = 1; }
// Send a sample to the McBSP (which then sends it to the AIC23 codec) }
2 - 18
Lab 2
lab2.c (hints)
The 6416 (or C6713) DSK Board Support Library is divided into several modules, each of which has its own include file. First of all, the file dsk6416.h (or dsk6713.h) must be included in every program that uses the BSL. You also need to include the header file for the AIC23 codec since this is the BSL module used in this exercise.
We created a structure called config which has the parameters needed to initialize the AIC23 codec. BSL creates a new datatype for this configuration information. We left part of this blank. The main reason we left it blank was because it the remaining three characters are specific to the DSK you are using (either C6416 or C6713). DSK6???_AIC23_Config By the way, to understand the values we chose for the AIC initialization; please refer to the DSKs help file. You need declare a handle for AIC23 codec. This step is similar to part of Step 1 as described for the generic CSL procedure (page 2-10). Similar to above, BSL creates a new datatype for the handle of the AIC23 codec. The BSL library contains a BSL function to initialize itself. It must be called before any other BSL function. Take a look at this function in the DSKs help file.
Next, you need to open the codec. The BSL function that opens the codec returns a handle. This step is similar to part of Step 3 as described for the generic CSL procedure (page 2-10). FYI, the BSL function that opens the codec actually does a number of things: Opens both of the McBSPs it requires Configures both McBSPs Configures the AIC23 Returns a handle to the AIC23 (which essentially points to the two McBSPs)
Looking at the block diagram for this lab (pg 2-16), we can see that were using the McBSP transmit interrupt to tell the CPU when to create another sine wave value and output it to the codec. To allow this to happen, we must enable the McBSP transmit interrupt as shown on page pg 2-15. In this same diagram we listed the CSL function used to enable individual interrupts. Remember, though, that each DSK (C6416 vs C6713) uses a different McBSP to talk to the codec. In other words, you need to choose the correct transmit event name based upon which DSK you are using. (Rather than making you look up the event names for the McBSP transmit interrupts, we have provided the names for you in the code's comments.) Note: The GIE bit is enabled automatically when you exit main() and return to the DSP/BIOS scheduler. Once more, look for a BSL function which writes a sample to the codec. You should be able to find this in the DSK help file. Note: What really happens is that the codec write function writes the value to the McBSP data transmit register (DXR), which then sends it to the AIC23 codec, which then converts it to the analog signal which we hear.
2 - 19
Lab 2
lab2.c (answers)
Note: For the C6713 DSK, just replace 6416 with 6713 in all the answers below.
#include " #include "
dsk6416
.h"
// need DSK specific header file // need AIC23 specific header file
dsk6416_aic23 .h"
// Codec configuration settings
DSK
6416 _AIC23_Config config = { \
// which DSK are you using?
0x0017, /* 0x0017, /* headsetVol, headsetVol, ...
0 DSK6416_AIC23_LEFTINVOL Left line input channel volume */ \ 1 DSK6416_AIC23_RIGHTINVOL Right line input channel volume */\ /* 2 DSK6416_AIC23_LEFTHPVOL Left channel headphone vol */ \ /* 3 DSK6416_AIC23_RIGHTHPVOL Right channel headphone vol */ \
};
// Declare BSL Handle for AIC23 Codec
DSK6416_AIC23_CodecHandle hCodec;
/* * main() - Main code routine, initializes BSL and a hardware interrupt */ void main() { // Initialize the board support library, this must be called first
4 5
DSK6416_init();
// Open the codec
hCodec = DSK6416_AIC23_openCodec(0, &config);

// Enable the McBSP interrupt for IRQ_EVT_XINT2 (for 6416 DSK) // Enable the McBSP interrupt for IRQ_EVT_XINT1 (for 6713 DSK)
IRQ_enable(IRQ_EVT_XINT2);
or IRQ_enable(IRQ_EVT_XINT1);
64
// Invoke DSP/BIOS scheduler return; } /* * myHWI() - ISR called when the McBSP wants more data */ void myHWI(void) { static short mySample; static int leftChan = 1; if(leftChan) { mySample = sineGen(); leftChan = 0; } else { leftChan = 1; }
67
7
}
// Send a sample to the McBSP (which then sends it to the AIC23 codec) DSK6416_AIC23_write(hCodec, mySample);
2 - 20
Lab 2
Lab2 Procedure
4. Create a new project called LAB2.PJT in the C:\c60001day\labs\lab2 subdirectory. Project New You will encounter the Project Creation dialog. Fill in the Project Name and Location as shown below:
If you are using the C6713 DSK, the Target field should read: TMS320C67XX
5. You can also use the button to specify the correct path.
Create a CDB file

As mentioned during the discussion, configuration database files (*.CDB), created by the Config Tool, control a range of CCS capabilities. In this lab, the CDB file will be used to automatically create the reset vector and perform memory management.
6. Create a new CDB file (DSP/BIOS Configuration) as shown below:
2 - 21
Lab 2
7. CCS allows you to select a template configuration file. Since no simulator specific CDB template is available, well choose the dsk6416.cdb or dsk6713.cdb template.
Select the dsk6416.cdb (or dsk6713.cdb) template
If you are using the C6713 DSK, please choose its template file: dsk6713.cdb
Note:
In some TI classrooms you may see two or more tabs of CDB templates; e.g. TMS6xxx, TMS54xx, etc. If you experience this, just choose the C6xxx tab and make your selection.
The CDB templates automate the process of setting up numerous system objects/parameters. Those shown above are shipped with the C6416 DSK. You can create your own CDB templates, just copy a CDB file you have created to the directory where the above files are stored (C:\ti\c6000\bios\include). 8. While there are many objects displayed in the configuration editor, we only need to configure one of them (which well do starting in step 15). The other dsk6416 (or dsk6713) defaults will work fine.
9. Save your CDB file as LAB2.CDB in C:\c60001day\labs\lab2 directory. File Save As
2 - 22
Lab 2
Adding files to the project

10. You can add files to a project in one of three ways:
Select the menu Project Add files to Project Right-click the project icon in the Project Explorer window and select Add files Drag and drop files from Windows Explorer onto the project icon
Using one of these methods, add the following files from C:\c60001day\labs\lab2 to your project:
LAB2.C LAB2.CDB LAB2cfg.CMD block_sine.c
Add BSL Library to the project

11. We also need to add the DSK BSL library file to our project. Use one of the methods from step 10 to add one of the two files to your project. (Please choose the library appropriate to your DSK). c:\ti\c6000\dsk6416\lib\dsk6416bsl.lib or c:\ti\c6000\dsk6713\lib\dsk6713bsl.lib
Edit Files
12. Open lab2.c for editing by double-clicking on it in the Project Explorer pane. 13. Use your answers from the paperwork exercise back on page 18 to make the appropriate changes to lab2.c. You should find a place commented in the file to make each of the changes (one change has question marks for you to replace). If you have any questions, feel free to ask your instructor for help. 14. When you're done, save lab2.c.
Configure a HWI
15. Open lab2.cdb. 16. Navigate to the Scheduling folder inside the Configuration Tool. 17. Inside this folder, find the "HWI Hardware Interrupt Service Routine Manager" and open it by clicking on the little + sign next to it. 18. You can pick any interrupt, from HWI_INT4 to HWI_INT15, that you want to use for the lab. The lab instructions are going to use HWI_INT12 (this was an arbitrary choice).
2 - 23
Lab 2
19. Open the properties of the interrupt that you chose by right-clicking and choosing Properties.
20. Change the interrupt source of the HWI interrupt number that you have selected to: C6416: McBSP 2 Transmit Interrupt (MCSP_2_Transmit) C6713: McBSP 1 Transmit Interrupt (MCSP_1_Transmit)
64
67
2 - 24
Lab 2
21. Change the function property of the interrupt to call the myHWI() function (defined in lab2.c). You will need to add an underscore in front of the function name since it is a C function. Here's what it should look like: From the previous step. if you are using the C6713 DSK, the interrupt source should be: MCSP_1_Transmit
Note: The TI C compiler (as with most compilers) differentiate C source labels from assembly source labels by prepending an _ to all C labels as it generates assembly code. In this dialog box, the HWI function property requires an assembly label; hence, we need the underscore. 22. Click on the Dispatcher tab in the interrupt properties. We want to use DSP/BIOS's HWI Dispatcher to take care of everything (i.e. context save/restore) for the ISR. Enable the Dispatcher by clicking on the check box:
23. When you're all done with the changes, click on OK to save the HWI_INT properties. 24. Save the changes that you made to the .cdb file. Go ahead and close the .cdb file.
2 - 25
Lab 2
Build and Run the code

25. Build your code. Project Build or click on 26. If your program doesn't load automatically, then load the program. File Load Program 27. Run the program. Debug Run or click on If everything is working correctly, you should hear a 500 Hz tone coming from the speaker or headphones connected to the DSK. If you don't, make sure everything is connected up to the DSK correctly. If you're having trouble, ask your facilitator for help.
2 - 26
Lab2a (optional)
Lab2a (optional)
Now that you've successfully got the DSK to spit out some sound, wouldn't you like to be able to turn it off? Is there anything on the DSK that we might be able to use as a switch to turn the sine wave on and off? Yeah, the DIP switches could be used to do that. Now, if we only had a function that made it easy to read one of the DIP switches. Do you think the BSL might have something like that? If so, you could simply read the DIP switch then decide whether to send a new sine wave sample or a 0 to the codec, effectively turning the sine wave off.
Take Home Lab 2a Use DIP Switch

DSK DSP
McBSP
AIC
sineGen
CPU HWI
transmit interrupt
Use DIP switch on DSK to turn the sine tone on/off
Here are the basic steps that you'll need to perform in order to do this: In the HWI routine, right before you write the new sine wave to the codec, read a DIP switch. If the DIP switch is on, then write the sine wave value that you calculated. If the DIP switch is off, simply write a 0 to the codec
Dont forget to add any necessary header files to your C file.
2 - 27
Lab 2 Debrief
Lab 2 Debrief
Lab 2 Debrief
1.
First, lets quickly review the values we filled-in. Click Here to Open Lab2.c How much differs between the C6713 and C6416 solutions? What would be the benefit if we could eliminate hardware specific references in our code?
2. 3.
1. Please refer to the solutions file for the results. 2. The differences are: The BSL calls we had you complete in Lab2.c. The CDB template file. The reference to the BSL library.
3. If we eliminate the hardware specific references, we would could write and maintain a single piece of code for all C6000 platforms. This highlights two key points: One, the consistency between families of the C6000 architecture makes porting code between them very easy. Even more important, if you learn one family, youve basically learned all of them. The increased modularity and reuse of a single code-base used across multiple families usually enhances the stability and robustness of the code. DSP/BIOS device drivers (SIO/PIP/IOM) are discussed briefly in the next chapter. These can allow us to achieve hardware independence in our code.
2 - 28
eXpressDSP Tools
Introduction
TI provides solutions to DSP engineers facing ever increasing complexity in their systems. Providing efficient, capable code libraries, I/O driver schemes, certified Algorithm Standards, and extraordinarily robust starter applications (reference designs, so to speak). While a single chapter in a one-day workshop cannot begin to describe the many details of these tools and libraries, hopefully we can give you a sense of what they offer and how they might help you finish your designs more quickly.
Outline
Outline
Overview of eXpressDSP
DSP/BIOS Scheduler Real Time Analysis Device Drivers (IOM) Algorithm Standard (XDAIS) Reference Frameworks (RF)
eXpressDSP Demo (based on RF3)
C6416/C6713 DSK One-Day Workshop - eXpressDSP Tools
3-1
What is eXpressDSP?
Chapter Topics
eXpressDSP Tools ..................................................................................................................................... 3-1 What is eXpressDSP? ............................................................................................................................. 3-3 DSP/BIOS ............................................................................................................................................... 3-4 DSP/BIOS Scheduler.......................................................................................................................... 3-4 Real-Time Analysis ...........................................................................................................................3-11 Device Drivers (IOM) .......................................................................................................................3-13 TMS320 DSP Algorithm Standard (XDAIS) ..........................................................................................3-19 Introduction .......................................................................................................................................3-19 XDAIS (background info) .................................................................................................................3-20 Thousands of XDAIS Compliant Algorithms ...................................................................................3-24 Reference Frameworks ..........................................................................................................................3-25 RF3 Demo ..............................................................................................................................................3-27 Inspect the .cdb file............................................................................................................................3-28 Use Real-time Analysis Tools ...........................................................................................................3-35 Flashing RF3 .........................................................................................................................................3-39 Create the Flash Image ......................................................................................................................3-39 Use Flashburn to Burn the Image ......................................................................................................3-41 Flashing POST...................................................................................................................................3-42 Lab/Demo Debrief .................................................................................................................................3-44 eXpressDSP Summary ...........................................................................................................................3-44
3-2
What is eXpressDSP?
What is eXpressDSP?
What is eXpress DSP?
A premier, open DSP software strategy for TIs Leadership TMS320 DSP Family
3rd Party Network
CCS
DSP/BIOS
XDAIS
Target Content
3-3
DSP/BIOS
DSP/BIOS
DSP BIOS Consists Of:
Real-time analysis tools Allows application to run uninterrupted while displaying debug data Real-time scheduler Preemptive thread mgmt kernel Real-time I/O (Drivers) Allows two-way communication between threads or between threads and hardware
DSP/BIOS Scheduler
DSP/BIOS Thread Types

HWI
Hardware Interrupts Used to implement 'urgent' part of real-time event Triggered by hardware interrupt HWI priorities set by hardware Use SWI to perform HWI 'follow-up' activity SWI's are 'posted' by software Multiple SWIs at each of 15 priority levels Use TSK to run different programs concurrently under separate contexts TSK's are usually enabled to run by posting a 'semaphore (a task signaling mechanism) Multiple IDL functions Runs as an infinite loop, like traditional while loop
SWI Priority
Software Interrupts
TSK
Tasks
IDL
Background
3-4
DSP/BIOS
DSP/BIOS Scheduler (background information)

Example Multi-Rate System
Algo1 Previous Requirement Lets say we originally had only one algorithm to execute New Requirement Then were asked to add a second algorithm TI DSP Algo2 independent of Algo1 Issues: Do we have enough bandwidth (MIPS)? Will one routine conflict with the other?
Algo2
While Loop?
Possible Solution While Loop
main { while(1) { Algo1 Algo2 } } Put each routine into an endless loop under main Potential Problems: Algos run at different rates: Algo1: 8kHz Algo2: 4Hz What if one algorithm starves the other for recognition or delays its response?
3-5
DSP/BIOS

Possible Solution Use Interrupts (HWI)
main { while(1); } Timer1_ISR { Algo1 } Timer2_ISR { Algo2 } Algo1: Algo2: An interrupt driven system places each function in its own ISR Period 125 s 250 ms Compute 7 s 100 ms CPU Usage 6% 40% 46%
running
idle
Time
Interrupt is missed
Allow Preemptive Interrupts - HWI

main { while(1); } Timer1_ISR { Algo1 } Timer2_ISR { Algo2 } Nested interrupts allow hardware interrupts to preempt each other
running idle
Time 0
Use DSP/BIOS HWI dispatcher for context save/restore, and allow preemption Reasonable approach if you have limited number of interrupts/functions Limitation: Number of HWI and their priorities are statically determined, only one HWI function for each interrupt
3-6
DSP/BIOS
Software Interrupts (SWI)

Use Software Interrupts - SWI
main { // return to O/S; } Make each algorithm an independent software interrupt SWI scheduling is handled by DSP/BIOS HWI function triggered by hardware SWI function triggered by software, e.g. a call to SWI_post() Why use a SWI? No limitation on number of SWIs, and priorities for SWIs are user-defined! SWI can be scheduled by hardware or software event(s) Defer processing from HWI to SWI
DSP/BIOS
Algo1 Algo2
Interrupt HWI: urgent code SWI_post();
HWIs signaling SWIs
SWI read serial port ints disabled process data (filter, etc.) rather than all this time
HWI
Fast response to interrupts Minimal context switching High priority only Can post SWI Could miss an interrupt while executing ISR
SWI
Latency in response time Context switch performed Selectable priority levels Can post another SWI Execution managed by scheduler
3-7
DSP/BIOS
Tasks (TSK)
Another Solution Tasks (TSK)
main { // return to O/S; } DSPBIOS tasks (TSK) are similar to SWI, but offer additional flexibility TSK is more like traditional O/S task Tradeoffs: SWI context switch is faster than TSK TSK module requires more code space TSKs have their own stack User preference and system needs usually dictates choice. Its easy to use both!
DSP/BIOS
Algo1 Algo2
SWIs and TSKs

SWI start
run to completion SEM_pend Pause (blocked state)
SWI_post
TSK
SEM_post
start end
end
Similar to hardware interrupt, but triggered by SWI_post() All SWI's share system software stack
SEM_post() triggers execution Each TSK has its own stack, which allows them to pause
3-8
DSP/BIOS
Enabling BIOS Return from main()

main { // return to BIOS } DSP BIOS
Algo1 Algo2 The while() loop is removed main() returns to BIOS IDLE allowing BIOS to schedule events , transfer info to host, etc A while() loop in main() will not allow BIOS to activate
BIOS Scheduler In Action

Priority Based Thread Scheduling
(highest)
HWI 2 HWI 1 SWI 3 SWI 2 SWI 1 MAIN IDLE
post3 rtn SWI_post(&swi2); post2 rtn post1 int2 rtn rtn rtn rtn int1 User sets the priority...BIOS does the scheduling
(lowest)
3-9
DSP/BIOS
SWI Properites & Setting their Priorities

SWI Properties
Managing SWI Priority
Drag and Drop SWIs to change priority Drag and Drop SWIs to change priority Equal priority SWIs run in the order that Equal priority SWIs run in the order that they are posted they are posted
3 - 10
DSP/BIOS
Real-Time Analysis

Real-time analysis tools Allows application to run uninterrupted while displaying debug data
Real-Time Analysis Tools

Gather data on target Send data during BIOS IDLE Format data on host (3-10 CPU cycles) (100s of non-critical cycles) (1000s of host PC cycles)
Data gathering does NOT stop target CPU
Execution Graph
Software logic analyzer Debug event timing and priority
CPU Load Graph

Analyze time NOT spent in IDLE
3 - 11
DSP/BIOS
Built-in Real-Time Analysis Tools

Statistics View
Profile routines w/o halting the CPU Capture & analyze data without stopping CPU
Message LOG
Send debug msgs to host Doesnt halt the DSP Deterministic, low DSP cycle count More efficient than traditional printf() LOG_printf (&logTrace, addSine ENabled);
RTDX: Real-Time Data Exchange

RTDX enables non-obtrusive two-way communication between the host PC and the DSP (during IDLE) Transfer speed dependent on JTAG bandwidth, connection type (parallel vs. XDS) and DSP activity level Transfers made via RTDX calls in DSP application code
PC
Display TI 3rd Party Third Party
Display
TMS320 DSP
JTAG USER CODE RTDX EMU
User
CCS
3 - 12
DSP/BIOS
Device Drivers (IOM)

Real-time analysis tools Allows application to run uninterrupted while displaying debug data Real-time scheduler Preemptive thread mgmt kernel Real-time I/O (Drivers) Allows two-way communication between threads or between threads and hardware
Access Hardware Directly or Use a Device Driver?
Directly Accessing Hardware

DSK6416_AIC23_
App
Codec
McBSP
void audioLoopBack() { DSK6416_AIC23_CodecHandle hCodec; short buf[64]; int N; DSK6416_init(); /* Start the codec */ hCodec = DSK6416_AIC23_openCodec(0, &config); while () { for (N = 0; N < 64; N++) { while (!DSK6416_AIC23_read(hCodec, buf[N])); while (!DSK6416_AIC23_write(hCodec, buf[N]));
} } }
App writes to hardware directly using the the specific targets BSL functions Every application needs to be customized to hardware: You must change each instance of DSK6416_xxx to another function call every time you port the code Portability suffers
3 - 13
DSP/BIOS
Abstracting Hardware with Drivers

IOM Codec McBSP
SIO or PIP
App
SWI or TSK
Device Drivers standardize the interface between the Application and the H/W Application programmer only has to know PIP or SIO (no matter what H/W is connected) The H/W can be changed without changing the Application (only need to change IOM included in project) Therefore, Drivers (SIO/PIP with IOM) insulate the Application from the hardwares details
DSP/BIOS I/O Models (background information)

BIOS I/O Models
DSP/BIOS Thread Types TSK or SWI SWI
DSP/BIOS provided Class Drivers Mini-Driver (IOM)
SIO
PIP
Any mini-driver (IOM) can be used with any DSP/BIOS I/O model
Application Programmer chooses the preferred class driver Interface is consistent regardless of which device (mini-driver) connected Software interface doesnt change, even if you change the IOM device
3 - 14
DSP/BIOS
Pipe I/O Model (PIP)
Using DSP/BIOS PIP Driver

void audioLoopBack() { // get the full buffer from the receive PIP PIP_get(&pipRx); // get the empty buffer from the transmit PIP PIP_alloc(&pipTx); // Put the transmit buffer; Free the receive buffer PIP_put(&pipTx); PIP_free(&pipRx);
Think of PIP (pipe) as a buffer manager with built-in signaling: The reader gets signaled when data is available The writer gets signaled when the buffer is emptied Use it from SWI SWI or between SWI hardware A common interface for all C5000 & C6000 DSPs
Bottom Line: Application Code never changes even if you change H/W
Stream I/O Model (SIO)
Using SIO Drivers

// Run forever looping-back buffers for (;;){ // Reclaim full buffer from the input stream SIO_reclaim(inStream, (Ptr *)&inbuf, NULL)) // Reclaim empty buffer from output stream and reissue SIO_reclaim(outStream, (Ptr *)&outbuf, NULL) SIO_issue(outStream, inbuf, nmadus, NULL) // Issue an empty buffer to the input stream SIO_issue(inStream, inbuf, SIO_bufsize(inStream), NULL)
SIO (Stream I/O) is another DSP/BIOS device driver methodology SIO (Stream I/O) is another DSP/BIOS device driver methodology Think of issuing and reclaiming buffers from aastream Think of issuing and reclaiming buffers from stream Bottom Line: Application Code doesnt change even ififhardware does Bottom Line: Application Code doesnt change even hardware does
Some further notes about SIO: SIO (Stream I/O) is another DSP/BIOS device driver methodology Handles queuing of buffers to/from devices Issue a buffer to a stream Issue full buffer to a transmit stream Or, empty going to a receive stream Full from a receive stream Empty from a transmit stream
Conversely, reclaim a stream buffer
3 - 15
DSP/BIOS
And heres a little larger code sample we couldnt fit in the slide.
/ Prime the stream buf0 = (Ptr)MEM_calloc(0, 64, BUFALIGN); buf3 = (Ptr)MEM_calloc(0, 64, BUFALIGN); // Issue the two empty buffers to the input stream */ SIO_issue(inStream, buf0, SIO_bufsize(inStream), NULL); SIO_issue(inStream, buf1, SIO_bufsize(inStream), NULL); // Issue the two empty buffers to the output stream */ SIO_issue(outStream, buf2, SIO_bufsize(outStream), NULL); SIO_issue(outStream, buf3, SIO_bufsize(outStream), NULL); // Run forever looping-back buffers for (;;){ // Reclaim full buffer from the input stream SIO_reclaim(inStream, (Ptr *)&inbuf, NULL)) // Reclaim empty buffer from output stream and reissue SIO_reclaim(outStream, (Ptr *)&outbuf, NULL) SIO_issue(outStream, inbuf, nmadus, NULL) // Issue an empty buffer to the input stream SIO_issue(inStream, inbuf, SIO_bufsize(inStream), NULL) }
Please refer to the code examples that ship with the DSK for a full, working example using this code.
3 - 16
DSP/BIOS
A closer look at IOM And heres a closer look at the functions and data structures that make up an IOM driver.
Mini-Driver Interface (IOM)

Maximum Reuse and Portability
One I/O mini-driver (IOM) interface to support all TI Class drivers.
IOM Interface Consists Of:

Functions:
init function IOM_mdBindDev IOM_mdUnBindDev IOM_mdControlChan IOM_mdCreateChan IOM_mdDeleteChan IOM_mdSubmitChan interrupt routine (isr)
Data Structures:
BIOS Device Table IOM function table Dev params Global Data Pointer (device inst. obj.) Channel Params Channel Instance Obj. IOM_Packet (aka IOP)
Driver Developement Kit (DDK)
Driver Developer Kit (DDK) Support

Video Platform* Capture / Display 6711 DSK 6713 DSK 6416 VT1420 6416 TEB 6416 DSK DM642 EVM Beta Beta PCM3002 AIC23 Beta AIC23 Beta AIC23 3rd Party Solution External PCI EMAC External McBSP AD535 AIC23 McASP H/W UART (External) S/W UART Utopia
* We have only included C6000 systems in this table
Provided Royalty Free Requires CCS v2.2 or greater Search for DDK on the TI website to download
DDK v1.0 DDK v1.0 DDK v1.1 DDK v1.1 DDK v1.2 (3Q03) DDK v1.2 (3Q03)
3 - 17
DSP/BIOS
Software Foundation Layers

As you can see, the software libraries and models discussed over the past two chapters build upon each other:
TI Software Foundation Model

DSP/BIOS Communications Layer SIO / PIP IOM
Board Support Layer
DSKs BSL
Chip Support Layer
TI CSL
3 - 18
TMS320 DSP Algorithm Standard (XDAIS)

Introduction
Buying Algorithms
Why is it hard to integrate someone elses algo? 1. Dont know how fast it runs or how much memory it uses. 2. How can I adapt the algorithm to meet my needs? 3. Will the function names conflict with other code in the system? 4. Will it use memory or peripherals needed by other algos? 5. How can I run the same algo on more than one channel at a time? (How can I prevent variables from conflicting?) 6. How many interfaces (APIs) do I have to learn? Traditional Solution When I buy an algorithm, I need the source code (and lots of development time) or I cant guarantee it will work. But, purchasing source code costs a lot of money!
TI TMS320 DSP Algorithm Standard

ALGORITHM PRODUCERS TEXAS INSTRUMENTS TMS320 DSP Algorithm Standard Specification (XDAIS) Rules & Guidelines Applied to Algorithm Software Modules Programming Rules Standard Interface Defined by TI Algorithm Packaging Algorithm Performance SYSTEM INTEGRATORS
Algorithm
Application Off-the-shelf DSP content
Write once, deploy widely
Ease of integration Purchase once, use widely
3 - 19
Overview of the XDAIS Rules

General Good Citizen Software Coding Rules
C callable & Reentrant Naming conventions enforced to avoid symbol clashes No direct peripheral interface or memory allocation Relocatable data and code in both static and dynamic systems No thread scheduling nor any awareness of controlling app Pure data transducer; cannot alter the DSP environment
Standard Algorithm Interface defined by TI

Defines a memory management protocol between application and algorithm for all compliant algorithm modules
Packaging Rules
All algorithms packaged and delivered in a consistent format
Documentation Rules
Algorithms must provide basic memory and performance information to enable apples to apples comparisons and to aid system designers with algorithm integration
XDAIS (background info)

Lets look at six problems XDAIS solves step by step
XDAIS Solution (1)

1.
Dont know how fast it runs or how much memory it uses.

Strict rules on vendor-provided documentation (PDF file).
3 - 20
XDAIS Solution (2)

2.
How can I adapt the algorithm to meet my needs?

Vendor supplies params structure to allow user to describe any user-changable algorithm parameters. For Example, a filter called IFIR might have: typedef struct IFIR_Params { typedef struct IFIR_Params { Int size; Int size; XDAS_Int16 firLen; XDAS_Int16 firLen; XDAS_Int16 blockSize; XDAS_Int16 blockSize; XDAS_Int16 * coeffPtr; XDAS_Int16 * coeffPtr; } IFIR_Params; } IFIR_Params;
XDAIS Solution (3)

3.
Will the function names conflict with other code in the system?
Algorithm must be C callable and re-entrant Strict rules on function naming virtually eliminate conflicts.
fir_company123_min.l64 fir_company123_max.h62
Algorithm Module Name Vendor Name Variant L: library h: header 62: C62x/C67x 64: C64x
XDAIS Solution (4)

4.
Will it use memory or peripherals needed by other algos?

Application controls all peripherals and memory Algorithms cannot access peripherals directly Algorithms cannot allocate their own memory Pre-defined XDAIS functions provide a common method for algorithms to request resources:
Algorithm
During algo startup, it requests any memory it requires
Application
(framework)
malloc()
Memory
*ptr *ptr
Application grants memory via *address
3 - 21
This type of dynamic instantiation sounds great, but what if I want to allocate my memory statically? No problem, XDAIS algorithms can be designed to work both ways. And theres even a utility that will interrogate an algorithm and create a C file containing all the required memory elements of an algorithm. That is, well even help you with your static instantiation.
Supports Static & Dynamic Instances

Static Framework
(algorithm lifecycle)
Dynamic
algNumAlloc algAlloc algInit algActivate Filter algDeactivate algFree
Create
algInit
Filter
Execute Delete
*Note: Static case can also use algActivate if algo uses scratch memory
Heres a more detailed look at the process of creating an instantiation of an algorithm. (Sorry we dont have time to go through this example in class.)
Instance Creation - start

Application Framework 1. Heres the way I want you to perform
*params = malloc(x); params=PARAMS;
Algorithm
Params
2. How many blocks of memory will you need to do this for me? 4. Ill make a place where you can tell me about your memory needs
*memTab = malloc(5*N)
algNumAlloc() N 3. Ill need N blocks of memory to do what youve specified
MemTab
3 - 22
Instance Creation - finish

Application Framework 5. Tell me about your memory requirements algAlloc() Params MemTab
size alignment type space *base
Algorithm 6. My needs, given these parameters, are this, for each of the N blocks of memory InstObj
Param1 Param2 Base1 Base2
7.Ill go get the memory you need

for(i=0;i<=N;i++) base=malloc(size);
N algInit()
8. Prepare an instance to run!
9. Copy Params and memory bases into my instance object
XDAIS Solution (5)

5.
If I want to run the same algo on more than one channel How can I prevent variables from conflicting with each other?
Each algorithm gets its own storage location called an instance object.
IFIR algorithm: Instance 1

instObj1
*fxns Pointer to algo functions *a *x

Pointer to coefficients Pointer to new data buffer
IFIR algorithm: Instance 2

instObj2
*fxns *a *x
XDAIS Solution (6)

6.
How many interfaces (APIs) do I have to learn?

Only one XDAIS! And, TI provides a tool that essentially writes the XDAIS interface, though you still need to add your magic.
3 - 23
And finally, heres a little diagram showing most of the XDAIS algorithm interface.
XDAIS Summary
instance handle Key: User Vendor Module XDAIS
fxns
IFIR_Fxns alg ... alg ... algInit alg filter
Program Memory ... FIR_TTO_initObj FIR_TTO_filter
memTab ... ...
params
firHandle->fxns=&FIR_TTO_IFIR; firHandle->fxns->ialg.algInit((IALG_Handle)firHandle, memTab,NULL,(IALG_Params *)&firParams); firHandle->fxns->filter(firHandle,processSrc,processDst);
Thousands of XDAIS Compliant Algorithms

Tools of the Trade
3rd Party XDIAS Compliant Algos

Make or buy
> 650 companies in 3rd party network > 1000 algorithms from > 100 unique 3rd parties
3 - 24
Reference Frameworks
IOM and XDAIS: Common Interfaces
System I O M System Software
Data Init Mem. Mgmt.
(Peripherals)
H/W
X D A I S
Algorithm
With standardized interfaces to Algorithms and H/W, system software (i.e. framework) can also be standardized A standard framework can be used as a starting point for many different Applications
Currently, three generic frameworks are available Also, application specific frameworks available (or coming) for specific applications (audio, video, etc.)
t pac Com le xib Fle ens Ex t
ive
te d nec Con
Design Parameter
Static Configuration Dynamic Object Creation Static Memory Management Dynamic Memory Allocation Recommended # of Channels Recommended # of XDAIS Algos Absolute Minimum Footprint Single/Multi Rate Operation Thread Preemption and Blocking Implements Control Functionality Supports Implements DSPLink (DSPGPP) Total Memory Footprint (less algos) Processor Family Supported
RF1
RF3
RF5
RF6
1 to 3 1 to 3 single
1 to 10+ 1 to 10+ multi
1 to 100 1 to 100 multi
1 to 100 1 to 100 multi
HWI
HWI, SWI
HWI, SWI, TSK
HWI, SWI, TSK
3.5KW C5000
11KW C5000 C6000
25KW C5000 C6000
tbd None Currently
Planned, but not yet available
3 - 25
RF3 Block Diagram (out of the box)

clkControl Control Thread
(swiControl)
Memory
Host (GEL)
FIR
Vol Join SWI PIP
In
IOM
PIP
Split SWI
SWI Audio 0 FIR Vol
Out
IOM
SWI Audio 1
IOM Drivers for input/output Two processing threads with generic algorithms Split/Join threads used to simulate stereo codec. (On C6416/C6713 DSKs, how could we save cycles on split/join?)
The Reference Frameworks available today provide a SWI thread which creates the stereo audio used by the audio processing threads. This was necessary for the early DSKs since they only supported mono audio. In the case of mono audio input, the Split thread just duplicated the audio to both channels. Today, with stereo codecs the Split thread sorts the two incoming channels into two different channels. How could you make the above system more efficient? How about re-writing the IOM driver so that it uses the EDMA to perform the channel sorting. The IOM interface supports multichannels, thus you should be able to directly connect it to both of the Audio Processing PIPs.
3 - 26
RF3 Demo
RF3 Demo
Here are the steps the facilitator will go through during the in-class demo. These were included to allow you to go back through the demo at your own pace, and to explore further any additional aspects of RF3 that you find interesting. Note: This demo assumes that your Code Composer Studio installation is setup just like we did it back in Lab 1 and that the files that we provided you are installed on the computer that you are using. If either of these are NOT true, then you may have some difficulty with the following steps.
Opening the Project

1. We have provided the files needed for RF3. We have installed them in the default install directory for the Reference Frameworks to avoid any problems with relative directory names during build. So, you'll need to go to a different location than you did for the other labs in the workshop. Use the following command to open the RF3 project for the 6416 DSK which is located at: C:\c60001day\referenceframeworks\apps\rf3\dsk6416, or C:\c60001day\referenceframeworks\apps\rf3\dsk6713 Project Open 2. Once you have opened the project, the Project View window should look something like this:
3 - 27
RF3 Demo
Inspect the .cdb file

Most of RF3 is configured via the Configuration Tool. Since this file has a lot of the cool stuff provided by RF3, we're going to take a look at it first. 3. Open the apps.cdb file by double-clicking on it in the Project View.
Hardware Portability
One of the best characteristics of RF3 (and the other reference frameworks for that matter) is that they can easily be ported to a new hardware platform. Getting useful applications up and running on a new hardware design used to be a difficult task. RF3 is built using IOM drivers. All of the hardware specific code is encapsulated in the driver. Let's take a look at how this is done. 4. Inside the .cdb file, navigate to the udevCodec object which is located in the User-Defined Devices folder under Device Drivers. The Device Drivers folder is in the Input/Output folder. It looks something like this:
Most of the hardware specific information is contained in this one object. So, where is the other stuff? Here is a list of the few places that are hardware specific: The library that actually contains the code that the udevCodec object refers to is referenced in the linker command file: link.cmd. There is a C file, dsk6416_devParams.c (or dsk6713_devParams.c), that contains the parameters for how to setup the hardware controlled by the driver. One of these parameters is the hardware interrupt that will be used by the driver to synchronize with the CPU.
3 - 28
RF3 Demo
5. As we just mentioned, most of the hardware specific information needed to talk to the codec for RF3's audio is contained in this object. Open the properties of the object by right-clicking on it and choosing properties to see this information:
Obviously, the C6713 DSK version is similar but uses symbols that begin with:
_DSK6713
Since most of the hardware specific information is contained in this one object, it is the main place that needs to change in order to talk to new hardware. 6. Close the udevCodec Properties box by clicking "Cancel".
I/O Flexibility
The IOM driver interfaces to a DSP/BIOS PIP. A PIP is a simple buffer manager with synchronization capabilities. RF3 uses PIPs to flow data from hardware (an IOM driver) to software processing engines called threads. To see all of the PIPs and how they connect things together refer back to the diagram at the beginning of the lab. 7. Let's take a look at how a PIP connects a driver to a thread by looking at the properties of the receive PIP. This is the PIP that connects the input device driver (audio source) to the receive/split thread. Navigate to the PIP Buffered PIP Manager which is inside the Input/Output folder in the .cdb file. 8. Open the properties of the pipRx PIP by right-clicking on it and choosing properties.
3 - 29
RF3 Demo
9. Click on the "Notify Functions" tab. You should see something that looks like this:
Notifies the driver when an empty buffer is available. Notifies the thread when a full buffer is available.
This interface makes it easy to change who gets notified when a PIP needs to be written to or needs to be read. The thread structures that RF3 uses also make it easy to change which PIP a thread is talking to. All of these capabilities work together to make the RFs great places to start a design because they are powerful and easy to adapt to different needs. 10. Click "Cancel" to close the properties box. 11. Feel free to look at the other PIPs in the application if you'd like. Make sure to refer back to the block diagram at the beginning of this discussion to see how things fit into the big picture.
Processing Threads
Flexible I/O and drivers are important, but come on, the whole reason for their being is to feed data to functions for processing. RF3 uses DSP/BIOS Software Interrupts (SWIs) to run processing functions. SWIs are very similar to hardware interrupts (HWIs), but they are controlled by software through API calls by the program. For example, the SWI_andnHook() function in the PIP properties that we looked at earlier is a BIOS API call that notifies a SWI that one of the conditions that it needs to run has been met. When all of the conditions have been met, the SWI is readied by the scheduler and it runs when it is the highest priority thread that needs servicing. 12. Let's take a look at the SWIs in RF3. Navigate to the "SWI Software Interrupt Manager" that is located in the Scheduling folder in the .cdb file.
3 - 30
RF3 Demo
13. You should see the following SWI objects: Name swiAudioProc0 swiAudioProc1 swiControl swiRxSplit swiTxJoin Purpose Runs volume and filter for channel 0 Runs volume and filter for channel 1 Runs control thread periodically (more info. later) Splits incoming data flow into two channel flows Combines two channels into outgoing data flow
Don't forget to refer back to the block diagram and see how everything fits together. 14. Let's take a closer look at the swiRxSplit thread to see what function it calls when it runs. Open the properties of swiRxSplit by right-clicking on it and selecting properties. You should see something like this:
Here are some details on the different properties: Name comment function priority mailbox arg0, arg1 Purpose Allows user to comment the object in the .cdb file Function that is called by the SWI Priority that the SWI executes at Used with APIs to signal the SWI (more info. later) Arguments to function (i.e. _thrRxSplitRun(arg0,arg1) )
3 - 31
RF3 Demo
Sidebar: SWI Mailboxes

pipRx notifyReader: SWI_andnHook(swiRxSplit, 1)
swiRxSplit
mailbox
pipRx0 notifyWriter: SWI_andnHook(swiRxSplit, 2)
pipRx1 notifyWriter: SWI_andnHook(swiRxSplit, 3)
Each bit in the swiRxSplit mailbox represents a pre-condition. The SWI should only run when there is a full buffer to split and two empty buffers to fill. Three preconditions with a bit each needs three bits, or 0x7. The SWI_andnHook() function is a BIOS API call that can only be called from within BIOS (that's why we add Hook) and it essentially clears a bit in the mailbox when it runs. When all the bits are zero, the SWI is automatically scheduled to run. 15. When you're done examining the SWI object, click "Cancel" to close the window. 16. Take a moment to find all of the threads that are listed in the .cdb file in the block diagram.
Taking Control
RF3 has a built in control function to change the execution of the processing algorithms at runtime. It uses this thread to modify the volume that each of the channels is played at. It can easily be modified to do just about anything else that you might want to do to control your application. 17. The control function is executed by swiControl. Open and examine the properties of swiControl. 18. When should the application tell swiControl to run? Normally, a control thread would run when a user changed something. For example, turning up the volume on your MP3 player. Well, the DSK only has a few inputs and they're not really tied to the application. So, RF3 simulates user activity by calling the control thread on a periodic basis. RF3 calls swiControl from a Timer HWI routine that BIOS sets up. Navigate to the "CLK Clock Manager" in the Scheduling folder.
3 - 32
RF3 Demo
19. You should see a clkControl object. Open up its properties to examine them. You should see something like this:
The thrControlIsr() function reads the control values into a control structure then posts swiControl to apply the changes. 20. Click Cancel to close the clkControl Properties window.
Analyzing Priority
Since we've got all of these threads running around processing data and providing control, the question might come up about priority. So, let's take a look at how priorities are assigned in RF3. 21. DSP/BIOS makes it really easy to compare SWI priorities. Click on the SWI Software Interrupt Manager. You should see something like this:
From this picture of the .cdb file we can see that all of the threads are currently set to priority level 1. It turns out that RF3 doesn't really need any of the threads to run at different priorities. However, if your application did need to use priority, it is easy to make the changes here by dragging and dropping the SWI objects to the desired priority level.
3 - 33
RF3 Demo
Getting Feedback
RF3 uses a variety of DSP/BIOS Real-time Analysis Tools to provide feedback about the application. The DSP/BIOS LOG module is used to send general information about the state of the application (things like trace information and dynamic memory or heap activity) back to the user. The BIOS STS (Statistics) module is used to calculate timing information. All of this is done without ever halting the DSP Target application. 22. You can see the objects for either of these modules in the Configuration Tool under Instrumentation. Take a moment to look at these objects and we'll show you how they're used here in a bit.
Build and Run the Application

To reduce their size and speed downloads, the RFs are not shipped built. There are make files provided to build everything if you choose to use them. Since we don't need to build them all, we'll build the RF3 that we've been looking at by itself using Code Composer Studio. 23. Build the RF3 application, app.pjt.
Project Build or click on 24. If you have CCS configured properly (or at least the way we had you do it back in lab 1), the application should automatically load and go to main(). 25. Make sure your DSK is set up properly for audio. Plug an audio source (CD Player, computer sound card, etc.) into the line in on the DSK. You can use either speakers or headphones for the audio output. Plug speakers into the line out. Plug headphones into the headphone out. 26. Make sure there is audio playing at the source. 27. Run the application.
Debug Run or press F5 or click on 28. You should hear audio playing. If not, double-check all of your connections.
3 - 34
RF3 Demo
Use Real-time Analysis Tools

DSP/BIOS automatically collects a lot of data about your application as it runs. RF3 extends this capability to send a wealth of information about the target application to the host. In this section we'll use special tools inside CCS to view and examine this information.
CPU Load Graph

DSP/BIOS automatically calculates the load your application puts on the processor. This information provides a nice quick snapshot of how your application is running. 29. Open the CPU Load Graph.
DSP/BIOS CPU Load Graph or click on You should see something like this:
As you can now see, the RF3 application (including algorithms) is only taking up a very small amount of the CPU's time.
Message Log
The DSP/BIOS Message Log provides printf() like capability at a much lower cost (memory and MIPS) to the target application. RF3 provides a module called UTL that powerfully extends the basic LOG module. 30. To see the output of the BIOS LOG, open a Message Log. DSP/BIOS Message Log or click on
3 - 35
RF3 Demo
31. Examine the output of the Message Log:
Amount of memory needed by XDAIS Volume algorithm
Heap Allocations
Statistics
The DSP/BIOS STS (Statistics) Module provides an easy way to get timing information about the threads in your application. For example, in real-time systems, designers are usually concerned with the maximum execution time of a thread. If a thread's maximum execution time ever exceeds its deadline, then you know you have a problem. 32. Open the DSP/BIOS Statistics View. DSP/BIOS Statistics View or click on
3 - 36
RF3 Demo
You should see a window that looks like this:
This window gives you a lot more detail regarding the execution times of your threads. 33. This window can be modified in several ways. Right-click on the window and choose properties. Inside this window you can enable and disable the statistics for each thread, change the unit that the timing information is displayed in (instructions, microseconds, and milliseconds), etc. Try changing the swiRxSplit thread so that it displays in Microseconds.
Viewing the Execution Graph

The Execution Graph is a graphical view of how your threads are executing. It is based on events that happen in the scheduler. Events are things like a SWI being posted, a SWI starting to run, a SWI finishing, etc. This graph is not time based, but the Time scale at the bottom can help associate events with time. 34. Open the Execution Graph. DSP/BIOS Execution Graph or click on 35. The tray at the bottom where all of these tools are being opened is probably getting a little full. Also, the Execution Graph needs a lot of screen real estate. To make it easier to see and work with, float the Execution Graph in the main window. Right-Click on the Execution Graph, choose "Float in Main Window"
3 - 37
RF3 Demo
36. You should now see a window that looks something like this:
Each column in the graph indicates that an event happened. The blue indicates that a thread is running. The white boxes represent that a thread is waiting for its turn to run. The teal green or dark lines indicate that the graph doesn't know what state a thread is in (running, waiting, or not doing anything). The reason for the green lines relates directly to the vertical red line. The red line indicates that the circular buffer that was being used to accumulate the information on the target wrapped around and overwrote some data. This indicates that there is a discontinuity in the data being displayed. Since the graph is essentially starting over, it doesn't know what state a thread is in until its state changes. The horizontal time line at the bottom has a little tick in it when the timer interrupt fires. This line can be used to relate the event based data to time. The threads are usually listed by priority, but since we only have one priority they are listed in the same order that we found them back in step 21. You could change this order in the Configuration Tool just like you changed priority.
Playing with GEL

The Control Thread in RF3 reads memory values and uses them to change how the algorithms execute. We can use a CCS GEL file to change these memory locations and control the application. 37. Load the provided app.gel file which is located in the same directory as your project. File Load Gel 38. The code in this file adds three command sliders to CCS. We can access these sliders from the GEL menu. Open each of the sliders using the following commands: GEL Application Control Set_active_channel GEL Application Control Set_channel_0_gain GEL Application Control Set_channel_1_gain 39. Use the sliders to change the gain of the channels and set the active channel. Try turning one of the channels down, then playing with the other one and vice-versa. These sliders represent real volume controls that you might have in your own system.
3 - 38
Flashing RF3
Flashing RF3
Once you have an application up and running, most people want to see it work without using CCS to control it. In order to do this, we need to burn the application to the flash that is located on the DSK. This sounds like a pretty hefty task, but once again we have tools that make the job a lot easier. In order to speed things along, we have provided another project that contains a slightly modified RF3 application to make it easier to flash.
Switching Projects
40. Close the project we have been working with so far. Project Close Note: If you get a message asking you if you want to save the project file, it shouldn't be necessary since we weren't supposed to make any changes to it.
Hint
41. Close any open windows, otherwise you may get an error when Flashburn opens. Window Close All 42. Close the GEL sliders by clicking on the little X in the upper right-hand corner of each one of them. 43. Open the new project app.pjt located in: C:\c60001day\referenceframeworks\apps\rf3\dsk6416_boot C:\c60001day\referenceframeworks\apps\rf3\dsk6713_boot Project Open or
Create the Flash Image

TI provides a separate tool, hex6x.exe, to create the image that can be burned into the flash. 44. You can run hex6x as a post-build step from within CCS. We have already added it for you. To see what we did: Project Build Options
3 - 39
Flashing RF3
45. Click on the General tab in the window that pops up. You should now see something like this:
C:\ti\c6000\cgtools\bin\hex6x C:\c60001day\referenceframeworks\apps\rf3\dsk
We have added a command to call hex6x using CCS's Final build steps option. This option tells CCS to call our command every time it does a full build. CCS also has Initial build steps option for those commands that need to run before CCS builds a project. 46. When you're through looking everything over, close the box by clicking on Cancel. 47. To have CCS build the project and call hex6x to generate the flash image, we need to do a Rebuild All. Project Rebuild All or click on 48. Wait for CCS to finish building the project and creating the hex image.
3 - 40
Flashing RF3
Use Flashburn to Burn the Image

TI simplifies burning the flash on the DSK with a utility called Flashburn. 49. Open Flashburn. Tools Flashburn 50. We have already created a configuration file for you that has all of the information that Flashburn needs to do its job. Open this file inside of Flashburn. File Open The file is named rf3_dsk6416.cdd (or rf3_dsk6713.cdd) and it is located at: C:\c60001day\referenceframeworks\apps\rf3\dsk6416_boot\Debug or C:\c60001day\referenceframeworks\apps\rf3\dsk6713_boot\Debug You should now see a window that looks like this: Again, the dsk6416 references would be changed to dsk6713 depending upon which C6000 DSK youre using.
Note: Flashburn should automatically connect to the target when you open the .cdd file. If it does not, you need to use CCS to run the CPU. When you do this, Flashburn should connect to the target and you should see this icon in Flashburn:
3 - 41
Flashing RF3
51. Use Flashburn to erase the flash. Program Erase Flash or click on 52. Wait until the blue progress indicator bar goes away. 53. Now that the flash is erased, we can burn our hex file. Program Program Flash or click on 54. Wait until the blue progress indicator bar goes away. 55. Close Flashburn. 56. Now, let's see if it worked. Since the program is now in flash, we don't need CCS to load it anymore. Close CCS. 57. Disconnect the USB emulation cable from the DSK. 58. Hold your breath and press the white reset button on the DSK. If everything is working properly, you should now have music coming out of the DSK. If not, check to make sure that you have music playing. 59. Congratulations! You just flashed RF3 to the DSK.
Flashing POST
You probably don't want to leave your DSK running RF3. Here are the steps to program the flash with the post routine. 60. Reconnect your USB emulation cable. 61. Open Code Composer Studio. 62. Open Flashburn. Tools Flashburn 63. Use Flashburn to open the post.cdd located at: c:\ti\examples\dsk6416\bsl\post\ or c:\ti\examples\dsk6713\bsl\post\ File Open 64. Make sure that Flashburn is connected. If not, you may need to run the processor. 65. Erase the flash. Program Erase Flash or click on 66. Wait on the blue progress bar to complete and go away.
3 - 42
Flashing RF3
67. Burn the flash. Program Program Flash or click on 68. Wait on the blue progress bar to complete and go away. 69. Close Flashburn. 70. Close CCS. 71. Push the white reset button on the DSK. The LEDs should flash to indicate the progress of the POST routine as it runs through its tests, then flash and remain on. You should also hear a tone if the speakers/headphones are still connected. 72. Your DSK is now good as new.
3 - 43
Lab/Demo Debrief
Lab/Demo Debrief
eXpressDSP Demo Debrief
1.
Reference Frameworks are a great way to get started with a design

IOM Drivers make it easy to port to new hardware DSP/BIOS tools for scheduling and analysis make them easy to adapt and observe XDAIS provides an organized way to change/add algorithms
2.
Flashburn gets the application into FLASH

Did you notice during the demo how few places Did you notice during the demo how few places in the RF3 application made direct reference to a in the RF3 application made direct reference to a specific hardware platform. specific hardware platform.
eXpressDSP Summary
eXpressDSP Summary
Target Software
3 - 44
Host Tools
C6000 Optimization
Introduction
The importance of the C language has grown significantly over the past few years. TI has responded by creating a compiler that produces extremely efficient processor code, which is so speed efficient you may not need to program in assembly. After getting your C code running, you may want to optimize it to get the best performance possible. In this chapter we discuss three major optimizations you can take, and then point out where you can go to discover more techniques.
Outline
Outline
Build Options Use Optimized Libraries Enable Cache Where To Go for More Information
C6000 One-Day Workshop - C6000 Optimization
4-1
Optimization Build Options
Chapter Topics
C6000 Optimization .................................................................................................................................. 4-1 Optimization Build Options .................................................................................................................... 4-3 Use Optimized Libraries ......................................................................................................................... 4-7 C6000 Double-Level Cache ...................................................................................................................4-10 Why Cache?.......................................................................................................................................4-10 Details of C67x & C64x Internal Memory ........................................................................................4-12 Configuring External Memory as Cacheable (MAR) ........................................................................4-16 Where To Go For More Information .....................................................................................................4-18 LAB 4: Using C......................................................................................................................................4-19 Optimized C.......................................................................................................................................4-27 Using ASM Libraries.........................................................................................................................4-30 Lab 4 Results .....................................................................................................................................4-32 Lab 4a: Memory and Cache...................................................................................................................4-33 Everything Off-chip...........................................................................................................................4-33 Use Some Cache (L1)........................................................................................................................4-38 Use All the Cache ..............................................................................................................................4-41 Cache Re-use .....................................................................................................................................4-44 Lab 4a Results ...................................................................................................................................4-45 Optional Topics......................................................................................................................................4-46 Cache Data Coherency ......................................................................................................................4-46 Advanced Optimizations (Brief List) ................................................................................................4-48
4-2

Compiler Build Options
Nearly one-hundred compiler options available to tune your code's performance, size, etc. To our earlier table, we added optimize options Options -mv6700 -mv6400 -fr <dir> -q -g -s -o3 -gp -k Description Generate C67x code (C62x is default) Generate 'C64x code Directory for object/output files Quiet mode (display less info while compiling) Enables src-level symbolic debugging Interlist C statements into assembly listing Invoke optimizer (-o0, -o1, -o2/-o, -o3) Enable function-level profiling Keep asm files, but don't interlist
debug optimize
Debug and Optimize options conflict with each other, therefore they should be not be used together
As you probably learned in college programming courses, you should probably follow a two step process when creating code: Write your code and debug its logical correctness (without optimization). Next, optimize your code and verify it still performs as expected.
As demonstrated above, certain options are ideal for debugging, but others work best to create highly optimized code. When you create a new project, CCS creates two sets of build options called Configurations: one called Debug, the other Release (you might think of as Optimize). Configurations will be explored next. Note: Like any compiler or toolset, learning the various options requires a bit of experimentation, but it pays off in the tremendous performance gains that can be achieved by the compiler. To this end, this workshop will explore these options further in an upcoming chapter.
4-3
Build Option Configurations

To help make it easy to use many compiler options, TI provides two recommended sets of options (configurations) in each new project you create: Debug Release -g -q -fr"c:\modem\Debug" -d"_DEBUG" -mv6700 -q -o3 -fr"c:\modem\Release" -mv6700
The main difference is that the Release (optimized) configuration invokes the optimizer with o3.
Two Default Configurations

-g -q -fr"c:\modem\Debug" -d"_DEBUG" -mv6700 For new projects, CCS automatically creates two build configurations: Debug (unoptimized) Release (optimized)
-q -o3 -fr"c:\modem\Release" -mv6700
Use the drop-down to quickly select build config.
*Note: We recommend you add the two options gp k to make Release more useful.
We recommend you add the gp and k options to the Release configuration as this makes it easier to evaluate the optimized performance. In the upcoming lab exercise, you will get a chance to do this. Note: The examples shown hear are for a C67x DSP, hence the mv6700 option.
4-4
Use the ProjectConfigurations menu to create or delete a Build Configuration.
Two Default Configurations

-g -q -fr"c:\modem\Debug" -d"_DEBUG" -mv6700 For new projects, CCS automatically creates two build configurations: Debug (unoptimized) Release (optimized) Use the drop-down to quickly select build config. Add/Remove build config's with Project Configurations dialog (on project menus) Edit a configuration: 1. Set it active 2. Modify build options (shown next) 3. Save project
-q -o3 -fr"c:\modem\Release" -mv6700
To edit a configuration, first make it active (via Project Configurations dialog or toolbar dropdown).
4-5
We are often asked, Why Use fr? When changing configurations, using -fr prevents the object (and .out) files from being overwritten. While not required, it allows you to preserve all variations of your projects output.
Why use -fr?

When changing configurations, the -fr option prevents files from being overwritten While not required, it allows you to preserve all variations of your projects output files c60001day labs lab3 Debug lab3.obj lab3.out Debug lab3.obj lab3.out
4-6
Use Optimized Libraries

TI provides several libraries of optimized code to help build a DSP system. The following slides provide some information on some of these libraries and how to use them.
DSPLIB
Optimized DSP Function Library for C programmers using C62x/C67x and C64x devices These routines are typically used in computationally intensive real-time applications where optimal execution speed is critical. By using these routines, you can achieve execution speeds considerably faster than equivalent code written in standard ANSI C language. And these ready-to-use functions can significantly shorten your development time. The DSP library features:
C-callable Hand-coded assembly-optimized Tested against C model and existing run-time-support functions
Adaptive filtering Math DSP_firlms2 DSP_dotp_sqr Correlation DSP_dotprod DSP_autocor DSP_maxval FFT DSP_maxidx DSP_bitrev_cplx DSP_minval DSP_radix 2 DSP_mul32 DSP_r4fft DSP_neg32 DSP_fft DSP_recip16 DSP_fft16x16r DSP_vecsumsq DSP_fft16x16t DSP_w_vec DSP_fft16x32 Matrix DSP_fft32x32 DSP_mat_mul DSP_fft32x32s DSP_mat_trans DSP_ifft16x32 Miscellaneous DSP_ifft32x32 DSP_bexp Filters & convolution DSP_blk_eswap16 DSP_fir_cplx DSP_blk_eswap32 DSP_fir_gen DSP_blk_eswap64 DSP_fir_r4 DSP_blk_move DSP_fir_r8 DSP_fltoq15 DSP_fir_sym DSP_minerror DSP_iir DSP_q15tofl
IMGLIB
Optimized Image Function Library for C programmers using C62x/C67x and C64x devices The Image library features: C-callable C and linear assembly src code Tested against C model
Compression / Decompression IMG_fdct_8x8 IMG_idct_8x8 IMG_idct_8x8_12q4 IMG_mad_8x8 IMG_mad_16x16 IMG_mpeg2_vld_intra IMG_mpeg2_vld_inter IMG_quantize IMG_sad_8x8 IMG_sad_16x16 IMG_wave_horz IMG_wave_vert
Picture Filtering / Format Conversions IMG_conv_3x3 IMG_corr_3x3 IMG_corr_gen IMG_errdif_bin IMG_median_3x3 IMG_pix_expand IMG_pix_sat IMG_yc_demux_be16 IMG_yc_demux_le16 IMG_ycbcr422_rgb565 Image Analysis IMG_boundary IMG_dilate_bin IMG_erode_bin IMG_histogram IMG_perimeter IMG_sobel IMG_thr_gt2max IMG_thr_gt2thr IMG_thr_le2min IMG_thr_le2thr
4-7
FastRTS (C67x)
Optimized floating-point math function library for C programmers using TMS320C67x devices Includes all floating-point math routines currently in existing C6000 runtime-support libraries The FastRTS library features: C-callable Hand-coded assembly-optimized Tested against C model and existing run-time-support functions FastRTS must be installed per directions in its Users Guide (SPRU100a.PDF)
Single Precision Double Precision atanf atan atan2f atan2 cosf cos expf exp exp2f exp2 exp10f exp10 logf log log2f log2 log10f log10 powf pow recipf recip rsqrtf rsqrt sinf sin
FastRTS (C62x/C64x)
Optimized floating-point math function library for C programmers enhances floating-point performance on C62x and C64x fixed-point devices The FastRTS library features: C-callable Hand-coded assembly-optimized Tested against C model and existing run-time-support functions FastRTS must be installed per directions in its Users Guide (SPRU653.PDF)
Single Double Others Precision Precision _addf _addd _cvtdf _divf _divd _cvtfd _fixfi _fixdi _fixfli _fixdli _fixfu _fixdu _fixful _fixdul _fltif _fltid _fltlif _fltlid _fltuf _fltud _fltulf _fltuld _mpyf _mpyd recipf recip _subf _subd
4-8
Now that you know about the libraries, here's where to find them and some information on how they are organized. Each library also has documentation that goes along with it.
Location of Libraries
(in CCS v2.2)
DSP and IMG Libraries provided as source archive, and Little Endian C6000 obj library Folder Structure: lib - library files (.lib) and source code (.src) include - contains the library header files support - miscellaneous supporting code bin - supporting Windows executables CCS Docs folder contains: SPRU565A.pdf - DSP API User Guide SPRU023A.pdf - Imaging API User Guide SPRU100A.pdf FastRTS Math API UG Application Notes: SPRA885.pdf - DSPLIB App note SPRA886.pdf- IMGLIB App note
4-9
C6000 Double-Level Cache

Why Cache?
In order to understand why the C6000 family of DSPs uses cache, let's consider a common problem. Take, for example, the last time you went to a crowded event like the symphony, a sporting event, or the ballet, any kind of event where a lot of people want to get to one place at the same time. How do you handle parking? You can only have so many parking spots close to the event. Since there are only so many of them, they demand a high price. They offer close, fast access to the event, but they are expensive and limited. Your other option is the parking garage. It has plenty of spaces and it's not very expensive, but it is a ten minute walk and you are all dressed up and running late. It's probably even raining. Don't you wish you had another choice for parking?
Parking Dilemma
Close Parking
0 minute walk
Distant Parking-Ramp
10 minute walk 1000 spaces $5/space
Concert Hall
10 spaces $100/space
10 minute walk
Parking Choices: 0 minute walk @ $100 for close-in parking 10 minute walk @ $5 for distant parking or Valet parking: 0 minute walk @ only $6.00
You do! A valet service gives the same access as the close parking for just a little more cost than the parking garage. So, you arrive on time (and dry) and you still have money left over to buy some goodies.
4 - 10
Cache is the valet service of DSPs. Memory that is close to the processor and fast can only be so big. You can attach plenty of external memory, but it is slower. Cache helps solve this problem by keeping what you need close to the processor. It makes the close parking spaces look like the big parking garage around the corner.
Why Cache?
Cache Memory CPU
Fast Small Works like Big, Fast Memory
Bulk Memory
Slower Larger Cheaper
Memory Choices: Small, fast memory Large, slow memory or Use Cache: Combines advantages of both Like valet, data movement is automatic
One of the often overlooked advantages of cache is that it is automatic. Data that is requested by the CPU is moved automatically from slower memories to faster memories where it can be accessed quickly.
4 - 11
Details of C67x & C64x Internal Memory

Overall Layout and Levels of Memory
The memory architecture of a C6x1x device with its two-level cache is often referred to as memory hierarchy. The highest level in the hierarchy, level 1 or L1, is the fastest, smallest memory. Requests from the CPU are sent to this level first. The second level is the larger on-chip memory, L2. Requests that can't be satisfied by the first two levels are sent off-chip, which is kinda like level 3.
Levels of Memory
Program Cache
L2 Internal SRAM
CE0
Daughter-Card Daughter-Card
CE2
SDRAM
(16MB) EMIF
CE1
CPU
Data Cache
L1
Room for Expansion

CE3
Flash
(512KB) Level 2 Level 3
We often refer to a systems memory in hierarchical levels Higher levels (L1) are closer to the CPU
It's important to understand these hierarchical levels in order to comprehend how requests flow from the CPU to the caches and memories. The daughtercard interface on the DSK allows you to add more memory, as well as other devices, to your board. Check the dspvillage.com web site and search for daughtercard to find out what is available for purchase.
4 - 12
C6713 Internal Memory

Here is some specific information about the C6713 internal memory architecture that is on some of the DSKs you may be using. As you can see, you get information from the L1 memories with no delay. If you have to go to L2 to get the information, there is some delay, but you can get additional information for no extra cost. For example, if you have to go to L2 to get a fetch packet (a group of 8 instructions), it will take 5 cycles but you will also get the next fetch packet.
C6713 Internal Memory

Level 1 Caches L1 Program
(4KB) Single-cycle access Always enabled L2 accessed on miss
L2 CPU
8/16/32/64
Program & Data

(256K Bytes)
Level 2 Memory
Unified: Prog or Data L2 L1D delivers 32-bytes in 4 cycles L2 L1P delivers 16 instrs in 5 cycles Configure L2 as cache or addressable RAM
L1 Data
(4KB)
(C6711/12: L2 memory is 64K bytes)
Here are some more details about the C6713 internal memories.
C6713 Internal Memory - Details

L1 Program
(4KB)
256 256
L2 Unified Program & Data

(256KB)
128
CPU
8/16/32/64
Level 1 Program Always cache 1 way cache (Direct-Mapped) Zero wait-state Line size: 512 bits (or 16 instr) Level 1 Data Always cache 2 way cache Zero wait-state Line size: 256 bits Level 2 Unified (prog or data) RAM or cache 1-4 way cache 32 data bytes in 4 cycles 16 instr. in 5 cycles Line Size: 1024 bits (or 128 bytes)
(4KB)
L1 Data
(C6711/12: L2 memory is 64K bytes)
4 - 13
C67x L2 Memory
A nice feature of the C6000 L2 memories is that they can be configured as internal SRAM or cache ways. This allows designers to customize the memory architecture to fit their needs.
C67x L2 Memory Configuration

RAM 0 RAM 1 RAM 2 RAM 3 Hardware default Four 16KB blocks Configure each as cache or addressable RAM Each additional cache block provides another cache way L2 is unified memory can hold program or data C6713 Still has 4 configurable 16KB RAM/cache blocks, the remaining 192KB is always RAM or RAM 0 RAM 1 RAM 2 Way 1 or RAM 0 RAM 1 Way 2 Way 1 or RAM 0 Way 3 Way 2 Way 1 or Way 4 Way 3 Way 2 Way 1
The Configuration Tool makes setting up the cache the way you want as easy as choosing an option in a drop-down box.
Configuring C6713 L2 Cache
4 - 14
C64x Internal Memory

The C64x internal memory architecture is similar to that of the C6713, but each level is larger.
C64x Internal Memory

L1 Program Cache L1 Program
(16KB) Direct Mapped (1 way) Single cycle access Size = 16K Bytes Linesize = 8 instr.
L2 CPU
8/16/32/64
Program & Data

(1M Bytes)
L1 Data Cache
2-Way Cache Single cycle access Size = 16K Bytes Linesize = 64 bytes
L1 Data
(16KB)
Level 2 Memory
C6414/15/16 = 1M Byte C6411/DM642 = 256K Byte
The C64x L2 memory is also configurable. It always has 4 cache ways, but you can change the size of the ways from 0K bytes (all SRAM) to larger sizes up to 64K bytes in 4 ways (256K cache, 728K SRAM).
C64x L2 Memory
Configuration
When cache is enabled, its always 4-Way This differs from C671x
L2 Ways are L2 Ways are Configurable in Size Configurable in Size
Linesize
Linesize = 128 bytes Same linesize as C671x
Performance 0 32K 64K 128K 256K

L2 L1P 1-8 Cycles L2 L1D L2 SRAM hit: 6 cycles L2 Cache hit: 8 cycles Pipelined: 2 cycles
4 - 15
Configuring External Memory as Cacheable (MAR)

What do you do if you have an address in memory that you don't want to be cached? Why would you want to do this? Would you want to cache the value stored in a FIFO, a register in a FPGA, or a parallel A/D converter. If you did, then the value might only be read once, and this is probably not what you want. The Memory Attribute Registers allow you to control which addresses in the memory map are cached and which are not cached. So, if you have a FIFO in your system, you can place it in an address region that uses a separate MAR bit and turn off that MAR bit. Then, you could turn on the MAR bits for the other memories in your system that you do want to cache.
Memory Attribute Regs (MAR)

Use MAR registers to enable/disable caching of external ranges Useful when external data is modified outside the scope of the CPU You can specify MAR values in Config Tool C671x: 16 MARs 4 per CE space Each handles 16MB C64x: Each handles 16MB 256 MARs 16 per CE space MAR4 MAR5 MAR6 MAR7 Reserved 0 = Not cached 1 = Cached 0 1 1 1
CE0
CE2
CE3
(on current C64x, some are rsvd)
The processor reset value for the MAR bits is zero. This means that when the processor wakes up, all of the external memory is uncacheable. What do you think this will do to the performance of your system if you are using cache (and you should)? Let's just say you probably won't be pleased. So, one of the first things that you need to do is turn on the MAR bits for the memory regions that you need to cache.
4 - 16
The Configuration Tool makes this easy. You can use a mask in the Configuration Tool to setup up the MAR bits that you want to enable and it will be done for you.
Setting MARs in CDB
MAR0 MAR1 MAR2 MAR3 MAR15
00000001 00000000 00000000 00000000
MAR bit values: MAR bit values:

0 = Not cached 0 = Not cached 1 = Cached 1 = Cached
00000000
If your code is running a lot slower than you thought it would, you might want to check the MAR bit settings. These bits are set to zero at reset and the default configuration files usually leave them that way. So, if you want cache enabled in your system, you need to turn these bits on.
4 - 17
Where To Go For More Information

We just can't cover everything that you might want to know about these important subjects in a 1day workshop. Here is a good list of other places that you can go for more information.
Optimizing C Performance
Attend the 4-day C6000 Optimization Workshop
Review the Compiler Tutorial

See tutorials in CCS online help, or
http://www.ti.com/sc/c6000compiler
Read:
C6000 Programmers Guide (SPRU198) Cache Memory Users Guide (SPRU656) C6000 Optimizing C Compiler Users Guide (SPRU187)
Look through the many application notes at:

http://www.dspvillage.com
All the options are detailed in TMS320C6000 Optimizing C Compiler User's Guide. Its highly recommended that you take time to read through the entire manual. OK, we know that reference manuals can be boring (and this one isnt any different) but the information you gain will be worth it.
Also recommended is the TMS320C6000 Programmers Guide. It contains code optimization details for C, Linear Assembly, and standard assembly programming. The TMS320C6000 Compiler Tutorial is an invaluable reference. You can find this excellent resource at the TI website: http://www.ti.com/sc/c6000compiler, or built into Code Composer Studio tutorials.
The Cache Memory User's Guide is an excellent resource for everyone that needs to know more about cache and how to use it in a DSP system. It includes examples on how to make your code go faster in a cache based architecture. Finally, there are several great application notes out on our web site. These notes go into detail about specific subjects to help solve common problems.
4 - 18
LAB 4: Using C
LAB 4: Using C
Lab 4
Build project with Image Correlation function Compare performance between:
1. 2. 3. 4.
Without Optimization With Compilers Optimizer Using IMGLIB Function With and Without Cache
This lab has several goals: Build a project using C source files. Benchmark/profile C code. Contrast results for both sets suggested compiler options: Debug build configuration Release (Optimized) build configuration
Call optimized Assembly routine from a library Examine the effects of cache on this optimized application
4 - 19
LAB 4: Using C
Image Correlation
We're going to use an image correlation algorithm in the lab. Here's some more information on this algorithm for those of us that haven't had much exposure to it.
3x3 Mask of 8-bit pixels
Lab 4 - 3x3 Correlation

Search Image for Mask
T
Search image for pixel location of mask Step through entire image processing each 3x3 pixel block Basically, image correlation involves summing the values of 3x3 matrix multiply between mask and each 3x3 block in the image. The result of each summation is written to an array. The best match is the largest value in the array. 128
(8-bit pixels)
64
Here's an example of how image correlation is used in the lab.
64
No Match
128
4 - 20
LAB 4: Using C
64
Partial Correlation
128
64
Better Fit
128
64
Match!
128
4 - 21
LAB 4: Using C
Create the Lab4 project

1. Create a new project called LAB4.PJT in the C:\c60001day\labs\lab4 subdirectory. Project New
Youll encounter the Project Creation dialog. Fill in the Project Name and Location as shown:
67
If using the C6713 DSK, the target should read, TMS320C67XX You can also use the button to specify the correct path. 2. Verify the newly created project is open in CCS by clicking on the + sign next to the Projects folder in the Project View window.
You may want to expand the lab4.pjt icon to observe the project contents. (Of course, it should be empty at this time.)
4 - 22
LAB 4: Using C
Create a CDB file

3. Create a new CDB file (File New DSP/BIOS Configuration).
4. Select the appropriate CDB template for your system.
Select the dsk6416.cdb or dsk6713.cdb template

5. While there are many objects displayed in the CDB file, we dont need to change any of them at this time. The defaults will work fine.
6. Save your CDB file as LAB4.CDB in C:\c60001day\labs\lab4 directory. File Save As 7. Add the following files from C:\c60001day\labs\lab4 to your project:
LAB4.C LAB4.CDB LAB4cfg.CMD
When these files have been added, the project should look like:
4 - 23
LAB 4: Using C
Modify the Debug Configuration

8. For easy debugging, use the Debug configuration; this should be the default.
Verify that Debug is in the Project Configurations drop-down box
9. Before building, though, we need to add a symbol definition to the Debug configuration.
In this lab we plan to use the hardware timer to benchmark our performance. (This is just one of many ways to do this within CCS). We will use the Chip Support Library (CSL) discussed back in Chapter 2 to setup and use the timer. CSL requires that the chip type is defined in your project so the proper code can be extracted from the library. Use the following three steps to modify your project configuration: Project Build Options Select the Preprocessor category on the left-hand side. Add CHIP_6416 to the Define Symbols text box. It should now look like this:
67
If using the C6713 DSK, the symbol should be: CHIP_6713
Basically, the d option is like adding a project-wide: #define CHIP_6416 1
4 - 24
LAB 4: Using C
10. Turn on the large memory model.
Add -ml3 (small ML3) to the build options by typing it in the box at the top of the build options. You can always add options by simply typing them into the text box, if you already know them.
11. Click OK to apply the changes that you have made and close the Build Options dialog.
Building the program (.OUT)

12. Build the program.
Project Build 13. Load the program if this did not happen automatically.
Run the Code

Now that the program has been written, compiled, and loaded, its time to determine its logical correctness. 14. Based on the CCS options we set in Lab 1, after loading the program CCS should have automatically run your code until it reached main.
If this did not happen, go ahead on perform it using Debug Go Main.

15. Feel free to take a moment to inspect the code. You'll notice that we are using the CSL functions that we discussed earlier in this workshop to calculate how long it takes to run our image correlation example. We are using printf()'s to output these results so that we know how our code is performing. 16. Run the code to see the output of the printf()'s.
Debug Run or click on
4 - 25
LAB 4: Using C
17. The output prints to the Stdout window in Code Composer Studio. Here is what the output should look like. Note, though that the cycle counts shown are for the C6416 DSK. If you are using the C6713 DSK, your cycle numbers will be different.
Number of data: number of pixels that were calculated. Number of cycles: number of CPU cycles it took to do the correlation. The next line tells us where the template image is at in the original image. The highest correlation found should match the template location if everything worked correctly.
18. Write down the number of cycles needed for this unoptimized C code in the table on page 32. 19. If CCS is still running, please halt it. 20. If you'd like to see the image that you just searched, Code Composer can show it to you with its graphing capability. In order to save you time, we have saved the graph in a workpace. If you'd like to see the image, just load the workspace:
File Workspace Load Workspace Choose the appropriate Workspace in C:\c60001day\labs\lab4. Use the pink data cursor to make sure that the template match is where the correlation algorithm said it was. Note: You can right-click on the image and choose properties if you'd like to see how this was set up. We simply used View Graph Image and filled in the values that you see in the properties box.
4 - 26
LAB 4: Using C
Optimized C
Now that we've seen how long it takes to execute the unoptimized C, let's take a look at how fast this code runs when the optimizer is turned on.
Change the Release Configuration

Code Composer Studio ships with two default configurations: Debug and Release. We've seen what Debug does, so let's use the Release configuration to turn on the C optimizer and see how fast it can make this piece of code. 21. Change the build configuration to Release by selecting it in the drop-down as shown:
22. We need to make a couple of simple changes to the configuration before we can use it. You'll probably recognize these changes since they are the same one that we made earlier. Open the project build options. Project Build Options
4 - 27
LAB 4: Using C
23. Add the following two options to the build options for the release configuration just like you did earlier. Either use the text box at the top or the GUI at the bottom to add these options: -d"CHIP_6416" and ml3 -d"CHIP_6713" and ml3 Your options should now look like this: (for the C6416 DSK) (for the C6713 DSK)
6416 6713
Note: The biggest difference between Release and Debug is that Release turns on the Optimizing C Compiler with the o3 option and turns off symbolic debugging by removing the g option.
4 - 28
LAB 4: Using C
Build and Run the code

24. Build your code. Project Build 25. If your program doesn't load automatically, then load the program. File Load Program Note: You will probably not see main() come up like it has in the past. Since the source level debug option (-g) has been turned off, CCS cannot open source files like the one that contains main(). 26. Run the program. If everything is working correctly, you should see a print out similar to the one that you saw in the first part of this lab. If you're having trouble, ask you facilitator for help. 27. Record the new results for the optimized C code in the table on page 32. Were these results better or worse than you expected them to be? _________________________________________________________________________
4 - 29
LAB 4: Using C
Using ASM Libraries

The jump in performance that the correlation routine gets by turning on the C optimizer is pretty impressive and may be all the performance that you need to meet your real-time deadlines. Sometimes all people need to get the necessary performance from the C6000 is to turn-on the optimizer and we've just seen how easy this is to do. If you're happy with the performance that you get from the C compiler then you can stop there. However, if you have a really time critical routine that you just need to squeeze every cycle out of, you may want to an assembly language (ASM) routine. To this end, well learn how to use one of the many optimized routines that TI provides. The Image Library (or IMGLIB), provided by TI, has a correlation routine optimized in ASM that well compare to our C code.
Referencing the library function

28. The first step to calling the code from IMGLIB is to change the function that we are calling. The function that we want to call in the library is called: IMG_corr_3x3. The C function that were replacing is called: corr_3x3.
So, all you need to do is find all places where corr_3x3 is used in the code and change it to IMG_corr_3x3. You should only need to do two replacements:
29. Now that we have replaced the C function calls with IMGLIB function calls, let's go ahead and comment out the actual corr_3x3 C function. This way, if we've missed any changing any references to it, we should get a compiler error.
4 - 30
LAB 4: Using C
30. The next thing we need to do in order to call the function in the library is to include a header file. Find the comments at the beginning of lab4.c that talk about including the header files for the Image Library and add the following line of code: #include "img_corr_3x3.h" 31. We also need to let CCS know where it can find the above header file. We can do this by adding the following to our build options under Project Build Options. -i"C:\ti\c6400\imglib\include" -i"C:\ti\c6200\imglib\include" (for the C6416 DSK) (for the C6713 DSK)
32. The last step to calling the ASM routine is to actually add the library to your project. Project Add Files to Project 33. In the following dialog box, navigate to the correct folder and the appropriate library file to your project. C:\ti\c6400\imglib\lib\img64x.lib" C:\ti\c6200\imglib\lib\img62x.lib" (for the C6416 DSK) (for the C6713 DSK)
Yes, it may appear odd that we are using the C62x library for the C6713, but since C67x devices can run C62x object code, this works out fine. Hint: You may have to changes the "Files of type" drop-down box at the bottom of the Add Files dialog to see the library files.
Rebuild and Run the Code

34. Now that you've made all of the changes necessary to use the library code, go ahead and rebuild everything. Project Build or click on .
35. If your program doesn't automatically load, go ahead and load it. File Load Program 36. When you're ready, run the code to see how fast it executes (and to make sure that it is accurate). 37. Record your results in the table on page 32.
4 - 31
LAB 4: Using C
Lab 4 Results
Here's a summary of the results that we've obtained from lab 4. These are the results with all code and data located in the internal memory of the C6000. In Lab 4a, we'll explore the effects that memory organization and cache can have on this system.
Lab Step
Lab 4 Step 18 Lab 4 Step 27 Lab 4 Step 37
Build Configuration
Debug Release IMGLIB
Cycles
If you have time left, please move on to lab 4a.
4 - 32
Lab 4a: Memory and Cache

In this lab, we'll take up where we left off with the IMGLIB example. We're going to explore how to move stuff around in memory and enable the cache to see how these things affect performance.
Everything Off-chip
Let's start off by moving everything (code and data) from on-chip memory where it is now, to off-chip memory. We're also going to leave the cache turned off for now to see what the absolute worst case performance of this code might be. 1. We're going to use the Configuration Tool to change the memory configuration. So, open the lab4.cdb file. 2. The .cdb file is broken down into different categories to make it easy to set up. We need to use the Memory Manager which is located in the System folder.
3. You can see the three different kinds of memory that we have available to us on the DSK by clicking on the plus sign next to the Memory Manager. You should see ISRAM, FLASH, and SDRAM for the 6416 DSK, and CACHE_L2, IRAM, and SDRAM for the 6713 DSK.
4 - 33
4. We need to change the properties of the SDRAM so that it can store code and data. Currently, it is only setup to store data. Open the SDRAM properties by right-clicking on it and choosing properties. 5. You should see the box below. Change the "space" option from data only to code/data. When complete, the dialog should look like one of the following:
6416
only
When you're done, click OK.
4 - 34
6.
Now we can move everything from ISRAM (or IRAM) to SDRAM. Open the properties of the Memory Manager itself by right-clicking on it and selecting properties.
7. You should now see a window with five tabs. We don't need to do anything with the General tab, so go ahead and select the next tab, BIOS Data. We want to move everything on this tab from ISRAM (or IRAM)to SDRAM by clicking on each one of the drop-down boxes. When you get finished with the BIOS Data tab, the dialog should look like this:
4 - 35
8. Click on the BIOS Code tab and make the same changes. When you are done with this tab, the window should look like this:
9. We also need to make the same change in the Compiler Sections tab. Select this tab and move everything to SDRAM. Here's what you'll be left with:
4 - 36
10. When you create a DSKC6713 CDB file, the cache is enabled automatically (whereas it is disabled by default on the DSKC6416 CDB file.) Therefore, to get an accurate no cache comparison, we need to turn off the cache by making all external memory Non-Cacheable. (C6416 users can ignore these two steps). Open the Config Tool makes this easy.
6713 ONLY
Find and open the Global Settings under the System by right-clicking on it and selecting properties. Select the 621x/671x tab. Make the dialog box look like this:
11. Modify the C6713 MAR bit settings to make external memory non-cacheable:

12. Now that everything has been moved off-chip, we can rebuild the code and run it to see what the effect was. Rebuild the code first. 13. If your program doesn't automatically load, go ahead and load it. 14. When you're ready, run the code. 15. Record the results of running everything from off-chip memory in the table on page 45.
4 - 37
Use Some Cache (L1)

When we moved everything off-chip our code slowed down considerably. So, how do we speed it back up? Well, we could just move it back on-chip, but what if we don't have space for everything? That's where cache comes in. It helps keep the stuff that the CPU needs close at hand, just like a valet brings your car close to you. 16. In order to use the L1 caches to help us speed our code up, we need to tell the C6000 that the addresses that our code and data are using are cacheable. We do this by enabling the Memory Attribute Register (MAR) bits. (We discussed the MAR bits on page 4-16.) 17. Again, the Config Tool makes this easy. Find the Global Settings under the System.
18. Open the Global Settings properties by right-clicking on it and selecting properties.
4 - 38
19. Modify the C6416 MAR bit settings: Select the 641x tab. Check the check box that says "641x Configure L2 Memory Settings".
6416 ONLY
We need to turn on the MAR bits for the EMIFA, CE0 memory space. Do this by changing the appropriate text box from 0x0000 to 0xffff. It should now look like this:
4 - 39
20. Modify the C6713 MAR bit settings: Select the 621x/671x tab. Check the check box that says "621x/671x Configure L2 Memory Settings". Lets turn on all the MAR bits for the EMIF. Do this by changing the appropriate text box from 0x0000 to 0xffff. It should now look like this:
6713 ONLY
21. When you've made the changes, click OK.

22. Now that the L1 caches have been enabled, we can rebuild the code and run it to see what the effect was. Rebuild the code. Project Build or click on .
23. If your program doesn't automatically load, go ahead and load it. File Load Program 24. When you're ready, run the code. 25. Record the results of running everything with L1 cache in the table on page 45.
4 - 40
Use All the Cache

So far, we've only turned on the L1 caches. If we want the full benefit of the cache, we need to enable both L1 and L2. 26. So, how do we turn on the L2 cache? You guessed it, we use the configuration tool. 27. Open the properties of the Global Settings just like we did before. 28. For the C6416 DSK only: Select the 641x tab. Change the "641x L2 Mode CCFG (L2MODE)" drop-down box From "4-way cache (0K)" To "4-way cache (256K)". It should look something like this:
6416 ONLY
Click OK when you are done.
Note: This setting is a little confusing. When most people see "4-way cache", they might actually think that the cache is on. Well, it is true, but the one that has cache ways that are 0K in length don't do much cacheing!
4 - 41
29. Save the changes that you have made to the .cdb file by making sure that it is selected and choosing File Save. 30. You should see the following box appear:
Still 6416 ONLY

The problem is that we took some of the internal memory away when we turned on the L2 cache. The CACHE_L2 memory segment was added to account for this change. We need to remove 256KB from the ISRAM section to allow for the cache. Click OK in the box so we can go fix the problem. 31. Right-click on the ISRAM memory segment and choose properties. 32. Change the "len:" property from 0x00100000 to 0x0000c000. This simply removes the 256KB that is being used for L2 cache from the memory, since it is now cache. 33. For the C6713 DSK: Select the 621x/671x tab. Change the "641x L2 Mode CCFG (L2MODE)" drop-down box from "4-way cache (0K)" to "4-way cache (256K)". It should look something like this:
6713 ONLY
Click OK when you are done. 34. Save the changes that you have made to the .cdb file by making sure that it is selected and choosing File Save.
4 - 42

35. Now that the L1 and L2 caches have been enabled, we can rebuild the code and run it to see what the effect was. Rebuild the code. Project Build or click on .
36. If your program doesn't automatically load, go ahead and load it. File Load Program 37. When you're ready, run the code. 38. Record the results of running everything with L1 and L2 cache in the table on page 45.
4 - 43
Cache Re-use
Cache memories perform best when the information that they are cacheing is used many times. So far, we've only looked at cache in the worst possible scenario, when the data is only used once. What if we're going to use the code/data again and again, then the cache can really start to help us. To see this, let's call the image correlation twice on the same image and see if the performance improves any on the second call. 39. Open lab4.c. 40. Scroll down to the code where we actually time the image correlation algorithm with the timer. The code should look something like this:
Copy IMG_corr_3x3 line to here
41. Copy the line of code that calls IMG_corr_3x3() and paste it above the line that starts with start (isn't that redundant?). This way, the algorithm get's called and everything gets brought into cache, then we call it again to see what the real benefit of cache would be.

42. Now that the L1 and L2 caches have been enabled and stuff is sitting in cache, we can rebuild the code and run it to see what the effect was. Rebuild the code. Project Build or click on .
43. If your program doesn't automatically load, go ahead and load it. File Load Program 44. When you're ready, run the code. 45. Record the results of running everything with L1 and L2 cache in the table on page 45.
4 - 44
Lab 4a Results
Here's a summary of the results that we've obtained from lab 4a.
Lab Step
Lab 4a Step 15 Lab 4a Step 25 Lab 4a Step 38 Lab 4a Step 45
Memory Configuration
All Off-chip L1 Cache On L1 and L2 Cache On Everything Already in Cache
Cycles
End of Lab4a Exercise

Please inform your facilitator that you have finished. Thank You.
4 - 45
Optional Topics
Optional Topics
Cache Data Coherency
CPU Reading External Memory

L1D L2 External
RcvBuf:
EDMA
RcvBuf
CPU
Buffer (located in external memory) written to by the EDMA
CPU Reading External Memory

L1D RcvBuf L2 RcvBuf External
RcvBuf:
EDMA
RcvBuf
CPU
CPU reads the buffer for processing CPU read causes a cache miss in L1D and L2 (Assuming L2 cache is on) RcvBuf is added to both caches
Space is allocated in each cache RcvBuf data is copied to both caches
4 - 46
Optional Topics
Read Data Coherency

L1D RcvBuf L2 RcvBuf External
RcvBuf:
EDMA
RcvBuf
CPU
EDMA writes new data to buffer When the CPU reads RcvBuf again, what will happen? CPU gets old data!
Solutions
Read Data Coherency - Solutions

1
Locate buffer in L2 No coherency issues between L1 L2
L1D RcvBuf
RcvBuf:
L2 RcvBuf
EDMA
External
CPU
Whenever L1 or L2 Whenever L1 or L2 are read, the other is are read, the other is checked to make sure checked to make sure there isnt newer data there isnt newer data
2
Invalidate (remove) RcvBuf from cache before receiving new data CSL provides cache invalidate routines
L1D RcvBuf
L2 RcvBuf
External
RcvBuf:
EDMA
RcvBuf
CPU
4 - 47
Optional Topics
Advanced Optimizations (Brief List)
Advanced Optimizations
(Other than the techniques discussed here) Let EDMA move data (or code) on-chip before needed
Data is on-chip when its needed EDMA gets better transfer performance than CPU due to its ability to perform burst transfers Minimize back-to-back Reads and Writes to/from off-chip memory
Compiler Intrinsic functions Program Level Optimization: -pm op2 -o3 Various Compiler Pragmas:
#pragma #pragma #pragma #pragma UNROLL(# of times to unroll); MUST_ITERATE(min, max, %); DATA_ALIGN(variable, 2n alignment); DATA_MEM_BANK(var, 0 or 2 or 4 or 6);
Layout system and tune code for best cache usage
4 - 48
Wrap Up
Introduction
What do you need to put around your DSP? Most microprocessors usually require some support chips power management, clock drivers, bus interface, and so on. DSP systems usually contain some additional devices such as sensors, data acquisition, and such because they receive, modify, and output real-world signals. Finally, pull out your DSP Selection Guide and C6000 Product Update sheet to follow along with the last part of the workshop summarizing the C6000 devices, tools, and support
Outline
Chapter Outline
What Goes Around a DSP?
Linear Products Logic Products
C6000 Summary Hardware Tools Software Tools Whats Next?
C6416/C6713 DSK One-Day Workshop - Wrap Up
5-1
What goes around a DSP?
5-2
Chapter Topics
Wrap Up..................................................................................................................................................... 5-1 What goes around a DSP? ...................................................................................................................... 5-4 Linear.................................................................................................................................................. 5-4 Logic................................................................................................................................................... 5-8 C6000 Summary.....................................................................................................................................5-12 Hardware Tools .....................................................................................................................................5-13 Software Tools .......................................................................................................................................5-17 Whats Next?..........................................................................................................................................5-18 Before Leaving ..................................................................................................................................5-22
5-3

Linear
Surround DSP with TI Products
DSP
Data Converters
Analog-to-Digital Converters (ADC) Analog input to digital output Output is typically interfaced directly to DSP Digital input to analog output Input interfaces directly to DSP Data converter system Combination of ADC and DAC in single package
Digital-to-Analog Converters (DAC)
CODEC
Power Management
Power Modules complete power solutions Linear Regulators regulated power for analog and digital DC-DC controllers efficient power isolation Battery Management for portable applications Charge Pumps & Boost Converters portable applications Supervisory Circuits to monitor processor supply voltages and control reset conditions Power Distribution controlling power to system components for high efficiency References for data converter circuits
5-4
A Real-Time DSP-Based System
Analog Circuits Considerations

DATA TRANSMISSION
Another system/ subsystem/ etc.
STANDARDS RS232 RS422 RS485 LVDS 1394/Firewire USB PCI CAN SONET Gigabit Ethernet GTL, BTL, etc.
Data Trans
OP-AMPs
Supply Voltage available? Bandwidth required? (kHz or MHz) What is the input signal? What is the output driving? # of channels needed? Most Important Spec(s)?
Interface Speed? (k or M bits per second)

Distance? Standard? SERDES? or- Topology needed? (point to point, multidrop, multipoint)
Signal-Conditioning Data Conversion
DAC Digital
(MSP430/DSP/uP/ FPGA/ASIC)
ADC
Power
POWER Management
Do you build your own power solutions, use modules, or both? What Input Voltage(s) & the source of these voltages (Wall, battery, AC/DC, etc.) What Output Voltage(s), and Output Current(s) do you need? How would you prioritize size, efficiency, and cost? What are the most important parameters in the design? (efficiency, form factor, ripple voltage, tolerance, etc.)
Data Converter/AIC/Codec
Clocking Solution
Clocks
Input frequencies? Output frequencies desired & number of copies necessary Supply voltages available/required? Special needs? (low jitter/jitter cleaner? low part to part skew? etc.)
Resolution? (bits & ask for ENOB!) Speed? (KSPS or MSPS for high speed, KHz or MHz for precision ADCs, uS (settling time) for precision DACs) # of channels needed? What is it interfacing to? (uC/uP/DSP/FPGA/ASIC)
What is Real-Time Signal Processing?
A Typical Real-Time DSP System

RF Front End
ADC
. . . 01101010
Compressed audio or digital data Power Amp
Real-Time Signal Processing Engine

01011010 . . .
DAC
Power
Clock Circuits
Interface Circuits
Digital Radio
Music Traffic
Weather Stocks
Control and User Interface
5-5
5-6K Analog Interface DSP Daughter-Card
5-6K Interface Card

Plug in analog modules for:
Compatible with current C5000 and C6000 series DSKs C5416, C5510, C6416, C6711, C6713 Interface card has connectors for flexible demos/prototyping: 2 Signal Conditioning 2 Serial 1 Parallel Site Allows trial of hardware and debugging of software GPIO access through test points Flexible Clocking / Interrupts
Data Converters Signal Conditioning Power Management
http://focus.ti.com/docs/tool/toolfolder.jhtml?PartNumber=5-6KINTERFACE
Analog Cards
Single-width Serial-Interface Card
Double-wide Serial-Interface Card
5-6
5-7
Logic
Welcome to the World of TI Logic

Specialty
GTLP GTL SSTL BTL ETL TVC F AVC AS CBT AC/ACT HC/HCT ABT ALS LVC CD4000 TTL S AHC AHCT BCT FCT LS LV
5+ V Logic
Harris now TI
Cypress now TI
3.3 V Logic
AHC AC LV ALVT ALVC ALB LVT
HSTL SSTV
2.5 V Logic
LV AVC LVC
1.8 V Logic
LVC ALVC AVC AUC
ALVC CBTLV
1.5 V Logic
AUC
1.2 V Logic 0.8 V Logic

AUC AUC
ALVT
Logic Families
100
GTLP
5V 3.3 V 2.5 V
64
ALVT
IOL Drive (mA)
LVT ABT GTLP
BCT 74F
1.8 V 1.2 V 0.8 V
24
ALB
FCT ALVC LVC
ABT AC/T AHC/T ALB ALVC ALVT AVC AUC BCT CBT CBTLV 74F FCT GTLP HC/T LV LVC LVT LS
Advanced BiCMOS Technology Advanced CMOS Advanced High Speed CMOS Advanced LV BiCMOS Advanced Low Voltage CMOS Adv LV BiCMOS Technology Advanced Very-LV CMOS Advanced Ultra-LV CMOS BiCMOS Technology Cross Bar Technology CBT Low Voltage Technology 74F Bipolar Technology Fast CMOS Technology Gunning Transceiver Logic Plus High Speed CMOS Low Voltage HCMOS Low Voltage CMOS Low Voltage BiCMOS Technology TTL HC/HCT
AC/ACT
ALS
12 8
AVC AUC
AC
AHC/AHCT
CBT CBTLV
AHC
5 10
LV
CD4K
50
15
20
Speed - max tpd (ns)
5-8
TI Logic Supports Voltage Migration

Vcc
AC* :7.0 ns AHC* :6.5 ns ABT* :4.0 ns LV245 : 6.5 ns AHC* :10 ns LV245 :10 ns LVT* :3.3 ns LVC* :4.0 ns ALVC* :3.0 ns ALVT* :2.4 ns ALB* :2.0 ns AVC * :2.5 ns LV245 : 15 ns LVC* : 4.5 ns ALVC* : 3.7 ns ALVT* : 3.5 ns AVC* : 2.0 ns AUC* : 2.5ns Additional Interface Capabilities 5V - 2.5V LV,LVC,LVCC3245,ALVT 5V - 1.8V LVC 3.3V - 1.8V LVC,AVC
5V
3.3V 2.5V
LV245 :10 ns LVC4245 :6.3 ns LVCC3245 :6.0 ns LVCC4245 :7.0 ns ALVC164245 :5.8 ns LV245 :15 ns LVC* :4.8 ns LVCC3245 :9.4 ns AVC* :2.5 ns
LVC* : 7.1 ns ALVC245 : 6.0ns ns AVC* : 4.0 ns AUC* : 2.0ns
AUC*
: 5.0 ns
1.8V 0.8V
LVC* :4.8 ns AVC* :4.0 ns * 16245 functions
Little Logic
The Principle Example Single Gate
5 4
Easy Naming from TI

SN74 LVC 1G 00 YEA R
SN74 Standard prefix 74 = Commercial Product Family AHC, AHCT, LVC, CBT, AUC 1G - Single Gate 2G Dual Gate 3G Triple Gate Logic Function Package Type YEA = NanoStar YZA = NanoFree DCK = SC-70 DBV = SOT-23 DCU = US-8 DCT = SM-8 Tape & Reel
SN74AHC1G00DCKR SN74AHCT1G00DBVR
LVC 1G
Dual Gate
00
SN74AHC2G00DCTR SN74AHCT2G00DCUR
YEA
Triple Gate
SN74LVC3G04DCTR SN74LVC3G04DCUR
Voltages -- AHC=5V, LVC=3V, AUC=1.8V
5-9
AUC
Features The Worlds First 1.8V Logic
1.8V optimized performance VCC Specified @ 2.5V, 1.8, 1.5, 1.2 0.8V typical Balanced Drive 3.6V I/O Tolerance Bushold (II(HOLD)) IOFF Spec for Partial Power-down ESD protection Low noise Second Source agreements Little Logic, Widebus, Octal Device SN74AUC1G00 SN74AUC16244 VCC 1.8 V 1.8 V Drive -8/8 mA -8/8 mA
NEW FAMILY
Advanced Packaging NanoStar - YEA SOT 23 - DBV (Microgate) SC-70 - DCK (PicoGate) TSSOP - PW & DGG TVSOP - DGV LFBGA - GKE & GKF VFBGA - GQL
TPD(MAX) 2.5 ns 2.0 ns
CHOOSING LOGIC
PRIMARY CONCERN SECONDARY CONCERN
5V
ABT, 74F ABT, 74F ABT, AC/ACT ABT, 74F ABT, 74F ABT ABT, AHC ABT, 74F AHC, ABT ABT, AHC ABT AHC, ABT
3V
2.5V
1.8V
AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC AUC
HIGH DRIVE
ALVT, LVT, ALVC AVC, ALVC, ALVT ALVC, LVT, LVC ALVC, LVT, LVC AVC AVC
HIGH SPEED
LOW NOISE LOW POWER HIGH SPEED
ALVT, LVT, ALVC AVC, ALVC, ALVT LVT LVT ALVC,LVT,LVC,LV LVT
ALVC,LVT,LVC, LV,AHC
HIGH DRIVE
LOW NOISE LOW POWER HIGH SPEED
AVC AVC AVC AVC AVC AVC AVC

AVC
LOW NOISE
HIGH DRIVE LOW POWER
HIGH SPEED
LVT, ALVC
ALVC,ALVT,LVT,LVC ALVC,LVT,LVC.LV
LOW POWER
HIGH DRIVE LOW NOISE
5 - 10
TI FIFOs
MEMORY
TI FIFO 100100... TI FIFO 011001...
TMS320 DSP
TI FIFO
Host Interface
Host Bus
5 - 11
C6000 Summary
C6000 Summary
TMS320C6000
Easy to Use
Best C engine to date Efficient C Compiler and Assembly Optimizer DSP & Image Libraries include hand-optimized code eXpressDSP Toolset eases system design
SuperComputer Performance
1.38 ns instruction rate: 720x8 MIPS (1GHz sampled) 2880 16-bit MMACs (5760 8-bit MMACs) at 720 MHz Pipelined instruction set (maximizes MIPS) Eight Execution Unit RISC Topology Highly orthogonal RISC 32-bit instruction set Double-precision floating-point math in hardware
Fix and Float in the Same Family

C62x Fixed Point C64x 2nd Generation Fixed Point C67x Floating Point
C6000 Roadmap
Object Code Software Compatibility
Multi-core Multi-core Floating Point Floating Point C64x DSP C64x DSP 1.1 GHz 1.1 GHz
2nd Generation
C6414 C6414 C6412 C6412 C6411 C6411
ce t es an ighform H r Pe
C6416 C6416 C6415 C6415 DM642 DM642
1st Generation
C6203 C6201 C6202 C6211
C6713 C6713
C6204 C6205 C6711 C6712
C6701
5 - 12
Hardware Tools
Hardware Tools
C6416 / C6713 DSK Contents
DSK Board
DSK Code Composer Studio CD ROM* * DSK version of CCS requires DSK to be connected or CCS cannot startup
DSK Technical Reference Guide
Low-Cost Video I/F Demo Platform

(TI Kit# 6444886)
Low-cost video interface demo shows how to Low-cost video interface demo shows how to connect an inexpensive 'C6000 DSP to a video connect an inexpensive 'C6000 DSP to a video decoder through a low-cost FPGA. decoder through a low-cost FPGA.
5 - 13
Hardware Tools
Tools of the Trade
XDS560
eXtended Development System (XDS) Industry Standard Connections PCI plugs into PC JTAG plugs into DSP target board Download code up to 500Kbytes/sec Advanced Event Triggering for simple and complex breakpoints Real Time Data Exchange (RTDX) can transfer data at 2Mbytes/sec
Tools of the Trade
National Instruments LabVIEW

Integrate wide variety of I/O for DSP testing Share real time DSP data with RTDX Automate routine Code Composer Studio functions from LabVIEW
LabVIEW
LabVIEW DSP Test Integration Toolkit
LabVIEW Graphical Development For Debug and Diagnostics of DSP software
Code Composer Studio Automate Code Composer Studio Communicate directly to DSP through RTDX
RTDX
5 - 14
Hardware Tools
Tools of the Trade
Hyperceptions VAB
Easy to use graphical Tool Hierarchical: Can write code graphically (down to ASM level instr.) One worksheet can become block in another worksheet Block/Component Wizard: You can create an optimized VAB bldg block Create XDAIS algorithms If desired, wrap PC interface into standalone EXE Outputs: Directly to DSP Burn program to Flash with single-click Create an .OUT file Create Relocatable Object file (i.e. library) to use in CCS
Tools of the Trade
MATLAB CCS Plug-in
Capabilities:
DSP program control, memory access, and real time data transfer with RTDX MATLAB automates testing and provides advanced analysis Function call support enables hardware-in-loop simulation and debugging C28x / C5000 / C6000 support Supports XDS560 and XDS510 Integrated with MATLAB design environment for a complete design solution
5 - 15
Hardware Tools
Tools of the Trade
Altera FPGA Daughter Card
FPGA development system fits standard DSK daughter card sockets Contains Altera FPGA software including power SOPC builder (shown above) After designing and burning FPGA, DSP can talk to FPGA via memory-mapped addresses (SOPC creates C header file) For more info:
http://www.altera.com/products/devkits/altera/kit-dsp_stratix.html
Summary of all Hardware Tools
Hardware Tools
For a full list of tools available from TI and its 3rd Parties, please check:
http://dspvillage.ti.com/docs/catalog/devtools/dsptoolslist.jhtml?familyId=132&toolTypeId=6&toolTypeFlagId=2&templateId=5154&path=templatedata/cm/toolswchrt/data/c6000_devbds
5 - 16
Software Tools
Software Tools
eXpress DSP
Target Software
Host Tools
Tools of the Trade
Largest DSP Third Party Network

Make or buy
> 650 companies in 3rd party network > 1000 algorithms from > 100 unique 3rd parties
5 - 17
Whats Next?
Whats Next?
Optimizing C Performance
Attend a four-day workshop (see next slide) Review the Compiler Tutorial
See tutorials in CCS online help, or
http://www.ti.com/sc/c6000compiler
Read:
C6000 Programmers Guide (SPRU198) Cache Memory Users Guide (SPRU656) C6000 Optimizing C Compiler Users Guide (SPRU187)
Look through the many application notes at:

http://www.dspvillage.com
DSP Workshops Available from TI

Attend another workshop:
4-day C2000 Workshops 4-day C5000 Integration Workshops 4-day C6000 Integration Workshop 4-day C6000 Optimization Workshop 4-day DSP/BIOS Workshop 4-day OMAP Software Workshop 1-day versions of these workshops 1-day Reference Frameworks and XDAIS
Sign up at:
5 - 18
Whats Next?
C6000 Workshop Comparison

Audience Algorithm Coding and Optimization System Integration (data I/O, peripherals, real-scheduling, etc.) C6000 Hardware CPU Architecture & Pipeline Details Using Peripherals (EDMA, McBSP, EMIF, HPI, XBUS) Tools Compiler Optimizer, Assembly Optimizer, Profiler, PBC CSL, Hex6x, Absolute Lister, Flashburn, BSL Coding & System Topics C Performance Techniques, Adv. C Runtime Environment Calling Assembly From C, Programming in Linear Asm Software Pipelining Loops DSP/BIOS, Real-Time Analysis, Reference Frameworks Creating a Standalone System (Boot), Programming DSK Flash IW6000 OP6000
Getting Started with TI DSP

www.ti.com is your starting point Sign up for Training
1 day or 4 day workshops 1 day DSK workshops C2000, C5000, C6000 DSP/BIOS eXpressDSP
dspvillage.ti.com
Getting Started Discussion Groups DSP Knowledge Base Third Party Network eXpressDSP Guided Tour
analog.ti.com
Design Resources Technical Documents Solution/Selection Guides
Applications Solutions
Find complete solutions for your application including: DSP, Analog, Boards Target Software, Development tools, third party support
Install Code Composer Studio Free Evaluation Tools (FET) from the Essential Guide to DSP CD Check out the DSP Selection Guide, its your consolidated resource for all pertinent information
5 - 19
Whats Next?
For More Information . . .

Internet
Website: http://www.ti.com http://www.dspvillage.com FAQ: http://www-k.ext.ti.com/sc/technical_support/knowledgebase.htm Device information my.ti.com Application notes News and events Technical documentation Training Enroll in Technical Training: http://www.ti.com/sc/training
USA - Product Information Center ( PIC )

Phone: 800-477-8924 or 972-644-5580 Email: support@ti.com Information and support for all TI Semiconductor products/tools Submit suggestions and errata for tools, silicon and documents
European Product Information Center (EPIC)

Web: http://www-k.ext.ti.com/sc/technical_support/pic/euro.htm Phone: Language
Belgium (English) France Germany Israel (English) Italy Netherlands (English) Spain Sweden (English) United Kingdom Finland (English)
Number
+32 (0) 27 45 55 32 +33 (0) 1 30 70 11 64 +49 (0) 8161 80 33 11 1800 949 0107 (free phone) 800 79 11 37 (free phone) +31 (0) 546 87 95 45 +34 902 35 40 28 +46 (0) 8587 555 22 +44 (0) 1604 66 33 99 +358(0) 9 25 17 39 48
Fax: All Languages Email: epic@ti.com
+49 (0) 8161 80 2045
Literature, Sample Requests and Analog EVM Ordering Information, Technical and Design support for all Catalog TI Semiconductor products/tools Submit suggestions and errata for tools, silicon and documents
5 - 20
Whats Next?
Looking for Literature on DSP?

A Simple Approach to Digital Signal Processing by Craig Marven and Gillian Ewers; ISBN 0-4711-5243-9 DSP Primer (Primer Series) by C. Britton Rorabaugh; ISBN 0-0705-4004-7 A DSP Primer : With Applications to Digital Audio and Computer Music by Ken Steiglitz; ISBN 0-8053-1684-1 DSP First : A Multimedia Approach James H. McClellan, Ronald W. Schafer, Mark A. Yoder; ISBN 0-1324-3171-8
Looking for Literature on C6000 DSP?

Digital Signal Processing Implementation using the TMS320C6000TM DSP Platform by Naim Dahnoun; ISBN 0201-61916-4
C6x-Based Digital Signal Processing by Nasser Kehtarnavaz and Burc Simsek; ISBN 0-13-088310-7 DSP Applications Using C and the TMS320C6x DSK by Rulph Chassaing; ISBN 0471207543
5 - 21
Before Leaving
Before Leaving
Lets Go Home
Thanks for your valuable time today Please fill out an evaluation and let us know how we could improve this class If you purchased a DSK:
Make sure you pack up (or receive) your DSK before leaving If available, you may keep the earbud headphones and audio patch cable
Workshop lab and solutions files will be available via CDROM or the Internet. Please check with your instructor.
5 - 22
C6000 Workshops Comparison Table

Legend
IW6000 = C6000 Integration Workshop OP6000 = C6000 Optimization Workshop Topic Discussed Topic Only Discussed Briefly Includes A Hands-On Lab Exercise Not Discussed
Target Attendee
System Integration (data input/output, peripherals, real-scheduling, etc.) Algorithm Development and Optimization
IW6000
OP6000

IW6000 OP6000
C6000 Hardware
CPU CPU Architecture Details CPU Pipeline Details Peripherals C6000 Peripherals Overview Using CSL (Chip Support Library) to program peripherals DMA/EDMA (Direct Memory Access ) Serial Port (McBSP) External Memory Interface (EMIF) Host Port Interface (HPI) XBUS Memory Basic Memory Management Advanced Memory Management Using Overlays Multiple Heaps Via DSP/BIOS C6000 Cache Cache Optimization
+ + + + + + + + + +
Development Tools
Code Composer Studio DSP/BIOS Configuration Tool C6711 DSP Starter Kit (DSK) C6000 Simulator Compiler Options for Optimization Assembly Optimizer Profile Based Compiler (PBC) Absolute Lister Hex6x Utility FlashBurn C6711 Board Support Library (BSL)
IW6000
OP6000
+ + + + + +
IW6000
+ + + + + +
Coding
Building Code Composer Studio Projects Compiler Build Options Running C programs C Coding Efficiency Techniques Writing / Optimizing Assembly Linear Assembly Coding Calling Assembly from C Software Pipelining Techniques Numerical Issues with Fixed Point Processors C Runtime Environment (stack pointer, global pointer, etc.) C Optimization (pragmas and other techniques)
OP6000
+ +
+ + + + + + + + +
System Topics
DSP/BIOS Real-Time Scheduler DSP/BIOS Real-Time Analysis (LOG, STS) Reference Frameworks Double-Buffers For Data Input/Output Creating A Bootable Standalone System (Boot Without Emulator) Programming Flash Memory Interrupt Basics Advanced Interrupt Topics Interruptibility of High-Performance C Code XDAIS ( eXpressDSP Algorithm Standard) Introduction
IW6000
OP6000
+ + + + + + +
Who Should Attend

The C6000 Optimization Workshop (OP6000) is primarily for software engineers writing code and algorithms for the C6000 family. It will also be useful for system designers evaluating the C6000s CPU architecture. The C6000 Integration Workshop (IW6000) may better suit your needs if you are tasked with building a system around the C6000. In this case you may need to know about: system design, using the C6000 peripherals to move data on/off-chip, scheduling real-time code, and design your DSPs boot-up procedure.
The C6000 Integration Workshop (IW6000) is not a prerequisite to this workshop, though if you are looking for a broad introduction to all aspects of building a C6000 based system, the Integration Workshop might be a better choice. On the other hand, if you are evaluating the C6000 CPU architecture or want to learn how to write better C and assembly code for the C6000, this workshop (OP6000) would be the best choice. (Please refer to the C6000 Workshop Comparison for differences between the two workshops.)
Bottom Line:
w If you're main goal is to understand the C6000 architecture and write optimized software for it, then the C6000 Optimization Workshop (OP6000) is the best one to attend. Peripherals and other system foundation software (DSP/BIOS, XDAIS, CSL) are only peripherally mentioned. Many software engineers are tasked with getting their algorithms to run ... and run as fast as possible. This course is well designed to handle these issues. On the other hand, if you need to figure out how to get an entire system working -- from programming the peripherals to get data in/out all the way to burning the Flash memory with your final program -- the C6000 Integration Workshop (IW6000) is the ticket. Along the way you'll be introduced to (and use in lab exercises) many of the TI Software Foundation tools (DSP/BIOS, XDAIS, CSL, BSL, and Reference Frameworks). This is probably the single best course for an engineer/programmer that is new to the C6000 DSP and needs to get a whole system running, as opposed to just optimizing one or two algorithms. Of course, some engineers will need to handle both of these jobs. Get everything running and optimize their software algorithms. In that case, you may want to take both workshops.
TMS320C6000 DSP Platform Update

Revised: July 28, 2003
Product Info / Tech Support / Literature: Texas Instruments Website: DSP Knowledge Base:
North America support@ti.com or (972) 644-5580 Europe epic@ti.com http://www.ti.com or http://www.dspvillage.com http://www-k.ext.ti.com/sc/technical-support/knowledgebase.htm
C6000 SILICON BUDGETARY PRICING, SPECIFICATIONS & AVAILABILITY

Pricing reflects year 2003 SUGGESTED RESALE and is subject to change. Please consult your preferred TI distributor for formal quotation requests. Prototype and production availability dates do not include product lead-times and are subject to change. Standard production lead-times are 10-12 weeks.
TMS320C62x Fixed-Point Digital Signal Processors

Device MIPS MHz Internal Memory External Memory (EMIF) (6) Peripheral Port (8) DMA (Channels) McBSP Timer / Counters Core Voltage I/O Voltage Package(s) (9) TMS - prod TMS 1,000u Device MFLOPS (MIPS) MHz Internal Memory External Memory (EMIF)(6) Peripheral Port(8) DMA (Channels) McBSP Timer/Counters Core Voltage I/O Voltage Package(s) (9) TMS - prod TMS 1,000u C6201 1600 200 Prog:64KB (1) Data:64KB 32-bit 52MB (4 CE) 16-bit HPI Standard (4+1) 2 2 1.8V 3.3V GJC or GJL NOW $82.70 C6701 1000/900 167/150 Prog: 64KB(1) Data: 64KB 32-bit 52MB (4 CE) 16-bit HPI Standard(3) (4+1) 2 2 1.9V / 1.8V 3.3V GJC NOW $113.13/78.57
(3)
C6202 2000/1600 250/200 Prog:256KB (1) Data:128KB 32-bit 52MB (4 CE) 32-bit XBUS Standard (4+1) 3 2 1.8V 3.3V GJL or GLS NOW $110.08 / $94.03 C6711B 900/600 150/100 L1 Prog: 4KB(2) L1 Data: 4KB(2) L2 P/D: 64KB(2) 32-bit 512MB (4 CE) 16-bit HPI Enhanced(4) (16+1+1) 2 2 1.8V 3.3V GFN NOW $30.77 / $21.54
(3)
C6202B 2400/2000 300/250 Prog:256KB (1) Data:128KB 32-bit 52MB (4 CE) 32-bit XBUS Standard (4+1) 3 2 1.5V 3.3V GNY or GNZ NOW $67.14 / $55.95 C6711C 1200 200 L1 Prog: 4KB(2) L1 Data: 4KB(2) L2 P/D: 64KB(2) 32-bit 512MB (4 CE) 16-bit HPI Enhanced(4) (16+1+1) 2 2 1.2V 3.3V GDP NOW (TMX) $21.55
(3)
C6203B 2400/2000 300/250 Prog:384KB (1) Data:512KB 32-bit 52MB (4 CE) 32-bit XBUS Standard (4+1) 3 2 1.5V 3.3V GNY or GNZ NOW $71.62 / $60.43 C6712 600 100 L1 Prog: 4KB(2) L1 Data: 4KB(2) L2 P/D: 64KB(2) 16-bit 512MB (4 CE) ---Enhanced(4) (16+1+1) 2 2 1.8V 3.3V GFN NOW $19.87
(3)
C6204 1600 200 Prog:64KB (1) Data:64KB 32-bit 52MB (4 CE) 32-bit XBUS Standard (4+1) 2 2 1.5V 3.3V GHK or GLW NOW $9.95 / $20.92 C6712C 900 150 L1 Prog: 4KB(2) L1 Data: 4KB(2) L2 P/D: 64KB(2) 16-bit 512MB (4 CE) ---Enhanced(4) (16+1+1) 2 2 1.2V 3.3V GDP NOW (TMX) $14.95
(3)
C6205 1600 200 Prog:64KB (1) Data:64KB 32-bit 52MB (4 CE) 32-bit PCI Standard (4+1) 2 2 1.5V 3.3V GHK NOW $10.74
(3)
C6211B 1336/1200 167/150 L1 Prog:4KB (2) L1 Data:4KB (2) L2 P/D:64KB (2) 32-bit 512MB (4 CE) 16-bit HPI Enhanced (4) (16+1+1) 2 2 1.8V 3.3V GFN NOW $26.93 / $21.54 VC33 (5) 150 / 120 75 / 60 P: 256B cache P/D: 136KB 32-bit 16M x 32 (4 CE) ---C3x DMA(1) 1 (not McBSP) 2 1.8V 3.3V PGE NOW $13.38 / $11.15
TMS320C67x and TMS320VC33 Floating-Point Digital Signal Processors

C6713 1350/900 225/200 L1 Prog: 4KB L1 Data: 4KB L2 P/D: 256KB 32-bit 512MB (4 CE) 16-bit HPI Enhanced(4) (16+1+1) 2 (or McASP)* 2 1.2V 3.3V GDP/PYP NOW (TMX) $28.99/$22.35
* The C6713 DSP can be configured to have up to three serial ports in various McASP/McBSP combinations by not utilizing the HPI. Other configurable serial options include IC and additional GPIO. There are 16 GPIO pins.
www.dspvillage.com
Page 1 of 4
TMS320C64x Fixed-Point Digital Signal Processors

Device MIPS MHz Internal Memory External Memory (EMIF) C6411 2400 300 L1 Prog: 16KB L1 Data: 16KB L2 P/D: 256KB 32-bit, 256MB (4CE) 16/32-bit HPI or 32-bit 66MHz PCI or 16-bit HPI + EMAC Enhanced (64) C6412 4000 / 4800 500 / 600 L1 Prog: 16KB L1 Data: 16KB L2 P/D: 256KB 64-bit, 1024MB C6414 4000 / 4800 / 5760 500 / 600 / 720 L1 Prog: 16KB L1 Data: 16KB L2 P/D: 1MB 64-bit, 1GB (4 CE) and 16-bit, 256MB (4 CE) C6415 4000 / 4800 / 5760 500 / 600 / 720 L1 Prog: 16KB L1 Data: 16KB L2 P/D: 1MB 64-bit, 1GB (4 CE) and 16-bit, 256MB (4 CE) 16/32-bit HPI or 32-bit 33MHz PCI Enhanced (64) 3 standard - or 2 standard + Utopia 2 C6416 4000 / 4800 / 5760 500 / 600 / 720 L1 Prog: 16KB L1 Data: 16KB L2 P/D: 1MB 64-bit, 1GB (4 CE) and 16-bit, 256MB (4 CE) 16/32-bit HPI or 32-bit 33MHz PCI Enhanced (64) 3 standard - or 2 standard + Utopia 2 Viterbi Decoder(VCP) Turbo Decoder (TCP) 3 16 1.2V (500MHz) 1.4V (600MHz) 3.3V GLZ TMS320C6416GL Z NOW $105.89/$145.73 DM642 4000 / 4800 500 / 600 L1 Prog: 16KB L1 Data: 16KB L2 P/D: 256KB 64-bit, 1024MB DM641 4000 / 4800 500 / 600 L1 Prog: 16KB L1 Data: 16KB L2 P/D: 128KB 32-bit, 256MB DM640 3200 400 L1 Prog: 16KB L1 Data: 16KB L2 P/D: 128KB 32-bit, 256MB
Peripheral Port (8) DMA (Channels)
16/32-bit HPI or 32-bit 33MHz PCI Enhanced (64)
16/32-bit HPI
Enhanced (64)
McBSP
2 Standard
2 Standard
3 standard
16/32-bit HPI or 32-bit 66MHz PCI or 16-bit HPI + EMAC Enhanced (64) 3 20-bit Video Ports (VP) or 1 20-bit VP + 2 10bit VP + 2 McBSP + 1 8-bit McASP ---3 16 1.2 (500MHz) 1.4 (600 MHz) 3.3V GDK/GNZ TMX320DM642 NOW/ 4Q03 $63.08 (TMX)
16-bit HPI or 10/100Mbit EMAC
10/100 Mbit EMAC
Enhanced (64) 2 8-bit Video Ports + 2 McBSP + 1 4bit McASP
Enhanced (64) 1 8-bit Video Port + 2 McBSP + 1 4bit McASP
---H/W Accelerators Timer/ Counters GPIO Core Voltage I/O Voltage Package(s) (9) Part Number TMX / TMS TMS 1,000u (1) (2) (3) (4) (5) (6) (7) (8) (9) ---3 16 1.2 (500MHz) 1.4 (600 MHz) 3.3V GDK/GNZ TMX320C6412G DK NOW / 4Q03 $56.07 (TMX) -------
---3 8 1.2 (500MHz) 1.4 (600 MHz) 3.3V GDK/GNZ TMX320DM641 4Q03 / 1Q04 $45.82 (TMX)
---3 8 1.2 (400MHz) 3.3V GDK/GNZ TMX320DM640 4Q03 / 1Q04 $28.00 (TMX)
2(7) 16 1.2V 3.3V GLZ TMS320C6411G LZ NOW $42.21
3 16 1.2V (500MHz) 1.4V (600MHz) 3.3V GLZ TMS320C6414G LZ NOW $87.43/$107.84
3 16 1.2V (500MHz) 1.4V (600MHz) 3.3V GLZ TMS320C6415G LZ NOW $96.18/$131.16
Notes:
C6201/C6204/C6205/C6701 internal program memory can be configured as cache or addressable RAM. C6202/C6203 allows 512Kb to be programmed as cache or addressable RAM, the balance is always addressable RAM. L1 data cache and L1 program cache are always configurable as cache memory. L2 is configurable between SRAM and cache memory. DMA has 4 fully configurable channels, plus one dedicated to host for HPI transfers. C6211/C6711/C6712 Enhanced DMA (EDMA) has 16 fully configurable channels. Additionally, there is an independent singlechannel quick DMA (QDMA) and a channel dedicated to the host for HPI transfers. VC33 is an upgrade TIs C3x family. While not a C6000 device, it is part of TIs floating-point family. Each Chip Enable (CE) allows the user to assign a specific memory space. A third timer is on-chip but not pinned-out. Host Port Interface (HPI) is slave-only async host access. Expansion Bus (XBUS) is master/slave async or sync interface; operates in host or FIFO/Memory modes. These devices are Pin-for-Pin compatible: (Note, be aware of voltage differences.) (GJC) C6201/C6701 (GJL, GNZ) C6202/C6203, (GLS, GNY, GLW) C6202/C6203/C6204 (GFN) C6211/C6711/C6712 (GLZ) C6411/C6414/C6415/C6416 (GDP) C6713/C6711C/C6712C (GDK, GNZ) C6412/DM642/DM641/DM640
Packages:
GGP= 35mm x 35mm, 1.27mm ball pitch 352-pin BGA GFN = 27mm x 27mm, 1.27mm ball pitch 256-pin BGA GLS = 18mm x 18mm, 0.8mm ball pitch 384-pin BGA PGE = 20mm x 20mm, 0.5mm pitch, 144-pin TQFP PYP = 28mm x 28mm, 0.5mm pitch, 208-pin PQFP GNY = Same as GLS GDP = 27mm x 27mm, 1.27mm ball pitch, 272-pin BGA GJC GJL GHK GLW GLZ GNZ GDK = = = = = = = 35mm x 35mm, 1.27mm ball pitch, 352-pin BGA 27mm x 27mm, 1.0mm ball pitch 352-pin BGA 16mm x 16mm, 288-pin Star BGA 18mm x 18mm, 340-pin BGA 23mm x 23mm, 0.8mm ball pitch, 532-pin BGA Same as GJL 23mm x 23mm, 0.8mm ball pitch, 548-pin BGA
www.dspvillage.com
Page 2 of 4
C6000 DEVELOPMENT TOOLS

Please note that all C6000 tools support all C6000 family members (C62x, C67x, and C64x DSP CPUs) unless otherwise noted.
All tools support Windows 98/2000/NT and Windows XP.
C6000 HARDWARE DEVELOPMENT TOOLS

Development Tool DM642 EVM DM64x DMDK C6713 DSK C6711 DSK C6000 TCP/IP NDK C6711 IDK C6416 DSK C6416 NVDK C6416 NVDK Bundle C6701 EVM Bundle C6701 EVM Board XDS510PP-Plus JTAG Emulator XDS510 USB based emulator for Windows XDS560 JTAG Emulator Part Number TMDXEVM642 TMDXEVM642-OE TMDXDMK642 TMDXDMK642-OE TMDSDSK6713 TMDSDSK6713-0E TMDS320006711 TMDS320006711E TMDX320036711 TMDX320036711E TMDX320026711 TMDX320026711E TMDSDSK6416 TMDSDSK6416-OE TMDX3PNV6416SE TMDX3PNV6416S NVDKCCS NVDKCCSE TMDS3260D6701 TMDS3260C6701 TMDSEMUPP TMDSEMUPP-OE TMDSEMUUSB TMDSEMUUSB-0E TMDXEMU560 TMDX3260C6416 TMDX3260C6416E Includes DM642 EVM Baseboard, CCS v2.20.18 patch (CCS 2.0 required), Quick Start Guide, Technical Reference DM642 EVM Baseboard, CCS v2.20 (for DM64x only), XDS560 PCI, NTSC or PAL Camera C6713 DSK, DSK CCS v2.2 including fast simulators.* C6711 DSK, DSK Code Composer v2.1 * Planned EOL--Being replaced by C6713 DSK C6000 TCP/IP Network Developers Kit (NDK) * Not being updated to ccs v2.2 Planned EOL. C6711 Imaging Developers Kit (IDK) * Not being updated to ccs v2.2 Planned EOL. C6416 DSK, DSK Code Composer Studio v2.2 including fast 1 simulators and trace header* C6416 Network and Video Development Kit Board Only * C6416 Network and Video Development Kit with CCS & Spectrum Digital 510PP+ * C6701 EVM with Code Composer Studio C6701 EVM Board Only Emulator with Parallel Port connection, JTAG cable Emulator with USB connection, JTAG Cable
1
Price $1995 $6495 $395 $395 $995 $4500 $395 $4495 $5995 $3495 $1995 $1500 $1995
PCI-bus JTAG Scan-Based Emulator $3995 C6416 Test Evaluation Board Only * C6416 TEB $1995 Planned EOL for this product replaced by 6416 DSK C6416 Test Evaluation Board bundled with CCS & Spectrum TMDX3260E6416 Digital 510PP+ * Planned EOL for this product replaced by C6416 TEB Bundle $3995 TMDX3260E6416E 6416 DSK rd Additional hardware development tools are provided by TIs large assortment of Third Parties. See the 3 Party resource link below. * E is European version CCS only works with the DSK. Does not include simulation and has 256K word program space memory limitation. CCS only works with the DSK. Does not include simulation however there is no memory limitation. Full version of CCS.
C6000 SOFTWARE DEVELOPMENT TOOLS

Code Composer Studio (CCS) is an integrated development environment (IDE) consisting of the Code Generation Tools (C compiler, assembler and linker), Debugger, Simulator, XDS-510 JTAG emulator drivers, Real-Time Data Exchange (RTDX) extensions, and the DSP/BIOS run-time environment. Tool Description Part Number Components Price C6000 Code Composer Studio V2.2 (Windows 98/NT/2000) includes first year of annual subscription C6000 Code Composer Studio Annual S/W Subscription software development tools Chip Support Library (TI Home > DSP Village Home > Software > Peripheral Drivers > Chip Support Libraries) DSP Library, Image/Video Processing Library, FastRTS Library (TI Home > DSP Village Home > Software > Signal Processing Libraries > C6000 Libraries) TMDSCCS6000-1 TMDSSUB6000 IDE, Code Gen Tools, Debugger, Simulator, DSP/BIOS and RTDX, DSK, EVM and XDS510 drivers Product Upgrades, Updates and Special Utilities $3595
$599
* Specific upgrades to Code Composer Studio available to users with a current registration for previous versions of TI TOOL LINKS
www.dspvillage.com
Page 3 of 4
C6000 TECHNICAL DOCUMENTATION

All released technical documentation & application notes can be found by referencing one of the following web sites: http://www.ti.com/sc/docs/general/dsp/docsrch.htm or http://dspvillage.ti.com
GENERAL
TMS320C6000 Technical Brief TMS320C64x Technical Overview TMS320C6711C Migration Document
NUMBER
SPRU197d SPRU395b SPRA837
REVISED
02/1999 01/2001 08/2002
LOCATION
http://www-s.ti.com/sc/psheets/spru197d/spru197d.pdf http://www-s.ti.com/sc/psheets/spru395b/spru395b.pdf http://www-s.ti.com/sc/psheets/spra837/spra837.pdf
TMS320C6000 HARDWARE GUIDES

C6000 CPU and Instruction Set Reference Guide Update for TMS320C6000 CPU Guide (SPRU189F) C6000 Peripherals Reference Guide C62x/C64x FastRTS Library Programmers Reference C6000 Instruction Set Simulator Technical Overview C6000 DSP Multichannel Audio Serial Port (McASP) C6000 DSP I2C Module Reference Guide C6000 Phase-Locked Loop (PLL) Controller
NUMBER
SPRU189f SPRZ168b SPRU190d SPRU653 SPRU600a SPRU041b SPRU175a SPRU233a
REVISED
11/2000 08/2001 02/2001 02/3002 12/2002 05/2003 10/2002 04/2003
LOCATION
http://www-s.ti.com/sc/psheets/spru189f/spru189f.pdf http://www-s.ti.com/sc/psheets/sprz168b/sprz168b.pdf http://www-s.ti.com/sc/psheets/spru190d/spru190d.pdf http://focus.ti.com/lit/ug/spru653/spru653.pdf http://focus.ti.com/lit/ug/spru600a/spru600a.pdf http://focus.ti.com/lit/ug/spru041b/spru041b/pdf http://foucs.ti.com/lit/ug/spru175a/spru175a.pdf http://focus.ti.com/lit/ug/spru233a/spru233a.pdf
TMS320C6000 TOOLS GUIDES

C6000 Programmers Guide C6000 Optimizing C Compiler UG C6000 Assembly Language Tools UG Code Composer Studio Users Guide C6000 Code Composer Studio Tutorial C6000 DSP/BIOS Users Guide TMS320C6000 DSP/BIOS App Programming I/F (API) TMS320 DSP Standard Algorithm Developer's Guide TMS320 DSP Algorithm Standard API Reference C Source Debuggers UG for SPARCstations Chip Support Library API Users Guide
NUMBER
SPRU198g SPRU187i SPRU186i SPRU328b SPRU301c SPRU303b SPRU403d SPRU424b SPRU360b SPRU224 SPRU401d
REVISED
08/2002 04/2001 04/2001 02/2000 02/2000 05/2000 12/2001 01/2002 03/2002 01/1997 04/2002
LOCATION
http://www-s.ti.com/sc/psheets/spru198g/spru198g.pdf http://www-s.ti.com/sc/psheets/spru187i/spru187i.pdf http://www-s.ti.com/sc/psheets/spru186i/spru186i.pdf http://www-s.ti.com/sc/psheets/spru328b/spru328b.pdf http://www-s.ti.com/sc/psheets/spru301c/spru301c.pdf http://www-s.ti.com/sc/psheets/spru303b/spru303b.pdf http://www-s.ti.com/sc/psheets/spru403d/spru403d.pdf http://www-s.ti.com/sc/psheets/spru424b/spru424b.pdf http://www-s.ti.com/sc/psheets/spru360b/spru360b.pdf http://www-s.ti.com/sc/psheets/spru224/spru224.pdf http://www-s.ti.com/sc/psheets/spru401d/spru401d.pdf
TMS320C6000 DATA SHEETS (*)
NUMBER
REVISED
LOCATION
C6201 Data Sheet SPRS051g 11/2000 http://www-s.ti.com/sc/ds/tms320c6201.pdf C6202 Data Sheet SPRS104c 08/2002 http://www-s.ti.com/sc/ds/tms320c6202.pdf C6203B Data Sheet SPRS086g 08/2002 http://www-s.ti.com/sc/ds/tms320c6203b.pdf C6204 Data Sheet SPRS152a 06/2001 http://www-s.ti.com/sc/ds/tms320c6204.pdf C6205 Data Sheet SPRS106c 06/2001 http://www-s.ti.com/sc/ds/tms320c6205.pdf C6211/C6211B Data Sheet SPRS073f 09/2001 http://www-s.ti.com/sc/ds/tms320c6211.pdf C6701 Data Sheet SPRS067e 05/2000 http://www-s.ti.com/sc/ds/tms320c6701.pdf C6711/C6711B/C6711C Data Sheet SPRS088c 10/2002 http://www-s.ti.com/sc/ds/tms320c6711.pdf C6712/C6712C Data Sheet SPRS148a 10/2002 http://www-s.ti.com/sc/ds/tms320c6712.pdf C6713 Data Sheet SPRS186 12/2001 http://www-s.ti.com/sc/ds/tms320c6713.pdf C6411 Data Sheet SPRS196 03/2002 http://www-s.ti.com/sc/ds/tms320c6411.pdf C6414 Data Sheet SPRS134c 09/2001 http://www-s.ti.com/sc/ds/tms320c6414.pdf C6415 Data Sheet SPRS146c 09/2001 http://www-s.ti.com/sc/ds/tms320c6415.pdf C6416 Data Sheet SPRS164c 09/2001 http://www-s.ti.com/sc/ds/tms320c6416.pdf DM642 Data Sheet SPRS200a 04/2003 http://www-s.ti.com/sc/ds/tms320dm642.pdf VC33 Data Sheet SPRS087b 07/2002 http://www-s.ti.com/sc/ds/tms320vc33.pdf (*) For Military C6000 information and data sheets, please visit: http://www.ti.com/sc/docs/products/military/processr/index.htm
TI DSP Training Information

Please visit the training webpage for a full details and schedules: http://focus.ti.com/docs/training/traininghomepage.jhtml
Workshops
C6416/C6713 One-Day Workshop C6000 Integration Workshop (IW6000) C6000 Optimization Workshop (OP6000) DSP/BIOS Design Workshop
Length
1 day 4 days 4 days 4 days
ADDITIONAL ONLINE RESOURCES TI Monthly DSP Customer Technology Webcasts: FTP Site: ftp://ftp.ti.com/mirrors/tms320bbs http://www.ti.com/sc/webcasts TI & ME Online Sample Requests https://www-a.ti.com/apps/ti_me/ti_me.asp Tech Online University: Software Upgrades & Registration / Hardware Repair & Upgrades (972) 293-5050 / (281) 274-2285 http://www.ti.com/sc/docs/training/techonline.htm C6000 Platform Benchmarks: http://www.ti.com/sc/docs/products/dsp/c6000/benchmarks/index.htm Network Video Developers Kit (NVDK): Data Converters and Power Solutions http://www.ti.com/sc/docs/msp/dsps.htm http://ti-training.com/courses/coursedescription.asp?iCSID=1250 www.dspvillage.com Page 4 of 4
ONLINE TRAINING

C 60001 Day

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

C 60001 Day

Uploaded by

Copyright:

Available Formats

ti

TMS320C6416/C6713 DSK One-Day Workshop

Revision 3.1 August 2003

C6416/C6713 DSK One-Day Workshop

C6416/C6713 DSK One-Day Workshop

C6416/C6713 DSK One-Day Workshop

Welcome Intro to C6000 and CCS

Using C6000 Peripherals

eXpressDSP TIs System Solution

Optimizing C6000 Code Wrap Up

C6416/C6713 DSK One-Day Workshop

Please Introduce Yourself

Please Introduce Yourself

C6416/C6713 DSK One-Day Workshop

TI DSP and C6x Family Positioning

TI DSP and C6x Family Positioning

Programming Interfacing Debugging

Device cost System cost Development cost Time to market

C6416/C6713 DSK One-Day Workshop

TI DSP and C6x Family Positioning

Different Needs? Multiple Families

C3x C4x C8x

Max Performance with Best Ease-of-Use

C6416/C6713 DSK One-Day Workshop

TI DSP and C6x Family Positioning

47 Products AUP: $3 - $15

New generation C28x DSP products fully code compatible

92 Products AUP: $5 - $120

Best DSP of 2001

13 Products AUP: $9.95 - $250

New generation C64x DSP products fully code compatible

C6416/C6713 DSK One-Day Workshop

C6416 C6416 C6415 C6415 DM642 DM642

C6204 C6205 C6711 C6712

Fix and Float in the Same Family

C6416/C6713 DSK One-Day Workshop

For More Information and Support

For More Information and Support

For More Information . . .

USA - Product Information Center ( PIC )

European Product Information Center (EPIC)

Fax: All Languages Email: epic@ti.com

+49 (0) 8161 80 2045

C6416/C6713 DSK One-Day Workshop

Key C6000 Literature

Key C6000 Literature

Key C6000 Manuals

CPU and Instruction Set Ref. Guide Peripherals Ref. Guide

Code Generation Tools

C6416/C6713 DSK One-Day Workshop

For Information about Digital Signal Processing

For Information about Digital Signal Processing

Textbooks on using the C6000

C6416/C6713 DSK One-Day Workshop

and finally, Workshops from TI

and finally, Workshops from TI

C6000 Workshop Comparison

C6416/C6713 DSK One-Day Workshop

Intro to C6000 and CCS

C6416/C6713 DSK One-Day Workshop - Intro to C6000 and CCS

Connecting to a C6000 Device

C6416/C6713 DSK One-Day Workshop - Intro to C6000 and CCS