Lecture 5 Fixed Point Vs Floating Point Q-Format Number Representation

Q-Format number representation
Lecture 5 Fixed Point vs Floating Point

N-bit fixed point, 2s complement number is given by:
x = bN 1 2 N 1 + bN 2 2 N 2 + + b1 21 + b0 20
Objectives:
N-1
Understand fixed point representations

Understand scaling, overflow and rounding in fixed point
Understand Q-format
Understand TMS320C67xx floating point representations
Understand relationship between the two in C6x architecture
S
imaginary
binary point
Difficult to work with due to possible overflow & scaling problems

Often normalise number to some fractional representation (e.g.
between 1)
x = bN 1 20 + bN 2 2 1 + + b1 2 N 2 + b0 2 N 1
Reference: "What Every Computer Scientist Should Know About Floating-Point

Arithmetic" by David GoldbergACM Computing Surveys 23, 5 (March 1991).
N-1
S
imaginary
binary point
Lecture 5 - Fixed point vs Floating point
5-1
Q-format notation
Q-format representation:
if N=16, 15 bit fractional representation
Rule:
Q m + Qm
Qm x Q n
Q
Q
Storing Q30 number to 16-bit memory requires rounding or

truncation:
Q15 format
31
Q30
Q15
16
00
0000
0000
0000
Rounding:
if r = 0, round down,
r = 1, round up
15
r
rounding by addition a '1' here
16
15
31
Q30
m+n
15
5-2
How to store Q30 number to 16-bit memory?
Assume 16-bit data format, Q15 x Q15 Q30

Q15
MPY A3,A4,A6
NOP
ADDK 4000h,A6
SHR A6,15,A6
STH A6,*A7
; A3 x A4 A6
; Delay slot
; rounding add
; truncate bottom 15 bits
; A6 mem[A7]
5-3
5-4
Avoid overflow with SADD
Safe add routine in C to avoid overflow
SADD - saturation add instruction

Always clip to max (or min) possible
Set bit 9 of the CSR register to indicate saturation has occurred
5-5
Single Precsion Floating Point number
single
precision
23 22
8-bit exp
64-bit double precision floating point:

31 30
double
precision
23-bit frac
20 19
11-bit exp
Odd register (e.g. A5)
x = 1s 2 exp 127 1. frac
0 31
52-bit frac
Even register (e.g. A4)
x = 1s 2 exp 1023 1. frac
1.175 1038 < x < 1.7 1038
2.2 10308 < x < 1.7 10308
MSB is sign-bit (same as fixed point)

8-bit exponent in bias-127 integer format (i.e., add 127 to it)
23-bit to represent only the fractional part of the mantissa. The
MSB of the mantissa is ALWAYS 1, therefore it is not stored
5-6
Double Precision Floating Point number
Easy (and lazy) way of dealing with scaling problem

32-bit single precision floating point:
31 30
MSB is sign-bit (same as fixed point)

11-bit exponent in bias-1023 integer format (i.e., add 1023 to it)
52-bit to represent only the fractional part of the mantissa. The
MSB of the mantissa is ALWAYS 1, therefore it is not stored
5-7
5-8
Examples
Problems of Q-format
Convert 5.75 to SP FP
Wrong Q-format representation will give totally wrong results

Even correct use of Q-format notation may reduce precision
For this example, Q12 result is totally wrong, and Q8 result is
imprecise:
22
5.75 to binary: +1.01110000... x

exponent in bias-127 is 127+2 = 129 = 1000 0000b
The fractional part is .01110000... after we drop the hidden 1 bit.
Answer: 0 10000001 0111000 00...00 = 40B80000 (hex)
Convert 0.1 to DP FP
0.1 to binary: 1.10011001(1001 repeats) x 2-4
exponent in bias-1023 is 1023-4 = 1019 = 011 1111 1011b
The fractional part is .10011001...1010 after we drop the hidden 1 bit and
rounding
Answer: 0 01111111011 1001100 ...1001 1010 = 3FB9 9999 9999 999A
(hex).
5-9
Q12 7.50195
Q12 7.25
Q24 54.38916
0111. 1000 0000 1000

* 0111. 0100 0000 0000
0110 0110. 0110 0011 1010 0000 0000 0000
Q12 6.38916
Q8 54.38281
Data types used by C6x DSPs
5 - 10
Special SP numbers
IEEE floating point standard has a set of special numbers:
Special
value
+0
-0
1
2
+Inf
-Inf
NaN
LFPN
SFPN
5 - 11
Sign (s)
0
1
0
0
0
1
x
0
0
Exponent
(e)
0
0
127
128
255
255
255
254
1
Fraction
(f)
0
0
0
0
0
0
Nonzero
All 1s
All 0s
Hex
Value
Decimal
value
0x0000 0000
0.0
-0.0
1.0
2.0
+
-
not a number
3.40282347 e+38
1.17549435e-38
0x8000 0000
0x3F80 0000
0x4000 0000
0x7F80 0000
0xFF80 0000
0x7FFF FFFF
0x7F7F FFFF
0x0080 0000
5 - 12
Special DP numbers
TMS320C67x Internal System Architecture
Double precision floating point special numbers:

Fraction
(f)
0
0
0
0
0
0
Nonzero
All 1s
All 0s
Hex
Value
Decimal
value
P
E
R
I
P
H
E
R
A
L
S
External
Memory
0x0000 0000 0000 0000
0.0
0x8000 0000 0000 0000
-0.0
0x3FF0 0000 0000 0000
1.0
0x4000 0000 0000 0000
2.0
0x7FF0 0000 0000 0000
0xFFF0 0000 0000 0000

0x7FFF FFFF FFFF FFFF
-
not a number
0x7FEF FFFF FFFF FFFF
1.7976931348623157 e+308
0x0010 0000 0000 0000
2.2250738585072014 e-308
Internal Buses
.D1 .D2
.M1 .M2
.L1 .L2
.S1 .S2
Regs (B0-B15/31)
Exponent
(e)
0
0
1023
1024
2047
2047
2047
2046
1
Regs (A0-A15/31)
Special
value
+0
-0
1
2
+Inf
-Inf
NaN
LFPN
SFPN
Internal
Memory
CPU
5 - 13
Four functional units for each datapath
5 - 14
Mapping of instructions to functional units

.S Unit
.S
.L
.D
.M
5 - 15
ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKH
NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO
ABSSP
ABSDP
CMPGTSP
CMPEQSP
CMPLTSP
CMPGTDP
CMPEQDP
CMPLTDP
RCPSP
RCPDP
RSQRSP
RSQRDP
SPDP
.D Unit
ADD
NEG
ADDAB (B/H/W) STB
(B/H/W)
LDB
(B/H/W) SUB
LDDW
SUBAB (B/H/W)
MV
ZERO
.L Unit
ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM
NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO
MPY
MPYH
MPYLH
MPYHL
SMPY
SMPYH
ADDSP
ADDDP
SUBSP
SUBDP
INTSP
INTDP
SPINT
DPINT
SPRTUNC
DPTRUNC
DPSP
.M Unit
MPYSP
MPYDP
MPYI
MPYID
No Unit Used
NOP
IDLE
5 - 16
Detailed internal datapaths
Data
path A
Data
path B
5 - 17

Lecture 5 Fixed Point Vs Floating Point Q-Format Number Representation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 5 Fixed Point Vs Floating Point Q-Format Number Representation

Uploaded by

Copyright:

Available Formats

Q-Format number representation

Lecture 5 Fixed Point vs Floating Point

Understand fixed point representations

Difficult to work with due to possible overflow & scaling problems

Reference: "What Every Computer Scientist Should Know About Floating-Point

Lecture 5 - Fixed point vs Floating point

Storing Q30 number to 16-bit memory requires rounding or

How to store Q30 number to 16-bit memory?

Assume 16-bit data format, Q15 x Q15 Q30

Lecture 5 - Fixed point vs Floating point

Lecture 5 - Fixed point vs Floating point

Lecture 5 - Fixed point vs Floating point

Avoid overflow with SADD

Safe add routine in C to avoid overflow

SADD - saturation add instruction

Lecture 5 - Fixed point vs Floating point

Single Precsion Floating Point number

64-bit double precision floating point:

Odd register (e.g. A5)

x = 1s 2 exp 127 1. frac

x = 1s 2 exp 1023 1. frac

1.175 1038 < x < 1.7 1038

2.2 10308 < x < 1.7 10308

MSB is sign-bit (same as fixed point)

Lecture 5 - Fixed point vs Floating point

Double Precision Floating Point number

Easy (and lazy) way of dealing with scaling problem

Lecture 5 - Fixed point vs Floating point

MSB is sign-bit (same as fixed point)

Lecture 5 - Fixed point vs Floating point

Wrong Q-format representation will give totally wrong results

5.75 to binary: +1.01110000... x

Lecture 5 - Fixed point vs Floating point

0111. 1000 0000 1000

Lecture 5 - Fixed point vs Floating point

Data types used by C6x DSPs

Lecture 5 - Fixed point vs Floating point

Lecture 5 - Fixed point vs Floating point

TMS320C67x Internal System Architecture

Double precision floating point special numbers:

0x0000 0000 0000 0000

0x8000 0000 0000 0000

0x3FF0 0000 0000 0000

0x4000 0000 0000 0000

0x7FF0 0000 0000 0000

0xFFF0 0000 0000 0000

0x7FEF FFFF FFFF FFFF

0x0010 0000 0000 0000

Four functional units for each datapath

Lecture 5 - Fixed point vs Floating point

Mapping of instructions to functional units

Lecture 5 - Fixed point vs Floating point

Lecture 5 - Fixed point vs Floating point

Detailed internal datapaths

Lecture 5 - Fixed point vs Floating point

You might also like