Professional Documents
Culture Documents
0 Ov er v iew
Jasmin Ajanovic
Sr . Pr in cip al En g in eer
I n t el Cor p .
Agenda
PCI e Architecture Overview
PCI e 3.0 Electrical Optimizations
PCI e 3.0 PHY Encoding and Challenges
New PCI e Protocol Features
Summary & Call to action
I O Tr e n ds
Point-to-point full-duplex
Differential low -voltage signaling
Embedded clocking
Scaleable w idth & frequency
Supports connectors and cables
I n cr e a se in I O Ba n dw idt h
Re du ct ion in La t e n cy
En e r gy Efficie n t Pe r for m a n ce
Protocol
V ir t u a liza t ion
Opt im ize d I n t e r a ct ion be t w e e n
H ost & I O
Advanced Capabilities
Ex a m ple s: Gr a ph ics, M a t h ,
Management
RAS: CRC Data I ntegrity, Hot Plug,
Advanced error logging/ reporting
QoS and I sochronous support
N e w Ge n e r a t ion s of
PCI Ex pr e ss Te ch n ology
60
GB/ Se c
50
I/O Virtualization
Device Sharing
40
PCIe Gen2 @ 5GT/s
30
PCIe Gen1 @ 2.5GT/s
20
PCI / PCI - X
1999
2001
2003
Link BW
BW x16
2.5GT/ s
2Gb/ s
~ 250MB/ s
~ 8GB/ s
PCI e 2.0
5.0GT/ s
4Gb/ s
~ 500MB/ s
~ 16GB/ s
PCI e 3.0
8.0GT/ s
8Gb/ s
~ 1GB/ s
~ 32GB/ s
PCI e 1.x
All da t e s t im e fr a m e s
a n d pr odu ct s a r e
su bj e ct t o ch a n ge
w it h ou t fu r t h e r
n ot ifica t ion
2005
2007
2009
2011
2013
support
Maximum reuse of HVM ingredients
FR4, reference clocks, etc.
Strive for similar channel reach in high-volume topologies
Mobile:
8 , 1 connector
Desktop: 14 , 1 connector
Server:
20 , 2 connectors
0.06
0.04
8GT/ s
-0.02
Pass
10GT/ s
0.02
Fail
-0.04
-0.06
0
-0.02
Pass
0.02
8GT/ s
0.08
10GT/ s
-0.08
-0.04
TX EQ [1]
Rx CTLE
Rx DFE
1st
[1]
1st
[1]
1st
[2 3]
[1]
[1 2]
-0.1
Fail
14 Client Channel
[1]
[1-4 ]
20 Server Channel
-0.12
-0.14
TX EQ
Rx CTLE
Rx DFE
[1]
[1-6 ]
[1]
1st
[1]
1st
[1]
1st
[2 3]
[1]
[1 2]
[1]
[1]
[1-4 ] [1-6 ]
requirements
Power, channel loss and distortion much worse at 10GT/ s
Similar findings by PCI -SI G members corroborated I ntel analysis
PCI -SI G approved 8GT/ s as PCI e 3.0 bit rate
CTLE= Continuous Time Linear Equalizer
DFE= Decision Feedback Equalizer
7
Receiver equalization
1st order LE (linear eq.) is assumed as minimum Rx equalization
Designs may implement more complex Rx equalization to maximize margins
Back channel allowing Rx to select fine resolution Tx equalization settings
Problem Statement
PCI Express* ( PCI e) 3.0 data rate decision: 8 GT/ s
High Volume Manufacturing channel for client/ servers
Same channels and length for backwards compatibility
Low power and ease of design - avoid using complicated receiver
equalization, etc.
Ln 2
Ln 1 Ln 0
LI DL
La ne le ve l ( 1 3 0 bit s)
STP
STP
Scr a m bling w it h t w o le ve ls of
e nca psula
t ion
11
DLLP
Receive
MSB
7
Symbol 0
6 5
MSB
Symbol 1
MSB
LSB
Symbol 15
6 5
6 5
MSB
MSB
6 5
MSB
LSB
6 5
LSB
Symbol 0
Time = 10 UI
Time = 2 UI
1
Sync
2 3
Symbol 0
2 3
6 5
Time = 122 UI
2 3
Symbol 15
Symbol 1
128 bit
Payload
Block
12
0
LSB
X1 Link
Time = 0
LSB
Symbol 1
Symbol 15
LSB
Lane 0
Time = 2 UI
Sync
Time = 0
Lane 1
2 3
Lane 2
Lane 3
Sync
2 3
2 3
2 3
2 3
2 3
Time = 122 UI
2 3
2 3
Symbol 7
13
Time = 122 UI
2 3
Symbol 3
Symbol 62
Symbol 6
Symbol 61
Symbol 5
Time = 10 UI
2 3
Symbol 60
Time = 10 UI
2 3
Time = 122 UI
Symbol 2
Time = 10 UI
Time = 2 UI
Symbol 4
Time = 2 UI
Symbol 1
Sync
Time = 0
Symbol 0
Sync
Time = 0
Time = 2 UI
Time = 122 UI
Time = 10 UI
2 3
Symbol 63
4 3
0 15 14
8 23
STP
Len[3:0] (1111) P Len [10:4]
20 19
16 31
Frame Seq
CRC No
[3:0]
[11:8]
24
39
32
Seq No [7:0]
n-1
n-8
LCRC (4B,
same format
as 2.0)
[Len[10:0]: length of the TLP in DWs, Frame CRC[4:0]: Check Bits covering Length[0:10], P: Frame Parity, No END]
0 15
8 23
SDP
(11110000) (10101100)
16
47
40 55
48 63
LCRC (same
format as 2.0)
(DLLP Layout)
15
56
LANE 1
LANE 2
0
1
0
1
LANE 3
LANE 7
0
1
0
1
TLP (7 DW)
Symbol 2
Data (1 DW)
Symbol 3
SDP
LCRC (1 DW)
(11110000)
DLLP Payload
Symbol 4
LIDL
(00000000)
Symbol 6
LCRC
LIDL
(00000000)
LIDL
(00000000)
LIDL
(00000000)
LIDL
(00000000)
LIDL
(00000000)
LIDL
(00000000)
LIDL
(00000000)
LIDL
(00000000)
STP + Seq No
0
1
Symbol 0
Symbol 1
LCRC (1 DW)
LIDL
(00000000)
LIDL
(00000000)
DLLP
DLLP Payload
(10101100)
LIDL
(00000000)
Symbol 15
Sync Char
0
1
LANE 6
Symbol 1
Symbol 5
LANE 5
0
1
0
1
0
1
STP + Seq No
Symbol 0
LANE 4
0
1
0
1
0
1
0
1
0
1
0
1
LIDL
(00000000)
LIDL
(00000000)
LIDL
(00000000)
Time
16
L3 L2
L1
L0
d3
d2
d1
h11
d0
h10
h9
h8
h7
h6
h4
h5
h3
h2
h1
Other
Packets
Framing Logic
d0
L0
d1
L1
d2
L2
d3
L3
h8
h9
h10
h11
h4
h5
h6
h7
h0
h1
h2
L[7:10],
S[3:0]
h3
L2
Scrambler
C[3], L[0:6]
Scrambler
Q[11:8],
P, C[0:2]
Lane 1
Q[7:0]
L0
L1
d0
d1
Scrambler
Lane 2
d2
h8
h9
h10
h4
h5
h6
h1
h0
Lane 0
h2
Q[11:8],
P, C[0:2]
C[3],
L[0:6]
L[7:10],
S[3:0]
TLP (6DW)
(Scrambled)
Scrambler
L3
d3
h11
h7
h3
Lane 3
Q[7:0]
Sync = 01b
Time
17
h0
Q[11:0]
Rsvd
18
18
Protocol Extensions
Pe r for m a n ce I m pr ove m e n t s
System
Memory
CPU
Host Memory
Root Complex
PCI Express
Pow e r M a n a ge m e n t
MEM
Local
Memory
Device
19
19
20
LLC/RC
cache
MEM
Root Complex
I O data
Host Memory
Feature:
PCI Express
Accelerator
Local
Memory
Benefits:
New applications
Comm adapters for HPC and DB clusters,
Computational Accelerators,
Ca ch e Siz e
D e vice W r it e s D M A D a t a
W r it e Ba ck ( M e m or y)
Mem ory
Sn oop Syst e m Ca ch e s
D e vice W r it e ( M e m or y)
N ot ify H ost ( Opt ion a l)
Root Complex
Soft w a r e Re a ds D M A D a t a
22
Sn oop Syst e m
Ca ch e s
Root Complex
23
Mem ory
D e vice W r it e s D M A D a t a
( H in t , St e e r in g Ta g)
Soft w a r e Re a ds D M A D a t a
Soft w a r e W r it e s
DMA Data
D e vice Pe r for m s Re a d
Mem ory
Root Complex
Sn oop Syst e m Ca ch e s
W r it e Ba ck t o M e m or y
24
Soft w a r e W r it e s
DMA Data
$
5
$
Mem ory
$
4
Root Complex
25
D e vice Pe r for m s Re a d
( H in t , St e e r in g Ta g)
Sn oop Syst e m
Ca ch e s
Atomic Operations
( AtomicOps)
26
26
Synchronization
Atomic Read-Modify-Write
Atomic transaction support for Host
CPU
At om ic Com plet er
Engine
Mem ory
performance
Root Complex
Atomic RMW
Multiple Producer Multiple Consumer support
Example: Math, Visualization, Content Processing
etc
27
$
At om ic Com plet er
3
At om ic Com plet er
4
Engine
D e vice I ssu e s RM W
Opt ion a l ( H in t , ST)
[ Fe t ch Add, Sw a p or CAS]
Engine
Mem ory
Root Complex
At om ic Com plet er
Engine
$
Re a d I n it ia l Va lu e
Request
Description
FetchAdd
Sw ap
CAS
3
4
PCI Express* Device
28
At om ic Com plet er
Engine
W r it e N e w Va lu e
Re t u r n I n it ia l Va lu e
29
Management
PCI e 2.0 adds mechanisms for dynamic scaling of Link
w idth/ speed
No architected mechanism for dynamic control of device
thermal/ pow er budgets
Problem Statement
Devices are increasingly higher consumers of system pow er &
thermal budget
Emerging 300W Add-I n Cards
30
CPU
Up to 32 substates supported
Substates
Mem ory
Benefits
Platform Cost Reduction
Pwr/ Thermal Management
Platform Optimizations
Battery Life (Mobile)/ Power(Servers)
Root Complex
Transit ions
1.2
1
0.8
0.6
0.4
0.2
0
Total Power
400
0
1
2
300
200
100
0
D0 SubSt at es
Performance
Performance
Total Power
failures
Worst case: PM disabled to allow functionality at cost to
pow er
Even best case not good reluctance to pow er dow n leaves
some PM opportunities on the table
Tough balancing act between performance / functionality and power
Want ed: Mechanism for plat form t o t une PM based on act ual
device service requirem ent s
32
tolerable latency
Capability to report both snooped & non-snooped
LTR
Benefits
Mem ory
Message
Root Complex
values
Terminate at Receiver routing, MFD & Switch
send aggregated message
Dynamic LTR
LTR ( M a x )
Buffer
I dle
Buffer
Act ive
Buffer
LTR ( Act ivit y
Adj u st e d)
to device
enlarged
idle
window
enlarged
idle
window
Want ed: Mechanism for Align Device Act ivit y wit h Plat form
PM event s
34
CPU
Mem ory
Message
Wake#
Root Complex
Opt im a l W indow s
CPU Act ive
Plat form fully
act ive. Opt im al for
bus m ast ering and
int errupt s
OBFF Plat form
m em ory pat h
available for
m em ory read and
writ es
W AKE# W a ve for m s
Tr a n sit ion Eve n t
I dle OBFF
I dle CPU Act ive
OBFF/ CPU Act ive I dle
OBFF CPU Act ive
CPU Act ive OBFF
W AKE#
36
Feature:
New Transaction Attribute bit to
indicate I D-based ordering
relaxation
Permission to reorder transactions
between different I D streams
Benefits:
Host CPU
Memory
Translation Agent
(TA)
Feature:
Root Complex (RC)
Root
Port
(RP)
ATS
Completion
ATS
Request
PCIe
PCIe Endpoint
ATC
Benefits:
CPU
PCIe Switch
P2P
Bridge
MEM
Root Complex
Host Memory
PCI Express
P2P
Bridge
P 2P
Bridge
P2P
Bridge
P 2P
Bridge
Endpoint
Endpoint
Endpoint
Endpoint
Local
Memory
BAR
Device
MEM
Summary
8.0 GT/ s silicon design is challenging but achievable
Double B/ W: Encoding efficiency 1.25 X data rate 1.6 = 2X
Next Generation PCI e Protocol Extensions Deliver
Energy Efficient Performance,
Software Model I mprovements and
Architecture Scalability
Specification Status:
Rev 0.5 spec delivered to PCI SI G in Q109
Rev 0.7 targeting Sept. 09 & Rev 0.9 early Q110
Visit :
w w w .pcisig.com for PCI Express specificat ion
updat es
h t t p:/ / dow n loa d.in t e l.com / t e ch n ology/ pcie x pr e ss/
de vn e t / docs/ PCI e 3 _ Acce le r a t or - Fe a t u r e s_ W P.pdf
41
Back u p
UI/2
EH
EW
43
( a common case)
Pathological match between PRBS and data pattern have very low probability
Retry mechanism changes polynomial starting point to prevent pathological data pattern from
failing repeatedly
44
The first byte of a packet is not one of the allowed sets (e.g., TLP, DLLP, LI DL)
Sync character is not 01 or 10
Same sync character not present in all lanes after deskew
CRC error in the length field of a TLP
Ordered set not one of the allowed encodings or not all lanes sending the same
ordered set after deskew (if applicable)
10 sync header received after 01 sync header without a marker packet in the 01
sync header OR received a marker packet in the 01 sync header and the subsequent
sync header in any lane not 10
Recovery
Stop processing any received TLP/ DLLP after error until we get through Recovery
Block lock acquired with EI EOS
Scrambler reset with each EI EOS
45
46
TPH Mechanism
Mechanism to provide
PH[1:0]
Processing
Hint
Usage Model
00
Bidirectional
data
structure
01
Requestor
D*D*
10
Target
DWHR
HWDR
11
Target with
Priority
DWHR (Prioritized)
HWDR (Prioritized)
47
Memory Read or
AtomicOperation TLPs
48
TPH Summary
Mechanism to make effective use of system fabric and improve
system efficiency
Reduce variability in access to system memory
Reduce memory & system interconnect BW & power consumption
Ecosystem I mpact
Software impact is under investigation - minimally may require software
support to retrieve hints from system hardware
Endpoints take advantage only as needed No cost if not used
Root Complex can make implementation tradeoffs
Minimal impact to Switches
capabilities
RC support for processing hints
Endpoint enabling to issue hints
49
I D-Based Ordering
( I DO)
50
50
Review :
PCI e Ordering Rules
Table is based on
new 2.0 errata!
No ent ries
caused by
Producer/
Consum er
rest rict ions
Yes
ent ries are
required for
deadlock
avoidance
51
51
Motivation
RO w orks w ell for single-stream models w here a data buffer
of w rites:
Each CO Flag
Writ e
serializes &
adds lat ency
t o t raffic from
unrelat ed
st ream s
52
52
I DO permits passing
53
53
TLP Prefix
54
Motivation
+0
+1
+2
+3
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
TLP Prefix Byte 0
Byte 0 >
Header Byte 0
Header
Byte J >
Data Byte 0
Data
{included when applicable}
Data Byte K-1
31
24 23
8 7
information
Example: Multi-Root I OV, Extended TPH
55
Prefix Encoding
+2
+3
+0
+1
76 5 4 3 2 1 0 76 5 4 3 2 1 0 76 5 4 3 2 1 0 76 5 4 3 2 1 0
+0
+1
+2
+3
76 5 4 3 2 1 0 76 5 4 3 2 1 0 76 5 4 3 2 1 0 76 5 4 3 2 1 0
Byte 0
100
Type
Byte 0
100
Type
Link Local Where routing elements may process the TLP for routing or other
purposes.
Only usable when both ends understand and are enabled to handle link local TLP Prefix
ECRC not applicable
End-End TLP Prefix
Requires support between the Requester, Completer and routing elements
End-End TLP Prefix not required to but is permitted to be protected by ECRC
I f underlying Base TLP is protected by ECRC then End-End TLP Prefix is also protected by ECRC
56
STP
LCRC
Sequence #
ECRC
Starts at
TypeL1
End-End Prefix #1
End-End Prefix #2
12
Header
and possibly additional Link Local DWORDs
Payload (optional)
ECRC (optional)
LCRC
END
57
Multicast
58
58
59
59
Multicast Example
Root
PCIe Switch
P2P
Bridge
P2P
Bridge
P2P
Bridge
P2P
Bridge
P2P
Bridge
Endpoint
Endpoint
Endpoint
Endpoint
60
60
2 MC_Index_Position
Multicast Group 1
Memory Space
Multicast Group 2
Memory Space
Multicast Group 3
Memory Space
61
61