You are on page 1of 24

Ichip Packet Flow

Dan Rautio

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

Overview
Complete Ichip packet flow
Iwi Ifi wan and fabric in
Ipktwr packet writer
Im packet memory and Notification Memory Buffer
Iif incoming interface index lookup
Ir route lookup
Isr Route Lookup and IIF/WO RLDRAM memory access
Ip Host Packet
Imq Scheduling and queueing
Ipktrd packet reader
Iwo Ifo wan and fabric output
Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

I chip packet flow

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

I chip

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

Iwi Ifi wan and fabric in


w_Pif 8 type 1, or 2 type 2 PICs (BDIF)
w_Sif SLIF PICs remap SLIF stream
numbers to internal stream number
w_Inq per-stream input buffering.
Assert FC to PICs. Dispatch header to
L2/L3 engines and payload to dbuf
w_L23 L2/L3 microcode processing
engines to decode the packet header. 4
engines, 2 double engines. Notification,
route key, and IIF key generated by the
engines to the segmentor
F_Kext similar to the w_L23 but for the
fabric to extract the route key
W_dbuf packet data buffering for wan.
Absorb the latency in processing headers
W_seg Error checks(chksm, plen, etc),
strip L2 header, counters, cellify packets,
send pktwr
Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

Iwi Ifi wan and fabric in


Wi receives packets from both WAN and Fabric
WAN packet header processing Max first 128 bytes

L2/L3 Decap and processing

Route key extraction for route lookup

IIF key extraction for IIF lookup

WAN packet data processing

Per stream data queueing

Packet to cell segmentation

Cell type assignment

Error counters

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

Iwi Ifi wan and fabric in


Fabric packet header processing

Extract 12B fabric notification and write to the Notificaiton


buffer

Route key extraction for route lookup

Fabric packet data processing

Single fabric input (fi) interface rx data from 16 sources

Strip 12B from fabric first cell and re-cellify packet data

Cell type assignment

The result of the Packet Header/Data processing are


notification/cells which are sent to the packet writer
block

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

Ipktwr packet writer

When it receives the notification


and data cells from the Iwi and Ip
blocks (Ip block is used for injecting
the host originated traffic), it would
store them into different buffer
location.
Notification cells would be stored in
the Notification Buffer (NTBUF) in
per Notification Queue (NTQ) basis
The data cells are stored in data cell
buffers (DCBUF) and a write
request would be sent to the Spray
block. The address in the DCBUF for
the corresponding cell would be sent
to the memory interface block
(MIF).
When the data rate from Host
exceeds the permitted rate of
1Gbps (notification is not included),
the dma_vld signal sent to Spray
engine is de-asserted so that the
traffic from the Host is ignored by
Spray engine.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

Ipktwr Packet writer

A per memory bank memory address will be assigned by the Spray


block for each cell write request and the cell write request is
then pushed to the per bank data cell write queues (DCWQ).
There are totally 12 banks Ipktwr can use to write the data cells
(3 DIMMs)
The memory address offset would be calculated and saved into
the notification cell of Icell (Packet size >= 6 cells) if required.
In case of an Icell is required, there is a reserved Icell space
(ICBUF) to store the Icell before being written into the packet
data memory via Im. Also, the Spray block will skip one bank to
the next bank for data cell and reserve current bank# for the
Icell.
Once the Icell is constructed, it will be enqueued into a
separated Icell write queue (ICWQ).
The Notification will stay within the NTQ until all the cells from
the corresponding packet are written into the packet memory via
Im.
When it has been done, the notification along with the packet
length (Plen) and Address Handle will be sent to the next block
for further processing (Iiif).

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

Im Packet memory
There are totally 4 DIMMs (16
banks) of memory with
256MB per DIMM. 3 of them
(12 banks) would be used for
data packet memory and 1 of
them (4 bnaks) would be used
for notification memory
(TNQ to DRAMQ / DRAMQ
to HNQ transfer)

Cells being read would be sent


to either Iwo (Wan output),
Ifo (Fabric output) or Ip
(Host Output) blocks.

Im sends bank request credit


to Ipktwr and Ipktwr asserts
grant signal with valid cell
address (cadr), base address
(ba) and cell data to Im when
the corresponding bank has
positive credit.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

10

Im Packet memory

Im sends a bank credit to Ipktrd when Im is ready to accept


more read requests. Ipktrd keeps track of credit counter for
each bank and generate grant signal to Im when there is bank
read request with positive bank credit.
Im sends Imq_tnq a credit with bank address when there is a
space available in write request queue. Imq_tnq generates grant
signals to Im when there is a pending write request and the
corresponding bank request queue credit is positive.
Im sends Imq_tnq a credit with bank address when there is a
space available in write request queue. Imq_tnq generates grant
signals to Im when there is a pending write request and the
corresponding bank request queue credit is positive.
Im sends Imq_hnq a credit with bank address when there is a
space available in read request queue. Imq_hnq generates grant
signals to Im when there is a pending read request and the
corresponding bank request queue credit is not zero.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

11

Iif Incoming
Interface

This is a new functional block

that being designed for Ichip


only. In the Gimlet and Martini
chipset, a channel lookup method
is being used to determine the
IIF index. The problem with the
old fashion channel lookup
method is that the maximum
level of lookup is 2 and this
causes a flexibility issues.
With this Iiif block, a lookup
format like JTREE (compressed
JTREE) has been used. As a
result, the IIF index lookup can
be done with more than 2 levels.
The Iiif data structure is
stored in a 8MB RLDRAM
partition accessible via the Isr
block. The final IIF value is
either from an on-chip per
stream state register or from
the off-chip IIF data structure
which is from the successful
lookup result.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

12

Ir route lookup
This is the route lookup block we have
on the Ichip. Similar to the other
platform, the route lookup is based on
a Jtree format. However, there are
some enhancements on the jtree
structure which makes the route
lookup a lot more flexible.
RICP: This block receives
notification/key pairs and sends them
on to free key engines.
RKP: This is a pool of 13 key engines.
The key engines perform the lookups
for the incoming packets.
RRCP: This block is responsible for
reordering the packets from the key
engines. It also handles sampling.
RJTBL: This block holds the JTable
memory and handles transactions from
the various key engines / translation
table transactions.
RMLP: This block does multicast list
processing.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

13

Ir route lookup
Once the Jtree has been constructed, it would be stored on the
RLDRAM. There are two partitions on the RLDRAM parts and each
of them has effective 16MB size.
The route lookup result would be the Lout_key combined with a
Stream ID (SID) which indicates whether the packet should be
forwarded to the Wan side or the Fabric side
When the per-packet load sharing is enabled, a final nexthop selection
process will be done to make sure that we can evenly distribute the
loading among the equal cost paths.
When the "m" bit in the lookup reset is 1, the result is multicast, and
the final nexthop is interpreted as a pointer to a multicast list. The
Rmlp (multicast list processor) block processes the multicast list on
the Ichip. The primary function of the block is to retrieve the
Multicast Final Next Hop list at the end of a key processing after
the reordering logic in the Rrcp block.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

14

Isr RLDRAM access


This block acts as a memory controller
for the Ichip RLDRAMs. 4 RLDRAM
parts are connected to each I2.0 chip
and each of them is a 288 Mbit
RLDRAM part (32M entries * 9
bits/entry, with 8 data + 1 parity for
those 9 bits in each entry). Hence,
each RLDRAM part stores 32 MB of
data.
The Irlkp subsystem has two RLDRAM
parts dedicated to it. The JTREE
data structures are replicated in both
parts as more memory bandwidth is
required rather than more capacity.
Hence, the effective size of RLDRAM
memory for the Irlkp subsystem is
actually 32 MBytes, or 8 MWords
(since each word is 32 bits in the I2.0
architecture), even though the
physical capacity is 64 MB.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

15

Isr RLSDRAM access


When a READ access is required (route lookup / firewall instruction
lookup), both parts of the RLDRAM can be accessed. However, when a
WRITE access is required for the accounting / policer counter purpose,
only part 1 (R1) is used.
Similar implementation is done for multicast traffic. Multicast lists are
kept only in part 2 (R2), so any traffic indicated by the MLP_SRM_RD
counter affects only R2 part.
The RLDRAM accesses from the Irlkp block are serviced in the order of
the requests made. The route lookup key engine requests are comprised of
JTREE lookups, firewall filtering read requests, accounting and policing
transactions and multicast list process read requests.
The RLDRAM accesses from the Iiif block and Iwo block are serviced in
round-robin fashion.
The maximum SRAM access rate for each RLDRAM part is 200Mops @
95%. For example, if route lookup / firewall lookup is required, the
maximum it can do is 200Mops x 95% x 2(two RLDRAM part) = 380Mops.
However, if firewall counter write is required, the maimum counter write
access rate is 200Mops @ 95% = 190Mops only as this can only be done on
partition 1 (R1).

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

16

Imq - queueing
Imq is the notification enqueued/dequeue
block of the Ichip. It gets the
notifications from Ir, queues them in
internal buffers (TNQ,HNQ) or to
external DRAM(DRAMQ) through
Im_ntf, and subsequently sends them
to Ipktrd.
The whole memory queue is separated
into three different parts TNQ,
DRAMQ and HNQ.
The TNQ receives the notification from
the Irlkp block, then, it maps the
SID[6:0] and QS[2:0] in the incoming
notification into the Imq internal
queue-number.
The DRAMQ is soft partitioned for
each queue and being configured by a
start and end pointers. Each pointer is
23 bits in cell (64 bytes) unit, which
includes 21 bits of bank cell address
and 2 bits of bank address. Within
each cell, there are 3 notifications and
the notification cells for each queue
are evenly sprayed across 4 DRAM
banks.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

17

Imq - queueing

Wan is given enough to get 100ms buffer, and the rest is used for
fabric. The fabric memory is further partitioned equally between the
32 queues, This partition results in ~100ms delay-bandwidth buffer
even with 256MB DIMMs for average 8 cells packets (IMIX traffic).
HNQ: A global buffer is soft partitioned for each queue which is
configured by a start and end pointer pair. The HNQ is carved
statically between the queues. The Fabric is allocated 512
notifications and Wan is allocated 1536 notifications. The Fabric
space is split equally between the 32 queues resulting in an HNQ size
of 16.
RED is supprted on both the WAN and FAB queues of Imq. For each
queue, Imq maintains several counts: Mu, Mas, Bu, Buavg, Prv. These
counters are used to determine which queues need to be bisited by
RED.
Imq supports 8 priority levels. Four priorities are applicable to all
queues with positive credit (hi, medium-hi, medium-lo, & low priority)
and four different priorities are applicable to all queues with
negative credit (bonus-hi, bonus-medium-hi, bonus-medium-lo, &
bonus-low priority).
Error Statistics; aging, SID error, ECC error, mu overflow,

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

18

Ipktrd packet
reader

The Ipktrd is a Packer Reader block to


retrieve the data back from the Im
block with the notification received
from the Imq and Host. This is basically
a reverse process of what the Ipktwr
block does.
Ipkrd receives the notification from Mq
and Host along with a QID. The QID
range is 256-287 for Fabric and 0-255
for WAN.
- By extracting the packet length (2
bytes) and address handle (8 bytes)
field within the notification, the cell will
be stored in a free pool buffer notification buffer (NTBUF).
- The notification would be sent to
either Ifo or Iwo block depending on
where the data should go. Before the
notification is being sent to the Fabric,
it will send a cell buffer reservation
request to the Ifo first to make sure
that there is a room available in the
Fabric plane.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

19

Ipktrd packet reader

The packet length and address handle will be stroed in Plen_and_Handle buffer
(PHBUF), a free pool buffer, for every stream.
The incoming notification of each stream will form the first-in first-out perqueue based queue called packet read queue (PRQ). The notification will be
processed in order for the same stream. The stream arbitration (STABR) will
select the next qualified stream for content switch.
The arbitration scheme in STARB across all streams from WAN is TDM. The
time slots will be based on the stream bandwidth programmed by sw
For more than 5 cells packet, the address handle may be processed to calculate
the first Icell address for reading. If the stream speed is equal or greater than
GE, the first icell of the following packet in the same PRQ will be prefetched.
When the stream is selected as a result of arbitration, either the address
handle, or the Icell, or the stream state will be fetched by the stream
processing engine (STPRC)
The Icells will be returned to Ipktrd and stored in icell buffer (ICBUF) for
further processing
The indirect cell prefetch is required in the case of GE or higher speed streams.
The ICBUF space of each stream is separated into two sections. One is for
regular icell and the other is for prefetch icells.
There is a sw programmed aging window for read address aging check against
the latest write address from Ipktwr.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

20

Iwo wan output


Iwo receives cells from the Ipktrd block and
performs the following actions before sending the
packet out to the PIC.
- L2/L3 Micro-code engine performs IPv4
fragmentation and redirect check.
- Byte and packet count per descriptor/stream
- Build the L2/L3 header
- Transmit packets to pif/sif blocks
The Iwo_ip block interfaces with the pktrd and
Im controller. It collects notifications and data
cells in a data buffer. Once enough data cells are
collected for a packet, it sends the first two cells
and the notification to the l23 engines to build the
l2 and l3 headers. The remaining cells (if the
packet has more than two cells) are being sent to
iwo_lsif block.
The Iwo_desrd engine fetches the L2/Tag header
data from the RLDRAM and on-chip template. To
do this, the Lout_key field of the notification will
be used to do the first lookup.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

21

Iwo wan output


it generates the L2/L3 header and forwards the packet and L2
encapasulation to the L23 engines.
There are 4 L2/L3 enigines in Ichip to support the OC192 bandwidth data
path. These four engines act as a free pool and a special logic has been
added to avoid packet reordering within a stream.
There are two main processing units in each engine. The L2+Tag processing
unit is responsible for building the L2 and Tag bytes. The L3 processing is
responsible for the L3 processing. There are 320 entries of L2 instruction
memory and 192 entries of L3 instruction memory.
At the end of the build for the first fragment, the engine unload logic will
check if fragmentation is required and then check the DF bit. If the DF bit
is set and the engine indicates that fragmentation is required, the
hardware will discard the current packet and send an MTU error message
to the host.
Also, the IPv4 header checksum is calculated by the micro-code engines.
Also, it performs CRC verification against the CRC value in the last cell of a
packet for data integrity check.
Later on, once the L2/L3 header and the data cells form the Data Buffer
(Iwo_ip_dbuf) are ready, the packet would be reassembled on the wo_spi
output buffer and being sent to the PIC.

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

22

Question or Suggestion
If you have any question regarding I-chip
trouble-shooting, please contact
mx-escalation@juniper.net

If you have any question about this presentation


drautio@juniper.net

Copyright 2006 Juniper Networks, Inc.

Proprietary and Confidential

www.juniper.net

23

You might also like