Proteo: A New Approach To Network On Chip

Proteo: A New Approach to Network-on-Chip
AATRAY KUMAR SINGH

ELECTRONICS(VLSI & EMBEDDED SYSTEM)
MIT ACADEMY OF ENGINEERING
PUNE 412105
Email: aatraysingh85@gmail.com
AbstractThe purpose of this paper is to present the basic
ideas behind the development of our Network-on-Chip (NoC)
architecture, called Proteo. The system designers are moving
into higher abstraction levels and usage of reusable IP
(Intellectual Property) blocks is increasing. The communication
between the IP blocks is of increasing importance and thus also
be designed in a reliable and fast way. One proposed solution to
this problem is to use Network-On-Chip (NoC) architectures,
which are built up from reusable interconnect IP blocks. In this
paper an interface router IP for Proteo network is introduced
and implemented. The network implements packet switching in
a hierarchical topology. A NoC needs a considerable amount of
resources that can be shared with other system level tasks, like
power saving and fault tolerance mechanisms.
I. INTRODUCTION
As the feature dimensions scale down to deep submicron
regime (below 0.25 m) the integration density is not limited
by the individual feature sizes e.g. of circuit metallization
layers, but by electrical phenomena, capacitive and
inductive crosstalk between the interconnect lines. These
effects will have a great impact in maximum operating
frequency and power consumption. In this environment,
communication within logic blocks will still be
synchronous, but between them it will become asynchronous
in order to solve the problem of clock skew and delay. This
is the Globally Asynchronous Locally Synchronous (GALS)
paradigm. In the System-on-Chip (SoC) designs of the
future there will be hundreds of functional IP blocks and a
large amount of embedded Dynamic Random Access
Memory (DRAM) in a single chip. Communication
requirements in this kind of systems are very demanding,
because each of those IPs can communicate in Gbit/s range.
Due to increased communication requirements the
traditional bus-based solutions are not useful anymore, thus
new kind of communication architectures must be
developed. One proposal to solve the communication
paradigm in SoC designs is to use NoCs, which are built up
from reusable IP blocks. These networks are scalable
because there can be as many Intellectual Property (IP)
blocks as needed connected into the network, without
dramatic problems in wiring delays, capacitance, clocking
etc.The global interconnects need to be treated as similar IP
blocks as processor cores or embedded memories. New
flexible and configurable communication channel
architectures need to be identified. These communication
channels will not form dedicated buses as currently
implemented on-chip and on PCBs, due to noise and
scalability speed constraints. Thus, the overall
communication scheme will resemble more computer
networking than traditional bus based design. The paper is
organized as follows: Section 2 provides the reader with a
practical understanding of the Proteo network.Then section

3 is dedicated to the architecture of the proteo network on
chip. Then in Section 4 the interface router is explained in
more details. Section 5 is dedicated to the synthesis results
of the router and Section 6 for the reconfigurable noc
design.Then section 7 provides the protocols for the proteo
noc. Eventually section in 8, conclusions are drawn and also
the present limitations of this router and future research are
discussed.
II. PROTEO NETWORK ON CHIP

The NoCs are constructed from several basic building
blocks, like routers or switches, bridges, links etc. The
routers are used to route packets from one place to another.
The IP blocks are also connected into them through some
fixed interface. The bridges can connect several subnetworks together and the links are used to connect all
building blocks together. Characteristics of the network used
are strongly depending on how these basic blocks are
implemented. This paper presents a flexible layered
interface router IP implementation of the network router for
the Proteo NoC. Proteo network is an on chip packetswitching network, which is developed in TUT to solve
communication problems of the future SoCs. Proteo can be
seen as a set of interconnection IP, which connects
functional IPs together.
The Proteo network is constructed from two kinds of
blocks: interface router IP and bridge IP. The interface router
IPs are used to connect functional IPs to the network and
bridge IP is used to connect several sub-networks together.
An example network is shown in Fig. 1. In this example
there are three sub-networks connected together with two
bridges. Nodes with two input output link pairs have been
used in sub-network 1, of which topology is a bi-directional
ring. Topology of the other sub-networks can be selected
freely. Possible topologies are for example different kinds of
trees and meshes. There can be also other sub-networks
behind these sub-networks. Proteo does not restrict the
complexity of the network topology. The Proteo NoC
supports several different kinds of communication protocols,
network topologies and packet formats. The Proteo NoC is
described on two different abstraction levels. This paper
concentrates on the low-level model of the interface router
IP. These models are used for logical implementations, as
well as to estimate physical properties like area, latency and
delays of the Proteo network. There are also high-level
models of Proteo building blocks, used for performance
estimations and simulations of the larger networks. More
detailed description of Proteo architecture, its protocol,
packet formats and different kind of building blocks can be
found in.
the ring. An interesting property of stars is that they allow

the presence of several BVCI initiators in the same cluster,
which is not supported directly by the standard.
Fig. 1: Example Network.

III. THE PROTEO ARCHITECTURE
The architecture of our system is discussed at a more
physical level.
A. Overview
The basic hardware elements in our network are hosts, nodes
and links. Every host will be connected to the network using
a dedicated node as a wrapper. Our nodes present a VSIA
compliant interface. This standard specifies three different
downwards compatible Virtual Component Interfaces
(VCI):peripheral (PVCI), basic (BVCI) and advanced
(AVCI).
Links and nodes are available as part of a library. They
include parameters to customize their number of channels
and dimensions, their interface options, supported data sizes
and protocol features, based on requirements of
functionality, throughput and Quality of Service (QoS).
Our target domain is that of heterogeneous systems, with
many different types of IPs co-working in the same chip.
The system is divided in clusters, using a hierarchical
network. This will comprise multiple subnets with different
performance, topologies, packet formats, etc. The subnets
are typically point-to-point structures, so each link can be
effectively tuned to its individual traffic requirements.
Fig. 2: Example topology.
In the star topology well define two types of node: the

satellite, which wraps in packets the information presented
at its interface, and the hub, which keeps track of the
pending transactions and routes the packets from node to
node, while connecting the star to the rest of the network.
The ring nodes are essentially homogeneous, implementing
different features depending only on the needs of the hosts
attached to them.
C. Hardware Elements
The architecture of a typical node is inspired in SCI

standard, which is a standard interface developed for
multiprocessor systems. SCI implements a rich set of
mechanisms covering most of the needs of high performance
systems. Our node architecture extends the basic SCI
architecture to allow a configurable number of dimensions
and channels (Fig.3 and 4).
B. Topologies
Currently, the topology being explored is a hierarchical
network built from a system-wide bidirectional ring and
several subnets with star (or bus) topology (Fig.2). The use
of regular topologies allows easy routing and direct
replication of blocks throughout the system.
The connection of a host to the main ring or to a subnet
depends on the available information at the host interface
level: stars are formed with BVCI elements, while blocks
implementing the AVCI interface can be attached directly to
Fig. 3: Basic node architecture.
We have chosen a highly modular structure that makes easy

its configuration and tuning. Links provide a high level
interface, so they are effectively treated as modular elements
and independently tuned. It must be easily modifiable and
extensible, so we can use it to compare the behavior of
different design choices.
It must be relatively lightweight, so that large networks can
be simulated. As we develop a synthesizable version of the
different blocks, we should be able to backannotate the
information we gather from the physical implementation in
the high level model. Models at other levels of abstraction
can be easily cosimulated, for example synthesizable blocks.
We could use the model for verification of the final design.
Fig. 5: Structure of the router IP.
Fig. 4: Extended node architecture with three I/O links and

two channels in the first link.
IV. STRUCTURE OF INTERFACE ROUTER IP
The basic structure of the interface router IP consists of one
incoming link and one outgoing link. This approach allows
Fig. 5: Structure of the router IP. only one-directional
communication in a simple ring topology. If the designer
wishes to use bi-directional communication or more
complex topologies, the basic structure is duplicated and
the interface router IP is constructed from several layers.
This layered approach allows us to build up networks with
different kind of topologies. The interface router IP with two
layers and two input-output link pairs is presented in Fig. 2.
The interface standard used defines that there can be two
different kinds of actors in communication, called initiator
and target. The initiators can generate requests to the target
and the target can only respond to these requests. Because of
this definition we must also design two different kinds of
interface router IPs. The basic structures of both routers are
similar to each other. Because Proteo is a re-usable and
flexible communication network there has to be a welldefined interface to connect different kinds of functional IP
blocks into it. The interface used between interconnection
IPs and functional IPs is Virtual Component Interface (VCI),
which is defined by VSI Alliance.
The VSI Alliance defines three different versions of the VCI,

Peripheral Virtual Component Interface (PVCI), Basic
Virtual Component Interface (BVCI) and Advanced Virtual
Component Interface (AVCI). Currently Proteo network
supports BVCI and AVCI standards. The implementation in
this paper uses BVCI. The interface is used to generate
Proteo packets from VCI standard signals when the
functional IP sends data into the network and on the other
end it will extract those signals from packets.
There are several FIFOs in each interface router IP. Those
FIFOs are called Output, Input and ByPass FIFO. All the
FIFOs are generic register banks and they are used to store
packets. The Output FIFO is used to store packets from the
functional IP block to the network. The packets that have
been sent are stored in the Output FIFO until the interface
deletes them from there. Before deleting, the packet can be
sent again if the interface is requested to do so. The Input
FIFO is used to store packets from the input link to the
interface and the By Pass FIFO is used to store packets
which are bypassing the interface router. The Input and By
Pass FIFOs are simple FIFO buffers without any kind of resend capabilities. The Input FIFO is a little different from
the By Pass FIFO, because the Overflow Checker can delete
the start of the packet from it if the entire packet does not fit
into it. The Multiplexer and De-Multiplexer blocks are used
to handle traffic between the interface block and different
layers. These blocks are left out in case when there is only
one layer in the interface router IP. The Multiplexer block is
used to direct packets from the Input FIFOs to the Interface.
It checks the status signals from Input FIFOs and if there is a
packet in some FIFO it will tell that to the Interface. After
the Interface has read the entire packet from the FIFO the
Multiplexer starts checking FIFOs again. The DeMultiplexer block reads packets from Output FIFO, detects
their destination address and according to a routing table it
routes packets to the correct Distributor block.
The Greeting block receives packets from the previous
node. First it detects the packets destination address. Then
it compares the destination address to its IP block address
and if they are equal it writes the packet to the Input FIFO
through the Overflow Checker. If the addresses are not equal
it writes the packet to the By Pass FIFO. The Overflow
Checker block receives packets from the Greeting block.

This block is used to check the contents of the packet. The
Overflow Checker also checks that the entire packet fits into
the Input FIFO. In case that the Input FIFO becomes full in
the middle of the packet the Overflow Checker controls the
Input FIFO so that the start of the packet is deleted. When
the Overflow Checker deletes a packet from the Input FIFO
it will also generate a Re Send packet to the sender IP block.
The Re Send packet will take care of the re-transmission of
the original packet.
The Distributor block is used to transmit packets to the
output link from the FIFOs. The Distributor waits as long as
there is a packet in some FIFO and then transmits one
packet from that FIFO. The priority of the FIFOs can be
changed very easily, but in default the highest priority is for
the ByPass FIFO, next priority is for the Overflow Checker
and the lowest is in the Request FIFO. This default priority
secures that the network traffic through the interface router
is not delayed. Also more complex arbitration schemes can
be implemented.
V. PROTOCOLS
If the communication needs are characterized correctly, we
can enable/disable protocol features at each node. Just a
basic packet format has to be defined and kept throughout
the network (Fig.5).
networks, because their functionality is restricted to the

lower levels of the protocol stack.
VI. PERFORMANCE
There are several methods how the performance of the

network can be estimated. There are several things that
affect to total performance figures, like maximum clock
frequency, latency through the synchronous parts, etc.
Maximum allowed clock speed is estimated from the
synthesis results. In this technology the maximum
achievable clock frequency is 1GHz. Latency of the
interface router IP can vary a lot depending on the current
status of the network. The latency through the router can be
quite small if there is no packet in the Output FIFO. On the
other hand if the Distributor is just sending a packet from
the Output FIFO the bypassing packet must wait in the By
Pass FIFO until the packet is sent.
Minimum latency of the bypassing packet is four clock
cycles, when the maximum latency can be calculated as in
(1).
Imax=4+ PL/WW ............................(1)
In (1) PL defines the maximum packet length in bits and

WW defines the word width in bits. The latency figures
from the input link to the functional IP block and from IP
block to the output link are very similar to each other.
Typical latency in these cases is similar to the maximum
latency of the bypassing packet. Total performance of the
network can be also estimated with different kinds of test
cases. In the test case the network is used to handle traffic
caused by some test program, this kind of test cases can be
found from.
VII. CONCLUSIONS AND FUTURE RESEARCH
Fig. 6: Generic packet format.

In the stars the transactions can be split or not, depending on
the interfaces involved (PVCI or BVCI). The requester
places its request at the node interface. The node converts it
to packet format. The star-hub takes this packet and delivers
it to the target node, while logging in an internal table the
start of the transaction. Given that these basic interfaces dont
support out-of-order responses, the next packet presented by
the target node will be sent to the first pending requester in
the list for that node. In this way well allow the use the
PVCI and the BVCI in multi-requester environments. When
the target of the transaction is not in the star, the star-hub
adds the extrain formation needed to the packet and
forwards it to the ring.
In the ring, transactions are split and out-of-order responses
are allowed. The AVCI-interface presented by the blocks
attached to the ring provides more information about the
transaction, like node, thread and packet identifiers. The
nodes can be made quite simple and still form complex
The future of highly integrated systems is pointing at a

network-on-chip
solution
to
the
problems
of
interconnection, productivity and heterogeneity. We are
trying to extend our NoC proposal to the fields of testing,
fault tolerance and low-power techniques. The synthesizable
interface router IP block for Proteo NoC implementation
was presented. The presented interface router IP is
constructed from several layers. The interface router IP can
be used to implement packet switching NoC architectures
with different kinds of topologies.
Implementation of the interface router does not restrict the
complexity of the network topology. The interface router
IP uses VCI interface standard to connect the functional IP
blocks into the Proteo network. The interface router IP was
designed using VHDL and it was synthesized with Synopsys
and 0.18mm standard cell technology. The achieved area
and performance figures were also presented. The area
figures show that onchip communication can be handled
with Proteo network with tolerable area penalty. Future
plans include: demonstrate the feasibility of complex
hierarchical networks using our approach, obtain estimates
of network and protocol performance by means of
simulations and finish the implementation of the basic set of

building blocks and gather low level statistics.
REFERENCES
[1] David Siguenza-Tortosa, Jari NurmiProteo: A New
Approach to Network-on-Chip, IEEE Oct,2002
[2] Avi KolodnyNetworks on Chips (NoC) Keeping up
with Rents Rule and Moores Law, IEEE March 2007
[3] Kangmin Lee, Se-Joong Lee, and Hoi-Jun Yoo,LowPower Networkon- Chip for High-Performance SoC
Design IEEE 2006.
[4] Benini, L., De Micheli, G.: Networks on Chip: A New
SoC paradigm. IEEE Computer 35(1), January 2002, pp. 7078.
[5] Robbe Vancayseele, Brahim Al Farisi, Wim Heirman,

Karel Bruneel and Dirk StroobandtRecoNoC: a
Reconfigurable Network-on-Chip,. IEEE,2010
[6] P. Guerrier and A. Greiner, A Generic Architecture for

On-Chip Packet-Switched Interconnections, Proc. Design,
Automation and Test in Europe (DATE) 2000, 250-256.
[7] A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. O berg,
M. Millberg, and D. Lindqvist, Network on a chip: An
architecture for billion transistor era, in Proceedings of the
18th IEEE NorChip Conference. IEEE, November 2000.
[8]
Sonics
micronetwork:
Technical
overview.
http://www.sonicsinc.com/Pages/ Networks.html.
[9] SystemC website. http://www.systemc.org/.

Proteo: A New Approach To Network On Chip

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Proteo: A New Approach To Network On Chip

Uploaded by

Copyright:

Available Formats

Proteo: A New Approach to Network-on-Chip

AATRAY KUMAR SINGH

practical understanding of the Proteo network.Then section

II. PROTEO NETWORK ON CHIP

the ring. An interesting property of stars is that they allow

Fig. 1: Example Network.

Fig. 2: Example topology.

In the star topology well define two types of node: the

The architecture of a typical node is inspired in SCI

We have chosen a highly modular structure that makes easy

Fig. 4: Extended node architecture with three I/O links and

The VSI Alliance defines three different versions of the VCI,

Checker block receives packets from the Greeting block.

networks, because their functionality is restricted to the

There are several methods how the performance of the

In (1) PL defines the maximum packet length in bits and

Fig. 6: Generic packet format.

The future of highly integrated systems is pointing at a

simulations and finish the implementation of the basic set of

[5] Robbe Vancayseele, Brahim Al Farisi, Wim Heirman,

[6] P. Guerrier and A. Greiner, A Generic Architecture for

You might also like