You are on page 1of 5

FPGA Implementation of Real-time Ethernet Communication Using RMII Interface

Nima Moghaddami Khalilzad, Farahnaz Yekeh, Lars Asplund

Mostafa Pordel

School of Innovation, Design, and Engineering


Malardalen University
Vasteras. Sweden
nmi0900 I @student.mdh.se

Department of Computer Science


Umea University
Umea, Sweden
pordel@cs.umu.se

components, each FPGA application is then one hardware


block which has no Operating System (OS) and therefore
does not inherit any software problems like interrupts and
events. Hence, the presented solution can be used in real
time applications or any other system that requires a reliable
communication component.
Some commercial Intellectual Property (IP)-cores exist and
can be used for Ethernet communication in FPGAs. Open
source IP cores are available for different communication
speeds. Implementation of UDP/IP stack and performance
measurements and TCP/IP core are presented in [2] and [7].
Some designs and implementation on gigabyte Ethernet are
done in [9] [11]. However there are few documents about
RMII implementations. As RMII uses less hardware wiring
and avoids its costs it can be used instead of Media
Independent Interface (MU) in the physical layer. Due to
complexity, dependability and lack of documentation,
making changes to open source IP cores is not easy at all.
Thus, a new implementation that takes advantages of
previous works and uses CBSE principles is presented.
In this paper, Section II provides brief theoretical
information which is necessary for implementation of a
100Mb Ethernet communication component. Section III
describes the communication solution and its implementation.
Section IV shows what hardware and software tools used in
the implementation and validation of the solution. In Section
V validation of solution is discussed. Finally, conclusion is
presented in Section VI.

Abstract-FPGA-based solutions have become more common


in embedded systems these days.

These systems need

to

communicate with external world. Considering high-speed and


popularity of Ethernet communication, a reliable real-time
Ethernet component inside FPGA is of special value.

To that end, this paper presents a new solution for 100

Mbls

FPGA-based

Ethernet

communications

with

timing

analysis. The solution deals with "Reduced Media-Independent


Interface" in its physical layer. lJDP is the network protocol
which is implemented from physical to transport layer. For
getting used in real-time applications, timing analysis is done
in the

communication system.

Component based software

engineering is used in the design and development processes.


In order to test the components inside FPGA, two different
approaches are utilized. Signal measurement in combination
with introduced windows based application contributes much
in testing and validation phases.

Keywords-FPGA; Ethernet; RMII; 100 Mbls, Real-Time

I.

INTRODUCTION

Field Programmable Gate Array (FPGA) based systems


are playing an increasingly important role in embedded
systems. Ever since FPGA has been vastly used in embedded
systems, communication between FPGA and other parts of
the system was turned to be an important necessity.
Depending on amounts of data that should be transferred,
different types of connections can be used. Ethernet
communication provides the required bandwidth for most of
the applications.
In this paper, in order to provide a real-time
communication, a UDPIIP core on 100Mb Ethernet has been
implemented in FPGA. The solution uses advantages of
Component-Based Software Engineering (CBSE) in design
and development phases. Using some testing techniques,
validity of the solution is evaluated during and after
implementation phase. Significant of this solution is the use
of Reduced Media-Independent Interface (RMII) in the
physical layer of communication protocol. The presented
solution in this paper is based on works that has been done
for vision systems of underwater and football player robots
in MaIardalen University. These robots are utilizing a
FPGA-based stereo vision system [6] which uses Ethernet to
communicate with the subsystems of robots.
Thanks to the characteristics of FPGA applications, it is
easy to calculate execution time for each component and
system. As programs in FPGA are implemented as hardware

II.

ETHERNET COMMUNICATION

This section covers a brief introduction to the network


protocols and RMII interface.
A.

Network Protocols

In order to describe the behavior of a network, the Open


System Interconnection (OSI) model is used that has seven
layers. In implementation, all standards of network protocol
in OSI model are followed. For detailed description of the
implemented layers and protocols, see [ 10].
B.

RMII Characteristics

RMII is one of the standard interfaces between Media


Access Controller (MAC) and Physical Interface (PRY) that
can support 100 Mbitls Ethernet.
In order to run in RMII mode reference clock input pin
on PHY should be connected to a 50MHz clock. In addition

978-1-61284-486-2/111$26.00 2011 IEEE

35

there are only two pins for transmitter and two pins for
receiver instead of four in MIL It means that Transmit Data
(TXD) pins number two and three as well as Receive Data
(RXD) pins number two and three should be disconnected.
Transmit Coding Error (TXER) is not getting used in RMII
mode as well as Collision Detection (COL) pin. One of the
major differences between MIl and RMII is that in RMII
Transmit Clock (TXCLK) and Receive Clock (RXCLK) pins
are not used. All transmission and receiving operations are
synchronized using one reference clock that is 50MHz.
Moreover, Transmit Enable (TXEN) pin is used to show
when there is valid data on TXD. In addition, Receive Data
Valid (RXDV) pin acts like a Carrier Sense (CRS) in RMII
mode, which indicates when there is valid data on RXD pins
[5].
1) RMII Reception and Transmission Timing
In order to receive packets with RMII Interface, there are
some points that should be taken into consideration. The
protocol uses four signals as shown in Figure 1. CRS DV
signal defines the valid data. However, the preamble can be
started after undefined number of clock cycles. This can be
coded using a state machine that after CRS DV signal waits
for 7 bytes of preamble followed by one byte Start Frame
Delimiter (SFD). Besides, the rising and falling edge of
signals matter in implementation. For example, in
transmission, developer should change TXEN on the rising
edge and TXD on the falling edge of the clock.

A.

Top Layer

In FPGA applications, top layer is the wrapper of all the


components and plays the role of a container and organizer
for the other software elements. In the presented design, the
top layer is responsible for controlling all other components.
This layer provides all necessary inputs to internal
components and also handles output signals of them and
connects them to the other subsystems. Figure 2 shows all
internal components in top layer. The figure indicates that
identical components are used in different software elements
(Reusability). ClockMaker component is a prime example of
this characteristic that is used in four different components.
ClockMaker and other components are explained in detail
separately.

Figure 2. Top component and hierarchy of other internal components

B.

Initiating Ethernet PHY is the first step in establishing a


communication. Based on desired communication type, inner
PRY registers should be manipulated. In the presented
solution, PHYManager component is responsible for
initiating Ethernet PRY. It sends Inter-Integrated Circuit
(I2C) commands to PRY and prepares it for a specific speed
and duplex mode. MIl or RMII is also chosen using these
I2C commands. I2C uses two lines (data and clock) for
manipulating registers that are inside PRY.

Figure 1. RM II Reception Timing for packets with no errors [3]

III.

PHY Manager

STRUCTURE OF CORE

Objective of this section is to provide detailed


information about the source code of the presented 100MB
Ethernet communication. CBSE approach is used in this
project because of the variety of the advantages associated
with it. A component should be feasible to use in different
projects and different solutions (Reusability). Developing
components by using different developers that speed up
process would be easy when parts are independent from each
other [4]. For example in the presented architecture for UDP
communication there are two different components, which
are UDP sender and UDP receiver. Therefore in a project
that just sending operation is needed, it is possible to only
use UDP sender component individually.
To implement the communication system, VHDL
programming language is mostly used. Except the Cyclic
Redundancy Check (CRC) generator component that is
written in VERI LOG, other parts of system are coded in
VHDL. The communication component is a subsystem of
the robotics system with a framework that supports FPGA
based vision systems [12].

Figure 3. PHYManager Component consists of I2C and Clock Maker

Figure 3 shows interior components of PHYManager.


ClockMaker produces proper clock for the management
process. I2C Component needs input command, register
address, isRead, clock and load. This component writes input
command and input register address on Management Data
Input-Output (MDIO) line synchronized with the clock.
When isRead input is high, the command is "read command"
and after turn around, I2C should release the line and let

36

high to show other components that sender is not ready to


receive new order for sending packet.

PHY write the response, then I2C reads the response. When
input values are ready, load input should be high for at least
one cycle of clock and then it should go low to start making
commands and writing on MDIO line. Output Management
Data Clock (MDC) and MDIO pins of I2C component are
directly connected to the top layer. MDIO can be input when
command is read, so it should be declared as an inOut signal.
C.

Communication

Communication component takes care of all actions


related to Ethernet communication. Top layer can ask this
component to send UDP Packets, which vary in size. In
order to keep design as simple as possible, packets do not
have dynamic size and they can be in only two types: small
and large packets. Top layer provides the communication
component with the data and asks it to send them. Loading
the entire data to communication component before send
process is not necessary. Top component can send series of
data while communication component is sending them.
Instead the Top layer has to send data not slower than the
speed of the UDPSender component. Moreover, this
component delivers received data to the top layer. The output
of component could be connected to a memory or other
components. The communication component, for handling
Ethernet
communication,
uses
UDPSender
and
UDPReceiver components, which are shown in Figure 4.
TXD and TXEN output of UDPSender are directly
connected to the output of the communication layer as well
as RXD and CRS DV inputs which are directly connected to
the UDPReceiver component.

Figure 5. Preamble state in UDPSender state machine

E.

F.

Timing Analysis

In order for the real-time applications to be reliable


timing analysis should be done carefully. In this section the
execution time for sending a UDP packet by using the
presented solution is calculated.
Sending package operation starts in the first state of the
state machine and ends in the last state of it. Therefore, if the
number of clock cycles between start and end of the state
machine is computed, the execution time of sending
operation can be calculated by using the following simple
equation:

Figure 4 UDPSender and UDPReceiver components inside communication

D.

UDPReceiver

Getting help of two processes, the component is able to


receive UDP packets. First process watches CRS input for
catching packets and the other one is a state machine which
receives all fields in different layers. Whenever CRS DV is
high, it starts monitoring input data and if it is in correct
format (all header fields such as preamble, SFD, MAC
header and UDP header are correct), it receives packet and
sends out data field of packet to the communication layer.
Independent from size, data field is sent to upper layer byte
by byte. The communication layer takes care of this data and
could redirect it to the other subsystems or start processing
and analyzing it. While different fields of the packet are
getting received in the state machine, CRC is getting
generated. Therefore when CRC field is received, it can be
compared with the calculated CRC. So with no delay,
packets are validated. Finally when packet is received, a bit
is send to the upper layer which indicates whether CRC field
is correct or not.

UDPSender

This component uses three synchronized processes to


make a packet which is sent out by using two lines of data
(to RMII Interface). One process is responsible for TXEN
output, the other sends bits out and the last one is a state
machine. Different layers of network protocols are built
using this state machine. Figure 5 shows the preamble state
that is inside the state machine which generates appropriate
MAC preamble. As it is clear in the figure after sending
preamble, state is changed to MAC destination address.
This component starts working after a pulse load input is
high. It sends all header data such as MAC header, IP header
and UDP header and in the need of sending data field sends
out a request and the communication layer provides a byte of
data for each request. In meantime UDPSender makes CRC
and attaches four bytes of CRC to the end of the packet.
While UDPSender sends the packet, the output "IsActive" is

15

c= Ic;

( 1)

;=1

where C;

K;

12.5

1
*

106

(2)

In (1) , C represents the total execution time, Cj in (1) and


in (2) are the execution time and the number of clock
cycles for the state i respectively. Since there are 15 states in
the state machine, the summation is from I to 15 in (1).
Execution time for each state is the number of clock cycles
multiplied by clock frequency of state machine process.
Clock frequency for the state machine process is considered
to be 12.5, consequently one single byte can be sent in each
state transmission.
Ki

37

Based on length of the data field, execution time could


differ. Since data field sending state is state number 13 (i =
13) and considering n as a number of bytes in data field, then:
[(13 =n
And C13 =n*80* I 0.9
Other states need constant number of clock cycles for
their execution. Therefore they have a constant execution
time. For instance, in preamble state which is shown in
Figure 5 i = 3 and k3 =8. Hence execution time is:
C3 =8*80* I 0.9
C3 =640ns
Basically, since the presented design does not impose any
extra delay to packet transmission, packet sending time is
equal to the minimum time needed in physical layer. Indeed,
there is one cycle delay between sending request from
Communication Component and start of sending in
UDPSender Component which is absolutely unavoidable.
IV.

Figure 6. MDIO (Data line ofl2C command) signal captured by Picoscope

Successfully simulated programs might not work on


device because synthesizer tries to optimize the design and it
can delete some connections which affect result. Hence, a
good knowledge of the used synthesizer is of importance in
FPGA program development.
B.

Monitoring output signals of the FPGA is a time


consuming method for debugging the program. A more
convenient way for testing the solution could be connecting
the FPGA to a PC and trying to communicate with the FPGA
using a windows-based application. In order to be able to
send and receive packages with any MAC addresses,
manipulation in network adapter of windows is necessary. In
addition to manipulating windows network adapter, a C#.Net
application is developed which can send and receive custom
raw Ethernet packets. Using a graphical user interface, it is
possible to assign a value to all fields of different network
layers. Basic idea of this application is from the presented
work at [8]. Figure 7 shows sending operation tab and
indicates how packet field can be set and sent. Moreover, this
application is able to receive a damaged packet (error in the
upper layers of MAC) that makes it easy to find out
problems from sender side (FPGA). In the later phases of the
project this application is extended and used in the
framework that supports FPGA-based vision systems [12].

EXPERIMENT CONSTRAINTS

The solution is synthesized and transferred into a FPGA


by Xilinx ISE which is from Xilinx Company. A custom
board is used for testing purpose. Specifications of the board
components are described.
The FPGA which is used in this project is "Xilinx
SPARTAN 3A DSP" (Digital signal processing). This FPGA,
in addition to the features of SPARTAN 3A family, is using
90 nm process technology which is consecutive to more
bandwidth. Moreover, in comparison to 3A family, this
family of XiI inx FPGA's are using an additional block RAM.
This RAM has some additional output registers which help it
to perform faster [ I ].
The PHY, which is used in the board, is DM9161 from
Davicom Company. By using this PRY, 100BASETX
100BASE-FX communication is possible. Using MIl, this
device can connect to MAC layer. In order to reduce the
number of pins, it is also possible to use RMII [5].
V.

VI.

SUMMARY AND CONCLUSIONS

In this work, a new solution for Ethernet communication


in FPGAs is introduced. The solution uses RMII interface in
physical layer. Although RMII interface uses fewer pins for
communicating with Ethernet PHY, it has no other
advantages and when there is no hardware limitation, it may
not worth it to develop a new component for interfacing
RMII inside FPGA because there are already made IP core
for MIl.
For real-time applications, FPGA is a reliable option
since timing analysis can be done easily. The program is
converted to hardware blocks which do not require operating
system and do not accept interrupts. Actually FPGA is as
flexible as software and as reliable as hardware.
Some problems such as lack of knowledge about
synthesizer and hardware connection problems make it so
difficult to develop components in FPGA. Picoscope helps
very much in debugging. Using this device, signals can be
stored in the PC memory and can be analyzed by developers
and also compared with simulation results. This method
comes in handy when program works in simulation but not
in the real device.
In safety critical real-time application, using Real-time
Transport Protocol (RTP) for communication is an excellent

TEST AND DEBUG

In order to test the presented solution, two steps have


been taken. First, output signals of the FPGA are measured
using Picoscope. Second, communication between the FPGA
and computer is tested by using a windows application.
A.

Windows Application

Picoscop

In program development process for FPGAs, two


different methods are often used for debugging. First, a
program is simulated using a simulation tools and output
signals. This step was done using the simulation tool which
is embedded in Xilinx ISE tool. The second step is to make
exactly the same signals inside the FPGA. In order to make
sure that the program is executing in a way which is
supposed, output signals were measured using Picoscope.
Signals can be observed in a monitor because of the
connection between Picoscope, which is an Oscilloscope,
and the computer. Figure 6 shows a captured output signal of
the FPGA which is one line of I2C command for initiating
Ethernet PHY. These signal measurements were done for all
single
components
individually.
After
integrating
components, again output signals were observed and verified.

38

option. Since RTP is used in transferring video and image, it


could be useful in vision systems which are developed in an
FPGA. Presented solution provides developers with fertile
ground in developing the RTP or other higher protocols.

[6]

[7]

REFERENCES
[1]

Xilinx, "Spartan-3A DSP FPGA Family data sheet", March 2009.

[2]

Lofgren, A. Lodesten, L. Sjoholm, S. Hansson, H., "An analysis of


FPGA-based UDP/1P stack parallelism for embedded Ethernet
connectivity," NORCHIP Conference, November 2005.

[3]

RMll Consortium, "RMll Specification," page 1, March 1998.

[4]

lvica Crnkovic, "Component-based Software Engineering - New


Challenges in Software Development," Proceedings of the 25th
International Conference on Information Technology Interfaces., June
2003.

[5]

.I. Lidholm, F. Ekstrand, L. Asplund, "Two camera system for robot


applications; navigation," 13th IEEE International Conference on
Emerging Technologies and Factory Automation, March 2008.
C Kachris, "Design and Implementation of a TCPIIP core for
reconfigurable logic," Technical University of Crete Electronic and
Computer Engineering D epartment, .I uly 2001.

[8]

Miahrugger , "Raw ethernet packet sending," The Code Project,


October 2003.

[9]

N. Alachiotis, S.A. Berger and A. Stamatakis, "Efficient PC-FPGA


Communication over Gigabit Ethernet," 10th IEEE International
Conference on Computer and Information Technology, June 2010.

[10] Cisco Systems Inc, "lnternetworking Technology Handbook," Cisco


Press, 2010.
[11] T. Uchida, "Hardware-Based TCP Processor for Gigabit Ethernet,"
Nuclear Science, IEEE Transactions on, June 2008.
[12] M. Pordel, N. M Khalilzad, F. Yekeh and L. Asplund, " A Component
Based Architecture to Improve Testability, Targeted FPGA-Based
Vision Systems," 2011 International Conference on System Modeling
and Optimization (ICSMO), January 2011.

Davicom, "10/100 Mbps Fast Ethernet Physical Layer TX/FX Single


Chip Transceiver data sheet," pp 1-41, September 2008.

Figure 7. Windows application screen shot

39

You might also like