You are on page 1of 24

SystemVerilog: From Device Modeling to Hardware Acceleration

Author
Yogesh Mittal yogesh.mittal@transwitch.com

ABSTRACT

Verification of an advanced multimillion gates networking chip is a complex activity.


Additionally, the product development cycle is such that it is difficult to meet time to market
constraints. The customer also requires some time to integrate the ASIC with the on board
software and resolve any issues that might arise. In order to accelerate the entire process and to
reduce this uncertainty it is beneficial to iron out issues upfront by modeling a working device.
There are hurdles associated with this approach because modeling the entire device is an
extensive effort in itself. To overcome this, this paper addresses a time feasible and simplistic
solution by modeling the complex performance and algorithmic critical portion of the design in
SystemC/C/C++ language as well as creating a reusable testbench that can be used across
different levels of abstraction.
One of the challenges in the verification of such complex networking scenarios is to bridge the
language barrier in integrating the high level models into the rest of the environment. The paper
describes the setting up of a simplistic virtual platform in detail and how DPI (Direct
Programming Interface) is leveraged for integrating these C/C++ based models with VCS proven
SystemVerilog VIPs. The same model could then be delivered to customer with functional APIs
to accelerate the software integration on the board before the actual device comes. To cut down
on the long simulation times required to verify the chip in the actual system scenario, the paper
describes techniques which take an existing advanced VCS-based verification environment and
optimally port the design as well as some testbench components to a prototype board in order to
speed up the system level verification and as well as the hardware/software co-verification.
Table of Contents
1 Introduction .............................................................................................................................. 4
2 Design for Verification: Carrier Ethernet SOC ........................................................................ 4
3 SOC Validation Challenges ..................................................................................................... 4
3.1 Device Feature Proofing...............................................................................................................................4
3.2 Equivalence Check between Model and RTL ..............................................................................................4
3.2.1 Single Testbench requirements ................................................................................................................5
3.3 Inadequate Simulation Performance .............................................................................................................5
3.4 Testbench Reuse...........................................................................................................................................7
3.5 HW-SW Coverification ................................................................................................................................ 7
4 Recommended Verification Flow ............................................................................................ 8
4.1 Virtual Platform: Early TB Development and Reuse ...................................................................................8
4.2 Device Modeling ........................................................................................................................................10
4.2.1 An Integrated Simulation Environment..................................................................................................12
4.3 Transactions Based Simulation Acceleration .............................................................................................15
4.3.1 Maximizing Overall Performance ..........................................................................................................15
4.3.1.1 TBA Framework ................................................................................................................................ 16
4.3.2 HW-SW Co Simulation .........................................................................................................................20
4.4 System/IP Emulation ..................................................................................................................................21
5 Results .................................................................................................................................... 22
6 Conclusions and Recommendations....................................................................................... 23
7 Acknowledgements ................................................................................................................ 23
8 References .............................................................................................................................. 24

SNUG India 2008 2 SystemVerilog : From Modeling to Hardware Acceleration


Table of Figures

Figure 1 : Equivalence check between Model and DUV ................................................................ 5


Figure 2 : STB/HAL Based Platform .............................................................................................. 6
Figure 3 : Transactor based Platform .............................................................................................. 6
Figure 4 : Overall Verification Methodology ................................................................................. 8
Figure 5 : VMM based layered Testbench ...................................................................................... 9
Figure 6 : Early Testbench Development with DUV as TLM ...................................................... 10
Figure 7 : DUV Model as Checker ............................................................................................... 11
Figure 8 : Model Integration with SV based transactor ................................................................ 12
Figure 9 : Simulation Acceleration Timing Profile ...................................................................... 15
Figure 10 : VCS-HW Platform integration through ‟Client‟ & „Server‟ ...................................... 18
Figure 11 : VMM Testbench Integration with the ProtoType Board ............................................ 19
Figure 12 : Clock Generation mechanism..................................................................................... 20
Figure 13 : Platform for HW-SW Coverification ......................................................................... 21
Figure 14: Emulation setup with synthesizable TB mapped to HAPS FPGA board .................... 22

SNUG India 2008 3 SystemVerilog : From Modeling to Hardware Acceleration


1 Introduction
This paper is based on our recent experience with the verification of a Carrier Ethernet switch. In
this paper we have described the verification challenges involved during its development and the
heuristic approach taken by us in the form of a proven verification methodology. Complex
system design requires modeling, verification, debug and analysis at different levels of
abstraction with varying levels of precision. The paper presents different techniques on reuse
across these different levels of abstraction achieved through a transaction based framework. It
elaborates on the setup for creating a hardware software co-verification environment as well as a
SV-VMM based testbench infrastructure to enable enhanced modeling capabilities through the
use of SystemVerilog DPI. Finally, it describes the various means employed to reduce the
verification cycle through simulation acceleration

2 Design for Verification: Carrier Ethernet SOC


The carrier grade switch is a enabling component for profitable metro Ethernet services and high
density service-aware Ethernet aggregation over IP/MPLS based networks. It supports 802.1ad,
802.1ah, PBT, MPLS, OAM, Link Aggregation, split horizon and advance traffic management
like HQOS.

3 SOC Validation Challenges


3.1 Device Feature Proofing
In the case of a new or emerging market, it is very difficult to predict which particular features
will be critical for a products success. In many cases feedback comes when the product actually
arrives in the market. The oversight of leaving out a feature or critical functionality may
sometimes be fixed using software or by adding another glue logic/gasket on the system. In other
cases, it will require a complete design respin that could delay the product launch which results
in missing the critical market window. So it is imperative that feature feedback should be
entailed from the potential customer by providing an early demonstrable model of the device.

3.2 Equivalence Check between Model and RTL


During device modeling, it is necessary to ensure that the RTL and the architecture model stay in
sync and are functionally equivalent. The current set of tools available for doing equivalence
checks between the high level models and the RTL might not help when the level of complexity
increases. Additionally, there is also a limitation in the capacity of such tools. Functional
equivalence can be set and governed by creating a set of common tests which could drive the
functional model as well as the RTL model. Hence, a specific criteria to achieve the same is the
concept of a single testbench

SNUG India 2008 4 SystemVerilog : From Modeling to Hardware Acceleration


Device
Model

Stimulus
Generator Response
Checker

Device
RTL

Figure 1 : Equivalence check between Model and DUV

3.2.1 Single Testbench requirements

A testbench needs to have additional flexibility if it has to drive high level functional models of
the SoC design. The architecture of the testbench should support working with transactions at the
same levels of abstraction as is used to model the SoC design. With such a testbench, a golden
test suite can then be defined to ensure the equivalence of the various models at different levels
of abstraction during the SoC development cycle.

There are various challenges required to get the testbench to work with different levels of
abstraction of the high level scenarios. The scenarios needed in the form of cycle-based
transactions to drive the architecture model has to be mapped onto functional signals to connect
to the RTL world in case of RTL verification

3.3 Inadequate Simulation Performance


In today‟s age simulator performance is typically inadequate for large system level verification.
Reducing long simulation times for long running simulations for complex networking chips have
become imperative. There are various approaches to accelerate the simulation speed.
– In-circuit emulation: It requires a target board on which the entire design is mapped.
However, there are structural differences between the actual netlist and the mapped netlist
and this may lead to masking of some critical design issues. It may also not be available early
enough in the design cycle.

– Synthesizable Testbench (STB) Approach: The design of such testbench can be time
consuming. It might need additional resources and in some case these testbenches can be
hard to debug. In addition these testbench cannot take advantage of advanced testbench
techniques, such as constrained randomization if the complete testbench is made
synthesizable. It also creates a lot of duplicated work between the teams. Moreover,
emulation equipment is very expensive.

SNUG India 2008 5 SystemVerilog : From Modeling to Hardware Acceleration


– The following could be one of the approaches (Hardware Abstraction Layer, HAL) where
only the Synthesizable BFMs are ported to the Hardware platform and the CRV layer still
exists outside the hardware to best leverage constrained random capabilities of the testbench
and yet obtain simulation speedup. However, this is one approach which we are still
experimenting with and this is in the initial stages. The overall framework for the same is
shown in following figure.

Figure 2 : STB/HAL Based Platform

Transaction Based Testbench: It allows advanced verification techniques to be applied while


allowing reusing most key simulation components. The TBA approach provides significant level
of performance improvement but requires careful planning and compliance with TBA
methodology. The upfront development time is also not very significant if this methodology is
established once. The basic concept of TBA is illustrated in the following figure.

Figure 3 : Transactor based Platform

SNUG India 2008 6 SystemVerilog : From Modeling to Hardware Acceleration


3.4 Testbench Reuse
Testbench should be architected such that it can be reused from block to cluster to system level,
from simulation to hardware-based acceleration and from project to project. A well thought
reusable methodology would help in achieving rapid establishment of various verification
platforms. It requires using hardware verification languages that are easily configurable (or
extended”) like SystemVerilog. However, the challenges are not restricted to extensible
flexibilities. They also involve meeting wide ranging requirements of modeling to emulation and
this would require the application of same scenarios but at different abstraction levels.

3.5 HW-SW Coverification


The biggest challenge of bringing an embedded system solution to market is delivering the
solution on time and with the complete required functionality. The key issues with delivering on
time are misinterpreting hardware functionality specifications and the ability to efficiently
integrate the software with the hardware. In conventional flow the platform for the software
validation is not available until the device is available. The hardware-software co-verification
usually involves just low level software, consequently creating testbenches with behavior that
does not closely match the way actual software will run. Both software and hardware test
benches have limited ability to control the whole system. In today‟s devices lot of interaction
happens between hardware and software and any holes in real time interaction may lead to corner
cases between the two. Another issues arises out of the fact that there is very less reuse of
testbench infrastructure across the teams.

SNUG India 2008 7 SystemVerilog : From Modeling to Hardware Acceleration


4 Recommended Verification Flow
Based on all the above requirements, we have come up with the following approach to modeling
our verification environment. Through our approach, we try to tackle the different verification
challenges we have put down earlier.

Verification Sign Off

Architecture Design/Co Functional Code Testplan


Review de Review Coverage Assertions Coverage Reviews

Block Envs

SubEnvs

Device
Modeling
Device
Toplevel Env Emulation/
TBA
Device
APIs

Emulation /TBA
Virtual Platform RTL Platform Platform

Figure 4 : Overall Verification Methodology

4.1 Virtual Platform: Early TB Development and Reuse


A Virtual Platform is a fully functional software model of device implemented in
C/C++/SystemVerilog and integrated with VMM Testbench. The basic objective is to provide an
early test bed for high system visibility and control before RTL is actually available. The
development of such verification environment development requires significant development
effort, especially when taking into account reuse considerations. We developed our verification
testbench based on the following guidelines
• The „Transactor‟ should be implemented in the form of layers so that it can directly work
with acceleration
 The BFMs and monitors encapsulate only the interface protocol specific knowledge and
thus can be reused from project to project

SNUG India 2008 8 SystemVerilog : From Modeling to Hardware Acceleration


VMM based Layered Testbench
In a structured VMM based testbench architecture, the „scenario‟ layer uses constrained-random
stimulus generators to produce the data streams that represent all possible usage scenarios. The
„functional‟ layer takes those data streams, turns them into streams of individual transactions-for
example, read/write transactions for a specific bus-and performs detailed protocol checking.
Finally, the command layer takes the individual transactions and drives the input of the RTL
model. The overall model is shown in Figure 5.

Throughout this process, functional coverage is monitor at different abstraction layers. This
methodology supports a top-down approach to building a verification environment. The approach
allows a team to build a complete verification environment early in the development process,
even before any RTL code has been developed. The environment then becomes the "golden
reference" for verifying additional verification components and the RTL design. If the
architecture model is cycle-accurate, it is easy to add in the RTL when it is ready.

Figure 5 : VMM based layered Testbench

To perform all these tasks effectively, powerful features such as constrained-random stimulus
generation, coverage metrics and assertions already required for the RTL verification, combined
with an object-oriented programming style, provide a highly effective environment to implement
the higher layers of the testbench as well. Users can also verify the architecture models as well
the virtual platform models, ensuring that all models stay mutually consistent during the SoC
development process. The following figure shows how the Command layer and the signal layer
could be replaced in the above layout by a Transaction level Model.

SNUG India 2008 9 SystemVerilog : From Modeling to Hardware Acceleration


Figure 6 : Early Testbench Development with DUV as TLM

The transaction level model [TLM] as shown in Figure 6 can be replaced by the RTL at the
appropriate time. The next sections would describe the details of the essential components of
such a system.

4.2 Device Modeling

We used extensive modeling mainly for three basic reasons

– Device Feature/Algorithmic validation

o The requirement was to create an untimed „C‟ model of packet processor quickly
enough to validate some critical algorithms as well as to provide feedback to the
detailed RTL design later.
o It also helped us in analyzing device behavior under complex network scenarios
– Device critical parameter characterization
o For validating performance critical features like latencies, bandwidth, buffer sizes etc
a detailed architecture model is created as a first step. A cycle accurate timed model is
then created based on the first step. This model is used to ensure that the right
architectural trade-offs are made, such as decisions on the bus infrastructure, buffer
sizing and so forth, before committing to the RTL implementation phase
– Model as RTL Checker
o Characterized Device model also served as checker later in the RTL verification cycle
as shown in following figure.

SNUG India 2008 10 SystemVerilog : From Modeling to Hardware Acceleration


Scoreboard

DPI
Coverage
DUV
[Model]
DPI
Testcase

Monitor
Driver

Testbench DUV
[RTL]

Figure 7 : DUV Model as Checker

// Device Model code in C//


int main_SV(unsigned char * data_0[], unsigned char * port, int
byteBufferLen)
{
TXC_U8BIT byteBuffer[1522];
TXC_Context_ts *in;
TXC_U8BIT Port;
if (data_0 == NULL) {
printf ("data not rec from sv");
}
else {
printf("\nPacket of Length %d Received from SystemVerilog
",byteBufferLen);
....
.....
//Parser task spawning different device model task called
main_parser(in);
}
TXC_ErrorType_t main_parser(TXC_Context_ts *in)
{
int status=0;
Txc_PacketParsingTask(in);
Txc_LP_ClassificationTask(in);
Txc_Ingress_PktEditTask(in);
Txc_L2_lookupTask(in);
Txc_Egress_PktEditTask(in);
return status;
}

SNUG India 2008 11 SystemVerilog : From Modeling to Hardware Acceleration


4.2.1 An Integrated Simulation Environment
The heart of such integration is the ability to call a C method directly from a SystemVerilog task
and, vice versa, to call a SystemVerilog task directly from within a C method. Clearly this
requires synchronization between the concepts of time in C and SystemVerilog. Figure 8 shows
an application where a packet processor model [implemented in C] is integrated with
SystemVerilog.
The core of C/C++/SystemC and SystemVerilog integration thus supports a mixed-level
modeling paradigm with the ability to create simulation models that are partly at the transaction
level and partly at the detailed hardware level. Hence, the integration enables SystemVerilog and
SystemC to communicate at different levels of abstraction.
We leverage SystemVerilog‟s DPI feature to create a comprehensive hardware/software co
verification environment. This methodology can smoothly integrate software with the simulation
environment and can offer full hardware debugging capabilities. As an added benefit, the use of a
hardware emulator or other tools is not required. This infrastructure makes co-verification
possible months or even more than a year before a hardware prototype becomes available. The
same sets of test code can be used in both pre- and post-silicon environments as well without
modification

Device 802.1D Q in Q PBB/PBT


Features

Scenario Gen

Transactor

SV Interface
DPI

SW Interface
Device Model

Figure 8 : Model Integration with SV based transactor

4.2.1.1 Using SystemVerilog DPI Methodology


The DPI is an interface between SystemVerilog and the C/C++ language. Using DPI, engineers
can directly call C/C++ functions from SystemVerilog and export SystemVerilog functions to be
called from C/C++.

SNUG India 2008 12 SystemVerilog : From Modeling to Hardware Acceleration


We used the DPI interface to import the software code to provide SystemVerilog tasks access to
the C/C++ code. Therefore software code can run directly in the system simulation environment.
As a result, we run simulations using the complete package.
The following example shows an application that uses export functions.

//SystemVerilog Code STEPS:


//Global variables to take value from C ======
bit[7:0] Pdu_from_c[]; 1. Declare the
int fsize_from_c; SystemVerilog task or
function identifier
//Export the return function from C -> SV using the „export
export "DPI-C" function sv_return; “DPI”‟ syntax.

function void sv_return(bit[31:0] a[140/4],int fsize); 2. The actual function


int count; implementation in
bit[31:0] loc_var; SystemVerilog.
Pdu_from_c = new[fsize];
fsize_from_c = fsize; 3. Declare the
for (int i = 0, count = 0; i < fsize/4; i++) begin equivalent C/C++
loc_var = a[i]; function identifier
Pdu_from_c[count++] = loc_var[7:0]; using the „extern‟
Pdu_from_c[count++] = loc_var[15:8]; syntax.
Pdu_from_c[count++] = loc_var[23:16];
Pdu_from_c[count++] = loc_var[31:24]; 4. Call the exported
end function from C/C++
endfunction code.

//C Code Packetprocess.c file


#include <svdpi.h>
extern sv_return(unsigned char *, int size);

Txc_Egress_PktEditTask(in) {
..........
//Call to the exported function
sv_return(pkt->u.base.pduPtr,pkt->u.base.fragSize+12);

SNUG India 2008 13 SystemVerilog : From Modeling to Hardware Acceleration


//C Function imported in Driver.sv 1. A declaration:
enet_packet txc_pkt; import “DPI”
enet_packet_scenario_gen txc_pack_gen= new("txc_scenario context task
Gen",1,txc_chan); <function name>

import "DPI" function main_SV (bit[31:0]cc[100],bit[31:0] 2. Calling the


port,int len); imported function
from your
task driver(bit [31:0] port); SystemVerilog
while (1) begin code.
bit [31:0] bb[100];
int p_size; 3. The actual
//Take packet from channel fill pdu_bytes function
txc_chan.peek(txc_pkt ); implementation in
//pack bytes into pdu_bytes C/C++
txc_pkt.byte_pack(pdu_bytes, 0, -1);
for(int j=0; j<pdu_bytes.size(); j++)begin
bb[j][7:0] = pdu_bytes[j];
....................
end
//Call the C task
main_SV(bb,port,p_size);
endtask
//PacketModel.C
int main_SV(unsigned char * data_0[], unsigned char * port, int
byteBufferLen)
{
............
}

SNUG India 2008 14 SystemVerilog : From Modeling to Hardware Acceleration


4.3 Transactions Based Simulation Acceleration
In order to speed up our simulations without compromising on debug features we capitalize on
the benefits of transaction-based acceleration (TBA) regression environment. In order to use it
effectively following two basic rules need to be followed:
 Maximize overall performance, focusing on the hardware accelerator.

 Maximize simulation testbench reuse[covered in section 4.1]

4.3.1 Maximizing Overall Performance


As the simulator is slower than the hardware platform, the approach towards achieving better
performance would depend on the ability to execute the design mapped on the hardware with
little or no intervention from the simulated portion of the testbench. Therefore, messages
between the simulated testbench and the synthesized testbench should be at the highest
applicable level of abstraction. Also, as the testbench is the bottleneck, maximizing the
performance of the testbench (running on the host computer) by minimizing the time spent in the
testbench would offer the best results., The architecture should incorporate the following
principles.

Sync
time

TB Execution Time DUT Execution Time

RTL Simulation Sync


time

TB in SW, DUT in HW Sync


time

TBA Method

Figure 9 : Simulation Acceleration Timing Profile

SNUG India 2008 15 SystemVerilog : From Modeling to Hardware Acceleration


The architecture should incorporate the following principles:
 The most active part of the testbench (BFM/monitors) should run in the hardware at
speed.
 The testbench residing on the SW side is abstracted to higher level data items or user
transaction-level API, and thus runs significantly faster with the BFM and the Monitor
relegated to the HW side
 The BFMs and monitors are the only testbench components requiring clocks. When
running on the accelerator, all clocks can be generated inside the HW side partition
avoiding synchronization with the SW side on every clock edge.
 BFMs and monitors can provide or gather “transaction data” over multiple clock cycles.
During these periods the HW side can run w/o interruption.
 Interaction between the HW side and SW side is fully asynchronous. It happens only
when the HW side requests a new transaction or produces a new transaction.
 Transactions are stored in a buffer on the HW side and transactions are fetched only when
buffer level falls below a threshold.
 Significant performance can be obtained if the stimulus could be pre-generated and DUT
response could be provided for post-processing checking.
 Solver profiling and analysis needs to be done as this would be the key attribute which
will determine performance as for scenarios like this the role of the TB is primarily in
doing constraint random generation
Following Table shows an example whereby substantial gain in simulation speed is achieved if
the payload bytes randomization is removed from transaction and moved to hardware portion.

Random Constraints Time taken to Achievable


for Ethernet Packet generate 10000 simulation speed
packets (No of bits/Time)
DA, SA, Type, Length, 61.8s,Total Bytes = 0.5 Mbps
Payload Bytes 30090008B
DA, SA, Type, Length 3.11s , Total Bytes = 19 Mbps
7746012B

4.3.1.1 TBA Framework


We have used „Client‟ and „Server‟ based methodology to connect VCS running as a SW on the
host machine to the DUT mapped on to the hardware. The socket is opened to create a bridge
across the two with DUT as „server‟ and Testbench as „Client‟. A socket is one end of an
interprocess communication channel and using which the client and server communicate with
each other. The client and server establish their own socket. The socket interface eliminates the
need of any third party tools and software.
The following snippet of code explains the initialization of socket connection for the client and is
developed in C. The following snippet of code explains the initialization of socket connection for
the server.

SNUG India 2008 16 SystemVerilog : From Modeling to Hardware Acceleration


client_tx(char* serv_ip_address, int* txbuf) Steps:
{ =====
if(create_socket == 0){ 1. On launching Server i.e DUT, it
printf("\x1B[2J"); opens a TCP/IP socket and stays in
if ((create_socket = socket(AF_INET,SOCK_STREAM,0)) > listen mode and wait for client to get
0) connected
printf("The Socket was created\n"); 2. Client i.e, VCS when invoked,
address.sin_family = AF_INET; establishes connection with server
address.sin_port = htons(15000); 3. VCS generate stimulus which is
inet_pton(AF_INET,serv_ip_address,&address.sin_addr); gathered by client function and sent to
if (connect(create_socket,(struct sockaddr the server
*)&address,sizeof(address)) == 0) 4. Server then pushes the stimuli to a
printf("The connection was accepted with the server rx_vector_fifo ,reads it in background
%s...\n",inet_ntoa(addr and applies it to DUT
ess.sin_addr)); 5. At the same time the DUT outputs
are sampled and pushed into
// tx_vector_fifo.
client_rx(int* rxbuf) 6. Server reads out tx_vector_fifo and
{ forwards the response to the client i.e
recv(create_socket,rxbuf,size,0); VCS
}

void server_rcv(int* rcvbuf)


{
if(new_socket == 0){
buffer = malloc(bufsize);
printf("\x1B[2J");
if ((create_socket = socket(AF_INET,SOCK_STREAM,0)) > 0)
printf("The socket was created\n");
....
address.sin_port = htons(15000);
if (bind(create_socket,(struct sockaddr *)&address,sizeof(address)) == 0)
printf("Binding Socket\n");
listen(create_socket,3);
addrlen = sizeof(struct sockaddr_in);
new_socket = accept(create_socket,(struct sockaddr *)&address,&addrlen);
if (new_socket > 0){
printf("The Client %s is connected...\n",inet_ntoa(address.sin_addr));
}
recv(new_socket,(char*)rcvbuf,bufsize,0);
}
void server_send(int* txbuf)
{
send(new_socket,(char*) txbuf,bufsize,0);
// printf("%x:%x:%x:%x\n", txbuf[0],txbuf[1], txbuf[2],txbuf[3]);
}

SNUG India 2008 17 SystemVerilog : From Modeling to Hardware Acceleration


The Software and hardware communicate through a „communication channel‟ setup through USB.
The USB channel carries messages: a message is a bit vector of a given length, which can be sent
from the software to the hardware, and conversely. The hardware part of the transactor is in charge
of processing messages and generating appropriate signals to stimulate the DUT.

Communication
Channel

HW
SW Multi
VCS U U B
Multi Client D (F
Client S S
Client Handler
Socket B B M
Handler
Intf

Client Prototype Board


Host Server
Host

Figure 10 : VCS-HW Platform integration through ‟Client‟ & „Server‟

Test case developed in SystemVerilog includes only the configuration of the device VIP,
configuration of the simulation environment and initialization of the sockets. The test case is run
in the native RTL simulation environment, where the simulation control resides with the VCS
simulator. After completing the initialization of the server component, the testbench waits for
socket connections to be established by the client residing in the “Server Host”. The “Server
Host” can be same host machine in which the simulation is running or it can be different machine
to balance the load. Once the connection is established, the „server host‟ can initiate the
transactions towards the device through the physical channel (USB in our case).
The overall HW-SW partitioning is done to maximize performance and reuse. The HW side
partition is driven by clocks while the SW side is transaction-based and less clock dependent.
The SW side could still scarcely use timed constructs such as waiting on time to allow the SW
side and HW side to synchronize and exchange transactions.
The packet generated through „scenario gen‟ is encapsulated in „messages‟ going through
physical media. These „messages‟ are decoded by the hardware transactor residing on the
prototype board and is converted into the relevant protocol by the BFM. The message encoding
and decoding is outside the purview of this document.
The following figure presents a simplified view of SW/HW testbench partitioning

SNUG India 2008 18 SystemVerilog : From Modeling to Hardware Acceleration


Software Hardware
e
VMM Testbench
Clock Clock
Control Gen
Scoreboard Monitor

DUV
Stimulus
Generator USB HW
USB Transactor
SW
Transactor
Rx Tx
FIFO FIFO

Figure 11 : VMM Testbench Integration with the ProtoType Board

Buffering Mechanism:
In order to minimize the HW-SW interactions on cycle by cycle basis, FIFOs are provisioned to
store the stimulus before it is read out by the DUT clock/clocks. In case of reactive transactor the
communication between HW and SW need to be established before any further stimulus is
applied. It can considerably slow down the simulation speed. The HW-SW interaction required in
this case can be decoupled through these FIFOs. A programmable threshold is maintained in the
FIFO and further transactions are fetched from the SW as soon as FIFO depth falls below the
configured threshold. Similarly transactions are sent to the SW side when the FIFO goes above
the configured threshold.

Clock control Mechanism:


Since transactor is running slower as compared to DUV ported in the hardware so transactor
must take into account the fact that the DUT clock is not always enabled. A speed bridge is
required to maintain the coherency between VCS simulator and the actual DUV in the hardware.
The simple scheme to decide the clock generation is based on the number of vectors in the
„message‟ required to maintain a particular threshold of the transaction buffer.
It is possible to split the hardware part of the transactor into several transactors, each one
interfacing with different ports of the DUT. Signals to control the generation of positive and
negative edges of the DUT clock are emitted by every transactor. Clock edges are generated on

SNUG India 2008 19 SystemVerilog : From Modeling to Hardware Acceleration


the DUV only when all transactors are ready. Consequently, the DUV runs on a slower clock
than transactors, which must be carefully handled in transactor designs.
In case of multiple clocks the transactor which runs out of data earlier halts other transactors by
passing the „stop‟ message. The Figure 12 shows clock freeze, unfreeze during interaction
between the HW-SW transactors

Uncontrolled Clock

Controlled DUV Clock


Transaction Fetch/Sent Time Window

FIFO Depth < Threshold FIFO Depth > Threshold

Figure 12 : Clock Generation mechanism

4.3.2 HW-SW Co Simulation


The HW-SW co simulation can be done through two methods in our environment. we leverage
the automated constrained random verification flow to verify the complete package in order to
find corner cases when both hardware and software coexist.
1. SW running on a machine is made another „client‟ and then it can access the „server host‟.
The server host can encapsulate the transactions to the hardware.
2. The VCS simulator already running on a client machine can interface with SW through
DPI. The rest of the mechanism is same as defined above.
When SW wants to access the HW it generates a „request‟ to „server host‟. The server
acknowledges the request and Read/Write transaction is converted into a message and passed to
the hardware transactor. The HW transactor passes it to the host BFM residing on the prototype
board which finally converts it into microprocessor Read/Write. The scheme is shown in Figure
13.
For applications where HW needs immediate SW service routine following protocol is followed
– DUV generates the interrupt towards HW transactor
– The DUV clock of the design may be frozen so that it appears to the design as if the
response would happen in one clock cycle
– Interrupt message is sent to the SW transactor
– SW transactor call SW subroutine
– SW response is sent back to HW transactor

SNUG India 2008 20 SystemVerilog : From Modeling to Hardware Acceleration


– design clock is unfreezed

SW Client Interrupt Service


SW Host
Interrupt
DPI

Server SW HW Host
VCS Client Host TX TX BFM DUV
Host

Figure 13 : Platform for HW-SW Coverification

4.4 System/IP Emulation


Based on the time and resources, full chip emulation can also be targeted to complement RTL
simulation. In our case emulation setup is done on the HAPS board which consists of 4 Virtex-II
pro FPGAs. The basic emulation infrastructure consists of synthesizable testbench mapped to one
of the on board FPGAs. The DUV is also partitioned across two FPGAs. The USB interface is
used for the configuration of device and the test components. By adding certain synthesizable test
components we were also able to interface with standard test equipment like IXIA for Ethernet
related testing.
In case of complex and non standard traffic stimulus requirements the onboard Power PC is used
as traffic generator which can be interfaced with BFM to generate the required protocol like SPI-
3 etc. The Control of the PowerPC shall be through software C/C++ programs and Embedded
Development Kit (EDK) from Xilinx.
The overall setup for this environment is shown in the following figure.

SNUG India 2008 21 SystemVerilog : From Modeling to Hardware Acceleration


SW

Host STB DUV

USB
Machine

ETH

IXIA Power PC DUV

Figure 144: Emulation setup with synthesizable TB mapped to HAPS FPGA board

5 Results
Based on our experiments with transaction based hardware accelerated testbench we observed
speed improvements in the range of 10 X but we took a simplified approach in the first phase to
establish the proof of concept. We had some limitations in our USB transactor which can be
enhanced to provide further speed gain

Typical Simulation Speed Simulation Speed after Simulation Speed based


for SOC Device in HW on TBA testbench
X 3X 10X-100X

In the next phase we are planning to use HAL(Hardware Abstraction Layer) APIs for better
control over HW-SW partition along with writing synthesizable transactor. We also need to
assess the multi clock handling capability of the platform

SNUG India 2008 22 SystemVerilog : From Modeling to Hardware Acceleration


6 Conclusions and Recommendations
The Layer based approach in VMM helped in building well organized testbench structure, which
can be reuse from modeling to emulation, from block level to system level.
The SystemVerilog DPI feature enables us to build a comprehensive hardware/software co-
verification environment for inexpensive and efficient verification of SOCs. We saved money on
the expensive co-verification tools, and have reduced our wasted time and removed the need for
separate software and hardware testbench. This methodology allows us to test and integrate code
earlier in the design cycle, and can more easily reproduce in simulation problems found in later
FPGA prototypes.
Testbenches built this way provide a more realistic view of the complete chip to be built, and
allows the software team to add resources to the bug finding effort through their normal work.
This methodology contributes significantly to producing a more reliable SoC that is successfully
taped out on time, and delivering high quality firmware in concert with the SoC prototype to the
customer.

7 Acknowledgements
I would like to thank Dinesh for his immense contribution in the development of this project. A
special thanks to Amit Sharma from Synopsys and Parag Goel from Transwitch for helping me
out in writing this paper. Last but not least, thanks to our management, in special to Santanu for
the encouragement and practical support.

SNUG India 2008 23 SystemVerilog : From Modeling to Hardware Acceleration


8 References
[1] Janick Bergeron, Eduard Cerny, Alan Hunter and Andrew Nightingale,
“Verification Methodology Manual for SystemVerilog”.
[2] SystemVerilog 3.1a Language Reference Manual. “Accellera‟s Extensions to Verilog”.
[3] VCS User Guide
[4] Chris Spears, “SystemVerilog for Verification”
[5] Anatomy of Reusable Verification IP in VMM ,SNUG2006 India
[6] www.metroethernetforum.org
[7] Five Vital Steps to a Robust Testbench with DesignWare Verification IP and VMM for
SystemVerilog
[8] Usage of VMM to address Gigabit Switch verification Challenges, SNUG2007 India

SNUG India 2008 24 SystemVerilog : From Modeling to Hardware Acceleration

You might also like