You are on page 1of 30

Switch Abstraction Interface

Joint work from


Microsoft, Dell, Facebook,
Broadcom, Intel, Mellanox,

Lihua Yuan, Guohan Lu, Dave Maltz,

Kamala Subramaniam, Darren Loher


for: Microsoft Azure Network Team

Agenda
Go through the SAI proposal.
Discussions!
Sorted list of feedbacks/questions from the mailing list.
Please add/prioritize the items of your interest.

Collecting feedbacks and requirements.


And names whod want to contribute

Initial Driving Scenario -- Data Center ToR


L

Key use cases

ToR

This is NOT the only scenario.

L3 forwarding, L2 forwarding, ECMP, port-channel,


monitoring, overlay

Multiple switching chips with different


capabilities

Goal of Switch Abstraction Interface (SAI)


Define a minimum required set of APIs to control the switch chips.
Not necessarily enough to build the entire switch,
but a significant portion of it.

Quick consensus is the goal,


So we can focus on implementation and operational experiences.

Platform Abstraction Layer (LED, Fans, Power etc.) on separate discussion.

Additional features exposed via SAI-extensions.


Unique hardware features/capabilities.
Extra software features.
Graduate towards SAI over time.

Approach of SAI
Focusing on exposing hardware capabilities,
And support management capabilities.

Unified verbs CRUD


Switch ASIC has many types of objects in its ASIC pipeline, every one of them
performs a unique operation
SAI API is all about creating these objects, assigning values to them, and
associating them with other objects

Extensible data -- Entity/Attribute/Value data model

New capability -> new type of entity: vxlan, nvgre


New capability -> new attribute: new hash type attribute
New connection -> new attribute: acl-based ERSPAN
Open Schema allows low cost extensions.

Operating System

SAI in Azure Cloud Switch


Provision
Deployment
Automation

BGP

Other SDN Apps


SNMP

LLDP

ACL

Switch State Service


Sync
Agent1

Sync
Agent2

Sync
Agent3

Switch Abstraction Layer


Switch ASIC SDK
Ethernet Interfaces
Switch ASIC
Driver

Hardware

Hw Load
Balancer

Switch
ASIC

User space

SAI

SAI Development
v0.9.0 available @github
https://github.com/opencomputeproject/OCP-Networking-ProjectCommunity-Contributions
Huge thanks to ocp!

Discussion @ opencompute-networking@lists.opencompute.org
Accepts patches via this mailing list or via github pull requests
Not sure if people think this mailing list can be dev list?

Timeline
v0.9.1 End of the year
Fixes for current v0.9.0

v0.9.2 First quarter of the next year


LAG, STP, Packet tx/rx,

A release every 3-6 month


Before its getting official (v1.x.x), the API is subject to change without
backward compatibility

Objects CRUD API based on attributes


Create object: create an object with all necessary
attributes. object id returned is used to reference
the object.
Certain objects such as FDB entry, L3 route and
VLAN objects do not use object ids, their have their
own keys.

typedef sai_status (*sai_create_object_name_fn) (


_Out_ sai_object_name_id_t *object_name_id,
_In_ int attr_count,
_In_ sai_object_name_attr_t *attrs,
_In_ sai_attribute_value_t *values);

Delete object: delete an object with the object id

typedef sai_status (*sai_delete_object_name_fn) (


_In_ sai_object_name_id_t object_name_id);

Set attributes: set a set of attributes for an object

typedef sai_status (*sai_set_object_name_attribute_fn) (


_In_ sai_object_name_id_t object_name_id,
_In_ int attr_count,
_In_ sai_object_name_attr_t *attrs,
_In_ sai_attribute_value_t *values);

Get attributes: get a set of attributes for an object

typedef sai_status (*sai_get_object_name_attribute_fn) (


_In_ sai_object_name_id_t object_name_id,
_In_ int attr_count,
_In_ sai_object_name_attr_t *attrs,
_Out_ sai_attribute_value_t *values);

Attribute values and additional API


sai_attribute_value_t is
defined as a union of all basic sai
types
Current issue: additional APIs are
also provided when other verbs
are needed such as add/remove
members

typedef union {
sai_uint8_t u8;
sai_uint16_t u16;
sai_uint32_t u32;
sai_uint64_t u64;
sai_switch_id_t switch_id;
sai_port_id_t port_id;
sai_port_phy_id_t port_phy_id;
sai_vlan_id_t vlan_id;
sai_vr_id_t vr_id;
sai_router_interface_id_t rif_id;
sai_cos_t cos;
sai_mac_t mac;
sai_ip_t ip;
sai_ipv6_t ip6;
sai_mpls_label_t mplslabel;
} sai_attribute_value_t;

SAI objects
SAI provides the interface to manage different types of objects in the
ASIC forwarding pipeline.
switch, port, lag
vlan, fdb, stp
virtual router, router interface, neighbor table, next hop, next hop groups, l3
routes
ACL
QoS, buffer
Port mirroring, ERSPAN, sflow

Switch/port initialization
sai_api_initialize(0, NULL);
sai_api_query(SAI_API_SWITCH, &sai_switch_api);
sai_switch_api->initialize_switch(0, NULL, NULL,
&switch_notifications);
sai_api_query(SAI_API_PORT, &sai_port_api);
int port_count = 128;
sai_port_id_t ports[128];
sai_port_api->get_ports(&port_count, ports);

VLAN
sai_api_query(SAI_API_VLAN, &sai_port_vlan);
sai_vlan_id_t vlan_id = 100;
sai_api_vlan->create_vlan(vlan_id);
int port_count = 2;
sai_vlan_port_t vlan_ports =
{ .port_id = 0, .tagging_mode =
SAI_VLAN_PORT_TAGGED },
{ .port_id = 1, .tagging_mode =
SAI_VLAN_PORT_UNTAGGED };
sai_api_vlan->add_ports_to_vlan(vlan_id, port_count,
vlan_ports);

L3 router interface
sai_api_query(SAI_API_ROUTER_INTERFACE, &sai_rif_api);
sai_router_interface_id_t rif_id;
sai_route_interface_attr_t attrs[4];
sai_attribute_value_t values[4];
attrs[0] = SAI_ROUTER_INTERFACE_ATTR_VR_ID;
attrs[1] = SAI_ROUTER_INTERFACE_ATTR_TYPE;
attrs[2] = SAI_ROUTER_INTERFACE_ATTR_VLAN_ID;
attrs[3] = SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS;
values[0].vr_id = vr_id;
values[1].u32 = SAI_ROUTER_INTERFACE_TYPE_VLAN;
values[2].vlan_id = vlan_id;
values[3].mac = src_mac;
sai_rif_api->create_router_interface(&rif_id, 4, attrs, values);

L3 neighbor
sai_api_query(SAI_API_NEIGHBOR, &sai_nb_api);
sai_ip_t nb_ip;
sai_neighbor_attr_t attrs[3];
sai_attribute_value_t values[3];
attrs[0] = SAI_NEIGHBOR_ATTR_VR_ID;
attrs[1] = SAI_NEIGHBOR_ATTR_ROUTER_INTERFACE_ID;
attrs[2] = SAI_NEIGHBOR_DST_MAC_ADDRESS;
values[0].vr_id = vr_id;
values[1].rif_id = rif_id;
values[2].mac = dst_mac;
sai_rif_api->create_neighbor(nb_ip, 4, attrs, values);

L3 next hop
sai_api_query(SAI_API_NEXT_HOP, &sai_nh_api);
sai_next_hop_id_t nh_id;
sai_next_hop_attr_t attrs[3];
sai_attribute_value_t values[3];
attrs[0] = SAI_NEXT_HOP_ATTR_TYPE;
attrs[1] = SAI_NEXT_HOP_ATTR_DST_IP;
attrs[2] = SAI_NEXT_HOP_ATTR_ROUTER_INTERFACE_ID;
values[0].u32 = SAI_NEXT_HOP_IP;
values[1].ip = dst_ip;
Values[2].rif_id = rif_id;
sai_nh_api->create_next_hop(&nh_id, 3, attrs, values);

L3 next hop group


sai_api_query(SAI_API_NEXT_HOP_GROUP, &sai_nh_api);
sai_next_hop_group_id_t nhg_id;
sai_next_hop_group_attr_t attrs[1];
sai_attribute_value_t values[1];
attrs[0] = SAI_NEXT_HOP_GROUP_ATTR_TYPE;
values[0].u32 = SAI_NEXT_HOP_GROUP_ECMP;
sai_nh_api->create_next_hop(&nhg_id, 3, attrs, values);
int nexthop_count = 2;
sai_nexthop_id_t nexthops[2];
nexthops[0] = nh_id;
nexthops[1] = nh_id;
sai_nh_api->add_next_hop_to_gorup(nhg_id, nexthop_count, nexthops);

L3 route
sai_api_query(SAI_API_ROUTE, &sai_route_api);
sai_next_hop_group_attr_t attrs[2];
sai_attribute_value_t values[2];
attrs[0] = SAI_ROUTE_ATTR_NEXT_HOP_TYPE;
attrs[1] = SAI_ROUTE_ATTR_NEXT_HOP_GROUP_ID;
values[0].u32 = SAI_ROUTE_NEXT_HOP_GROUP;
values[1].nhg_id = nhg_id;
sai_ip_prefix_t ip_prefix;
sai_route_api->create_route(ip_prefix, 3, attrs,
values);

ACL
sai_api_query(SAI_API_ACL, &sai_acl_api);
sai_acl_table_id_t acl_table_id;
sai_acl_table_attr_t attrs[1];
sai_attribute_value_t values[1];
sai_acl_field_t fields[2]= {SAI_ACL_FIELD_SRC_IP, SAI_ACL_FIELD_DST_IP };
attrs[0] = SAI_ACL_ATTR_STAGE;
values[0] = SAI_ACL_STAGE_INGRESS;
sai_acl_api->create_acl_table(&acl_table_id, 2, attrs, values, 2, 2,
fields);
sai_acl_id_t acl_id;
sai_acl_attr_t acl_attrs = { SAI_ACL_ATTR_TABLE_ID, SAI_ACL_ATTR_PRIORITY
};
sai_acl_attr_t acl_values = { acl_table_id, 100 };
sai_acl_api->create_acl(&acl_id, 2, attrs, 2, filters, 1, actions);

Discussions

Issue types
API design
BULK API, Async API support

API extensibility
SAI API to support
Packet tx/rx, fault management, trunk, OAM,

Multi-asic support
L2/L3
Port functionality
Misc

API Design
BULK APIs (Cisco, Kamala)
Example: program 100K routes in a second

Synchronous Vs. Asynchronous calls (Cisco, Kamala)


Example: would a notification for a queue exceeding its watermark be
sync/async?

SAI API support


Packet Receive/Send APIs (Tina, Bruce)
Request to define

How Control Plane Stack Rx/Tx data plane packets


Packets type and format, in TLV way.

Fault Management APIs (Tina)

Interrupts and event notifications to the CPS


Define failure level (serious, general, reminder)
Define corresponding process actions (Alarm, reset chip, reset device, etc)

Trunk API (Tina, Furst)


Link Banding

OAM API (Tina)

BFD, Eth-OAM, Y.1731 have been basic functions of switch

Support L2 tunneling (L2 over NVGRE/VXLAN) (Tao Gu, Phanidhar)

Add provisions for L2 tunneling into fdb entry, i.e. to add next_hop_group_id to sai_fdb_entry_t

SPAN/RSPAN & policer support crucial and will be good to have for the first cut.

Host interface tx/rx


Still need to support
standard tx/rx API
However, most of user
applications use
socket to send/recv
packets

TCP/IP
Vlan 4

Vlan 2

LAG
1

LAG
2

p1p2

p8

p6

p7

SAI driver

Router
Vlan 2

Vlan 4

Switch
p1

p2
LAG 1

p3

p4

p5

LAG 2

p8

API Extensibility
Versioning APIs TLV based structures for backward
compatibility/extensibility (Kamala)

APIs: create/delete/set/get; Input: Attribute list


New requirements? New APIs!
New functionality? New attribute id!
Adapter implementation will decide which or all versions to support.

Lacking statements around extensibility framework, specifically


vendor extensions
Lacking details on how to create custom attributes
Vendor extensions are a must have

Multi-chip support
Different ASICs in the same box (Furst)
How will it handle 2 ASICs in the same box?
What happens if they are the same version of silicon. How does it identify between them?

Handling multiple chips with same roles and different roles will also need to be
accommodated. This probably can be a future item. (Phanidhar)

L2/L3
Routing protocol support mandatory (Furst)
Define mandatory in this context
What happens if a host adapter/underlying ASIC does not support routing?

Spanning Tree Protocols should be agnostic in SAI (Tao)


Difference of the STP mode is in the control plane protocol

Merge nbr table and nhop table in Adapter (Tao)


More oriented towards software routing, ASICS have a consolidated table. Merge them in
Adapter and have Adapter Host maintain separate tables and their mapping.

Router function is to manage virtual routers, which is more of a higher level task
(Tao)
Move such high level task from SAI (Adapter) into Adapter Host

L2/L3
Complete L2 Multicast Switch functionality (Tina)
Mac Limit, multicast action combo(Drop, forwarding, Send to CPU)
SAI only supports L2 unicast.
The L2 multicast miss is supported in the SAI via sai_switch_fdb_miss_action_t to
define the behavior, which contains drop, forward, trap (send to cpu) and log.

The fdb_entry assumes the mac_address is the dest MAC (Tao)


Some use cases need action on src MAC (such as drop action)
Add flags to fdb_entry in order to cover these use cases.

Need SAI APIs for enabling/disabling L2/L3 control protocols on a per port.
Not in all chips the capture of control protocol PDUs is done using generic
ACL TCAMs (Phanidhar)

Port functionality
Request- an interface to query SFP information. Some vendors would
like to control the ports based on SFP data. (ramachandran)
Enhancement for additional port parameters is being worked on.

Do we need SAI APIs for handling pluggable port modules, meaning


we need SAI APIs to dynamically add ports (Phanidhar)

Miscellaneous
Intended audience, add page numbers (ramachandran)
Add figures for interaction between adapter module and adapter host.
(Ramachandran)
Warm Restart (Phanidhar)
Warm restart not well defined at this point
Maybe best to remove from v1.0 scope until better defined

Ordering (Furst)
During adapter start/restart what is the guaranteed order of objects creation.?
Currently the SAI states VLANs first, the whole ordering should be enumerated with dependencies
outlined.

Process for other vendors to be added as a reference platform (Omar Sultan)


SAI/HAL should not need to reference HW, but instead have a consistent and predictable behavior
against the SAI/HAL, regardless of the HW

You might also like