Professional Documents
Culture Documents
data structure:
Rte_mbuf - dpdk encapsulation structure of the message
Rte_ring - dpdk lock-free buffer for high-performance producer consumer scenarios,
such as the front and rear end of the virtio send and receive messages.
Commonly used functions:
Rte_eal_init - dpdk initialization function, which reads various command line
parameters, parses the configuration, initializes the large page and its management
structure, and creates dpdk threads in it. The thread officially runs our written
function through another function
Rte_memcpy - The dpdk copy function takes full advantage of the bandwidth length
of a single instruction.
Rte_eal_remote_launch - officially executed thread function
Rte_eth_rx_burst - physical port receiving function
Rte_eth_tx_burst - physical port packet function
The process of sending and receiving packets can be roughly divided into two parts:
1. Configuration and initialization of the transceiver package, mainly to configure the
sending and receiving queues.
2. The acquisition and transmission of data packets, mainly from the queue to get the
data packets or put the data packets into the queue.
Main function
/* * The main function, which does initialization and calls the per-lcore * functions.
*/ int main( int argc, char *argv[]) { struct rte_mempool *mbuf_pool; //指向内存池
结构的指针变量 unsigned nb_ports; //网口个数 uint8_t portid; //网口号,临时的
标记变量 /* Initialize the Environment Abstraction Layer (EAL). */ int ret =
rte_eal_init(argc, argv); //初始化 if (ret < 0 ) rte_exit(EXIT_FAILURE, "Error with
EAL initialization\n" ); argc -= ret; argv += ret; /* Check that there is an even number
of ports to send/receive on. */ nb_ports = rte_eth_dev_count(); //获取当前有效网口
的个数 if (nb_ports < 2 || (nb_ports & 1 )) //如果有效网口数小于 2 或有效网口数
为奇数 0,则出错 rte_exit(EXIT_FAILURE, "Error: number of ports must be
even\n" ); /* Creates a new mempool in memory to hold the mbufs. */ /*创建一个新
的内存池*/ //"MBUF_POOL"内存池名, NUM_MBUFS * nb_ports 网口数, //此函
数为 rte_mempoll_create()的封装 mbuf_pool =
rte_pktmbuf_pool_create( "MBUF_POOL" , NUM_MBUFS * nb_ports,
MBUF_CACHE_SIZE, 0 , RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); if
(mbuf_pool == NULL ) rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n" ); //
初始化所有的网口 /* Initialize all ports. */ for (portid = 0 ; portid < nb_ports;
portid++) //遍历所有网口 if (port_init(portid, mbuf_pool) != 0 ) //初始化指定网口,
需要网口号和内存池 rte_exit(EXIT_FAILURE, "Cannot init port %" PRIu8 "\n" ,
portid); //如果逻辑核心总数>1 ,打印警告信息,此程序用不上多个逻辑核心 //
逻辑核心可以通过传递参数-c 逻辑核掩码来设置 if (rte_lcore_count() > 1 )
printf( "\nWARNING: Too many lcores enabled. Only 1 used.\n" ); /* Call
lcore_main on the master core only. */ //执行主函数 lcore_main(); return 0 ; }
<2>. Allocate the space of the descriptor queue, and allocate according to the
maximum number of descriptors.
rz = rte_eth_dma_zone_reserve(dev, "rx_ring" , queue_idx, RX_RING_SZ,
IXGBE_ALIGN, socket_id); //接着获取描述符队列的头和尾寄存器的地址,在收
发包后,软件要对这个寄存器进行处理。 rxq -> rdt_reg_addr =
IXGBE_PCI_REG_ADDR(hw, IXGBE_RDT(rxq -> reg_idx)); rxq -> rdh_reg_addr
= IXGBE_PCI_REG_ADDR(hw, IXGBE_RDH(rxq -> reg_idx)); //设置队列的接收
描述符 ring 的物理地址和虚拟地址。 rxq -> rx_ring_phys_addr =
rte_mem_phy2mch(rz -> memseg_id, rz -> phys_addr); rxq -> rx_ring = (union
ixgbe_adv_rx_desc * ) rz -> addr;
<3> Assign sw_ring, the object stored in this ring is struct ixgbe_rx_entry, in fact, it
is the pointer of the packet mbuf.
rxq->sw_ring = rte_zmalloc_socket( "rxq->sw_ring" , sizeof ( struct
ixgbe_rx_entry) * len, RTE_CACHE_LINE_SIZE, socket_id);
After the above three steps are completed, the important part of the newly allocated
queue structure has been filled, and the other members need to be reset.
ixgbe_reset_rx_queue () // 先把分配的描述符队列清空,其实清空在分配的时候
就已经做了,没必要重复做 for (i = 0 ; i < len; i++) { rxq->rx_ring[i] =
zeroed_desc; } //然后初始化队列中一下其他成员 rxq ->rx_nb_avail = 0 ; rxq
->rx_next_avail = 0 ; rxq ->rx_free_trigger = (uint16_t)(rxq->rx_free_thresh - 1 );
rxq ->rx_tail = 0 ; rxq ->nb_rx_hold = 0 ; rxq ->pkt_first_seg = NULL ; rxq
->pkt_last_seg = NULL ;
In this way, the receive queue is initialized. The initialization of the send queue is the
same as the receive queue in the previous check. Only a little difference is in the
setup section. After the above queue initialization, the queue ring and sw_ring are
allocated, but found that there is still, DMA still does not know where to copy the
packet, we said that DPDK is zero copy, then we allocate the mempool How do
objects relate to queues and drivers? Next is the most exciting moment - to establish
the relationship between mempool, queue, DMA, ring.
The device starts from rte_eth_dev_start() : diag = (*dev->dev_ops->dev_start)(dev);
In turn, find the real boot function for device startup: ixgbe_dev_start() : Check the
device's link settings first, temporarily not supporting half-duplex and fixed-rate
modes. It seems that for the time being only adaptive mode. Then disable the
interrupt, and at the same time, stop the adapter ixgbe_stop_adapter(hw); in which
ixgbe_stop_adapter_generic() is called, the main job is to stop the sending and
receiving units. This is done by writing directly to the register. Then restart the
hardware, ixgbe_pf_reset_hw()->ixgbe_reset_hw()->ixgbe_reset_hw_82599(), and
finally set the registers, so I won't go into details here. After that, the hardware is
started.
Then initialize the receiving unit: ixgbe_dev_rx_init() : In this function, the main
thing is to set various registers, such as configuring CRC check. If jumbo frame is
supported, configure the corresponding register. Also, if the loopback mode is
configured, also configure the registers.
The next most important thing is to set up a DMA register for each queue, identifying
the address, length, header, and tail of the descriptor ring for each queue.
bus_addr = rxq->rx_ring_phys_addr; IXGBE_WRITE_REG (hw, IXGBE_RDBAL
(rxq->reg_idx), (uint32_t)(bus_addr & 0x00000000ffffffff ULL));
IXGBE_WRITE_REG (hw, IXGBE_RDBAH (rxq->reg_idx), (uint32_t)(bus_addr
>> 32 )); IXGBE_WRITE_REG (hw, IXGBE_RDLEN (rxq->reg_idx), rxq-
>nb_rx_desc * sizeof(union ixgbe_adv_rx_desc)); IXGBE_WRITE_REG (hw,
IXGBE_RDH (rxq->reg_idx), 0 ); IXGBE_WRITE_REG (hw, IXGBE_RDT (rxq-
>reg_idx), 0 );
Here you can see that the physical address of the descriptor ring is written to the
register, and the length of the descriptor ring is also written. The length of the packet
data is then calculated and written to the register. It is also configured for the multi-
queue settings of the NIC. In this way, the initialization of the receiving unit is
completed.
Next, initialize the sending unit: ixgbe_dev_tx_init() . The initialization of the
sending unit is the same as the initializing operation of the receiving unit. It is the
value of the padding register. The key point is to set the base address and length of
the descriptor queue.
bus_addr = txq -> tx_ring_phys_addr; IXGBE_WRITE_REG(hw,
IXGBE_TDBAL(txq -> reg_idx), (uint32_t)(bus_addr & 0x00000000ffffffff ULL));
IXGBE_WRITE_REG(hw, IXGBE_TDBAH(txq -> reg_idx), (uint32_t)(bus_addr
>> 32 )); IXGBE_WRITE_REG(hw, IXGBE_TDLEN(txq -> reg_idx), txq ->
nb_tx_desc * sizeof(union ixgbe_adv_tx_desc)); /* Setup the HW Tx Head and TX
Tail descriptor pointers */ IXGBE_WRITE_REG(hw, IXGBE_TDH(txq -> reg_idx),
0 ); IXGBE_WRITE_REG(hw, IXGBE_TDT(txq -> reg_idx), 0 );
After the transceiver unit is initialized, you can start the transceiver unit of the
device: ixgbe_dev_rxtx_start()
First set the threshold related register of each send queue, which is the threshold
parameter when sending. This thing is described in the sending part. Then start each
receive queue in turn: ixgbe_dev_rx_queue_start()
//先检查,如果要启动的队列是合法的,那么就为这个接收队列分配存放 mbuf
的实际空间 if (ixgbe_alloc_rx_queue_mbufs(rxq) != 0 ) { PMD_INIT_LOG(ERR,
"Could not alloc mbuf for queue:%d" , rx_queue_id); return - 1 ; }
Here, you will find the ultimate answer – mempool, ring, queue ring, queue sw_ring
relationship!
static int __attribute__((cold)) ixgbe_alloc_rx_queue_mbufs( struct ixgbe_rx_queue
*rxq) { struct ixgbe_rx_entry *rxe = rxq->sw_ring; uint64_t dma_addr; unsigned int
i; /* Initialize software ring entries */ for (i = 0 ; i < rxq->nb_rx_desc; i++) { volatile
union ixgbe_adv_rx_desc *rxd; struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq-
>mb_pool); if (mbuf == NULL ) { PMD_INIT_LOG(ERR, "RX mbuf alloc failed
queue_id=%u" , ( unsigned ) rxq->queue_id); return -ENOMEM; }
rte_mbuf_refcnt_set(mbuf, 1 ); mbuf->next = NULL ; mbuf->data_off =
RTE_PKTMBUF_HEADROOM; mbuf->nb_segs = 1 ; mbuf->port = rxq->port_id;
dma_addr = rte_cpu_to_le_64(rte_mbuf_data_dma_addr_default(mbuf)); rxd =
&rxq->rx_ring[i]; rxd->read .hdr_addr = 0 ; rxd->read .pkt_addr = dma_addr;
rxe[i] .mbuf = mbuf; } return 0 ; }
We see that the nb_rx_desc mbuf pointers are looped out of the ring of the memory
pool to which the queue belongs, that is, to fill rxq->sw_ring. Each pointer points to a
packet space in the memory pool.
Then the newly allocated mbuf structure is populated first, and the most important is
the padding calculation dma_addr. Then initialize the queue ring, rxd information,
indicating that the driver puts the packet at dma_addr. In the last sentence, put the
allocated mbuf into the sw_ring of the queue, so that the package that was received is
directly placed in the sw_ring.
The most important work above is completed, the following can enable the DMA
engine, ready to receive the package
hw->mac .ops .enable _rx_dma(hw, rxctrl) ;
Then set the value of the head and tail registers of the queue ring, which is also very
important! The header is set to 0, and the tail is set to the number of descriptors
minus 1, which means that the descriptor fills the entire ring.
IXGBE_WRITE_REG(hw, IXGBE_RDH(rxq->reg_idx) , 0 ) ;
IXGBE_WRITE_REG(hw, IXGBE_RDT(rxq->reg_idx) , rxq->nb_rx_desc - 1 ) ;
With this step done, there is nothing important to the rest, just stop!
The start of the send queue is simpler than the start of the receive queue, except that
the txdctl register is configured, the delay waits for TX to be completed, and finally,
the head and tail positions of the set queue are both 0.
txdctl = IXGBE_READ_REG (hw, IXGBE_TXDCTL (txq->reg_idx)); txdctl |=
IXGBE_TXDCTL_ENABLE ; IXGBE_WRITE_REG (hw, IXGBE_TXDCTL (txq-
>reg_idx), txdctl); IXGBE_WRITE_REG (hw, IXGBE_TDH (txq->reg_idx), 0 );
IXGBE_WRITE_REG (hw, IXGBE_TDT (txq->reg_idx), 0 );
note:
The acquisition of the data packet means that the driver puts the data packet into the
memory, and the upper layer application extracts the data packet from the queue; the
sending means that the data packet to be sent is put into the sending queue to prepare
for the actual transmission.
The service level gets the packet starting from rte_eth_rx_burst()
static inline uint16_t rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id, struct
rte_mbuf **rx_pkts, uint16_t nb_pkts) { struct rte_eth_dev *dev; dev =
&rte_eth_devices[port_id]; return (*dev->rx_pkt_burst)(dev->data-
>rx_queues[queue_id], rx_pkts, nb_pkts); }