Skip to content

Commit 85e46c5

Browse files
zhoumin2qzhan16
authored andcommitted
net/ixgbe: add proper memory barriers in Rx
Segmentation fault has been observed while running the ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 processor which has 64 cores and 4 NUMA nodes. From the ixgbe_recv_pkts_lro() function, we found that as long as the first packet has the EOP bit set, and the length of this packet is less than or equal to rxq->crc_len, the segmentation fault will definitely happen even though on the other platforms. For example, if we made the first packet which had the EOP bit set had a zero length by force, the segmentation fault would happen on X86. Because when processd the first packet the first_seg->next will be NULL, if at the same time this packet has the EOP bit set and its length is less than or equal to rxq->crc_len, the following loop will be executed: for (lp = first_seg; lp->next != rxm; lp = lp->next) ; We know that the first_seg->next will be NULL under this condition. So the expression of lp->next->next will cause the segmentation fault. Normally, the length of the first packet with EOP bit set will be greater than rxq->crc_len. However, the out-of-order execution of CPU may make the read ordering of the status and the rest of the descriptor fields in this function not be correct. The related codes are as following: rxdp = &rx_ring[rx_id]; #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); if (!(staterr & IXGBE_RXDADV_STAT_DD)) break; #2 rxd = *rxdp; The sentence #2 may be executed before sentence #1. This action is likely to make the ready packet zero length. If the packet is the first packet and has the EOP bit set, the above segmentation fault will happen. So, we should add a proper memory barrier to ensure the read ordering be correct. We also did the same thing in the ixgbe_recv_pkts() function to make the rxd data be valid even though we did not find segmentation fault in this function. Fixes: 8eecb32 ("ixgbe: add LRO support") Cc: [email protected] Signed-off-by: Min Zhou <[email protected]> Reviewed-by: Ruifeng Wang <[email protected]>
1 parent c261e7d commit 85e46c5

File tree

1 file changed

+21
-26
lines changed

1 file changed

+21
-26
lines changed

drivers/net/ixgbe/ixgbe_rxtx.c

+21-26
Original file line numberDiff line numberDiff line change
@@ -1817,11 +1817,22 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
18171817
* of accesses cannot be reordered by the compiler. If they were
18181818
* not volatile, they could be reordered which could lead to
18191819
* using invalid descriptor fields when read from rxd.
1820+
*
1821+
* Meanwhile, to prevent the CPU from executing out of order, we
1822+
* need to use a proper memory barrier to ensure the memory
1823+
* ordering below.
18201824
*/
18211825
rxdp = &rx_ring[rx_id];
18221826
staterr = rxdp->wb.upper.status_error;
18231827
if (!(staterr & rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD)))
18241828
break;
1829+
1830+
/*
1831+
* Use acquire fence to ensure that status_error which includes
1832+
* DD bit is loaded before loading of other descriptor words.
1833+
*/
1834+
rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
1835+
18251836
rxd = *rxdp;
18261837

18271838
/*
@@ -2088,39 +2099,23 @@ ixgbe_recv_pkts_lro(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts,
20882099

20892100
next_desc:
20902101
/*
2091-
* The code in this whole file uses the volatile pointer to
2092-
* ensure the read ordering of the status and the rest of the
2093-
* descriptor fields (on the compiler level only!!!). This is so
2094-
* UGLY - why not to just use the compiler barrier instead? DPDK
2095-
* even has the rte_compiler_barrier() for that.
2096-
*
2097-
* But most importantly this is just wrong because this doesn't
2098-
* ensure memory ordering in a general case at all. For
2099-
* instance, DPDK is supposed to work on Power CPUs where
2100-
* compiler barrier may just not be enough!
2101-
*
2102-
* I tried to write only this function properly to have a
2103-
* starting point (as a part of an LRO/RSC series) but the
2104-
* compiler cursed at me when I tried to cast away the
2105-
* "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm
2106-
* keeping it the way it is for now.
2107-
*
2108-
* The code in this file is broken in so many other places and
2109-
* will just not work on a big endian CPU anyway therefore the
2110-
* lines below will have to be revisited together with the rest
2111-
* of the ixgbe PMD.
2112-
*
2113-
* TODO:
2114-
* - Get rid of "volatile" and let the compiler do its job.
2115-
* - Use the proper memory barrier (rte_rmb()) to ensure the
2116-
* memory ordering below.
2102+
* "Volatile" only prevents caching of the variable marked
2103+
* volatile. Most important, "volatile" cannot prevent the CPU
2104+
* from executing out of order. So, it is necessary to use a
2105+
* proper memory barrier to ensure the memory ordering below.
21172106
*/
21182107
rxdp = &rx_ring[rx_id];
21192108
staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error);
21202109

21212110
if (!(staterr & IXGBE_RXDADV_STAT_DD))
21222111
break;
21232112

2113+
/*
2114+
* Use acquire fence to ensure that status_error which includes
2115+
* DD bit is loaded before loading of other descriptor words.
2116+
*/
2117+
rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
2118+
21242119
rxd = *rxdp;
21252120

21262121
PMD_RX_LOG(DEBUG, "port_id=%u queue_id=%u rx_id=%u "

0 commit comments

Comments
 (0)