Fix code scanning alert - Likely overrunning write #1

eroullit · 2022-06-15T08:49:02Z

Tracking issue for:

https://github.com/eroullit/dpdk/security/code-scanning/69

If DPDK is built with thread sanitizer it reports a race in setting of multiprocess file descriptor. The fix is to use atomic operations when updating mp_fd. Build: $ meson -Db_sanitize=address build $ ninja -C build Simple example: $ .build/app/dpdk-testpmd -l 1-3 --no-huge EAL: Detected CPU lcores: 16 EAL: Detected NUMA nodes: 1 EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem EAL: Detected static linkage of DPDK EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' testpmd: No probed ethernet devices testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc EAL: Error - exiting with code: 1 Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory ================== WARNING: ThreadSanitizer: data race (pid=87245) Write of size 4 at 0x558e04d8ff70 by main thread: #0 rte_mp_channel_cleanup <null> (dpdk-testpmd+0x1e7d30c) #1 rte_eal_cleanup <null> (dpdk-testpmd+0x1e85929) #2 rte_exit <null> (dpdk-testpmd+0x1e5bc0a) #3 mbuf_pool_create.cold <null> (dpdk-testpmd+0x274011) #4 main <null> (dpdk-testpmd+0x5cc15d) Previous read of size 4 at 0x558e04d8ff70 by thread T2: #0 mp_handle <null> (dpdk-testpmd+0x1e7c439) #1 ctrl_thread_init <null> (dpdk-testpmd+0x1e6ee1e) As if synchronized via sleep: #0 nanosleep libsanitizer/tsan/tsan_interceptors_posix.cpp:366 #1 get_tsc_freq <null> (dpdk-testpmd+0x1e92ff9) #2 set_tsc_freq <null> (dpdk-testpmd+0x1e6f2fc) #3 rte_eal_timer_init <null> (dpdk-testpmd+0x1e931a4) #4 rte_eal_init.cold <null> (dpdk-testpmd+0x29e578) DPDK#5 main <null> (dpdk-testpmd+0x5cbc45) Location is global 'mp_fd' of size 4 at 0x558e04d8ff70 (dpdk-testpmd+0x000003122f70) Thread T2 'rte_mp_handle' (tid=87248, running) created by main thread at: #0 pthread_create libsanitizer/tsan/tsan_interceptors_posix.cpp:969 #1 rte_ctrl_thread_create <null> (dpdk-testpmd+0x1e6efd0) #2 rte_mp_channel_init.cold <null> (dpdk-testpmd+0x29cb7c) #3 rte_eal_init <null> (dpdk-testpmd+0x1e8662e) #4 main <null> (dpdk-testpmd+0x5cbc45) SUMMARY: ThreadSanitizer: data race (app/dpdk-testpmd+0x1e7d30c) in rte_mp_channel_cleanup ================== ThreadSanitizer: reported 1 warnings Fixes: bacaa27 ("eal: add channel for multi-process communication") Cc: [email protected] Signed-off-by: Stephen Hemminger <[email protected]> Acked-by: Anatoly Burakov <[email protected]> Reviewed-by: Chengwen Feng <[email protected]>

The net/vhost pmd currently provides a -1 vid when disabling interrupt after a virtio port got disconnected. This can be caught when running with ASan. First, start dpdk-l3fwd-power in interrupt mode with a net/vhost port. $ ./build-clang/examples/dpdk-l3fwd-power -l0,1 --in-memory \ -a 0000:00:00.0 \ --vdev net_vhost0,iface=plop.sock,client=1\ -- \ -p 0x1 \ --interrupt-only \ --config '(0,0,1)' \ --parse-ptype 0 Then start testpmd with virtio-user. $ ./build-clang/app/dpdk-testpmd -l0,2 --single-file-segment --in-memory \ -a 0000:00:00.0 \ --vdev net_virtio_user0,path=plop.sock,server=1 \ -- \ -i Finally stop testpmd. ASan then splats in dpdk-l3fwd-power: ================================================================= ==3641005==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000005ed0778 at pc 0x000001270f81 bp 0x7fddbd2eee20 sp 0x7fddbd2eee18 READ of size 8 at 0x000005ed0778 thread T2 #0 0x1270f80 in get_device .../lib/vhost/vhost.h:801:27 #1 0x1270f80 in rte_vhost_get_vhost_vring .../lib/vhost/vhost.c:951:8 #2 0x3ac95cb in eth_rxq_intr_disable .../drivers/net/vhost/rte_eth_vhost.c:647:8 #3 0x170e0bf in rte_eth_dev_rx_intr_disable .../lib/ethdev/rte_ethdev.c:5443:25 #4 0xf72ba7 in turn_on_off_intr .../examples/l3fwd-power/main.c:881:4 DPDK#5 0xf71045 in main_intr_loop .../examples/l3fwd-power/main.c:1061:6 DPDK#6 0x17f9292 in eal_thread_loop .../lib/eal/common/eal_common_thread.c:210:9 DPDK#7 0x18373f5 in eal_worker_thread_loop .../lib/eal/linux/eal.c:915:2 DPDK#8 0x7fddc16ae12c in start_thread (/lib64/libc.so.6+0x8b12c) (BuildId: 81daba31ee66dbd63efdc4252a872949d874d136) DPDK#9 0x7fddc172fbbf in __GI___clone3 (/lib64/libc.so.6+0x10cbbf) (BuildId: 81daba31ee66dbd63efdc4252a872949d874d136) 0x000005ed0778 is located 8 bytes to the left of global variable 'vhost_devices' defined in '.../lib/vhost/vhost.c:24' (0x5ed0780) of size 8192 0x000005ed0778 is located 20 bytes to the right of global variable 'vhost_config_log_level' defined in '.../lib/vhost/vhost.c:2174' (0x5ed0760) of size 4 SUMMARY: AddressSanitizer: global-buffer-overflow .../lib/vhost/vhost.h:801:27 in get_device Shadow bytes around the buggy address: 0x000080bd2090: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x000080bd20a0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x000080bd20b0: f9 f9 f9 f9 00 f9 f9 f9 00 f9 f9 f9 00 f9 f9 f9 0x000080bd20c0: 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 04 f9 f9 f9 0x000080bd20d0: 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 =>0x000080bd20e0: 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 04 f9 f9[f9] 0x000080bd20f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x000080bd2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x000080bd2110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x000080bd2120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x000080bd2130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Thread T2 created by T0 here: #0 0xe98996 in __interceptor_pthread_create (.examples/dpdk-l3fwd-power+0xe98996) (BuildId: d0b984a3b0287b9e0f301b73426fa921aeecca3a) #1 0x1836767 in eal_worker_thread_create .../lib/eal/linux/eal.c:952:6 #2 0x1834b83 in rte_eal_init .../lib/eal/linux/eal.c:1257:9 #3 0xf68902 in main .../examples/l3fwd-power/main.c:2496:8 #4 0x7fddc164a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f) (BuildId: 81daba31ee66dbd63efdc4252a872949d874d136) ==3641005==ABORTING More generally, any application passing an incorrect vid would trigger such an OOB access. Fixes: 4796ad6 ("examples/vhost: import userspace vhost application") Cc: [email protected] Signed-off-by: David Marchand <[email protected]> Reviewed-by: Maxime Coquelin <[email protected]>

Segmentation fault has been observed while running the ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 processor which has 64 cores and 4 NUMA nodes. From the ixgbe_recv_pkts_lro() function, we found that as long as the first packet has the EOP bit set, and the length of this packet is less than or equal to rxq->crc_len, the segmentation fault will definitely happen even though on the other platforms. For example, if we made the first packet which had the EOP bit set had a zero length by force, the segmentation fault would happen on X86. Because when processd the first packet the first_seg->next will be NULL, if at the same time this packet has the EOP bit set and its length is less than or equal to rxq->crc_len, the following loop will be executed: for (lp = first_seg; lp->next != rxm; lp = lp->next) ; We know that the first_seg->next will be NULL under this condition. So the expression of lp->next->next will cause the segmentation fault. Normally, the length of the first packet with EOP bit set will be greater than rxq->crc_len. However, the out-of-order execution of CPU may make the read ordering of the status and the rest of the descriptor fields in this function not be correct. The related codes are as following: rxdp = &rx_ring[rx_id]; #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); if (!(staterr & IXGBE_RXDADV_STAT_DD)) break; #2 rxd = *rxdp; The sentence #2 may be executed before sentence #1. This action is likely to make the ready packet zero length. If the packet is the first packet and has the EOP bit set, the above segmentation fault will happen. So, we should add a proper memory barrier to ensure the read ordering be correct. We also did the same thing in the ixgbe_recv_pkts() function to make the rxd data be valid even though we did not find segmentation fault in this function. Fixes: 8eecb32 ("ixgbe: add LRO support") Cc: [email protected] Signed-off-by: Min Zhou <[email protected]> Reviewed-by: Ruifeng Wang <[email protected]>

getline() may allocate a buffer even though it returns -1: """ If *lineptr is set to NULL before the call, then getline() will allocate a buffer for storing the line. This buffer should be freed by the user program even if getline() failed. """ This leak has been observed on a RHEL8 system with two CX5 PF devices (no VFs). ASan reports: ==8899==ERROR: LeakSanitizer: detected memory leaks Direct leak of 120 byte(s) in 1 object(s) allocated from: #0 0x7fe58576aba8 in __interceptor_malloc (/lib64/libasan.so.5+0xefba8) #1 0x7fe583e866b2 in __getdelim (/lib64/libc.so.6+0x886b2) #2 0x327bd23 in mlx5_sysfs_switch_info ../drivers/net/mlx5/linux/mlx5_ethdev_os.c:1084 #3 0x3271f86 in mlx5_os_pci_probe_pf ../drivers/net/mlx5/linux/mlx5_os.c:2282 #4 0x3273c83 in mlx5_os_pci_probe ../drivers/net/mlx5/linux/mlx5_os.c:2497 DPDK#5 0x327475f in mlx5_os_net_probe ../drivers/net/mlx5/linux/mlx5_os.c:2578 DPDK#6 0xc6eac7 in drivers_probe ../drivers/common/mlx5/mlx5_common.c:937 DPDK#7 0xc6f150 in mlx5_common_dev_probe ../drivers/common/mlx5/mlx5_common.c:1027 DPDK#8 0xc8ef80 in mlx5_common_pci_probe ../drivers/common/mlx5/mlx5_common_pci.c:168 DPDK#9 0xc21b67 in rte_pci_probe_one_driver ../drivers/bus/pci/pci_common.c:312 DPDK#10 0xc2224c in pci_probe_all_drivers ../drivers/bus/pci/pci_common.c:396 DPDK#11 0xc222f4 in pci_probe ../drivers/bus/pci/pci_common.c:423 DPDK#12 0xb71fff in rte_bus_probe ../lib/eal/common/eal_common_bus.c:78 DPDK#13 0xbe6888 in rte_eal_init ../lib/eal/linux/eal.c:1300 DPDK#14 0x5ec717 in main ../app/test-pmd/testpmd.c:4515 DPDK#15 0x7fe583e38d84 in __libc_start_main (/lib64/libc.so.6+0x3ad84) As far as why getline() errors, strace gives a hint: 8516 openat(AT_FDCWD, "/sys/class/net/enp130s0f0/phys_port_name", O_RDONLY) = 34 8516 fstat(34, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 8516 read(34, 0x621000098900, 4096) = -1 EOPNOTSUPP (Operation not supported) Fixes: f8a226e ("net/mlx5: fix sysfs port name translation") Cc: [email protected] Signed-off-by: David Marchand <[email protected]> Acked-by: Viacheslav Ovsiienko <[email protected]>

ASAN report: ERROR: AddressSanitizer: unknown-crash on address 0x7ffffef92e32 at pc 0x00000053d1e9 bp 0x7ffffef92c00 sp 0x7ffffef92bf8 READ of size 16 at 0x7ffffef92e32 thread T0 #0 0x53d1e8 in _mm_loadu_si128 /usr/lib64/gcc/x86_64-suse-linux/11/include/emmintrin.h:703 #1 0x53d1e8 in send_packets_multi ../examples/l3fwd/l3fwd_sse.h:125 #2 0x53d1e8 in acl_send_packets ../examples/l3fwd/l3fwd_acl.c:1048 #3 0x53ec18 in acl_main_loop ../examples/l3fwd/l3fwd_acl.c:1127 #4 0x12151eb in rte_eal_mp_remote_launch ../lib/eal/common/eal_common_launch.c:83 DPDK#5 0x5bf2df in main ../examples/l3fwd/main.c:1647 DPDK#6 0x7f6d42a0d2bc in __libc_start_main (/lib64/libc.so.6+0x352bc) DPDK#7 0x527499 in _start (/home/kananyev/dpdk-l3fwd-acl/x86_64-native-linuxapp-gcc-dbg-b1/examples/dpdk-l3fwd+0x527499) Reason for that is that send_packets_multi() uses 16B loads to access input dst_port[]and might read beyond array boundaries. Right now, it doesn't cause any real issue - junk values are ignored, also inside l3fwd we always allocate dst_port[] array on the stack, so memory beyond it is always available. Anyway, it probably need to be fixed. The patch below simply allocates extra space for dst_port[], so send_packets_multi() will never read beyond its boundaries. Probably a better fix would be to change send_packets_multi() itself to avoid access beyond 'nb_rx' entries. Bugzilla ID: 1502 Fixes: 94c54b4 ("examples/l3fwd: rework exact-match") Cc: [email protected] Signed-off-by: Konstantin Ananyev <[email protected]> Acked-by: Stephen Hemminger <[email protected]>

eroullit linked a pull request Jun 15, 2022 that will close this issue

Take in accont NUL terminator #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix code scanning alert - Likely overrunning write #1

Fix code scanning alert - Likely overrunning write #1

eroullit commented Jun 15, 2022

Fix code scanning alert - Likely overrunning write #1

Fix code scanning alert - Likely overrunning write #1

Comments

eroullit commented Jun 15, 2022