Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.2dev kni_mod 20.11 When gatekeeper is not gracefully shut down in i40e driver scenario, BUG appears: kernel NULL pointer dereference, address: 0000000000000010 #685

Closed
ShawnLeung87 opened this issue Apr 13, 2024 · 7 comments
Labels
Milestone

Comments

@ShawnLeung87
Copy link

ShawnLeung87 commented Apr 13, 2024

When this exception occurs, the server must be restarted before it can be restored.
Exception log:

Apr 13 16:31:39 78-GK-3 systemd[1]: systemd-timedated.service: Succeeded.
Apr 13 16:31:57 78-GK-3 systemd-networkd[1949]: kni_back: Link DOWN
Apr 13 16:31:57 78-GK-3 systemd-networkd[1949]: kni_back: Lost carrier
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048101] BUG: kernel NULL pointer dereference, address: 0000000000000010
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048152] #PF: supervisor read access in kernel mode
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048177] #PF: error_code(0x0000) - not-present page
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048201] PGD 0 P4D 0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048219] Oops: 0000 [#1] SMP NOPTI
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048243] CPU: 37 PID: 2496 Comm: rte_mp_handle Tainted: G OE 5.15.0-97-generic #107~20.04.1-Ubuntu
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048291] Hardware name: Dell Inc. PowerEdge R740xd/06WXJT, BIOS 2.11.2 004/21/2021
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048323] RIP: 0010:vmacache_find+0x24/0xf0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048355] Code: 5d c3 cc cc cc cc 0f 1f 44 00 00 65 48 8b 04 25 c0 fb 01 00 48 3b b8 08 09 00 00 74 07 31 c0 c3 cc cc cc cc f6 40 2e 20 75 f3 <48> 8b 57 10 48 3b 90 18 09 00 00 75 69 55 48 89 e5 41 57 49 89 f7
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048428] RSP: 0018:ffffa5b6a1e979c8 EFLAGS: 00010246
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048455] RAX: ffff9077dbc98000 RBX: 0000000000000000 RCX: 0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048486] RDX: 0000000000000001 RSI: 0000002d42377000 RDI: 0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048517] RBP: ffffa5b6a1e979e8 R08: ffffa5b6a1e97b80 R09: 0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048547] R10: 0000000000000001 R11: ffff906ac32b80c0 R12: 0000002d42377000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048578] R13: 0000002d42377000 R14: 0000000000000000 R15: 0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048607] FS: 0000000000000000(0000) GS:ffff90b6bf680000(0000) knlGS:0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048659] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048685] CR2: 0000000000000010 CR3: 000000524d810004 CR4: 00000000007706e0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048716] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048745] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048775] PKRU: 55555554
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048790] Call Trace:
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048805]
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048820] ? show_regs.cold+0x1a/0x1f
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048848] ? __die_body+0x20/0x70
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048877] ? __die+0x2b/0x37
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048895] ? page_fault_oops+0x136/0x2c0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048918] ? update_load_avg+0x7c/0x650
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048942] ? newidle_balance+0x39d/0x470
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048967] ? do_user_addr_fault+0x303/0x660
Apr 13 16:31:57 78-GK-3 kernel: [ 369.048990] ? __update_idle_core+0xe5/0x120
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049015] ? exc_page_fault+0x77/0x170
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049042] ? asm_exc_page_fault+0x27/0x30
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049067] ? vmacache_find+0x24/0xf0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049091] ? find_vma+0x1b/0x80
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049114] find_extend_vma+0x1e/0x90
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049140] __get_user_pages+0xa0/0x6b0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049167] __get_user_pages_remote+0xdc/0x320
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049191] ? kfree+0x3bd/0x420
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049214] get_user_pages_remote+0x21/0x50
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049239] kni_fifo_trans_pa2va+0x1fa/0x310 [rte_kni]
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049271] ? kobject_release+0x5f/0x150
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049299] kni_net_release_fifo_phy+0x36/0x40 [rte_kni]
Apr 13 16:31:57 78-GK-3 kernel: [ 369.049329] kni_dev_remove+0x33/0x50 [rte_kni]
Apr 13 16:31:57 78-GK-3 kernel: [ 369.050138] kni_release+0xb0/0x180 [rte_kni]
Apr 13 16:31:57 78-GK-3 kernel: [ 369.050908] __fput+0x9c/0x280
Apr 13 16:31:57 78-GK-3 kernel: [ 369.051672] ____fput+0xe/0x20
Apr 13 16:31:57 78-GK-3 kernel: [ 369.052412] task_work_run+0x6d/0xb0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.053133] do_exit+0x363/0xad0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.053850] ? __mod_memcg_lruvec_state+0x63/0xe0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.054572] do_group_exit+0x43/0xb0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.055289] get_signal+0x157/0x900
Apr 13 16:31:57 78-GK-3 kernel: [ 369.055991] ? lru_cache_add_inactive_or_unevictable+0x29/0xe0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.056686] arch_do_signal_or_restart+0xf7/0x290
Apr 13 16:31:57 78-GK-3 kernel: [ 369.057369] ? fput+0x13/0x20
Apr 13 16:31:57 78-GK-3 kernel: [ 369.058039] ? __sys_recvmsg+0x98/0xb0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.058667] exit_to_user_mode_prepare+0x130/0x1c0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.059282] syscall_exit_to_user_mode+0x27/0x50
Apr 13 16:31:57 78-GK-3 kernel: [ 369.059893] ? __x64_sys_recvmsg+0x1f/0x30
Apr 13 16:31:57 78-GK-3 kernel: [ 369.060200] do_syscall_64+0x69/0xc0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.060463] ? exit_to_user_mode_prepare+0x92/0x1c0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.060726] ? do_user_addr_fault+0x1e0/0x660
Apr 13 16:31:57 78-GK-3 kernel: [ 369.060987] ? irqentry_exit_to_user_mode+0x17/0x20
Apr 13 16:31:57 78-GK-3 kernel: [ 369.061247] ? irqentry_exit+0x1d/0x30
Apr 13 16:31:57 78-GK-3 kernel: [ 369.061498] ? exc_page_fault+0x89/0x170
Apr 13 16:31:57 78-GK-3 kernel: [ 369.061747] entry_SYSCALL_64_after_hwframe+0x62/0xcc
Apr 13 16:31:57 78-GK-3 kernel: [ 369.061996] RIP: 0033:0x7f4b7f1dd0ed
Apr 13 16:31:57 78-GK-3 kernel: [ 369.062244] Code: Unable to access opcode bytes at RIP 0x7f4b7f1dd0c3.
Apr 13 16:31:57 78-GK-3 kernel: [ 369.062494] RSP: 002b:00007f4b7dec9fb0 EFLAGS: 00000293 ORIG_RAX: 000000000000002f
Apr 13 16:31:57 78-GK-3 kernel: [ 369.062753] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f4b7f1dd0ed
Apr 13 16:31:57 78-GK-3 kernel: [ 369.063017] RDX: 0000000000000000 RSI: 00007f4b7deca030 RDI: 0000000000000009
Apr 13 16:31:57 78-GK-3 kernel: [ 369.063285] RBP: 00007f4b7deca254 R08: 0000000000000000 R09: 0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.063554] R10: 0000000000004022 R11: 0000000000000293 R12: 00007f4b7deca038
Apr 13 16:31:57 78-GK-3 kernel: [ 369.063826] R13: 00007f4b7deca072 R14: 00007f4b7deca250 R15: 00007f4b7deca030
Apr 13 16:31:57 78-GK-3 kernel: [ 369.064101]
Apr 13 16:31:57 78-GK-3 kernel: [ 369.064374] Modules linked in: rte_kni(OE) nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif binfmt_misc input_leds joydev kvm dell_smbios ice dcdbas rapl wmi_bmof dell_wmi_descriptor ib_uverbs ib_core intel_cstate mei_me ioatdma mei intel_pch_thermal dca acpi_ipmi ipmi_si ipmi_devintf ip6t_REJECT nf_reject_ipv6 ipmi_msghandler xt_hl ip6t_rt acpi_power_meter mac_hid ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_limit xt_addrtype sch_fq_codel xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 iavf nf_defrag_ipv4 uio_pci_generic ip6table_filter uio ip6_tables iptable_filter bpfilter msr ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear dm_mirror
Apr 13 16:31:57 78-GK-3 kernel: [ 369.064448] dm_region_hash dm_log hid_generic usbhid hid mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec crct10dif_pclmul rc_core crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd i40e(OE) cryptd i2c_i801 ahci drm megaraid_sas tg3 xhci_pci lpc_ich i2c_smbus libahci xhci_pci_renesas wmi
Apr 13 16:31:57 78-GK-3 kernel: [ 369.068303] CR2: 0000000000000010
Apr 13 16:31:57 78-GK-3 kernel: [ 369.068702] ---[ end trace 03c229dce7104af2 ]---
Apr 13 16:31:57 78-GK-3 kernel: [ 369.114550] RIP: 0010:vmacache_find+0x24/0xf0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.114852] Code: 5d c3 cc cc cc cc 0f 1f 44 00 00 65 48 8b 04 25 c0 fb 01 00 48 3b b8 08 09 00 00 74 07 31 c0 c3 cc cc cc cc f6 40 2e 20 75 f3 <48> 8b 57 10 48 3b 90 18 09 00 00 75 69 55 48 89 e5 41 57 49 89 f7
Apr 13 16:31:57 78-GK-3 kernel: [ 369.115462] RSP: 0018:ffffa5b6a1e979c8 EFLAGS: 00010246
Apr 13 16:31:57 78-GK-3 kernel: [ 369.115772] RAX: ffff9077dbc98000 RBX: 0000000000000000 RCX: 0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.116042] RDX: 0000000000000001 RSI: 0000002d42377000 RDI: 0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.116308] RBP: ffffa5b6a1e979e8 R08: ffffa5b6a1e97b80 R09: 0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.116575] R10: 0000000000000001 R11: ffff906ac32b80c0 R12: 0000002d42377000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.116845] R13: 0000002d42377000 R14: 0000000000000000 R15: 0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.117114] FS: 0000000000000000(0000) GS:ffff90b6bf680000(0000) knlGS:0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.117386] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 13 16:31:57 78-GK-3 kernel: [ 369.117647] CR2: 0000000000000010 CR3: 000000524d810004 CR4: 00000000007706e0
Apr 13 16:31:57 78-GK-3 kernel: [ 369.117910] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 13 16:31:57 78-GK-3 kernel: [ 369.118172] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr 13 16:31:57 78-GK-3 kernel: [ 369.118433] PKRU: 55555554
Apr 13 16:31:57 78-GK-3 kernel: [ 369.118693] Fixing recursive fault but reboot is needed!

@ShawnLeung87
Copy link
Author

The same code is normal in the ixgb 10g scenario and can be kill -9 gatekeeper.

@AltraMayor
Copy link
Owner

The root cause of this problem is the KNI kernel module, and the latest v1.2.0-dev solves this problem by replacing the KNI kernel module with the Virtio kernel module.

@ShawnLeung87
Copy link
Author

xl710 After using virtio-user, PCTYPES is not supported in rss.

@AltraMayor
Copy link
Owner

I can only look at that problem once I address a problem with bonded interfaces I'm working on; this should take me a couple more weeks. In the meantime, you should collect as much information about your problem as possible since I don't have access to an xl710 to test.

@ShawnLeung87
Copy link
Author

I'm also trying to set breakpoints and use gdb to collect specific information. If you need to test the XL710, we can provide a test environment.

@AltraMayor
Copy link
Owner

Could you give me access to a machine with an XL710 ready for testing with branch v1.2.0-dev?

@AltraMayor
Copy link
Owner

Pull request #691 addressed this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants