Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxmox 安装 NVIDIA 驱动 直通LXC 嵌套 docker 各种疑难杂症 #72

Open
luckyyyyy opened this issue May 26, 2023 · 0 comments
Open

Comments

@luckyyyyy
Copy link
Owner

luckyyyyy commented May 26, 2023

基本操作

# 屏蔽nouveau 添加一句 blacklist nouveau
vim /etc/modprobe.d/blacklist.conf
# 修改生效
update-initramfs -u
# 重启
reboot
# 到nvidia官方下载对应驱动 给运行权限
chmod +x NVIDIA-Linux-x86_64-525.116.04.run 
# 安装
./NVIDIA-Linux-x86_64-525.116.04.run 

错误

  1. Error: the distribution-provided pre-install script failed.
  2. Error: Unable to find the development tool 'cc' in your path.
  3. Error: Unable to find the development tool 'make' in your path.
  4. Error: The kernel module failed to load. Secure boot is enabled on this system.
  5. The signed kernel module failed to load.
  6. Error: Unable to load the kernel module 'nvidia.ko'
  7. Error: An NVIDIA kernel 'nvidia-drm' appears to already be loaded in your kernel.
  8. Error: An NVIDIA kernel module 'nvidia-modeset' appears to already be loaded in your kernel.
  9. WARNING: Unable to find a suitable destination to install 32-bit compatibility libraries.
  10. WARNING: Unable to determine the path to install the libglvnd EGL vendor library config files.

解决

  • 第一个错误,继续安装即可,这个错误只是确认你是否要安装这个驱动
  • 第二个、第三个错误产生的原因是gcc和make没安装
apt-get install gcc
apt-get install make
  • 第四个错误与第五个错误产生的原因是BIOS没有关闭 Secure boot
  • 第六个错误,证明准备工作没有做好
  • 第七个错误和第八个错误,首先要确保关闭了Secure Boot,然后删除已经安装的显卡驱动:
apt-get purge nvidia*
apt-get autoremove
reboot

安装后

添加两行到 /etc/modules-load.d/nvidia.conf

nvidia
nvidia-uvm

添加规则

新建 /etc/udev/rules.d/70-nvidia.rules 添加内容

# /etc/udev/rules.d/70-nvidia.rules
# Create /nvidia0, /dev/nvidia1 and /nvidiactl when nvidia module is loaded
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
# Create the CUDA node when nvidia_uvm CUDA module is loaded
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"

重启

LXC配置

参考如下 使用cgroup2添加对应设备

lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop: 
lxc.cgroup2.devices.allow: c 10:200 rwm
lxc.mount.entry: /dev/net/tun dev/net/tun none bind,create=file
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.cgroup2.devices.allow: c 507:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir

LXC

需要安装显卡驱动,选择 ./NVIDIA-Linux-x86_64-535.104.05.run --no-kernel-module 方式安装

image

LXC Docker

参考NVIDIA官方的安装手册 https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant