Running an AI model locally on a Proxmox Container + GPU passthrough

Share

Going further down the Proxmox / Local AI rabbit hole


One of the main reasons to run a local AI model on an LXC container instead of creating a VM with Debian as an example is efficiency and underlying resource usage.

You've got to remember that if we can manage the Ollama instance via the CLI instead of a fully loaded OS and GUI, we reduce the CPU, RAM, GPU, and disk space usage. Instead, we can keep the underlying host resource usage to a minimum.

From https://www.atlassian.com/microservices/cloud-computing/containers-vs-vms

Here's how we can do that (for an Nvidia GPU).


NVIDIA Driver Installation & LXC Pass through on Proxmox

1. Prerequisites & Host Preparation

Before installing drivers, the host must be configured to prevent driver conflicts and ensure building capabilities. First we will need to Disable Secure Boot. Why? you might ask, it's because secure Boot prevents unsigned third-party kernel modules (like NVIDIA's) from loading. This is done by going into the BIOS (F2/Del) and setting Secure Boot to disabled.

Install Build Dependencies

Reboot the node and Install the Proxmox kernel headers and build tools required to compile the driver against your specific kernel version.


apt update

apt install -y pve-headers-$(uname -r) build-essential dkms pkg-config xorg-dev libglvnd-dev

Blacklist Nouveau Driver

The open-source Nouveau driver will need to be disabled so the proprietary driver can take control of the hardware. If this is not done, the Linux instance will attempt to load these drivers before using the proprietary Nvidia drivers.


cat <<EOF > /etc/modprobe.d/blacklist-nouveau.conf

blacklist nouveau

options nouveau modeset=0

EOF


update-initramfs -u -k all

reboot

2. Host Driver Installation

Using the 595.45.04 (or latest) driver branch is recommended for mobile Ampere (30-series) GPUs on modern kernels

Installation Command

Run the installer with the following flags for a clean, non-interactive setup:

chmod +x NVIDIA-Linux-x86_64-595.45.04.run

./NVIDIA-Linux-x86_64-595.45.04.run --dkms --silent --accept-license --no-questions --ui=none

Verification

Ensure the driver is active and prevent the GPU from entering deep sleep states (common on laptops), which can break container links.


# Verify GPU visibility

nvidia-smi

# Enable Persistence Mode

nvidia-persistenced

nvidia-smi -pm 1

# Manually trigger device node creation if /dev/nvidia* is missing

nvidia-modprobe -u -c0

3. LXC Container Configuration

LXC containers share the host kernel. You will need to map the hardware device nodes from the host into the container.

Edit Container Config

  1. Open the config file for your container (e.g., ID 100):

nano /etc/pve/lxc/100.conf

  1. Add the following lines to the bottom:

# Passthrough NVIDIA Device Nodes

dev0: /dev/nvidia0

dev1: /dev/nvidiactl

dev2: /dev/nvidia-uvm

dev3: /dev/nvidia-uvm-tools

dev4: /dev/nvidia-modeset

Note: Ensure the major number (usually 195) matches your host by checking ls -l /dev/nvidia*.

Install the Nvidia Libraries in LXC

The container needs the NVIDIA libraries but not the kernel module. First we can push the .run file to the to the container the pct command from the node.

On the Proxmox Node:

pct push 100 NVIDIA-Linux-x86_64-595.45.04.run /root/nvidia.run

On the LXC container you want to use the GPU:

chmod +x /root/nvidia.run

./root/nvidia.run --no-kernel-module --no-x-check --no-nouveau-check --ui=none

Troubleshooting

IssueSolution
Module nvidia not foundRe‑run the installer with --dkms and make sure the pve-headers package matches your current kernel (uname -r).
ls: cannot access '/dev/nvidia'*Run nvidia-modprobe -u -c0 and then inspect dmesg for any RMInit‑related errors.

Here we can start asking the AI models extremely important questions... And we can see on the Proxmox node that the GPU is being use.

Asking the model a random question with some swearing in the top half of the image and nvtop at the bottom.

Now that I have the LXC container successfully utilising the GPU and verifying this via the tool nvtop. We can begin to build all sorts of workflows around this local AI model. As you've probably heard before, this is not going to be anywhere near as powerful as the big 3 AI models (Claude Opus, ChatGPT and Gemini) however, this allows for a central AI tool that can be used within my network instead of different Ollama instances across my devices. Instead of just stopping at the AI chat bot level, the next step would be to start building the workflows and integrating the local AI model instead of accidentally bankrupting myself when forgetting to set a limit on API usage.

I'm open to suggestions if anyone has some AI workflows they have found useful.

- The slow SOC Engineer at w8security