Running an AI model locally on a Proxmox Container + GPU passthrough
Going further down the Proxmox / Local AI rabbit hole
One of the main reasons to run a local AI model on an LXC container instead of creating a VM with Debian as an example is efficiency and underlying resource usage.
You've got to remember that if we can manage the Ollama instance via the CLI instead of a fully loaded OS and GUI, we reduce the CPU, RAM, GPU, and disk space usage. Instead, we can keep the underlying host resource usage to a minimum.

Here's how we can do that (for an Nvidia GPU).
NVIDIA Driver Installation & LXC Pass through on Proxmox
1. Prerequisites & Host Preparation
Before installing drivers, the host must be configured to prevent driver conflicts and ensure building capabilities. First we will need to Disable Secure Boot. Why? you might ask, it's because secure Boot prevents unsigned third-party kernel modules (like NVIDIA's) from loading. This is done by going into the BIOS (F2/Del) and setting Secure Boot to disabled.
Install Build Dependencies
Reboot the node and Install the Proxmox kernel headers and build tools required to compile the driver against your specific kernel version.
apt update
apt install -y pve-headers-$(uname -r) build-essential dkms pkg-config xorg-dev libglvnd-dev
Blacklist Nouveau Driver
The open-source Nouveau driver will need to be disabled so the proprietary driver can take control of the hardware. If this is not done, the Linux instance will attempt to load these drivers before using the proprietary Nvidia drivers.
cat <<EOF > /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF
update-initramfs -u -k all
reboot
2. Host Driver Installation
Using the 595.45.04 (or latest) driver branch is recommended for mobile Ampere (30-series) GPUs on modern kernels
Installation Command
Run the installer with the following flags for a clean, non-interactive setup:
chmod +x NVIDIA-Linux-x86_64-595.45.04.run
./NVIDIA-Linux-x86_64-595.45.04.run --dkms --silent --accept-license --no-questions --ui=none
Verification
Ensure the driver is active and prevent the GPU from entering deep sleep states (common on laptops), which can break container links.
# Verify GPU visibility
nvidia-smi
# Enable Persistence Mode
nvidia-persistenced
nvidia-smi -pm 1
# Manually trigger device node creation if /dev/nvidia* is missing
nvidia-modprobe -u -c03. LXC Container Configuration
LXC containers share the host kernel. You will need to map the hardware device nodes from the host into the container.
Edit Container Config
- Open the config file for your container (e.g., ID 100):
nano /etc/pve/lxc/100.conf
- Add the following lines to the bottom:
# Passthrough NVIDIA Device Nodes
dev0: /dev/nvidia0
dev1: /dev/nvidiactl
dev2: /dev/nvidia-uvm
dev3: /dev/nvidia-uvm-tools
dev4: /dev/nvidia-modeset
Note: Ensure the major number (usually 195) matches your host by checking ls -l /dev/nvidia*.Install the Nvidia Libraries in LXC
The container needs the NVIDIA libraries but not the kernel module. First we can push the .run file to the to the container the pct command from the node.
On the Proxmox Node:
pct push 100 NVIDIA-Linux-x86_64-595.45.04.run /root/nvidia.run
On the LXC container you want to use the GPU:
chmod +x /root/nvidia.run
./root/nvidia.run --no-kernel-module --no-x-check --no-nouveau-check --ui=none
Troubleshooting
Here we can start asking the AI models extremely important questions... And we can see on the Proxmox node that the GPU is being use.

nvtop at the bottom.Now that I have the LXC container successfully utilising the GPU and verifying this via the tool nvtop. We can begin to build all sorts of workflows around this local AI model. As you've probably heard before, this is not going to be anywhere near as powerful as the big 3 AI models (Claude Opus, ChatGPT and Gemini) however, this allows for a central AI tool that can be used within my network instead of different Ollama instances across my devices. Instead of just stopping at the AI chat bot level, the next step would be to start building the workflows and integrating the local AI model instead of accidentally bankrupting myself when forgetting to set a limit on API usage.
I'm open to suggestions if anyone has some AI workflows they have found useful.
- The slow SOC Engineer at w8security