In this post I present some of the challenges you might face with IOMMU and provide tools to identify and perhaps solve the issues. Your best friend is the pciutils package and the lspci command (see here for examples).
IOMMU – or input–output memory management unit – is a memory management unit (MMU) that connects a direct-memory-access–capable (DMA-capable) I/O bus to the main memory. The IOMMU maps a device-visible virtual address ( I/O virtual address or IOVA) to a physical memory address. In other words, it translates the IOVA into a real physical address.
In an ideal world, every device has its own IOVA address space and no two devices share the same IOVA. But in practice this is often not the case. Moreover, the PCI-Express (PCIe) specifications allow PCIe devices to communicate with each other directly, called peer-to-peer transactions, thereby escaping the IOMMU.
That is where PCI Access Control Services (ACS) are called to the rescue. ACS is able to tell whether or not these peer-to-peer transactions are possible between any two or more devices, and can disable them. ACS features are implemented within the CPU and the chipset.
Unfortunately the implementation of ACS varies greatly between different CPU or chip-set models. Some CPUs have good ACS, in other CPUs it’s outright unusable – see for example Xeon E3-1200, page 61. Note that Intel Xeon processors are usually an excellent choice for PCI/VGA passthrough, except perhaps this specific model.
Spaceinvador One has produced a comprehensive, easy to follow video on IOMMU, showing configuration examples using unRAID (a commercial solution).
If you already own a PC that you want to use for VGA passthrough, and IOMMU is supported and enabled (see my tutorial), you can check the ACS capabilities as follows: sudo lspci -vv > lspci-vv.txt
Then open the file and search for “Access Control Services”: gksudo xed lspci-vv.txt or xed admin://lspci-vv.txt for Linux Mint 19 / Ubuntu 18.04 and above.
Here an example from my system: 00:02.0 PCI bridge: Intel Corporation Xeon E5/Core i7 IIO PCI Express Root Port 2a (rev 07) (prog-if 00 [Normal decode]) . . Capabilities: [110 v1] Access Control Services ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans- ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
ACSCap specifies the ACS capabilities – every option ending with a + such as TransBlk+ is supported, the – indicates that this capability is not supported, for example EgressCtrl-.
ASCCtl shows the ACS capabilities that are enabled. We can manually enable a capability, but that shouldn’t be necessary.
The ACS capabilities listed above are for one device only, in this case the PCI bridge at 00:02.0. You will hopefully find multiple “Access Control Services” entries in your lspci-vv.txt file. Mine has 6 entries, including:
3 PCI Express bridge root ports from the Intel 3930K CPU (see above);
1 PCI Express virtual root port from the X79 chipset;
2 PCI bridge ports from a PLX Technology, Inc. 8603 chip that resides on a SATA controller/USB3 board I added.
My X79 board and the Intel i7 3930K CPU provide good ACS capabilities.
The ultimate test, however, is the PCI device separation into independent IOMMU groups. To get a sorted list of IOMMU groups and their devices, enter: for a in /sys/kernel/iommu_groups/*; do find $a -type l; done | sort --version-sort
You can get more information on the devices inside the IOMMU groups using this command line script: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU Group %s ' "$n"; lspci -nns "${d##*/}"; done;
Here an extract of what I get: IOMMU Group 0 00:00.0 Host bridge [0600]: Intel Corporation Xeon E5/Core i7 DMI2 [8086:3c00] (rev 07) IOMMU Group 10 00:1c.0 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 1 [8086:1d10] (rev b5) IOMMU Group 11 00:1c.1 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 2 [8086:1d12] (rev b5) IOMMU Group 12 00:1c.2 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 3 [8086:1d14] (rev b5) IOMMU Group 13 00:1c.3 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 4 [8086:1d16] (rev b5) IOMMU Group 14 00:1c.4 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 5 [8086:1d18] (rev b5) IOMMU Group 15 00:1c.7 PCI bridge [0604]: Intel Corporation C600/X79 series chipset PCI Express Root Port 8 [8086:1d1e] (rev b5) IOMMU Group 16 00:1d.0 USB controller [0c03]: Intel Corporation C600/X79 series chipset USB2 Enhanced Host Controller #1 [8086:1d26] (rev 05) IOMMU Group 17 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev a5)
For a tree view of the PCIe bus and devices, use: lspci -t
Passing through PCI or VGA devices requires you to pass through all devices within an IOMMU group. The exception to this rule are PCI root devices that reside in the same IOMMU group with the device(s) we want to pass through. These root devices cannot be passed through as they often perform important tasks for the host. A number of (Intel) CPUs, usually consumer-grade CPUs with integrated graphics (IGD), share a root device in the same IOMMU group as the first PCIe 16x slot.
Let’s have a look at some of the IOMMU groups from the list above:
/sys/kernel/iommu_groups/19/devices/0000:01:00.0 – this is the Nvidia Quadro 2000 GPU in the first PCIe 16x port /sys/kernel/iommu_groups/19/devices/0000:01:00.1 – this is the audio part of the Nvidia Quadro 2000 GPU
/sys/kernel/iommu_groups/20/devices/0000:02:00.0 – this is the Nvidia GTX 970 GPU in the second PCIe 16x port /sys/kernel/iommu_groups/20/devices/0000:02:00.1 – this is the audio part of the Nvidia GTX 970 GPU
I’m passing through the GTX 970. Since this card and its audio function are the only devices in the IOMMU group, passthrough is a piece of cake.
What if there are other devices in my IOMMU group?
If you want to pass through a graphics card or PCIe device and there are one or more other devices in that same IOMMU group, passthrough can become challenging. Below I’m referring to the graphics device, but the same goes for any PCI device.
The graphics card and one or more PCI root ports share the same IOMMU group. Pass through the graphics card and the audio part and leave the root port to the host – that should work. Here an example of root ports on an Intel Skylake system: /sys/kernel/iommu_groups/7/devices/0000:00:1c.0 /sys/kernel/iommu_groups/7/devices/0000:00:1c.4 /sys/kernel/iommu_groups/7/devices/0000:04:00.0 /sys/kernel/iommu_groups/7/devices/0000:04:00.1
To see what kind of device is associated with PCI slot 00:1c.0, use the following command: lspci -s 00:1c.0 Both 00:1c.0 and 00:1c.4 are root ports and cannot and need not to be passed to the guest! You can retrieve more information using the lspci -nnk command: 00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #3 [8086:a112] (rev f1) Kernel driver in use: pcieport Kernel modules: shpchp 00:1c.4 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #5 [8086:a114] (rev f1) Kernel driver in use: pcieport Kernel modules: shpchp Besides these two root ports, 04:00.0 and 04:00.1 designate a graphics card residing in PCIe x16 slot 1 on the board. Passing through this graphics card should not pose a problem. But what if it does, or what if those other devices aren’t root ports? See 2 below.
Some vendors had bugs in their motherboard BIOS that have been fixed in newer releases, for example the 3.30 to 4.40 BIOS update on Ryzen/AMD X390 boards. Conversely, some vendors have introduced bugs in newer BIOS releases that may completely break IOMMU support. Search the Internet for your specific motherboard/BIOS version and IOMMU support. Perhaps other users have posted their success/failure? Be careful before updating a BIOS: sometimes updates are irreversible (it has happened with Asus)! If a new BIOS release promises a solution to the IOMMU issue, then get the BIOS for your motherboard and update. Just be careful, this procedure can potentially brick your motherboard.
You have other devices in your IOMMU group and cannot pass through the graphics card. Upgrade your kernel. As of this writing, Linux Mint 19 (Ubuntu 18.04) ships with default kernel 4.15, but you can easily install a newer kernel. Newer kernels bring support for more chip-sets, especially when you use the latest hardware. Some chip-sets don’t offer ACS, but they may still honor device separation, which allows kernel developers to add quirks to support ACS functionality. Important: The above mentioned kernel version becomes quickly outdated, and I can’t and won’t keep updating this tutorial with every incremental kernel release. Use the Update Manager to see if there is a newer kernel available. Check your kernel version with: uname -r 4.15.0-121-generic If you find you use an older kernel, here is the way to upgrade in Linux Mint:
Open the Update Manager and select View -> Linux kernels: View kernel options in Update Manager
Select the most recent kernel and install: Install latest kernel via Update Manager
Then list the IOMMU groups and see if it made a difference. If yes, try again to pass through the GPU.
Even with the latest kernel, there are still PCIe devices besides your graphics card. Move the graphics card to a different PCIe (16x) slot: Turn off your PC, unplug the power cable, and open the case. See if you can move the GPU to a different slot. Most modern motherboards have at least 2, if not 3 PCIe 16x and/or 8x long slots for graphics cards. Here the explanation: Each PCIe slot on your motherboard corresponds to a different PCI ID (BDF = Bus:Device.Function annotation). In the Skylake example above, the GPU is located at 04:00.0 and the sound part at 04:00.1. By moving your graphics card to a different slot it should be getting a different PCI ID, thus showing up within a different IOMMU group. The same is true for every PCIe card. Sometimes the easiest and best solution is to move around the cards on the motherboard. Once you “reshuffled” the cards, close the case and reconnect to power. Boot and list the PCI devices and IOMMU groups. Did it work?
No matter what you tried, your system doesn’t have good device isolation. As a last resort, apply the ACS override patch (see instructions here), or – much more convenient – use the latest kernel builds with the ACS override patch provided by Max Ehrlich. The builds are supplied as .deb files based on Ubuntu and can be installed via the packet manager. Arch Linux users (incl. Manjaro users) can install the linux-vfio AUR package. After you install the patched kernel, you must activate the ACS override by inserting “pcie_acs_override=downstream” after the …iommu=on option in /etc/default/grub, then run update-grub. A word of caution: The ACS override patch introduces a security hole that may perhaps be exploited. Also be aware of the security risks involved in using software sources outside the official repositories such as the kernel builds mentioned above. Before patching the kernel, or installing the patched kernel via .deb file, do the following: I. Make a complete backup, including operating system and user data. II. Make sure you have at least 1 working kernel image other than the kernel you are patching. The simplest way to install another kernel is via Update Manager -> View -> Linux kernels and install a recent kernel (see screen shots under 2. above).
Buying computer hardware
If your buying a new computer or planning to build one, take a careful look at the specifications. Here is a non-conclusive checklist:
IOMMU support in the CPU: Intel VT-d or AMD SVM.
IOMMU support in the motherboard / BIOS: check the specifications and manual on how to enable IOMMU! See also https://passthroughpo.st/vfio-increments/ for a hardware parts list used with VGA passthrough.
Discrete graphics card with UEFI support.
CPU ACS support – how well is it implemented? See link above and search the Internet.
When you made a hardware shortlist, check the Internet / forums for success stories.
If this article has been helpful, click the “Like” button below. Don’t forget to share this page with your friends.
Related
5 thoughts on “IOMMU Groups – What You Need to Consider”
Hello!
First of all, this post is one of the best I’ve read about this IOMMU, ACS topic, very clear, understandable and useful.
But I still have problem with splitting my IOMMU groups, let’s start from the beginning. I have an Asrock J3455-ITX motherboard with the latest Proxmox version. On the motherboard there are an onboard NIC, an external NIC (plugged into the PCIe slot) and an external Wifi card (plugged into the M.2 slot). On the host proxmox I have a pfSense running in a VM and I would like to pass through my external NIC and the Wifi card but they are in the same IOMMU group than the onbord NIC.
See my dmesg output:
dmesg |grep -i iommu [ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.15.18-10-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt pcie_acs_override=downstream [ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA [ 0.000000] Kernel command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.15.18-10-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt pcie_acs_override=downstream [ 0.000000] DMAR: IOMMU enabled [ 0.004000] DMAR-IR: IOAPIC id 1 under DRHD base 0xfed65000 IOMMU 1 [ 1.340272] iommu: Adding device 0000:00:00.0 to group 0 [ 1.340287] iommu: Adding device 0000:00:02.0 to group 1 [ 1.340306] iommu: Adding device 0000:00:0e.0 to group 2 [ 1.340325] iommu: Adding device 0000:00:0f.0 to group 3 [ 1.340339] iommu: Adding device 0000:00:12.0 to group 4 [ 1.340384] iommu: Adding device 0000:00:13.0 to group 5 [ 1.340408] iommu: Adding device 0000:00:13.1 to group 5 [ 1.340427] iommu: Adding device 0000:00:13.2 to group 5 [ 1.340447] iommu: Adding device 0000:00:13.3 to group 5 [ 1.340465] iommu: Adding device 0000:00:15.0 to group 6 [ 1.340487] iommu: Adding device 0000:00:1f.0 to group 7 [ 1.340500] iommu: Adding device 0000:00:1f.1 to group 7 [ 1.340515] iommu: Adding device 0000:01:00.0 to group 5 [ 1.340527] iommu: Adding device 0000:02:00.0 to group 5 [ 1.340538] iommu: Adding device 0000:03:00.0 to group 5 [ 1.340549] iommu: Adding device 0000:04:00.0 to group 5
What I already did and tried: – Updated the BIOS and turned on intel virtualization (VT-x, VT-d) – updated the grub with intel_iommu=on iommu=pt pcie_acs_override=downstream (tried downstream,multifunction as well) – loaded vfio, vfio_iommu_type1, vfio_pci, vfio_virqfd kernel modules at boot time (/etc/modules) – write: options vfio_iommu_type1 allow_unsafe_interrupts=1 to /etc/modprobe.d/iommu_unsafe_interrupts.conf – write: options vfio-pci ids=[8086:5ad9],[8086:5ada] to /etc/modprobe.d/vfio.conf
The ACS patch should only be used if absolutely necessary. Since you have all your network cards in the same IOMMU group, this might be an option. The “pcie_acs_override=downstream” option in your grub file only works if you have applied the ACS patch to your kernel. Instead of the ACS patch, or in addition to it, you should try to plug the external NIC into another PCIe slot. This might put it into another IOMMU group.
If nothing helps, it may well be your motherboard and/or CPU that imposes these limits. As a last try, post your problem to the following Reddit group: https://www.reddit.com/r/VFIO/
Some of the best brains on kvm are in that group, so perhaps they can provide an answer.
I am very grateful. This is perhaps one of the most clear blog I have read on systems :). I have a similar problem, I am trying to get Xilinx OpenNIC design which has 2 NICs on the same fpga board work with different drivers each. One with vfio (to use with DPDK) and the other with the vendor’s linux driver to support normal kernel network I/O. However the driver which I load the last always gives me error.
The first step is to check if the NICs on your Xilink OpenNIC fall into two different IOMMU groups. If that isn’t the case, there is no point in trying to pass through one NIC. If each NIC is in its own IOMMU group, you should be able to pass through one of them. The passthrough method depends on what you prefer or what works in your system. I wrote a separate blog on that. It’s probably best to use a method that binds one NIC right at system boot time to the vfio-pci driver, before the Linux driver takes over. After reboot, check with lspci -v to see which driver is used. You can use lspci -n to get the vendor id:device id pairs for your NIC. If they are the same for both NICs, use the driver override feature with PCI bus notation (find out via lspci -v), which should be different for each NIC. Hope it helps. If not, the VFIO Discord group is probably the best forum to seek help.
Loading...
Hi !
Thank you for your article !
I have a question regarding IOMMU with E3-1245 v2.
On a proxmox installation, VT-x and VT-d enabled in BIOS (can’t confirm myself as it is a dedicated server, but support told me they are both enabled).
I have nothing at all in /sys/kernel/iommu_groups/
No matter what I tried… Using legacy bios with grub… followed PCI pathtrough tutorial from Proxmox, but can’t make it work.
Whereas my personal server at home works perfectly fine with passthrough…
Hello!
First of all, this post is one of the best I’ve read about this IOMMU, ACS topic, very clear, understandable and useful.
But I still have problem with splitting my IOMMU groups, let’s start from the beginning. I have an Asrock J3455-ITX motherboard with the latest Proxmox version. On the motherboard there are an onboard NIC, an external NIC (plugged into the PCIe slot) and an external Wifi card (plugged into the M.2 slot). On the host proxmox I have a pfSense running in a VM and I would like to pass through my external NIC and the Wifi card but they are in the same IOMMU group than the onbord NIC.
See my dmesg output:
dmesg |grep -i iommu
[ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.15.18-10-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt pcie_acs_override=downstream
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 0.000000] Kernel command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.15.18-10-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt pcie_acs_override=downstream
[ 0.000000] DMAR: IOMMU enabled
[ 0.004000] DMAR-IR: IOAPIC id 1 under DRHD base 0xfed65000 IOMMU 1
[ 1.340272] iommu: Adding device 0000:00:00.0 to group 0
[ 1.340287] iommu: Adding device 0000:00:02.0 to group 1
[ 1.340306] iommu: Adding device 0000:00:0e.0 to group 2
[ 1.340325] iommu: Adding device 0000:00:0f.0 to group 3
[ 1.340339] iommu: Adding device 0000:00:12.0 to group 4
[ 1.340384] iommu: Adding device 0000:00:13.0 to group 5
[ 1.340408] iommu: Adding device 0000:00:13.1 to group 5
[ 1.340427] iommu: Adding device 0000:00:13.2 to group 5
[ 1.340447] iommu: Adding device 0000:00:13.3 to group 5
[ 1.340465] iommu: Adding device 0000:00:15.0 to group 6
[ 1.340487] iommu: Adding device 0000:00:1f.0 to group 7
[ 1.340500] iommu: Adding device 0000:00:1f.1 to group 7
[ 1.340515] iommu: Adding device 0000:01:00.0 to group 5
[ 1.340527] iommu: Adding device 0000:02:00.0 to group 5
[ 1.340538] iommu: Adding device 0000:03:00.0 to group 5
[ 1.340549] iommu: Adding device 0000:04:00.0 to group 5
What I already did and tried:
– Updated the BIOS and turned on intel virtualization (VT-x, VT-d)
– updated the grub with intel_iommu=on iommu=pt pcie_acs_override=downstream (tried downstream,multifunction as well)
– loaded vfio, vfio_iommu_type1, vfio_pci, vfio_virqfd kernel modules at boot time (/etc/modules)
– write: options vfio_iommu_type1 allow_unsafe_interrupts=1 to /etc/modprobe.d/iommu_unsafe_interrupts.conf
– write: options vfio-pci ids=[8086:5ad9],[8086:5ada] to /etc/modprobe.d/vfio.conf
Do you have any idea what could be the problem?
Thank you
The ACS patch should only be used if absolutely necessary. Since you have all your network cards in the same IOMMU group, this might be an option.
The “pcie_acs_override=downstream” option in your grub file only works if you have applied the ACS patch to your kernel.
Instead of the ACS patch, or in addition to it, you should try to plug the external NIC into another PCIe slot. This might put it into another IOMMU group.
If nothing helps, it may well be your motherboard and/or CPU that imposes these limits. As a last try, post your problem to the following Reddit group: https://www.reddit.com/r/VFIO/
Some of the best brains on kvm are in that group, so perhaps they can provide an answer.
I am very grateful.
This is perhaps one of the most clear blog I have read on systems :). I have a similar problem, I am trying to get Xilinx OpenNIC design which has 2 NICs on the same fpga board work with different drivers each. One with vfio (to use with DPDK) and the other with the vendor’s linux driver to support normal kernel network I/O. However the driver which I load the last always gives me error.
The first step is to check if the NICs on your Xilink OpenNIC fall into two different IOMMU groups. If that isn’t the case, there is no point in trying to pass through one NIC. If each NIC is in its own IOMMU group, you should be able to pass through one of them. The passthrough method depends on what you prefer or what works in your system. I wrote a separate blog on that. It’s probably best to use a method that binds one NIC right at system boot time to the vfio-pci driver, before the Linux driver takes over. After reboot, check with lspci -v to see which driver is used. You can use lspci -n to get the vendor id:device id pairs for your NIC. If they are the same for both NICs, use the driver override feature with PCI bus notation (find out via lspci -v), which should be different for each NIC. Hope it helps. If not, the VFIO Discord group is probably the best forum to seek help.
Hi !
Thank you for your article !
I have a question regarding IOMMU with E3-1245 v2.
On a proxmox installation, VT-x and VT-d enabled in BIOS (can’t confirm myself as it is a dedicated server, but support told me they are both enabled).
I have nothing at all in /sys/kernel/iommu_groups/
No matter what I tried… Using legacy bios with grub… followed PCI pathtrough tutorial from Proxmox, but can’t make it work.
Whereas my personal server at home works perfectly fine with passthrough…
# dmesg |egrep -i “dmar|iommu|remapping”
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.15.104-1-pve root=UUID=8a1c83b5-ab35-41d3-85af-aefe9389383f ro quiet intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream,multifunction initcall_blacklist=sysfb_init
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 0.067198] Kernel command line: BOOT_IMAGE=/vmlinuz-5.15.104-1-pve root=UUID=8a1c83b5-ab35-41d3-85af-aefe9389383f ro quiet intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 pcie_acs_override=downstream,multifunction initcall_blacklist=sysfb_init
[ 0.067262] DMAR: IOMMU enabled
[ 0.181205] x2apic: IRQ remapping doesn’t support X2APIC mode
[ 0.265862] iommu: Default domain type: Passthrough (set via kernel command line)
# ls -l /sys/kernel/iommu_groups/
total 0