hosted services
Virtual Box, originally from Sun was wonderful, it was great, and free. A long time ago VMWare was the choice for virtualising desktops, when Virtual Box was introduced it was a sensible option. Virtual Box (vbox) requires a kernel module to be built each time there's a kernel upgrade on the host. In some cases running the latest kernel presents a problem if the vbox module cannot be built.
QEMU uses a stock linux kernel module, kvm
which should mean there will be
fewer incompatibilities.
There are some nice features of vbox, such as simple network isolation, USB pass through, basic GUI. It isn't without issues, since all the tooling is proprietary the command line suffers.
getting started
First, create a disk, let's make a debian system
and name it debian
:
qemu-img create -f qcow2 debian.qcow2 20G
We'll need some way to start the VM, let's make a script to include all the options we'll need:
We need elevation to start the VM, I'm using please:
please qemu-system-x86_64 \
-smp 4 `# we're going to give it four cores` \
-m 4g `# allocating 4G/RAM to the VM` \
-machine q35,vmport=on,dump-guest-core=off `# q35 resembles a x86-64` \
-enable-kvm `# use hardware virtualisation` \
-cpu host,migratable=on `# host CPU will match the host VM` \
-drive file=debian.qcow2 `# use the newly-allocated file as disk` \
-vga virtio `# virtio has good performance graphics` \
-display gtk,grab-on-hover=on `# grab-on-hover captures key and mouse events` \
-netdev user,id=net0,hostfwd=tcp::5555-:22 `# assign tcp/5555 to tcp/22 on net0` \
-device e1000,netdev=net0 `# create net0 as e1000` \
-cdrom debian-live-11.5.0-amd64-xfce.iso `# use a debian live ISO to boot from`
The above should get a VM running. What I find really handy is that in this form things are very easy to change and even keep in a version control system like [git]. Comments are in backtick form, that's thanks to stackoverflow.
You'll need to download the ISO from somewhere like debian cdimages.
Moving from a GUI to a script that starts a VM might seem an odd thing to do, but there are some major benefits if doing this on a regular basis.
According to some Red Hat presentation, adding aio=io_uring
to a
-drive file=...
should improve performance. I've no found it to
negatively impact VMs so far.
usb
QEMU permits passing USB devices to a VM by vendor/product ID. The
simplest way is to run lsusb
, assuming we want to pass a Logitech
mouse:
Bus 007 Device 003: ID 046d:c542 Logitech, Inc. Wireless Receiver
We'd add the following to our QEMU script:
-device qemu-xhci,id=xhci
-device usb-host,bus=xhci.0,vendorid=0x046d,productid=0xc542
On the surface of it, it should be that simple. Under the surface, my understanding is that QEMU has to emulate and relay data. Virtual Box performs some magic to probably circumvent this, as passing a USB device to a guest works very well.
If you need to pass something that needs realtime behaviour, it may be a struggle. The solution that I had to go with required getting a PCI USB controller and passing the whole card to the guest. I've not had to add hardware to solve a software problem for a very long time.
This motherboard has two PCIe slots, one wide slot and one narrow. The narrow is intended for things just like USB cards, however, the IOMMU group that it is assigned wasn't free so I had to relocate it in the wide.
Once the slot is sorted out, you'll need to take some notes from lspci -vnn
:
...
10:00.0 USB controller [0c03]: USB 3.0 Host Controller [1b73:1100] (rev 10) (prog-if 30 [XHCI])
...
The slot (10:00.0
) needs to be unbound by passing the address through
the kernel driver (xhci_hcd
):
echo 0000:10:00.0 | please tee /sys/bus/pci/drivers/xhci_hcd/unbind
Then load vfio-pci:
please modprobe vfio-pci
Tell it to bind the PCIe card:
echo 1b73 1100 | please tee /sys/bus/pci/drivers/vfio-pci/new_id >/dev/null
The three statements above (modified for your needs) should go at the top of the start script.
Finally, we can pass this device into the guest by adding this device line:
-device vfio-pci,host=10:00.0
From the guest point of view, there's now a whole four-port USB controller attached.
network
The final thing I bumped into from this migration was an analogue of internal networking in Virtual Box. To get something similar I needed to create a bridge interface and tap off that. My layout was a "rogue" VM and a squid proxy that it could communicate through.
Both needed to be attached to the bridge via a tap.
My squid VM has some extra setup at the start to create a bridge network,
which I don't want to filter with iptables
.
Any VM that needs to attach to this network will use the following network device which will connect it to the bridge, which will conditionally route traffic.
Me squid VM script has the following to check for br0
and if it isn't
present create it:
ip a s br0 >/dev/null || please ip link add br0 type bridge
please ip link set br0 up promisc on
I don't want to filter traffic on the bridge interface itself, so I have the following to wait for the bridge to appear in proc and then disable filtering:
while /bin/true; do
if test ! -e "/proc/sys/net/bridge/bridge-nf-call-iptables";
then
sleep 0.5
continue
fi
break
done
please sysctl net.bridge.bridge-nf-call-iptables=0
Here's the qemu script to start the squid VM, note the tap interface:
please qemu-system-x86_64 \
-m 256 \
-machine pc-q35-5.0,vmport=on,dump-guest-core=off \
-cdrom debian-11.5.0-amd64-netinst.iso \
-enable-kvm \
-cpu host,migratable=on \
-drive file=squid.qcow2,if=virtio \
-vnc :3 \
-netdev user,id=net0,hostfwd=tcp::3128-:3128,hostfwd=tcp::3142-:3142,hostfwd=tcp::3122-:22,hostfwd=tcp::3180-:80,hostfwd=tcp::3143-:443, \
-device e1000,netdev=net0 \
-device e1000,netdev=net1,mac=DE:AD:BE:EF:CA:F0 \
-netdev tap,id=net1,script=qemu-ifup
Redirection of ports 3128 and 3142 are for squid and apt-cacher. net0
will
attach to the host's interface whilst net1
will be on br0
.
Contents of the qemu-ifup:
#!/bin/sh
set -x
switch=br0
if [ -n "$1" ];then
# tunctl -u `whoami` -t $1 (use ip tuntap instead!)
ip tuntap add $1 mode tap user `whoami`
ip link set $1 up
sleep 0.5s
# brctl addif $switch $1 (use ip link instead!)
ip link set $1 master $switch
exit 0
else
echo "Error: no interface specified"
exit 1
fi
The VMs that need to be isolated to br0
use the qemu-ifup
like in the
-netdev tap
/-device
line pair above.
please qemu-system-x86_64 \
-m 2G \
-machine pc,vmport=on,dump-guest-core=off \
-enable-kvm \
-cpu host,migratable=on \
-drive file=win.qcow2,if=virtio \
-vga virtio \
-display gtk,grab-on-hover=on \
-device e1000,netdev=net0,mac=DE:AD:BE:EF:CA:F8 \
-netdev tap,id=net0,script=qemu-ifup \
-device qemu-xhci,id=xhci \
-device usb-host,bus=xhci.0,vendorid=0x0da4,productid=0x0008 \
-device usb-host,bus=xhci.0,vendorid=0x0da4,productid=0x0009
suspend
Sometimes after suspend there's oddities with the VM. Sometimes USB doesn't wake up correctly so it needs to be reset.
This can be done by rebinding within the guest, however, I'm not sure how to trigger this within the guest, perhaps looking at the monotonic clock vs wall clock would do it.
#!/bin/sh
id | grep 'uid=0(root)' >/dev/null
if test $? -ne 0; then
echo "This must be run as root!"
exit 1
fi
for XHCI in /sys/bus/pci/drivers/*hci_hcd ; do
cd "${XHCI}"
if test $? -ne 0; then
echo "Could not cd to ${XHCI}"
continue
fi
echo "Rebinding ${XHCI}..."
for i in *:*:*.*; do
echo -n "$i" > unbind
echo -n "$i" > bind
done
done
Another issue I've found is peculiarities around network after suspend.
I suspect this is somehow related to physical network dropping whilst
virtual networks (br0 and tap0) remaining up. Here's a down/up script named 99bounce_interfaces.sh
that I keep in /usr/lib/pm-utils/sleep.d
:
#!/bin/sh
MAIN_IF=`ip route get 1.1.1.1 | grep dev | sed -e 's/.* dev \([^ ]\+\) .*/\1/g'`
for IF in `ip a | grep -E '^[0-9]' | grep -v "${MAIN_IF}" | sed -e 's/^[0-9]\+: \([^ @]\+\)\(@.*\+\)\?: .*/\1/g'`; do
echo "$IF"
ip link set dev "${IF}" down
done
sleep 1
for IF in `ip a | grep -E '^[0-9]' | grep -v "${MAIN_IF}" | sed -e 's/^[0-9]\+: \([^ @]\+\)\(@.*\+\)\?: .*/\1/g'`; do
echo "$IF"
ip link set dev "${IF}" up
done
connmand
may also need to be configured with the following in
/etc/connman/main.conf
, it should prevent the tap
device from
setting itself as a default route when the device becomes up.
NetworkInterfaceBlacklist=tap