why unix | RBL service | netrs | please | ripcalc | linescroll
hosted services

hosted services

Virtual Box, originally from Sun was wonderful, it was great, and free. A long time ago VMWare was the choice for virtualising desktops, when Virtual Box was introduced it was a sensible option. Virtual Box (vbox) requires a kernel module to be built each time there's a kernel upgrade on the host. In some cases running the latest kernel presents a problem if the vbox module cannot be built.

QEMU uses a stock linux kernel module, kvm which should mean there will be fewer incompatibilities.

There are some nice features of vbox, such as simple network isolation, USB pass through, basic GUI. It isn't without issues, since all the tooling is proprietary the command line suffers.

getting started

First, create a disk, let's make a debian system and name it debian:

qemu-img create -f qcow2 debian.qcow2 20G

We'll need some way to start the VM, let's make a script to include all the options we'll need:

We need elevation to start the VM, I'm using please:

please qemu-system-x86_64 \
    -smp 4                                       `# we're going to give it four cores` \
    -m 4g                                        `# allocating 4G/RAM to the VM` \
    -machine q35,vmport=on,dump-guest-core=off   `# q35 resembles a x86-64` \
    -enable-kvm                                  `# use hardware virtualisation` \
    -cpu host,migratable=on                      `# host CPU will match the host VM` \
    -drive file=debian.qcow2                     `# use the newly-allocated file as disk` \
    -vga virtio                                  `# virtio has good performance graphics` \
    -display gtk,grab-on-hover=on                `# grab-on-hover captures key and mouse events` \
    -netdev user,id=net0,hostfwd=tcp::5555-:22   `# assign tcp/5555 to tcp/22 on net0` \
    -device e1000,netdev=net0                    `# create net0 as e1000` \
    -cdrom debian-live-11.5.0-amd64-xfce.iso     `# use a debian live ISO to boot from`

The above should get a VM running. What I find really handy is that in this form things are very easy to change and even keep in a version control system like [git]. Comments are in backtick form, that's thanks to stackoverflow.

You'll need to download the ISO from somewhere like debian cdimages.

Moving from a GUI to a script that starts a VM might seem an odd thing to do, but there are some major benefits if doing this on a regular basis.

According to some Red Hat presentation, adding aio=io_uring to a -drive file=... should improve performance. I've no found it to negatively impact VMs so far.

usb

QEMU permits passing USB devices to a VM by vendor/product ID. The simplest way is to run lsusb, assuming we want to pass a Logitech mouse:

Bus 007 Device 003: ID 046d:c542 Logitech, Inc. Wireless Receiver

We'd add the following to our QEMU script:

-device qemu-xhci,id=xhci
-device usb-host,bus=xhci.0,vendorid=0x046d,productid=0xc542

On the surface of it, it should be that simple. Under the surface, my understanding is that QEMU has to emulate and relay data. Virtual Box performs some magic to probably circumvent this, as passing a USB device to a guest works very well.

If you need to pass something that needs realtime behaviour, it may be a struggle. The solution that I had to go with required getting a PCI USB controller and passing the whole card to the guest. I've not had to add hardware to solve a software problem for a very long time.

This motherboard has two PCIe slots, one wide slot and one narrow. The narrow is intended for things just like USB cards, however, the IOMMU group that it is assigned wasn't free so I had to relocate it in the wide.

Once the slot is sorted out, you'll need to take some notes from lspci -vnn:

...
10:00.0 USB controller [0c03]: USB 3.0 Host Controller [1b73:1100] (rev 10) (prog-if 30 [XHCI])
...

The slot (10:00.0) needs to be unbound by passing the address through the kernel driver (xhci_hcd):

echo 0000:10:00.0 | please tee /sys/bus/pci/drivers/xhci_hcd/unbind

Then load vfio-pci:

please modprobe vfio-pci

Tell it to bind the PCIe card:

echo 1b73 1100 | please tee /sys/bus/pci/drivers/vfio-pci/new_id >/dev/null

The three statements above (modified for your needs) should go at the top of the start script.

Finally, we can pass this device into the guest by adding this device line:

-device vfio-pci,host=10:00.0

From the guest point of view, there's now a whole four-port USB controller attached.

network

The final thing I bumped into from this migration was an analogue of internal networking in Virtual Box. To get something similar I needed to create a bridge interface and tap off that. My layout was a "rogue" VM and a squid proxy that it could communicate through.

Both needed to be attached to the bridge via a tap.

My squid VM has some extra setup at the start to create a bridge network, which I don't want to filter with iptables.

Any VM that needs to attach to this network will use the following network device which will connect it to the bridge, which will conditionally route traffic.

Me squid VM script has the following to check for br0 and if it isn't present create it:

ip a s br0 >/dev/null || please ip link add br0 type bridge
please ip link set br0 up promisc on

I don't want to filter traffic on the bridge interface itself, so I have the following to wait for the bridge to appear in proc and then disable filtering:

while /bin/true; do
  if test ! -e "/proc/sys/net/bridge/bridge-nf-call-iptables";
  then
    sleep 0.5
    continue
  fi
  break
done
please sysctl net.bridge.bridge-nf-call-iptables=0

Here's the qemu script to start the squid VM, note the tap interface:

please qemu-system-x86_64 \
  -m 256 \
  -machine pc-q35-5.0,vmport=on,dump-guest-core=off \
  -cdrom debian-11.5.0-amd64-netinst.iso \
  -enable-kvm \
  -cpu host,migratable=on \
  -drive file=squid.qcow2,if=virtio \
  -vnc :3 \
  -netdev user,id=net0,hostfwd=tcp::3128-:3128,hostfwd=tcp::3142-:3142,hostfwd=tcp::3122-:22,hostfwd=tcp::3180-:80,hostfwd=tcp::3143-:443, \
  -device e1000,netdev=net0 \
  -device e1000,netdev=net1,mac=DE:AD:BE:EF:CA:F0 \
  -netdev tap,id=net1,script=qemu-ifup

Redirection of ports 3128 and 3142 are for squid and apt-cacher. net0 will attach to the host's interface whilst net1 will be on br0.

Contents of the qemu-ifup:

#!/bin/sh
set -x
switch=br0
if [ -n "$1" ];then
  # tunctl -u `whoami` -t $1 (use ip tuntap instead!)
  ip tuntap add $1 mode tap user `whoami`
  ip link set $1 up
  sleep 0.5s
  # brctl addif $switch $1 (use ip link instead!)
  ip link set $1 master $switch
  exit 0
else
  echo "Error: no interface specified"
  exit 1
fi

The VMs that need to be isolated to br0 use the qemu-ifup like in the -netdev tap/-device line pair above.

please qemu-system-x86_64 \
  -m 2G \
  -machine pc,vmport=on,dump-guest-core=off \
  -enable-kvm \
  -cpu host,migratable=on \
  -drive file=win.qcow2,if=virtio \
  -vga virtio \
  -display gtk,grab-on-hover=on \
  -device e1000,netdev=net0,mac=DE:AD:BE:EF:CA:F8 \
  -netdev tap,id=net0,script=qemu-ifup \
  -device qemu-xhci,id=xhci \
  -device usb-host,bus=xhci.0,vendorid=0x0da4,productid=0x0008 \
  -device usb-host,bus=xhci.0,vendorid=0x0da4,productid=0x0009

suspend

Sometimes after suspend there's oddities with the VM. Sometimes USB doesn't wake up correctly so it needs to be reset.

This can be done by rebinding within the guest, however, I'm not sure how to trigger this within the guest, perhaps looking at the monotonic clock vs wall clock would do it.

#!/bin/sh

id | grep 'uid=0(root)' >/dev/null
if test $? -ne 0; then
    echo "This must be run as root!"
    exit 1
fi

for XHCI in /sys/bus/pci/drivers/*hci_hcd ; do
    cd "${XHCI}"
    if test $? -ne 0; then
        echo "Could not cd to ${XHCI}"
        continue
    fi

    echo "Rebinding ${XHCI}..."

    for i in *:*:*.*; do
        echo -n "$i" > unbind
        echo -n "$i" > bind
    done
done

Another issue I've found is peculiarities around network after suspend. I suspect this is somehow related to physical network dropping whilst virtual networks (br0 and tap0) remaining up. Here's a down/up script named 99bounce_interfaces.sh that I keep in /usr/lib/pm-utils/sleep.d:

#!/bin/sh

MAIN_IF=`ip route get 1.1.1.1 | grep dev | sed -e 's/.* dev \([^ ]\+\) .*/\1/g'`

for IF in `ip a | grep -E '^[0-9]' | grep -v "${MAIN_IF}" | sed -e 's/^[0-9]\+: \([^ @]\+\)\(@.*\+\)\?: .*/\1/g'`; do
    echo "$IF"
    ip link set dev "${IF}" down
done

sleep 1

for IF in `ip a | grep -E '^[0-9]' | grep -v "${MAIN_IF}" | sed -e 's/^[0-9]\+: \([^ @]\+\)\(@.*\+\)\?: .*/\1/g'`; do
    echo "$IF"
    ip link set dev "${IF}" up
done

connmand may also need to be configured with the following in /etc/connman/main.conf, it should prevent the tap device from setting itself as a default route when the device becomes up.

NetworkInterfaceBlacklist=tap