Krunvm is extremely easy to use and is packed with some interesting ideas.
One of the biggest advantages of this VMM is that programs have access to the network inside the VMM without the admin having to setup complex virtual bridges and so forth in the host in advance or use something like slirp. This is accomplished via TSI (Transparent Socket Impersonation).
Basically sockets in the guest are bridged to AF_VSOCK via the use of a patched linux kernel (when you build libkrunfw.so) when communicating outside the VM. See https://www.youtube.com/watch?v=EGV03THGrrw for more info on TSI.
My only concern is that TSI is currently not a feature available in Linux. When do the authors plan to upstream this into Linux proper? My understanding is that this was planned in 2021 but it is now 2022...
goombacloud 591 days ago [-]
Maybe instead of patching the kernel the "init" process of the VM could set up a seccomp-notify sandbox to handle the socket syscalls in userspace to back them the tcp/udp sockets by a vsock (I think that read/write or sendmsg/recvmsg would work without userspace handling because they get the vsock fd).
As far as I understand, any protocol that runs over IP (TCP, UDP, ICMP, ...) should be supported by slirp (i.e. slirp4netns) as it sets up a TAP device.
sidkshatriya 590 days ago [-]
P.S. Actually slirp4netns also needs to implement a network stack for the protocol in user space so it also depends on what protocols it understands.
I wonder if slirp4netns understands anything other than TCP/UDP/ICMP.
OTOH I don't think ping (ICMP) is possible with TSI (but maybe I needed to do some other config to make it work).
gabereiser 591 days ago [-]
Mmmmmm, rust and golang, working together. Shout out to buildah team and the krunvm folks. This is very cool. The icing on the cake was the aarm64 support for apple. Kudos to you all. I can now run experimental linux images as VM's and see if they'll blow up.
dvnguyen 591 days ago [-]
Ask HN: what workload is not suitable for microVMs? Can I use them like a regular VM?
0xbadcafebee 591 days ago [-]
Since it's a VM, it's ideal for workloads with a set amount of resource use and that need strong isolation guarantees. Regular containers are better to share a pool of resources whose usage varies widely, and when you don't need strong isolation guarantees. Depending on how I/O is handled, container I/O can be very slow, whereas a dedicated disk snapshot without CoW/overlays would be much faster. Since this also uses TSI for networking, you will need a patched Linux kernel to use networking in the guest at all, and raw sockets don't work at all.
staticassertion 591 days ago [-]
> Depending on how I/O is handled, container I/O can be very slow, whereas a dedicated disk snapshot without CoW/overlays would be much faster.
Do you mean VM I/O can be very slow? I don't think containers should have any overhead, please correct me if I'm wrong though.
0xbadcafebee 591 days ago [-]
Container file I/O is very slow. It unpacks the OCI image layers onto the regular host filesystem, then adds overlay filesystems, does copy-on-write, and references files between each layer. For example, doing 10 containerized nodejs app builds simultaneously will swamp the host with iowait. A common hack to is to put the OCI file tree / overlays on a dedicated disk with much higher iops than the boot disk.
rascul 590 days ago [-]
> It unpacks the OCI image layers onto the regular host filesystem, then adds overlay filesystems, does copy-on-write, and references files between each layer.
That's just Docker though, right? Does LXC or systemd-nspawn do that?
staticassertion 591 days ago [-]
Thank you, I'll have to look into this. I was thinking from a file namespacing perspective there shouldn't be overhead, but it makes sense that adding the overlay filesystems and mounts would impact performance.
# Goals
- Be compatible with a reasonable amount of workloads.
# Non-goals
- Be compatible with all kinds of workloads.
staticassertion 591 days ago [-]
That depends on the microvm. Device support in Firecracker, like GPUs, doesn't exist, which also makes Firecracker suitable for multitenant workloads. Something like QEMU has far more device support but is also significantly easier to escape out of.
vbitz 591 days ago [-]
Very cool looking project. I love their approach to networking which patches the Linux kernel to intercept operations on sockets and defers that to the host.
I’ve been working in a similar area recently and networking is an unfortunate stumbling block.
But I'm no expert, just my armchair take on things.
debarshri 591 days ago [-]
If it is OCI compatible, it technically means you could use kubernetes or another container orchestrator to orchestrate these microvms. I wonder krunvm already works with kubernetes.
mcronce 591 days ago [-]
If I'm understanding correctly, it's OCI compatible in the other direction - it consumes OCI compatible images, but it doesn't expose an OCI compatible layer on top for orchestration.
kube-virt[1] is a thing, though, that provides k8s orchestration for VMs. I don't see why you couldn't use krunvm microvms with that
One of the biggest advantages of this VMM is that programs have access to the network inside the VMM without the admin having to setup complex virtual bridges and so forth in the host in advance or use something like slirp. This is accomplished via TSI (Transparent Socket Impersonation).
Basically sockets in the guest are bridged to AF_VSOCK via the use of a patched linux kernel (when you build libkrunfw.so) when communicating outside the VM. See https://www.youtube.com/watch?v=EGV03THGrrw for more info on TSI.
My only concern is that TSI is currently not a feature available in Linux. When do the authors plan to upstream this into Linux proper? My understanding is that this was planned in 2021 but it is now 2022...
Does slirp support protos other than UDP and TCP? Apparently, TSI doesn't: https://github.com/containers/libkrun/blob/1af2e7236d1/READM...
I wonder if slirp4netns understands anything other than TCP/UDP/ICMP.
OTOH I don't think ping (ICMP) is possible with TSI (but maybe I needed to do some other config to make it work).
Do you mean VM I/O can be very slow? I don't think containers should have any overhead, please correct me if I'm wrong though.
That's just Docker though, right? Does LXC or systemd-nspawn do that?
In summary though (others redacted):
I’ve been working in a similar area recently and networking is an unfortunate stumbling block.
But I'm no expert, just my armchair take on things.
kube-virt[1] is a thing, though, that provides k8s orchestration for VMs. I don't see why you couldn't use krunvm microvms with that
[1] https://github.com/kubevirt/kubevirt