[HN Gopher] My VM is lighter (and safer) than your container (2017)
___________________________________________________________________
 
My VM is lighter (and safer) than your container (2017)
 
Author : gaocegege
Score  : 234 points
Date   : 2022-09-08 12:26 UTC (10 hours ago)
 
web link (dl.acm.org)
w3m dump (dl.acm.org)
 
| lamontcg wrote:
| Containers should really be viewed as an extension of packages
| (like RPM) with a bit of extra sauce with the layered filesystem,
| a chroot/jail and cgroups for some isolation between different
| software running on the same server.
| 
| Back in 2003 or so we tried doing this with microservices that
| didn't need an entire server with multiple different software
| teams running apps on the same physical image to try to avoid
| giving entire servers to teams that would be only using a few
| percent of the metal. This failed pretty quickly as software bugs
| would blow up the whole image and different software teams got
| really grounchy at each other. With containerization the chroot
| means that the software carries along all its own deps and the
| underlying server/metal image can be managed seperately, and the
| cgroups means that software groups are less likely to stomp on
| each other due to bugs.
| 
| This isn't a cloud model of course, it was all on-prem. I don't
| know how kubernetes works in the cloud where you can conceivably
| be running containers on metal sharing with other customers. I
| would tend to assume that under the covers those cloud vendors
| are using Containers on VMs on Metal to provide better security
| guarantees than just containers can offer.
| 
| Containers really shouldn't be viewed as competing with VMs in a
| strict XOR sense.
 
  | nikokrock wrote:
  | I don't remember where i read it but as far as i know when
  | using Fargate to run containers (with k8s or ecs) AWS will just
  | allocate an ec2 instance for you. Your container will never run
  | on the same vm as other customer. This explain i think the
  | latency you can have to start a container. To improve that you
  | need to handle your own ec2 cluster with an autoscaling group
 
| djhaskin987 wrote:
| Not surprising that VMs running unikernels are as nimble as
| containers, but not quite useful either, at least in general.
| Much easier to just use a stock docker image.
 
| ricardobeat wrote:
| How does LightVM compare to Firecracker VMs? Could it be used for
| on-demand cloud VMs?
 
| [deleted]
 
| r3mc0 wrote:
| Containers and VMs are totally not the same thing. They serve a
| complete other purpose , as multiple containers can be combined
| to create an application/service , VMs always use a complete os
| etc etc anyway the internet is full of the true purpose of
| containers , they were never meant to use as a "VM" and about
| security.. meh everything is insecure until proven differently
 
  | wongarsu wrote:
  | VMs can have private networks between each other just as
  | containers do. That's pretty much what EC2 is about.
 
  | nijave wrote:
  | VMs don't need a full OS. You can run a single process directly
  | from the kernel with no init system or other userland
 
| fnord123 wrote:
| Title is kinda clickbaity (wha-? how can a VM be lighter than a
| container). It's about unikernels.
 
  | JeanSebTr wrote:
  | Exactly, unikernels are great for performance and isolation,
  | but that can't be compared to a full application stack running
  | in a container or VM.
 
  | throwaway894345 wrote:
  | > how can a VM be lighter than a container
  | 
  | It's still clickbaity, but the title implies a comparison
  | between a very lightweight VM and a heavy-weight container
  | (presumably a container based on a full Linux distro). You
  | could imagine an analogous article about a tiny house titled
  | "my house is smaller than your apartment".
 
    | marcosdumay wrote:
    | It is still lighter in memory only. CPU is also a relevant
    | thing to compare them.
 
    | turkishmonky wrote:
    | Not to mention, in the paper, the lightvm only had an
    | advantage on boot times. Menory usage was marginally worse
    | than docker, even with the unikernel, and debian on lightvm
    | was drastically worse for cpu usage than docker (the
    | unikernel cpu usage was neck and neck with the debian docker
    | contaner).
    | 
    | I could see it being an improvement over other VM control
    | planes, but docker still wins in performance for any
    | equivalant comparisons.
 
  | nailer wrote:
  | Firecracker VMs are considered lighter than a container and are
  | pretty old at this point.
 
    | sidkshatriya wrote:
    | I would say that firecracker VMs are _not_ more lightweight
    | than Linux containers.
    | 
    | Linux containers are essentially the separation of Linux
    | processes via various namespaces e.g. mount, cgroup, process,
    | network etc. Because this separation is done by Linux
    | internally there are not that many overheads.
    | 
    | VMs provide a different kind of separation one that is
    | arguably more secure because it is backed up hardware -- each
    | VM thinks it has the whole hardware to itself. When you
    | switch between the VM and the host there is quite a
    | heavyweight context switch (VMEXIT/VMENTER in Intel
    | parlance). It can take a long time compared to just the usual
    | context switch from one Linux container (process) to another
    | host (process) or another Linux container (process).
    | 
    | But coming back to your point, no firecracker VMs are not
    | lighter/lightweight than a Linux container. They are quite
    | heavyweight actually. But the firecracker VMM is probably the
    | most nimble of all VMMs.
 
  | [deleted]
 
| kasperni wrote:
| [2017]
 
| GekkePrutser wrote:
| Sometimes the less strict separation is a feature, not a bug.
| 
| Without folder sharing with dockers for example, it would be
| pretty useless.
 
  | 1MachineElf wrote:
  | While a flawed comparison, WSL does use a VM in conjunction
  | with the 9p protocol to achieve folder sharing.
 
    | liftm wrote:
    | 9p-based folder sharing is (used to be?) possible with qemu,
    | too.
 
      | ksbrooksjr wrote:
      | It looks like it still is supported [1]. I noticed while
      | reading the Lima documentation that they're planning on
      | switching from SSHFS to 9P [2].
      | 
      | [1] https://wiki.qemu.org/Documentation/9psetup
      | 
      | [2] https://github.com/lima-
      | vm/lima/blob/3401b97e602083cfc55b34e...
 
| gavinray wrote:
| The issue with unikernels and things like Firecracker are that
| you can't run them on already-virtualized platforms
| 
| I researched Firecracker when I was looking for an alternative to
| Docker for deploying FaaS functions on an OpenFaaS-like clone I
| was building
| 
| It would have worked great if the target deployment was bare
| metal but if you're asking a user to deploy on IE an EC2 or
| Fargate or whatnot, you can't use these things so all points are
| moot
| 
| This is relevant if you're self-hosting or you ARE a service
| provider I guess.
| 
| (Yes, I know about Firecracker-in-Docker, but I mean real
| production use)
 
  | eyberg wrote:
  | This is a very common misunderstanding in how these actually
  | get deployed in real life.
  | 
  | Disclosure: I work with the OPS/Nanos toolchain so work with
  | people that deploy unikernels in production.
  | 
  | When we deploy them to AWS/GCP/Azure/etc. we are _not_ managing
  | the networking /storage/etc. like a k8s would do - we push all
  | that responsibility back onto the cloud layer itself. So when
  | you spin up a Nanos instance it spins up as its own EC2
  | instance with only your application - no linux, no k8s,
  | nothing. The networking used is the networking provided by the
  | vpc. You can configure it all you want but you aren't managing
  | it. Now if you have your own infrastructure - knock yourselves
  | out but for those already in the public clouds this is the
  | preferred route. We essentially treat the vm as the application
  | and the cloud as the operating system.
  | 
  | This allows you to have a lot better performance/security and
  | it removes a ton of devops/sysadmin work.
 
  | gamegoblin wrote:
  | This is a limitation of whatever virtualized instance you're
  | running on, not Firecracker itself. Firecracker depends on KVM,
  | and AWS EC2 virtualized instances don't enable KVM. But not all
  | virtualized instance services disable KVM.
  | 
  | Obviously, Firecracker being developed by AWS and AWS disabling
  | KVM is not ideal :)
  | 
  | Google Cloud, for instance, allows nested virtualization, IIRC.
 
    | verdverm wrote:
    | Ive used GCP nested virtualization. You pay for that overhead
    | in performance so I wouldn't recommend it without more
    | investigation. We were trying to simulate using luks and the
    | physical is key insert / removal. Would have used it more if
    | we could get GPU passthrough working
 
    | shepherdjerred wrote:
    | Azure and Digital Ocean allowed nested virt as well!
 
    | gavinray wrote:
    | Yeah but imagine trying to convince people to use an OSS tool
    | where the catch is that you have to deploy it on special
    | instances, only on providers that support nested
    | virtualization
    | 
    | Not a great DX, haha I wound up using GraalVM's "Polyglot"
    | abilities alongside it's WASM stuff
 
| Sohcahtoa82 wrote:
| > We achieve lightweight VMs by using unikernels
| 
| When I attended Infiltrate a few years ago, there was a talk
| about unikernels. The speaker showed off how incredibly insecure
| many of them were, not even offering support for basic modern
| security features like DEP and ALSR.
| 
| Have they changed? Or did the speaker likely just cherry-pick
| some especially bad ones?
 
  | eyberg wrote:
  | You are probably talking about this:
  | https://research.nccgroup.com/wp-content/uploads/2020/07/ncc...
  | 
  | In short - not a fundamental limitation - just that kernels
  | (even if they are small) have a ton of work that goes into
  | them. Nanos for instance has page protections, ASLR, virtio-rng
  | (if on GCP), etc.
 
  | sieabah wrote:
  | The headline reads like a reddit post so I'm going to assume
  | the same still holds true.
 
| wyager wrote:
| It's not clear to me that VMs actually do offer better isolation
| than well-designed containers (i.e. not docker).
| 
| It's basically a question of: do you trust the safety of kernel-
| mode drivers (for e.g. PV network devices or emulated hardware)
| for VMs, or do you trust the safety of userland APIs + the
| limited set of kernel APIs available to containers.
| 
| On my FreeBSD server, I kind of trust jails with strict device
| rules (i.e. there are only like 5 things in /dev/) over a VM with
| virtualized graphics, networking, etc.
 
  | nijave wrote:
  | I think it gets even more complicated with something like
  | firecracker where they recommend you run firecracker in a jail
  | (and provide a utility to set that up)
 
| [deleted]
 
| dirkg wrote:
| Why is a 5yr old article being posted now? If this were to catch
| on, it would've. I just dont see it being used anywhere.
| 
| Having a full Linux kernel available is a major benefit that you
| lose, right?
 
| faeriechangling wrote:
| What I see happening now on the cloud is containers from
| different companies and different security domain running on the
| same VM. I have to think this is fundamentally insecure and that
| VMs are underrated.
| 
| I hear people advocate QubesOS for security which is based on XEN
| when it comes to running my client. They say my banking should be
| done in a different VM than my email for instance. Well if that's
| the case, why do we run many containers doing different security
| sensitive functions on the same VM when containers are not really
| considered a very good security boundary?
| 
| From a security design perspective I imagine hardware being
| exclusive to a person/organization, vms being exclusive to some
| security function, and containers existing on top of that makes
| more sense from a security function but we seem to be playing
| things more loosely on the server side.
 
  | bgm1975 wrote:
  | Doesn't AWS use firecracker with its Fargate container service
  | (and lambda too)?
 
| jupp0r wrote:
| (2017)
 
| jjtheblunt wrote:
| "orders of magnitude" :
| 
| Why does anyone ever write "two orders of magnitude" when 100x is
| shorter?
| 
| Of course, this presumes 10 as the magnitude and the N orders to
| be the exponent, but I don't think I've ever, since the 90s, seen
| that stilted phrasing ever used for a base other than 10.
 
  | IshKebab wrote:
  | Because two orders of magnitude does not mean 100x. It means on
  | the same order as 100x.
 
    | jjtheblunt wrote:
    | Do you mean folks using the phrase know big-O, big-omega,
    | big-theta, and are thinking along those lines?
 
      | IshKebab wrote:
      | It's nothing to do with big-O; it's about logarithms. But
      | really I think most people using it just think of it like:
      | "which of these is it closest to? 10x, 100x or 1000x?"
 
| xahrepap wrote:
| This reminds me: in 2015 I went to Dockercon and one booth that
| was fun was VMWare's. Basically they had implemented the Docker
| APIs on top of VMWare so that they could build and deploy VMs
| using Dockerfiles, etc.
| 
| I've casually searched for it in the past and it seems to not
| exist anymore. For me, one of the best parts of Docker is
| building a docker-image (and sharing how it was done via git). It
| would be cool to be able to take the same Dockerfiles and pivot
| them to VMs easily.
 
  | All4All wrote:
  | Isn't that essentially what Vagrant and Vagrantfiles do?
 
    | hinkley wrote:
    | What is your theory for why Docker won and Vagrant didn't?
    | 
    | Mine is that all of the previous options were too Turing
    | Complete, while the Dockerfile format more closely follows
    | the Principle of Least Power.
    | 
    | Power users always complain about how their awesome tool gets
    | ignored while 'lesser' tools become popular. And then they
    | put so much energy into apologizing for problems with the
    | tool or deflecting by denigrating the people who complain.
    | Maybe the problem isn't with 'everyone'. Maybe Power Users
    | have control issues, and pandering to them is not a
    | successful strategy.
 
      | duskwuff wrote:
      | What turned me off from Vagrant was that Vagrant machines
      | were never fully reproducible.
      | 
      | Docker took the approach of specifying images in terms of
      | how to create them from scratch. Vagrant, on the other
      | hand, took the approach of specifying certain details about
      | a machine, then trying to apply changes to an existing
      | machine to get it into the desired state. Since the
      | Vagrantfile didn't (and couldn't) specify everything about
      | that state, you'd inevitably end up with some drift as you
      | applied changes to a machine over time -- a development
      | team using Vagrant could often end up in situations where
      | code behaved differently on two developers' machines
      | because their respective Vagrant machines had gotten into
      | different states.
      | 
      | It helped that Docker images can be used in production.
      | Vagrant was only ever pitched as a solution for
      | development; you'd be crazy to try to use it in production.
 
        | mmcnl wrote:
        | Docker is not fully reproducible either. Try building a
        | Docker image from two different machines and then pushing
        | it to a registry. It will always overwrite.
 
    | xahrepap wrote:
    | Yes, which is what I'm using now. But it doesn't use the
    | Docker APIs to allow you to (mostly) reuse a dockerfile to
    | build a VM or a container.
    | 
    | not sure if it would be better than Vagrant. But it was still
    | very interesting.
 
  | verdverm wrote:
  | They might have built it into Google Anthos as part of their
  | partnership. I recall seeing a demo where you could deploy &
  | run any* VMWare image on Kubernetes without any changes
 
  | mmcnl wrote:
  | You are talking about declarative configuration of VMs. Vagrant
  | offers that, right?
 
    | P5fRxh5kUvp2th wrote:
    | eeeeeh.......
    | 
    | yes, but then again ... no.
    | 
    | I mean ... yes Vagrant does offer that, but no would I ever
    | consider Vagrant configuration anything approaching a
    | replacement for docker configuration.
 
| JStanton617 wrote:
| This paper references consistently mischaracterizes AWS Lambda as
| a "Container as a Service" technology, when in fact it is exactly
| the sort of lightweight VM that they are describing -
| https://aws.amazon.com/blogs/aws/firecracker-lightweight-vir...
 
  | [deleted]
 
  | Jtsummers wrote:
  | In fairness to this paper, it was written and published before
  | that Firecracker article (2017 vs 2018). From another paper on
  | Firecracker providing a bit of history:
  | 
  | > When we first built AWS Lambda, we chose to use Linux
  | containers to isolate functions, and virtualization to isolate
  | between customer accounts. In other words, multiple functions
  | for the same customer would run inside a single VM, but
  | workloads for different customers always run in different VMs.
  | We were unsatisfied with this approach for several reasons,
  | including the necessity of trading off between security and
  | compatibility that containers represent, and the difficulties
  | of efficiently packing workloads onto fixed-size VMs.
  | 
  | And a bit about the timeline:
  | 
  | > Firecracker has been used in production in Lambda since 2018,
  | where it powers millions of workloads and trillions of requests
  | per month.
  | 
  | https://www.usenix.org/system/files/nsdi20-paper-agache.pdf
 
    | runnerup wrote:
    | Thank you for this detail!
 
  | xani_ wrote:
  | AWS "just" runs linux but this is using unikernels tho ?
 
    | Jtsummers wrote:
    | No, it's using a modified version of the Xen hypervisor and
    | the numbers they show are boot times and memory usage for
    | both unikernels and pared down Linux systems (via tinyx).
    | It's described in the abstract:
    | 
    | > We achieve lightweight VMs by using unikernels for
    | specialized applications and with Tinyx, a tool that enables
    | creating tailor-made, trimmed-down Linux virtual machines.
 
  | wodenokoto wrote:
  | For what it's worth, Google's cloud functions are a container
  | service. You can even download the final docker container.
 
    | raggi wrote:
    | KVM gVisor is a hybrid model in this context. It shares
    | properties with both containers and lightweight VMs.
 
| oxfordmale wrote:
| Kubernetes says no...
| 
| The article is light on detail. Containers and VMs have different
| use cases. If you self host lightweight VMs is likely the better
| path, however, once you in the cloud most managed services only
| provide support for containers.
 
  | nailer wrote:
  | > in the cloud most managed services only provide support for
  | containers.
  | 
  | Respectfully, comments like these are the reason for Kubernetes
  | becoming a meme.
 
    | oxfordmale wrote:
    | There is a huge difference running on VMs that you have zero
    | access to, and actually owning your own VM infrastructure.
    | Yes AWS Lambda runs on Firecracker, however, it could as well
    | running on a FireCheese VM platform and you would be none the
    | wiser, unless AWS publishes this somewhere.
    | 
    | I am also not running on Kubernetes, because Kubernetes. AWS
    | ECS and AWS Batch also only handle containerised
    | applications. Even when deploying on EC2 I tend to use
    | containers, as it ensures they keep working consistently if
    | you apply patches to your EC2 environment.
 
  | lrvick wrote:
  | You can also use a firecracker runner in k8s to wrap each
  | container in a VM for high isolation and security.
 
| bongobingo1 wrote:
| I'm quite interested in seeing where slim VM's go. Personally I
| don't use Kubernetes, it just doesn't fit my client work which is
| nearly all single-server and it makes more sense to just run
| podman systemd units or docker-compose setups.
| 
| So from that perspective, when I've peeked at firecracker, kata
| containers, etc, the "small dev dx" isn't quite there yet, or
| maybe never will get there since the players target other spaces
| (aws, fly.io, etc). Stuff like a way to share volumes isn't
| supported, etc. Personally I find Dockers architecture a bit
| distasteful and Podmans tooling isn't _quite_ there yet (but very
| close).
| 
| Honestly I don't really care about containers vs VMs except the
| VM alleges better security which is nice, and I guess I like
| poking at things but they're were a little too rough for weekend
| poking.
| 
| Is anyone doing "small scale" lightweight vm deployments - maybe
| just in your homelab or toy projects? Have you found the
| experience better than containers?
 
  | NorwegianDude wrote:
  | I've been using containers since 2007 for isolating workloads.
  | I don't really like Docker for production either because of the
  | network overhead with the "docker-way" of doing things.
  | 
  | LXD is definetly my favorite container tool.
 
    | pojzon wrote:
    | How differently LXD manages isolation in comparison to docker
    | ?
    | 
    | I suppose both create netns, bridge, ifs ?
 
      | lstodd wrote:
      | It's the same stuff - namespaces, etc. But it doesn't shove
      | greasy fingers into network config like docker. More a
      | tooling question/approach than tech.
 
      | antod wrote:
      | LXC/LXD use the same kernel isolation/security features
      | Docker does - namespaces, cgroups, capabilities etc.
      | 
      | After all, it is the kernel functionality lets you run
      | something as a container. Docker and LXC/LXD are different
      | management / FS packaging layers on top of that.
 
        | staticassertion wrote:
        | I assume it's not using seccomp, which Docker uses,
        | although seccomp is not Docker specific and you can go
        | grab their policy.
 
  | xani_ wrote:
  | They went to trash because containers are more convenient to
  | use and saving few MBs of disk/memory is not what most users
  | care.
  | 
  | The whole idea was pretty much either use custom kernel (which
  | inevitably have way less info on how to debug anything in it),
  | and re-do all of the network and storage plumbing containers
  | already do via the OS they are running one.
  | 
  | OR just very slim linux one which at least people know how to
  | use but STILL is more complexity than "just a blob with some
  | namespaces in it" and STILL requires a bunch of config and data
  | juggling between hypervisor and VM just to share some host
  | files to the guest.
  | 
  | Either way to get to the level of "just a slim layer of code
  | between hypervisor and your code" you need to do a quite a lot
  | of deep plumbing and when anything goes wrong debugging is
  | harder. All to get some perceived security and no better
  | performance than just... running the binary in a container.
  | 
  | It did percolate into "slim containers" idea where the
  | container is just statically compiled binary + few configs and
  | while it does have same problems with debuggability, you _can_
  | just attach sidecart to it
  | 
  | I guess next big hype will be "VM bUt YoU RuN WebAsSeMbLy In
  | CuStOm KeRnEl"
 
    | evol262 wrote:
    | Virtualization is not just "perceived" security over
    | containerization. From CPU rings on down, it offers
    | dramatically more isolation for security than
    | containerization does.
    | 
    | This isn't about 'what most users care' about either. Most
    | users don't really care about 99% of what container
    | orchestration platforms offer. The providers do absolutely
    | care that malicious users cannot punch out to get a shell on
    | an Azure AKS controller or go digging around inside /proc to
    | figure out what other tenants are doing unless the provider
    | is on top of their configuration and regularly updates to
    | match CVEs.
    | 
    | "most users" will end up using one of the frameworks written
    | by a "big boy" for their stuff, and they'll end up using
    | what's convenient for cloud providers.
    | 
    | The goal of microvms is ultimately to remove everything
    | you're talking about from the equation. Kata and other
    | microvm frameworks aim to be basically jsut another CRI which
    | removes the "deep plumbing" you're talking about. The onus is
    | on them to make this work, but there's an enormous financial
    | payoff, and you'll end up with this whether you think it's
    | worthwhile or not.
 
      | convolvatron wrote:
      | in a related vein, most of the distinctions that are being
      | brought up around containers vs vms (pricing, debugability,
      | tooling, overhead) are nothing fundamental at all. they are
      | both executable formats that cut at different layers, and
      | there is really no reason why features of one can't be
      | easily brought to the other.
      | 
      | operating above these abstractions can save us time, but
      | please stop confusing the artifacts of implementation with
      | some kind of fundamental truth. its really hindering our
      | progress.
 
        | evol262 wrote:
        | Bringing the features of one to the other is exactly what
        | microvms means.
 
      | pojzon wrote:
      | With eBPF there is really not much to argue about in
      | security space.
      | 
      | You can do everything.
      | 
      | New toolset for containers covers pretty much every
      | possible use-case you could even imagine.
      | 
      | The trend will continue in favor of containers and k8s.
 
        | tptacek wrote:
        | It is pretty obviously not the case that eBPF means
        | shared-kernel containers are comparably as secure as VMs;
        | there have been recent Linux kernel LPEs that no system
        | call scrubbing BPF code would have caught, without
        | specifically knowing about the bug first.
 
        | evol262 wrote:
        | Let me know when eBPF can probe into ring-1 hypercalls
        | into a different kernel other than generically watching
        | timing from vm_enter and vm_exit.
        | 
        | Yes, there is a difference between "eBPF can probe what
        | is happening in L0 of the host kernel" and "you can probe
        | what is happening in other kernels in privileged ring-1
        | calls".
        | 
        | No, this is not what you think it is.
 
        | staticassertion wrote:
        | I'm not sure what you mean with regards to eBPF but the
        | difference between a container and a VM is massive with
        | regards to security. Incidentally, my company just
        | published a writeup about Firecracker:
        | https://news.ycombinator.com/item?id=32767784
 
  | depingus wrote:
  | > So from that perspective, when I've peeked at firecracker,
  | kata containers, etc, the "small dev dx" isn't quite there yet,
  | or maybe never will get there since the players target other
  | spaces (aws, fly.io, etc). Stuff like a way to share volumes
  | isn't supported, etc. Personally I find Dockers architecture a
  | bit distasteful and Podmans tooling isn't quite there yet (but
  | very close).
  | 
  | This is pretty much me and my homelab. I haven't visited it in
  | a while, but Weave Ignite might be of interest here.
  | https://github.com/weaveworks/ignite
 
| opentokix wrote:
| Tell me you don't understand containers wihout telling me you
| dont understand containers.
 
  | anthk wrote:
  | You don't understand vm's neither. Ever used virtual network
  | interfaces?
 
| devmor wrote:
| Yes, when you custom engineer a specific, complex solution for a
| specific use case, it is generally more performative than a
| general-use solution that's simple.
 
___________________________________________________________________
(page generated 2022-09-08 23:00 UTC)