|
| [deleted]
| riobard wrote:
| A few years ago I invested in a small startup called `hyper.sh`.
| It open sourced a container runtime called `runV` which provided
| exactly this: security of virtual machines plus convenience of
| containers.
|
| The project later merged with Intel Clear Container to become
| what's now called Kata Containers (https://katacontainers.io/)
| and is now widely used by several Internet giants like Alibaba
| and Baidu.
|
| The startup was acquired by Ant Finance a couple of years ago.
|
| (I recorded a podcast with one of hyper.sh engineer if you can
| listen to Mandarin https://pan.icu/25)
| [deleted]
| polskibus wrote:
| How does it differ from Firecracker?
| riobard wrote:
| I'm not familiar with later development, but AFAIK
| Firecracker came much later and now you can actually use
| Firecracker as Kata Container's hypervisor in addition to
| QEMU.
| temp_praneshp wrote:
| Probably off topic: Back in 2014-15 at my first job, when I was
| working on openstack, they used to show up at the summits. They
| were super smart and very generous with their time when I had
| questions. I wondered sometime in 2020 what happened to them,
| I'm happy they had a decent exit.
| lifty wrote:
| I worked with their tech, testing it, and I loved the product.
| It was definitely ahead of its time. Similar in some ways to
| what Fly is doing these days, without the edge.
| cptnapalm wrote:
| I was looking at Kata containers a few days ago. I'm pretty new
| to trying to use VMs/containers for services; purely hobby
| level. Couldn't figure out how to use them, but that's not
| necessarily a knock on them as I also can't get OpenBSD
| wireguard to work either.
| forty wrote:
| Isn't firecracker an AWS tech?
| cpach wrote:
| That's correct.
|
| https://github.com/firecracker-microvm/firecracker
| encryptluks2 wrote:
| Why not run containers in VMs in containers in VMs? :)
|
| Seriously, VMs are hardly as secure as many people want to
| believe unless you're utilizing enclaves and even that has
| vulnerabilities. I think a better approach is Seccomp and
| whatever other filtering makes sense.
| dboreham wrote:
| Machine Turducken.
| handrous wrote:
| A while back I did some looking at FreeBSD jails to try to
| figure out why they don't have more mindshare (especially when
| paired with the nigh-superpower-granting ZFS).
|
| I came away baffled that they weren't more widely-promoted,
| compared with Docker and friends. After thinking about it for a
| while, all I can figure is they're so straightforward to use
| and well-documented that there's no room to make one's name, or
| to make a buck, re-packaging them or wrapping them in complex
| tools, so there's little money or glory (= personal marketing
| via open-source project leadership/contributions) in promoting
| them.
|
| [EDIT] that is: what would be a blog post in LXC/Docker land...
| doesn't exist, because it's covered perfectly well in the docs.
| What would be a simple open-source tool... becomes a blog post,
| because it's short, simple, and clear enough not to merit
| special software, but just a quick guide to existing tools.
| What would be a business, becomes a simple open-source tool
| without enough of a difficulty/convenience "moat" to support a
| business.
| nicolaslem wrote:
| TrueNAS exposed me to FreeBSD jails but what put me off is
| that there does not seem to be an equivalent of "docker
| build".
|
| Jails seem to be treated like OpenVZ containers in the Linux
| world: a lighter alternative to virtual machines, not a way
| to build and distribute applications like Docker.
|
| This is just my take after playing a few hours with jails, I
| would happily be proven wrong.
| tyingq wrote:
| If technically best in the container space mattered, Illumos
| would be everywhere...
| tptacek wrote:
| People say this a lot too, but Illumos also uses shared-
| kernel isolation. Linux + gVisor is probably
| (significantly) superior to it as far as security goes.
| cestith wrote:
| Or z/OS
| tptacek wrote:
| Jails are still shared-kernel isolation. Docker's reputation
| is mired in its earlier implementations, when it wasn't
| really even intended for multitenant isolation. Modern
| Docker, running with unprivileged containers (which is the
| norm), is substantially hardened. The real win over Docker is
| losing the shared kernel, which is what lots of people are
| doing, so the win to Jails is marginal.
| boardwaalk wrote:
| I suspect the answer includes it not being Linux, even with
| the compatibility layer available.
| handrous wrote:
| I'm sure that's some of it, but the trend seems to be
| moving away from leveraging OS-level tools _anyway_. As
| long as your containers (or jails) and the single important
| binary in each one start up OK and your network tuning on
| the parent OS isn 't completely screwed up, the rest barely
| matters anymore.
| coder543 wrote:
| It seems like you're missing a lot of things.
|
| As a developer, how do I run FreeBSD Jails on my MacBook
| during development? With Docker for Mac, it is trivial
| for me to do everything on my Mac, and the fact that
| there is a virtual machine is completely invisible to me.
| Everything "Just Works". With FreeBSD Jails, I would have
| to actually interact with a VM constantly, including the
| pain of shipping files back and forth.
|
| As a developer, are popular databases and applications
| pre-packaged as FreeBSD Jails so that I can spin one up
| on my laptop with a single command? Where is the Docker
| Hub equivalent?
|
| As a developer, how do I orchestrate a collection of
| FreeBSD Jails for each project? With Docker, I define a
| single `docker-compose.yml` file for each project. With a
| single `docker-compose up`, the entire project is running
| _including_ dependencies such as databases and other
| related projects in a completely reproducible fashion.
| This makes it trivial for coworkers to spin up a project
| on their machine and immediately be productive without
| spending an hour trying to get all the right versions of
| everything installed and up and running.
|
| As someone responsible for deploying an application to
| production, what is the story around FreeBSD Jails for
| deploying across a cluster? Is there a Kubernetes-
| equivalent that can manage the allocation of resources,
| blue-green deployments, and manage the lifecycle of my
| FreeBSD Jails?
|
| As someone responsible for deploying an application to
| production, do any of the major clouds support FreeBSD
| Jails? With Docker images, I can deploy those straight to
| ECS Fargate, Google Cloud Run, and half a dozen other
| services. Then I don't even have to think about my own
| infrastructure unless I need some really specialized
| hardware for a specific application.
|
| > the rest barely matters anymore.
|
| _Everything else_ matters so much.
|
| As to your earlier point about ZFS, most Linux distros
| these days seem to trivially support ZFS. Even TrueNAS is
| working on switching to Linux with their TrueNAS Scale
| offering.
|
| It's not that I'm opposed to FreeBSD... FreeBSD is just a
| hard sell. It's hard to pin down exactly what you're
| gaining by throwing out all the collective Linux
| knowledge of an organization and switching to FreeBSD.
| FreeBSD is an N-th tier platform for pretty much every
| programming language except C, so good luck when you run
| into random subtle problems. Also, good luck doing
| hardware accelerated machine learning inference or
| training on FreeBSD... it's _probably_ possible?
|
| > the single important binary
|
| This is also such a weird thing to throw out there. I
| like a good Go program myself, but _most_ companies are
| not only deploying single-binary statically linked
| applications. Most companies are also deploying some kind
| of Ruby, Python, or Java application... none of which are
| likely to be a single file in practice. Most of them will
| have a variety of shared libraries, and I don 't know if
| I've ever seen a Ruby application shipped in a `FROM
| scratch` container before. Technically possible, but
| that's just not common reality as far as I've seen. It
| sounds like you're proposing that everyone is already
| running in `FROM scratch` containers, so a FreeBSD Jail
| is just a drop-in replacement.
|
| Linux containers are far from perfect, but as a
| developer... I _have_ played with FreeBSD Jails before,
| and come away frustrated by all the work you have to do
| yourself.
| handrous wrote:
| > > the single important binary
|
| > This is also such a weird thing to throw out there. I
| like a good Go program myself, but most companies are not
| only deploying single-binary statically linked
| applications. Most companies are also deploying some kind
| of Ruby, Python, or Java application... none of which are
| likely to be a single file in practice.
|
| Sure, but usual practice with containers is to put each
| thing in its own, unless they are _very_ tightly coupled.
| Web-app with a SQL database and a memory cache? Three
| containers. You _can_ do otherwise, but that 's typical.
| Usually each container ends up with one main, important
| running process, and not much else.
|
| [EDIT]
|
| > As someone responsible for deploying an application to
| production, what is the story around FreeBSD Jails for
| deploying across a cluster? Is there a Kubernetes-
| equivalent that can manage the allocation of resources,
| blue-green deployments, and manage the lifecycle of my
| FreeBSD Jails?
|
| > As someone responsible for deploying an application to
| production, do any of the major clouds support FreeBSD
| Jails? With Docker images, I can deploy those straight to
| ECS Fargate, Google Cloud Run, and half a dozen other
| services. Then I don't even have to think about my own
| infrastructure unless I need some really specialized
| hardware for a specific application.
|
| These are exactly the kinds of things I was thinking of
| when I noted that the OS itself has been seriously
| diminished in importance, for modern workflows. I agree
| that most commercial or high-profile open-source "cloud"
| tools and platforms are built around LXC/Docker.
| coder543 wrote:
| > Sure, but usual practice with containers is to put each
| thing in its own, unless they are very tightly coupled.
| Web-app with a SQL database and a memory cache? Three
| containers. You can do otherwise, but that's typical.
| Usually each container ends up with one main, important
| running process, and not much else.
|
| I agree, but... getting all the application dependencies
| in there is more than just getting a single binary in
| there. If it's just a single-binary Go program, then a
| Jail works just fine, but it's not that simple for a Ruby
| application. I'm definitely not talking about databases
| running in the same container as the application. That's
| where Kubernetes and docker-compose come in for multi-
| container orchestration, which are things that FreeBSD
| Jails don't have as far as I know.
|
| > These are exactly the kinds of things I was thinking of
| when I noted that the OS itself has been seriously
| diminished in importance
|
| Yes, but... these are all the things that FreeBSD doesn't
| offer. These are the real reasons that people don't talk
| about FreeBSD Jails in the same breath as Docker. The
| Docker container itself (or the FreeBSD Jail) as a unit
| of isolation is the least interesting part of the
| ecosystem. All of the developer tools, orchestration
| tools, and prebuilt images are what make the Docker
| universe so interesting, and make FreeBSD Jails... less
| interesting.
|
| You said you were confused why Jails don't have more
| mindshare. It has absolutely nothing to do with people
| being able to invent useless tools and write blog posts
| about them, and it has absolutely nothing to do with
| FreeBSD Jails being _too well documented_. You kind of
| implied those were the best explanations you could come
| up with. Those are not the problems _at all_ , and it
| seems disingenuous to me to say you think those are the
| problems unless you _really_ didn 't know the things I
| mentioned in my first reply.
| oarsinsync wrote:
| FreeBSD introduced Jails in 1999.
|
| I used my first Jail in 2001.
|
| Docker was started over a decade later in 2013.
|
| It's reasonable to be confused why Jails lacks the
| mindshare. "Because it lacks all these other over-the-top
| features that we need" might be reasonable in response,
| except that Docker didn't have any of these things on day
| 0 either.
|
| Jails had a 14 year head start, Docker reinvents the
| wheel, and nor particularly well at first. Why did it
| succeed more than Jails did? It wasn't because of the
| piss-poor native Mac support.
| tptacek wrote:
| It seems pretty obvious that the big thing here is that
| most people ship apps on Linux, not on FreeBSD.
| handrous wrote:
| My personal favorite thing about Docker, and the part I'd
| most miss if I switched to Jails (which I'm fairly
| confident could meet my needs with some fairly simple
| scripts and aliases that wouldn't take me long to arrive
| at, which is why I think there's so much less of an
| "ecosystem" there, even a nascent and under-developed
| one) is the way it forces projects to un-fuck their
| configuration.
|
| 500-line config, much of which few people ever care
| about, with all kinds of ill-conceived nesting? Better
| put the ~20 options that 99% of users ever touch in
| environment variables, and document them. Weird state
| garbage that's not captured in your config-on-disk?
| Better figure it out and get it into env vars, and have
| your startup script use those to transparently manage
| whatever bad decisions you made re: state in the past.
| Shit files all over the system? Better get that sorted
| out so people can handle persistence with at the _very_
| most three total mounts--and oh, gee, look, now your
| simple example docker-compose also serves to document
| where exactly you store files. And so on.
|
| (my second-favorite thing is that it's a de-facto cross-
| distro package manager with very up-to-date packages that
| are trivial to completely and cleanly uninstall)
| vermaden wrote:
| > As a developer, are popular databases and applications
| pre-packaged as FreeBSD Jails so that I can spin one up
| on my laptop with a single command?
|
| The closest you can get is BastilleBSD (framework for
| FreeBSD Jails) and their templates - available here:
|
| https://github.com/BastilleBSD/templates
| https://bastillebsd.org/templates/
| tptacek wrote:
| I don't know what people generally believe.
|
| But the attack surface of a Linux kernel is very large, is
| pretty unpredictable, and can't be coherently masked out with
| rules (my favorite example Jann Horn's VM reference count bug,
| which was a simple concurrency flaw in the core virtual memory
| system). By comparison, a Linux KVM hypervisor is not just a
| subset of the kernel by definition, but also a much smaller
| codebase, a tiny fraction of the whole kernel.
|
| Replacing shared-kernel isolation like seccomp-filtered
| containers with VMs is, architecturally, simply the replacement
| of a large trusted computing base with a smaller one. If the
| overhead is acceptable, it's hard to argue with from a security
| perspective.
| riobard wrote:
| That's the approach taken by Google's gVisor (at the cost of
| I/O and network performance).
| fsociety wrote:
| gVisor, for better or for worse, does a whole lot of other
| things than just seccomp filtering, and it shows in
| performance tests.
| encryptluks2 wrote:
| gVisor does more than filtering, they basically reimplemented
| the syscalls in an application kernel. At least with seccomp
| the performance overhead is minimal.
| tptacek wrote:
| No, that's really not at all what gVisor is. gVisor is best
| thought of as user-mode Linux --- a complete reimplementation
| of most of the OS kernel. It's not a system call filter; it's
| something much closer to a VM than to seccomp.
|
| gVisor is a very cool codebase. As an illustration of the
| approach: it includes its own TCP/IP stack; we use it in our
| command-line dev tool to allow people to SSH to their VMs
| over WireGuard without having to install WireGuard or obtain
| privileges to manage WireGuard.
| gorkish wrote:
| OK; https://github.com/harvester/harvester
|
| Security and performance aren't the only driving forces; there
| are a lot of technical and operational benefits to the
| abstraction and standard interfaces that you get when running
| stacks that might otherwise look like someone took an Xzibit
| meme too far.
|
| Also remember on a modern system, there are often at least 2
| additional layers at work abstracting interfaces to the "bare
| metal" OS already.
| encryptluks2 wrote:
| I'm not disagreeing that abstraction can be useful, but the
| overhead of a VM is unnecessary if utilizing the full
| potential of containers. Afterall, the Linux Kernel is acting
| as the hypervisor already, so might as well trust it enough
| to properly sandbox containers too and use the right
| functionality to do so. I also think that running a
| virtualization layer adds quite a bit of complexity, so while
| it is cool that projects and companies have made it work and
| integrated it with a container solution, eliminating the VM
| layer altogether seems more ideal IMO.
| ashishbijlani wrote:
| > Can we somehow combine the advantages of the docker ecosystem
| with VMs?
|
| Shameless plug: this is exactly what our goal is with
| https://kwarantine.xyz We are creating a new hypervisor (from
| scratch) that can run strongly isolated Docker/LXC containers.
| mikepurvis wrote:
| Is this what gvisor is? https://github.com/google/gvisor
| ashishbijlani wrote:
| No, gVisor is from Google. They emulate system calls in user-
| space and use VMs, which increases runtime performance
| overhead. We use hardware virtualization to directly run
| containers -- no I/O emulation, no expensive VM exits, scale
| as needed. Initial comparison with FC/GVisor/Xen here:
| https://github.com/ashishbijlani/kwarantine
| monocasa wrote:
| I'm not sure gvisor requires vm exits. Their first backend
| used ptrace very similarly to how user mode Linux worked.
|
| Minor quip though since ptrace might even be slower than vm
| exits; your core point stands.
| rkeene2 wrote:
| User Mode Linux is still around and works well. I use it
| when I need a "fakeroot" without any special privileges
| on the host.
|
| https://rkeene.org/viewer/tmp/fakeroot.sh.htm
| tptacek wrote:
| It sounds like you just said "yes, but what we're building
| is faster". The userland Linux emulation is a security
| benefit, not a liability.
| amscanne wrote:
| The "fork" sounds like you blue pill the OS for each container?
| I'm assuming the concept is like Cappsule [1] or Bromium [2]?
|
| [1] https://cappsule.github.io/ [2]
| https://en.wikipedia.org/wiki/Bromium#/media/File:Bromium-en...
| ashishbijlani wrote:
| fork here is COW on the host kernel (i.e., copying EPT
| entries). We will post detailed technical documentation soon.
| eatonphil wrote:
| There are a few existing projects out there like this (running
| Docker images as virtual machines, specifically) if folks are
| interested. Slim [0] is the one I can remember off the top of my
| head. I think there are a couple more.
|
| Still, neat to have the walkthrough here in this post.
|
| https://github.com/ottomatica/slim
| thekevjames wrote:
| I had fun exploring Docker->VM conversion a while back [1],
| though the larger goal in my case was to be able to make the
| build path to custom GCP VM Images a bit simpler. Exciting to see
| other cases where folks are finding this sort of flow useful!
|
| 1: https://thekev.in/blog/2019-08-05-dockerfile-bootable-
| vm/ind...
| rwmj wrote:
| https://katacontainers.io/ ?
| bonzini wrote:
| Yes, indeed. However it's nice to see directly the mechanisms
| that let Kata do its magic.
| gravypod wrote:
| Something I'd be very interested in: building a PXE image from
| something declarative like Dockerfiles.
| justincormack wrote:
| Try LinuxKit https://github.com/linuxkit/linuxkit
| laurencerowe wrote:
| Google Container Optimized OS is basically this I think. It's
| what's used when you start a GCE instance with a docker image.
|
| https://cloud.google.com/container-optimized-os/
| OldGoodNewBad wrote:
| I think a lot of folks are going out of their way to
| misunderstand what happened. Yes there are other similar projects
| and containers. No, none come from a long established _COMMUNITY
| RUN PROJECT_. This is something akin to the difference between
| VirtualBox and OpenBSD's vmd. Ones a product with a "free" tier,
| the other is a community project.
| tptacek wrote:
| As I understand the landscape here, the big enabling win of
| microvms is faster boot time; there's a cool qemu-lite slide deck
| that goes into detail about how they cut down boot time:
|
| https://www.linux-kvm.org/images/d/d2/03x05B-Chao_Peng-Light...
|
| The big win was slashing away the BIOS stuff.
|
| We use AWS's Firecracker to turn our customers Docker containers
| into Firecracker microvms (Firecracker is Amazon's Rust VMM, the
| engine for Fargate and Lambda). Anecdotally: in my dev
| environment, the difference between Firecracker boot times and
| native Docker container startup is imperceptible; the logging we
| do swamps the VM boot stuff. It's _very_ fast.
___________________________________________________________________
(page generated 2021-06-16 23:00 UTC) |