proxy70

	[HN Gopher] Writing a Kubernetes Operator ___________________________________________________________________ Writing a Kubernetes Operator Author : todsacerdoti Score : 141 points Date : 2023-03-09 13:41 UTC (9 hours ago)
	web link (metalbear.co)
	w3m dump (metalbear.co)
	\| sigwinch28 wrote: \| I find myself conflicted between two approaches at work: \| \| 1. Write a provider/extension/whatever for a tool like Terraform \| or Pulumi. I live in a world where the infrastructure doesn't \| move underneath my feet. I am the source of truth. I feel like I \| only need to reconcile changes when _I_ make changes to my IaC \| repositories. \| \| 2. I could write something that exists in a control plane, like \| Kubernetes operators or Crossplane. I live in a world where I \| look at the world, find the delta between current state and \| desired state, then try to reconcile. This is an endless loop. \| \| I feel like these are different approaches with the same goal. \| Why should I decide either way beyond tossing a coin? \| \| Some use cases: \| \| - an internal enterprise DNS system which is not standards- \| compliant with the world at large \| \| - an internal certificate authority and certificate issuing \| system. \| debarshri wrote: \| A better way to write an operator these days is to use \| kubebuilder [1]. \| \| My complaint is that I have seen orgs write operators for random \| stuff, often reinventing the wheel. Lot of operators in orgs are \| result of resume driven development. Having said that it often \| comes handy for complex orchestration. \| \| [1] https://github.com/kubernetes-sigs/kubebuilder \| casperc wrote: \| What would be a good example where an operator would make \| sense? \| EdwardDiego wrote: \| I worked on an operator that manages Kafka in K8s. If you \| want to upgrade the brokers in a Kafka cluster, you generally \| do a rolling upgrade to ensure availability. \| \| The operator will do this for you, you just update the \| version of the broker in the CR spec, it notices, and then \| applies the change. \| \| Likewise, some configuration options can be applied at \| runtime, some need the broker to be restarted to be applied, \| the operator knows which are which, and will again manage the \| process of a rolling restart if needed to apply the change. \| \| You can also define topics and users as custom resources, so \| have a nice Gitops approach to declaring resources. \| debarshri wrote: \| There is whole list of public operators that you can find in \| operator hub [1]. \| \| [1] https://operatorhub.io/ \| spenczar5 wrote: \| Operators make sense when you need to automatically modify \| resources in response to changes in the cluster's state. \| \| An example that has come up for me is an operator for a Kafka \| Schema Registry. This is a service that needs some \| credentials in a somewhat obscure format so it can \| communicate very directly with a Kafka broker. If the \| broker's certificates (or CA) are modified, then the Schema \| Registry needs to have new credentials generated, and needs \| to be restarted. But the registry shouldn't (obviously) have \| direct access to the broker's certificates. Instead, there's \| a more-privileged subsystem which orchestrates that dance; \| that's the operator. \| sleepybrett wrote: \| kubernetes itself is a collection of controllers/operators. \| It takes manifests like pods and uses that information to \| create the workload in your container runtime on a node with \| the resources it needs. \| debarshri wrote: \| A good example from my perspective is when you are delivering \| an application as 3rd party vendor and you wish to automate \| lot of operational stuff like backup, scaling based on \| events, automating stuff based on cluster events. It starts \| becoming very valuable. I am sure there are many more use \| cases for. \| jrockway wrote: \| I would not write an operator to do any of these things. To \| me an "operator" strongly implies the existence of a CRD \| and the need to manage it. So for autoscaling, HPA/VPA are \| built into k8s. Backups should be an application-level \| feature; when the "take a backup" RPC or time arrives, take \| a backup and dump it in configured object storage. \| Automating stuff based on cluster events also doesn't \| require an operator; call client.V1().Whatever().Watch and \| do what you need to do. \| \| The only moderately justifiable operator I've ever seen is \| cert-manager. Even then, one wonders what it would be like \| if it just updated a Secret every 3 months based on a hard- \| coded config passed to it, and skipped the CRDs. \| jhoelzel wrote: \| - creating databases for your app on the fly. \| \| - scaling up and down applications because of time instead of \| demand. or based on non metric based actions \| \| - Extending kubernetes to understand your workload \| \| - Automating configuration and management of complex \| applications \| \| - Managing legacy applications that cannot be easily \| containerized or migrated to the cloud. \| \| if you love k8s youll love operators \| \| the list is endless! \| dilyevsky wrote: \| With respect, being "in love" with a technology is not a \| good way to go about it - it leads to tunnel vision \| remram wrote: \| An operator operates something, e.g. it actively makes \| changes. If you want to deploy an application, a Helm Chart \| is the correct way. It will allow you to have deterministic \| deployment, that you can duplicate multiple times in your \| cluster, and you can dry-run it and see the generated \| manifests. \| \| An operator is needed when you can't just deploy and forget \| about it. An example is the Prometheus operator, which will \| track annotations created by users to configure the scraping \| configuration of your Prometheus instances. Another example \| is cert-manager, which gets certificates into secrets based \| on Certificate and Ingress objects, renews them automatically \| before expiry, and does that by creating ingresses picked up \| by your ingress controller. \| \| The advantage of an operator is that it will react to stuff \| happening in the cluster. The drawback is that it reacts to \| stuff happening, potentially doing unexpected things because \| changes happen at any time and you can't dry-run them. \| Another drawback is that they are usually global, so you \| can't run multiple versions at the same time for different \| namespaces (mainly because custom resource definitions are \| global). \| \| Unfortunately many people think packaging an application = \| creating an operator, and that operator does nothing a chart \| couldn't do. \| stasmo wrote: \| The CockRoach DB example in the article is a perfect \| example of an unnecessary CRD. Acquiring certificates \| within an Kubernetes cluster is a common requirement for \| lots of applications and there are lots of solutions out \| there. Is it really necessary to spend time writing your \| own operator? Now you have a second helm chart and an \| operator to maintain. Now you have to explain to people \| which chart to use. You could get rid of the non-operator \| chart but now I have operators within the cluster acquiring \| certificates in 5 or 6 different ways. Do I have to \| configure the credentials for 6 operators so they can make \| Route53 DNS challenge records? \| \| Edit: maybe we could shift left and ask the app developers \| to add certificate acquisition directly into the app \| source. \| outworlder wrote: \| > Do I have to configure the credentials for 6 operators \| so they can make Route53 DNS challenge records? \| \| A certificate for service to service communication does \| not have to correspond to a public endpoint. \| mdaniel wrote: \| > that operator does nothing a chart couldn't do. \| \| Or is can be _actively harmful_ when they don 't do any \| error checking whatsoever, causing it to be less accurate \| that `helm template` would be. Related, it's also one more \| thing to monitor because it can decide to start vomiting \| errors for whatever random reason \| dpkirchner wrote: \| Neither of those cases really need an operator -- \| Prometheus and cert-manager both have code that watches for \| changes on ingresses/services/custom resources and reacts \| to changes (using permissions granted via RBAC). I've used \| both without an operator and still use Prometheus without \| one. \| cacois wrote: \| I've found operator-sdk [1] (which uses kubebuilder under the \| hood) to be a better starting point for operator development. \| \| [1] https://github.com/operator-framework/operator-sdk \| MuffinFlavored wrote: \| Can you give me an example use case you've ran into where you \| need to write a custom k8s operator/API? \| [deleted] \| darren0 wrote: \| I'm not sure why this is a top post. The definitions of \| controller and operator are completely wrong. The example code is \| for creating a custom api server which is only done in the most \| advanced of advanced use cases. The implementation of the \| apiserver is too naive to demonstrate they have any understanding \| of the complexity that supporting watch will cause. \| mfer wrote: \| The article has a description of what an operator is wrong. The \| definition of an operator originally was... \| \| > An Operator is an application-specific controller that \| extends the Kubernetes API to create, configure, and manage \| instances of complex stateful applications on behalf of a \| Kubernetes user. It builds upon the basic Kubernetes resource \| and controller concepts but includes domain or application- \| specific knowledge to automate common tasks. \| \| This is the original definition of an operator [1]. People no \| use them for stateless things and domain specific work has \| taken off. \| \| You can look at the Kubernetes docs [2] to see refinements on \| it... \| \| > Kubernetes' operator pattern concept lets you extend the \| cluster's behaviour without modifying the code of Kubernetes \| itself by linking controllers to one or more custom resources. \| Operators are clients of the Kubernetes API that act as \| controllers for a Custom Resource. \| \| [1] \| https://web.archive.org/web/20190113035722/https://coreos.co... \| \| [2] https://kubernetes.io/docs/concepts/extend- \| kubernetes/operat... \| richardwhiuk wrote: \| You don't need to implement a custom API server to implement \| an operator - you can just watch a CR. \| jhoelzel wrote: \| for an operator you do, what you mean is a controller =) \| [deleted] \| timelapse wrote: \| > The definitions of controller and operator are completely \| wrong. \| \| mind clarifying? \| devkulkarni wrote: \| We have an FAQ about Operators here: https://github.com/cloud- \| ark/kubeplus/blob/master/Operator-F... \| \| It should be helpful if you are new to the Operator concept. \| \| Operators are generally useful for handling domain-specific \| actions - for example, performing database backups, installing \| plugins on Moodle/Wordpress, etc. If you are looking for \| application deployment then a Helm chart should be sufficient. \| kimbernator wrote: \| I didn't really enjoy my experience with the few operators I've \| worked with, mainly because they require the maintainer to build \| in some sort of access to basic kubernetes functionality. I see \| the benefit of operators, but I hated that in order to do \| something as simple as define memory/CPU limits to certain \| containers I would need to open a PR to the repo and wait weeks, \| sometimes months, for a new release. \| \| It's frustrating to be a kubernetes admin but not have access to \| basic configuration options because the maintainers of even some \| very high-profile operators (looking at you, AWX) neglected to \| build in access to basic functionality. \| evancordell wrote: \| This is a common frustration of mine as well! \| \| In the latest release of the spicedb-operator[0], I added a \| feature that allows users to specify arbitrary patches over \| operator-managed resources directly in the API (examples in the \| link). \| \| There are some other projects like Kyverno and Gatekeeper that \| try to do this generically with mutating webhooks, but \| embedding a `patches` API into the operator itself gives the \| operator a chance to ensure the changes are within some \| reasonable guardrails. \| \| [0]: https://github.com/authzed/spicedb- \| operator/releases/tag/v1.... \| remram wrote: \| The SpiceDB operator looks like a prime example of something \| that should have been a Helm Chart. Migrations can be run in \| the containers. \| \| Operators are just the non-containerized daemons of the \| Kubernetes OS. We did all this work to run everything in \| neatly encapsulated containers, and then everyone wants to \| run stuff globally on the whole cluster. What's the point? Do \| we just containerize clusters and start over? \| xyzzy_plugh wrote: \| I'm not sure what you're on about. Operators don't need to \| run in cluster at all. And even then, they can absolutely \| run as containers. And as far as permissions go, that's up \| to you. They're just regular service accounts. \| evancordell wrote: \| I get the sentiment. We held off on building an operator \| until we felt there was actually value in doing so (for the \| most part, Deployments cover the operational needs pretty \| well). \| \| Migrations can be run in containers (and they are, even \| with the operator), but it's actually a lot of work to run \| them at the right time, only once, with the right flags, in \| the right order, waiting for SpiceDB to reach a specific \| spot in phased migrations, etc. \| \| Moving from v1.13.0 to v1.14.0 of SpiceDB requires a multi- \| phase migration to avoid downtime[0], as could any phased \| migration for any stateful workload. The operator will walk \| you through them correctly, without intervention. Users who \| aren't running on Kubernetes or aren't using the operator \| often have problems running these steps correctly. \| \| The value is in this automation, but also in the API \| interface itself. RDS is just some automation and an API on \| top of EC2, and I think RDS has value over running postgres \| on EC2 myself directly. \| \| As for helm charts, this is just my opinion, but I don't \| think they're a good way to distribute software to end \| users. The interface for a helm chart becomes polluted over \| time in the same way that most operator APIs become \| polluted over time, as more and more configuration is \| pulled up to the top. I think helm is better suited to \| managing configuration you write yourself to deploy on your \| own clusters (I realize I'm in the minority here). \| \| [0]: \| https://github.com/authzed/spicedb/releases/tag/v1.14.0 \| ojhughes wrote: \| Adding the patch api is neat! I've solved this in the past by \| embedding the entire PodSpec etc into the CRD \| remram wrote: \| Did you call your CRD "Deployment"? \| sklarsa wrote: \| I might have to borrow that! Very clever \| hintymad wrote: \| > I would need to open a PR to the repo and wait weeks, \| sometimes months, for a new release. \| \| Just curious, is this a limitation of the Operators framework, \| or that of your system's implementation? My knee-jerk reaction \| is that any implementation should absolutely not require \| opening ticket. After all, Amazon's API mandate happened 20 \| years ago, and Netflix followed suit to achieve phenomenal \| productivity for their engineers. I have a hard time imagining \| why any engineer would think that gatekeeping configuration \| with PR is a good idea(a UI with proper automation and approval \| process that hides generated PR for specific use cases is a \| different matter) \| IceWreck wrote: \| Not a kubernetes expert, but my understanding is that that \| operators are regular programs that run in a kubernetes \| container and interact with the kubernetes API to \| launch/manage other containers and custom kubernetes \| resources. \| \| An operator (or its custom resource) can be configured by \| Kubernetes YAML/API and its upto the creator of the operator \| to specify the kind of configuration. If the operator creator \| did not specify options to set cpu/memory limits on the pods \| managed by the operator, then you can't do anything. You have \| to add that feature into the operator and then make a pull \| request and wait for it to be upstreamed. \| \| Or fork it instead. Same thing for helm charts (except \| forking and patching them is easier than forking an \| operator). \| fedreg wrote: \| Here's another example of a custom rust operator, \| https://github.com/mach-kernel/databricks-kube-operator \| \| Written by a co-worker to help manage our databricks projects \| across clusters. Works wonderfully!! \| alexott wrote: \| But why such complexity? Is it easier to maintain than \| terraform code? \| EdwardDiego wrote: \| Yes. Terraform doesn't actively manage resources, opererators \| do. \| jhoelzel wrote: \| Oh i love operators they usually tie the entire cluster together \| and lead to amazing things! Think of Kubernetes as an advanced \| API server that can be extended endlessly and operators are the \| way to do it. \| \| There really is no magic, is all there and with go the images are \| usually what? like 10 mb? \| \| It's essential to have a solid understanding of Kubernetes \| architecture, concepts such as custom resources and controllers, \| and the tools and APIs available for working with Operators. \| \| Dont use rust though, use and sdk like the operator sdk or \| kubebuilder. Its native to k8s and you will have a much easier \| time too. \| Thaxll wrote: \| Using Rust for that is a bad idea, just use the official and \| native SDKs ( in Go ). Rust does not have any equivalent to \| https://sdk.operatorframework.io/ \| jzelinskie wrote: \| Since Go got generics, working with the Kubernetes API could \| become far more ergonomic. It's been pulling teeth until now. I'm \| eager to see how the upstream APIs change over time. \| \| In the mean time, one of the creators of the Operator \| Framework[0] built a bunch of useful patterns using generics that \| we used to build the SpiceDB Operator[1] called controller- \| idioms[2]. \| \| Does anyone know of other efforts to improve the status quo? \| \| [0]: https://operatorframework.io \| \| [1]: https://github.com/authzed/spicedb-operator \| \| [2]: https://github.com/authzed/controller-idioms \| crabbone wrote: \| I've written (well, participated in development of) two \| Kubernetes operators, and support about a dozen of them (in our \| own deployment of Kubernetes): Jupyter, PostgreSQL, a bunch of \| Prometheus operators and a handful of proprietary ones. \| \| In my years of working with Kubernetes I cannot shake the feeling \| that it's, basically, an MLM. It carefully obscures it's \| functionality by hiding behind opaque definitions. It doesn't \| really work, when push comes to shove. And, most importantly, it \| survives in a parasitic kind of way: by piggybacking on those who \| develop all kinds of extensions, be it operators, custom \| networking or storage plugins, authentication and so on. \| \| My problem is I cannot find who stands at the top of the pyramid. \| There's Cloudnative Foundation, but all it does is selling \| certifications nobody really needs... so, that cannot possibly be \| it. No big name doesn't really benefit from this in an obvious \| way... \| \| So... anyways, when I hear people argue about how to implement \| this or another extension of Kubernetes, it rings the same as \| when people argue about styles of agile, or code readability etc. \| nonsense. There isn't a good way. There is not acceptance \| criteria. The whole system is flawed to no end. \| _muff1nman_ wrote: \| This article is mistaken from the get-go as an operator is not \| the same as an apiservice. Rather an operator is a wider term for \| something that includes a controller. See \| https://kubernetes.io/docs/concepts/extend-kubernetes/operat... \| \| Also it's important for people reading this article - an \| apiservice (which this article talks about) is very rarely \| something that should be done. An operator is more appropriate \| for nearly all cases except for when you truly need your state \| stored outside of the internal Kubernetes etcd datastore. \| reedjosh wrote: \| Custom Resource + Controller = Operator. Good call! \| \| > Operators are clients of the Kubernetes API that act as \| controllers for a Custom Resource. \| jhoelzel wrote: \| exactly! controlling refers to directing or regulating the \| behavior of something, while operating refers to the actual \| execution or manipulation. \| tenac23 wrote: \| After reading the comments we updated the article \| rdtsc wrote: \| You have a problem: orchestrating some thing in kube, so you \| write some custom operator logic running alongside your main \| product; but now you have two problems to worry about. \| \| I've seen just as much if not more issues with debugging the \| operator logic itself as with the main pods/deployments it was \| trying to manage. \| \| So just from a practical point of view, I think it should be a \| last resort after everything else fails (helm charts, etc). ___________________________________________________________________ (page generated 2023-03-09 23:01 UTC)