How do you observe and manage geographically distributed observability pipelines built from thousands of active components that deliver observability data? Can you do that in hybrid environments, for enterprises using both cloud-native and legacy on-premises technologies in a unified way? Everybody knows that Kubernetes has excellent tools and concepts, but it is a container orchestration platform. Or is it really just that? What’s left if we remove the pods, the services and the nodes from Kubernetes? Can we even do that?
The Kubernetes Resource Model
Let’s look at the Kubernetes Resource Model (KRM) first to understand what is behind container orchestration. I’ll try to summarize: Kubernetes is API centric (REST like) but not API driven. What does it mean? It means that there are controllers running outside the API server, and these controllers implement the business logic. Why is this a good thing? Because the API is extendable and we can write custom controllers for the custom resources we define. We do that a lot and we call it the operator pattern.
The Kubernetes API dictates the following patterns:
- Metadata, Spec and Status for a declarative data model – controllers act on the Spec trying to make the observed state match the desired state, and register outcome in the Status
- Optimistic locking – multiple actors can update the same resource, typically:
- Clients set the desired state through the Spec
- Controllers communicate the state through the Status
- Support for efficiently “watching” resources (getting updates instantly as they happen)
- Built-in authentication and authorization model based on industry standards
How does the ecosystem help the API clients?
- Client libraries with code generation (client-go)
- Kubectl as an extensible command line API client
- Client side and server side apply (in the latest Kubernetes versions)
This model enables very powerful patterns for controllers acting on these resources.
The Kubernetes Resource Model is level-triggered instead of edge-triggered, which means it doesn’t really care about what happened; it only cares about the desired state and its ability to converge towards it. It also provides autonomy by distributing data and decentralizing decision making:
“It’s fine to centralize data, but avoid centralized decision making whenever possible.
Instead, distribute information about the desired state and let each node determine how best to get to that state.”
gengnosis: Level-triggered and edge-triggered
Why is this so cool? Because it’s fault-tolerant and extremely robust. The API server and controllers can come and go, and when resumed will pick up the tasks where they left off. And it’s also great for interoperability:
“Any API using the same mechanisms and patterns will automatically work with any libraries and tools (e.g., CLIs, UIs, configuration, deployment, workflow) that have already integrated support for the model…”
How does it work in practice?
For example, client-go builds a sophisticated in-memory representation of the API Server’s state, at least of the view it is interested in. It uses edge-based triggers by watching certain resources, but instead of receiving the differences between state changes, it always receives the full object and builds up an in-memory database and index from it locally. Controller-runtime takes this even further, where the client code is not even aware that it is speaking to a local cache, instead of the API server.
Controllers can watch multiple resources, but they are primarily responsible for the instances of a single resource type. Most of the time it is a hierarchy of resources and controllers working together. The best examples for this are workload types: Deployments, DaemonSets and StatefulSets. For example,
- based on Deployments, the Deployment controller creates ReplicaSets,
- the ReplicaSet controller creates Pod resources, and
- finally Kubelet (also a controller) launches containers on the hosts.
But all of this is also true without the notion of pods, nodes, services, and the whole army of tools to support it: workload controllers, kube-proxy and kubelet. Imagine that you can use the same tools and libraries to manage your own domain objects! You don’t have to:
- roll your own CRUD management,
- bother with OpenAPI schemas,
- generate client libraries, and
- write your own CLI tool
You can have all this for free! (Well, almost…)
I think this could already give you an idea on why this would be useful, but let’s look at a more concrete example – the Axoflow use case.
The Axoflow domain
At Axoflow, one of our priorities is to efficiently manage log collection and forwarding in traditional and cloud native infrastructures. By traditional I mean machines that typically run a single type of workload. In that case we have the notion of Hosts and Logging services running on them. This is, in some way, analogous to Nodes and Containers in Kubernetes. We also have a controller running on the host, analogous to the Kubelet, but instead of managing containers, it deals with the Logging services.
How would you start writing a central orchestration service to control possibly thousands of agents in a robust and fault tolerant way? How would that scale with the complexity you are going to add through the product life cycle? Isn’t this something the Kubernetes ecosystem solves ingeniously under the hood? Hosts and Logging services are low-level resources, but we also have higher level abstractions. Each abstraction level has a specific set of user archetypes that typically works with it. This is a hierarchy of resources and controllers, very similar to how it works in Kubernetes in the aforementioned example.
- The lowest level is for the controller running on the host to discover and manage the logging agents’ lifecycle.
- The next level is for the controller that generates configuration for the logging agents based on the direct paths between hosts and the protocols they support.
- On an even higher level an administrator can define what are the possible routes between groups of hosts and external log destinations.
- And the end users can define the log flows they would like the system to implement.
You don’t need to fully understand what these resources are good for. I’m showing this to demonstrate a real-world example, that maps to a hierarchy of abstractions and that requires different management components acting on their hierarchy level. Of course, we started by implementing our own REST API and workers for the job, but very soon we realized that we would need something really close to what the Kubernetes API could give us.
I found kcp.io when I was looking for existing alternatives to our hand-crafted system. At that time it mostly looked like a complex multi-cluster solution that helps with managing lots of clusters from a single control plane, using a new abstraction called workspaces. It turned out that this abstraction (the workspace), although higher than the “cluster” itself, gets rid of the workload APIs and their controllers completely. It exposes pure Kubernetes APIs, which means you can simply start kcp and talk to it, like you would with a Kubernetes API server using kubectl. It doesn’t have nodes, pods or services, but it has all the primitives you need to write controllers that work with Custom Resources like namespaces, all the RBAC resources, secrets, configmaps, and leases to implement leader election. And just like we can use kubectl as our CLI tool, we can use controller-runtime to write controllers.
Unlike vcluster – which in theory could also solve our problem – kcp doesn’t depend on an actual Kubernetes cluster to run, it is completely standalone. Start kcp and it will immediately provide you with workspaces as cheap as a namespace in a regular Kubernetes cluster.
In kcp, multi-tenancy is implemented through workspaces. A workspace is a Kubernetes-cluster-like HTTPS endpoint: an endpoint usual Kubernetes client tooling (client-go, controller-runtime and others) and user interfaces (kubectl, helm, web console, …) can talk to like to a Kubernetes cluster. A workspace is just a reference, it doesn’t have to run anything, and although it looks like a fully separate cluster, it doesn’t need any by default. You can attach a physical cluster to a workspace, but that’s completely optional.
How does all this add up?
Back to our example at Axoflow, let’s look at a specific task we implemented using KCP. Although we don’t manage hosts, we have an agent called axolet that, if deployed to a host, is capable of discovering and managing its logging services. It runs in an on-prem network on a customer’s host and talks to a Kubernetes API endpoint which is a kcp workspace running inside the Axoflow hosted service. This agent is actually a Kubernetes controller implemented using controller-runtime, that watches Host resources and looks at the Spec to understand what it has to do – for example, install syslog-ng with a specific config. It then writes the most important information back to Status (for example, the control socket of syslog-ng, where the configuration is stored, and so on).
Provisioning workload certificates
Another useful feature on top of this is the ability to provision certificates, so that a service running on a host can connect and send logs to an aggregator component secured with a mutual TLS connection. In order to do this, the first thing that comes to a Kubernetes developer’s mind is, naturally, to use cert-manager! But how can we do that if kcp itself cannot run actual container workloads? The answer is that we can actually run cert-manager anywhere we want (for practical reasons, we run it in Kubernetes as well) as long as it can talk to the kcp endpoint directly.
I’ve created a project to demonstrate how this actually works for those who would like to get hands-on with the tool and a simplified version of the mentioned use case: pepov/kcp-playground.
Although kcp is capable of much more, we use it as a bare-bone Kubernetes API server without container workload management, so that we can rapidly implement something fully unrelated to Kubernetes while still leveraging the army of awesome patterns and tools provided by the ecosystem. Not every problem can be solved this way, but in our case it was an obvious choice to build on the shoulder of a giant and focus on building our core business.