December 6, 2023

and

No items found.

Multi-tenancy using Logging operator

The Logging operator was envisioned with multi-tenancy in mind from the beginning. It was designed to leverage Kubernetes namespaces to isolate logging flows (we will see what we mean by that in a minute) so that developer teams can define their log forwarding rules themselves, while not having to worry about how the log infrastructure is set up.

This has been a very useful concept, but the implementation turned out to be limited in certain situations. We – as the core maintainers of the Logging operator – are committed to pushing the project over these limitations to be the go-to solution when it comes to secure and efficient multi-tenant logging in Kubernetes.

Log isolation in multi-tenant environments

Why do we need log isolation at all? Kubernetes has an API endpoint to query or tail logs which is also protected by standard RBAC to control permissions. Unfortunately, by design it is not appropriate for automated log shipping, but rather for relatively infrequent on-demand log queries.

For a comprehensive solution, you have to get to the log source, which is the container runtime on the nodes. The typical solution is to run a log agent as a DaemonSet and mount the single host folder that contains all the logs. Mounting a host path – even in read-only mode – is a security boundary and once one steps over that boundary there are no more permission checks that would limit the agent in which application logs it can process. This can be a real problem in multi-tenant environments, where multiple teams or customers are sharing the same cluster.

Soft and hard multi-tenancy

There are several ways to think about isolation in Kubernetes, especially when it comes to logs. Soft multi-tenancy means that tenants trust each other to a certain degree, while hard multi-tenancy converges toward sharing as little resources as possible.

In the soft form tenants typically share as many resources as possible to gain cost effectiveness and keep operations complexity low at a price of being less secure. Running the workloads of multiple teams or customers in the same Kubernetes cluster is already a type of soft multi-tenancy when we look at it from the cluster perspective. The goal in our case is to support the spectrum of log isolation levels within the bounds of a single cluster.

Let’s see how the Logging operator helps with restoring the lost permission boundary of logs after collection.

Restoring the lost isolation

Traditionally container logs are handled by Fluent Bit as the node agent of choice because of its exceptional performance and low resource footprint. The implementation has always been very simple to avoid complexity at this point in the system since the routing capabilities in Fluent Bit are rather limited. The tag and match concept is useful, but for anything but trivial use cases, it can get very complex and fragile. There is no easy way, for example, to match different Kubernetes labels (keys and values) at once, which was the second most important requirement we wanted to fulfill (besides supporting Kubernetes namespace-based routing).

Fluent Bit is lightweight enough, so it was a deliberate choice to keep it as the collector running on all nodes, but because of the reasons above we decided to implement all the flexibility in the aggregation layer. Fluentd and syslog-ng are both capable of providing the flexibility we need here. Fluentd has plugins for the vast majority of possible output destinations, while syslog-ng (which was added later) has exceptional performance characteristics while still having support for the most popular outputs.

We can see now how the aggregator becomes the place where all the logs are accumulated and where the permission boundaries need to be reimplemented. But no worries; there is nothing new to learn regarding permissions since the implementation is primarily based on the Kubernetes Namespace model.

The Flow model

A Flow consists of selectors to be able to define a set of logs we want to filter or transform and forward to one or more Outputs in the end. Initially, selectors were simple Kubernetes labels, but later host and container names were added as options as well (for syslog-ng, any log metadata can be used as a selector). The Source router on the diagram represents the logic that understands the Flow selectors (dotted lines) and routes logs in a way that every Flow works on its own copy so they cannot affect each other’s data.Here is an example Flow that is going to forward logs of every pod that has the label app.kubernetes.io/name: log-producing-app to the output named http.

kind: Flow
metadata:
  name: example
  namespace: default
spec:
  match:
    - select:
        labels:
          app.kubernetes.io/name: log-producing-app
   localOutputRefs:
    - http

Flow and Output are namespace-scoped Kubernetes Custom Resources, and the aggregator configuration is generated based on these higher-level constructs automatically. The implementation for this in Fluentd and syslog-ng are different, but the point is that the user can only refer to logs that originated in the same namespace where the Flow resource resides.ClusterFlow is different, as there is no namespace filtering applied to it by default. It is designed for administrators so that they can define log flows over multiple – or even all – namespaces. ClusterOutputs are the outputs to be used from ClusterFlows but also from namespaced Flows to provide a shared output reusable from multiple namespaces. Again, all of these components are implemented in the aggregator.There is another important resource, which is the central part of the system: the Logging resource. It is responsible for defining the operational attributes of the collector, and the aggregator, and for setting the boundaries of the entire logging domain as we will see in a moment.Take a look at the following diagram which summarizes the capabilities of the above resources and provides us with an example of multi-tenancy for developer teams:

Developers of one team (team A) can handle and forward their logs on their own while another team (team B) can leverage the ClusterOutput provided for them as a shared resource. Operations can use a ClusterFlow for defining global rules to fulfill requirements like archiving all logs for security reasons.

Follow along to learn more about what the logging domain is and what other types of isolation the Logging Operator provides.

> Note: the notion of Flow, Clusterflow, Output, and ClusterOutput are used interchangeably for Fluentd and syslog-ng flows and outputs as well, though these are separate Kubernetes resources.

Isolation levels

Namespaced resources for soft multi-tenancy

As you can see, the primary level of isolation is the namespace, enforced in the aggregator. Operations can grant permissions for a group of users to create and manage Flow and Output resources in their own namespaces. There is no resource isolation on the collector, or on the aggregator level either. This might be good for resource sharing between multiple teams who should not have access to each other's logs but need to share resources to keep costs at a minimum.

The logging domain for hard multi-tenancy

The logging domain is the set of Kubernetes resources that maps into a specific runtime configuration applied by the operator. It has its own collector and aggregator, so we can think about it as a hard tenant, where no compute resources are shared (at least not on the aggregator level).

Logging reference

The loggingRef field is available in most resource types to form an isolated logging domain where all resources belong to a specific Logging resource identified by the loggingRef. Since the collector does not have any routing or filtering capabilities by default, the use cases in which a logging domain can be used as a tenant on its own are limited.

However, one great example use case for this is node isolation based multi-tenancy, where the workloads of a tenant are restricted on specific nodes. In that case, every tenant has its own Logging resource. Logs are isolated since every collector can only collect and forward the logs originated on the tenant’s nodes to an aggregator residing in the same tenant. Configuration is also isolated because Flows and Outputs strictly belong to a single Logging domain dictated by the loggingRef field on all resources.

Logging operator multi-tenancy, logging domain with loggingref

Note: in this case, the loggingRef field must be protected using some kind of policy framework, otherwise users might bind their resources to a different logging domain.

Namespace restrictions to limit the logging domain

Resources with an empty loggingRef are processed by all Logging resources by default, except if the Logging defines a watchNamespaces or watchNamespaceSelector field (the former is a static list while the latter is a label selector as its name suggests). In that case Flow and Output resources are only pulled into the logging domain if they are all in the defined list of these watched namespaces.ClusterFlow and ClusterOutput resources must reside in a specific control namespace to be processed, typically owned by the operations team.The watchNamespace* parameters configured in the Logging resource also help to ensure that users cannot bind their Flow and Output resources to other logging domains, even if they try to set a different loggingRef.

Traditional Limitations

Previously the Logging resource could only have one collector defined in the spec. That is a serious limitation in case you have multiple different types of nodes. EKS for example switched from the Docker runtime to containerd, which required a rolling upgrade of nodes. It is now possible to support multiple FluentbitAgent resources using the same configuration scheme as before, but in a separate resource type to allow diverging configs.

Another painful limitation was that the collector wasn’t able to filter and route logs to be sent to the aggregator. The problem with this is if someone wanted to create multiple logging domains on the same cluster, each domain’s aggregator would receive the same set of logs.

In this example, we have two nodes and two tenants owning two separate logging domains, but sharing the same nodes. Unfortunately, all the logs from both nodes are sent to both aggregators. This means that we use double the bandwidth, but also that tenants receive each other’s logs as well, so they are back to soft multi-tenancy. Let’s see how that has finally changed in the latest release!

Hard multi-tenancy on shared nodes: the LoggingRoute

Logging operator release 4.4 introduced a new resource called LoggingRoute. It adds namespace filtering and cross-domain routing capabilities to the collector.

It instructs the collector of a logging domain to send logs to one or more aggregators in the same or different domains. Based on the watch namespaces configured in the target Logging resources, it also applies namespace filters, so that every aggregator receives logs from their tenant’s namespaces only.

In the above example, we have two user tenants (A and B) with their own Logging domains, which are essentially two Logging resources. These Logging resources define an aggregator each, but no collectors. There is a third logging tenant (Ops) that is ultimately the collector in the system. It may run its own aggregator as well and can receive logs from all namespaces, to implement cluster-wide logging flows.

This is how the LoggingRoute resource looks like in the above scenario:

kind: LoggingRoute
metadata:
  name: ops-to-tenants
spec:
  source: ops
  targets:
    matchExpressions:
      key: tenant
      operator: Exists

Given that, the only thing we have to make sure of is to add the proper tenant label to the appropriate logging resources. We use a label selector here because we want to avoid adding each and every target individually by name. Remember, the controller filters logs sent to a target tenant based on the namespaces defined in the target Logging’s watch namespaces.

Here is how Logging A would look like:

kind: Logging
metadata:
  name: tenant-a
  labels:
    tenant: a                      # loggingRoute matches on this label 
spec:
  syslogNG: {}                     # the aggregator
  watchNamespaces: ["a1", "a2"]    # loggingRoute configures the forwarded namespaces based on this field

Non-functional considerations

In this specific scenario, there is a caveat: one collector can handle and route the logs of all the workloads in all tenants, but that very same collector has now become the single point of failure. Up until now, it had to handle a single output and forward all the logs to it, but now it routes logs to all the tenants as individual outputs. If not configured correctly, one misbehaving output might affect the others in availability, performance, or even disk space (in case we use disk buffering).

In theory, we can configure a LoggingRoute for each logging domain, but it would require a running collector agent daemonset for each tenant, which would be a waste of resources if there are lots of them. Should there be too many tenants to handle by one collector process, there would still be an option to shard tenants into isolated layers with a single administrative logging domain each to make sure failures are not cascading through the whole system.

Conclusions and what’s next

We have seen how the Logging operator has evolved and continues to improve its support of multi-tenancy requirements. The collector, which processes all the logs on the nodes, now has the ability to filter and route – but only based on a very limited set of attributes (logging domains and their namespaces) so that the complexity is limited at the level where tenants’ logs are processed together. This is important to minimize the possibility of introducing bugs and to ensure the code is maintainable.

On the other hand, we can configure an aggregator that resides in one of the tenant’s namespaces, which is great because tenants can configure logging flows without affecting each other. These flows will also have full visibility into the runtime state to get direct feedback about performance issues or runtime errors. One thing is still missing though: the runtime configuration of the aggregator resides in the Logging resource, which should not be made available for the tenant owners to edit, because it contains fields they shouldn’t be allowed to modify (control namespace for example). In order to solve this, there is ongoing work to allow for the configuration of the aggregator using a separate CRD which tenants owners will be able to fully control – so stay tuned!