Introduction to Logging Operator
Logging operator (a CNCF project) is commonly used to handle log collection scenarios on Kubernetes. The goal of that project is to provide an end-to-end solution for collecting logs on Kubernetes in different scenarios. However, as time passed, new challenges emerged in this field:
- New technologies like OpenTelemetry
- New use-cases like multi-tenant Kubernetes
- Learnings from the years developing and operating the Logging Operator, for example, backpressure, noisy neighbors, configuration conflicts, and so on.
Solving all these problems would require drastic architecture changes in the Logging Operator, so we’ve decided to choose a different path. We felt that the scope of the Logging Operator was already too big, so we split the responsibilities between the Logging Operator and a new project, the Telemetry Controller:
- The Telemetry Controller collects and labels data on the edges enriches them, and forwards it to the next layer of your logging infrastructure. It doesn’t do any complex processing or error handling, its task is to provide a consistent, well-formed data flow. For more details, see our introductory blog post about the Telemetry Controller.
- The Logging Operator provides the aggregation and buffering layer and performs complex routing, processing, and tenant-specific operations.
Why do we need the Telemetry Controller?
One of the key features of the Logging operator is self-service. However, merging configurations for different use-cases is not an easy task, and is difficult to troubleshoot because the abstraction layer will hide information about the original problems. For example, a syntactically correct configuration doesn’t mean that the configuration is good. If you misconfigure a destination or are using resource-intensive transformations, you can easily block the whole flow.
Multi-tenant use-case
The Logging operator was designed to leverage Kubernetes namespaces to isolate logging flows so that tenants can define their log forwarding rules themselves, while not having to worry about how the log infrastructure is set up. This becomes complicated when you consider different scenarios like soft and hard multi-tenancy.
Soft multi-tenancy means that tenants trust each other to a certain degree, while hard multi-tenancy converges toward sharing as little resources as possible. The goal in our case is to support the spectrum of log isolation levels within the bounds of a single cluster.
Problems with this concept become increasingly obvious when you are operating a real multi-tenant Kubernetes cluster. For example, in scenarios like:
- You are a platform provider for internal/external teams.
- You are sharing infrastructure for your customers where to some extent, they can configure their own logging.
- You are running multiple applications on the same cluster, but want to prevent them from affecting each other.
Difference between Logging Operator and Telemetry Controller
Telemetry Controller is a log collector that implements filtering and routing right at the edge, rather than at the aggregation level. Basically, it extends the flexibility of the Logging Operator to the edge. If there is no need for aggregation, the Telemetry Controller can directly send all or a filtered portion of your data directly to a remote destination. In scenarios like these it completely replaces other agents like Fluent Bit, Vector, or Promtail. Telemetry Controller also provides flexible routing capabilities out of the box with minimal complexity.
The Telemetry Controller provides a convenient and robust multi-tenant API on top of OpenTelemetry, so you can just describe what telemetry data you need, and where it should be forwarded. The Telemetry Controller provides isolation and access control for telemetry data, similar to what Kubernetes provides for pods, secrets, and other resources. It introduces new resources that give granular control over the shared data:
- Administrators can define tenants to provide isolation and access control for telemetry data.
- Users can create subscriptions to select telemetry data streams accessible by their tenant only.
- Users can create or refer to available outputs for subscriptions to route and transport data.
For more details, see the Telemetry Controller introduction blog post.
Benefits of using Logging Operator and Telemetry Controller
Telemetry Controller provides a robust and high-performance collector agent for your edge nodes that you can declaratively configure, and allows your tenants to similarly manage their logging configuration. It also provides filtering, message enrichment, and per-tenant routing capabilities, allowing you and your tenants to selectively send their data to their own aggregators (like Logging Operator), or directly to specific destinations.
Sending data to Logging Operator gives you a flexible and high-performance log aggregator that supports 50+ destinations and powerful routing and log transformation capabilities.
Used together, the Telemetry Controller and the Logging Operator are ideal for multi-tenant log collection in Kubernetes. Also, since both provide detailed metrics about the processed data, together they give you unparalleled monitoring capabilities into the status of your logging and telemetry pipeline.
Hands-on demo
This demo shows you how to:
- Install the Telemetry Controller
- Install the Logging Operator
- Install the log-generator that provides sample log messages that the Telemetry Collector can collect
- Configure Telemetry Controller to send the collected data to Logging operator instance
Prerequisites
Create a KinD cluster.
Deploy the Telemetry Controller
helm upgrade --install --wait --create-namespace --namespace telemetry-controller-system telemetry-controller oci://ghcr.io/kube-logging/helm-charts/telemetry-controller
Deploy the Logging Operator
helm upgrade --install logging-operator oci://ghcr.io/kube-logging/helm-charts/logging-operator --version=4.6.0 -n logging-operator --create-namespace
Deploy the log-generator
helm upgrade --install --wait log-generator oci://ghcr.io/kube-logging/helm-charts/log-generator -n log-generator --create-namespace
Configure the Logging Operator
Apply the following sample configuration to the Logging Operator. Since in this demo we just want to demonstrate that the Telemetry Controller can send data to the Logging Operator, we don’t configure any specific output, just drop the incoming data into /dev/null.
kubectl apply -f https://raw.githubusercontent.com/kube-logging/telemetry-controller/main/docs/examples/fluent-forward/telemetry-controller.yaml
Configure the Telemetry Controller
Apply the following sample configuration to the Telemetry Controller. This creates a tenant (called kubernetes) that has access to the log-generator namespace, an OtelOutput called fluent that sends the data to the address of the Logging Operator service, and a subscription that routes the collected logs of the tenant to this output.
kubectl apply -f https://raw.githubusercontent.com/kube-logging/telemetry-controller/main/docs/examples/fluent-forward/logging-operator.yaml
Check the collected logs
After you’ve applied the above configurations, the Telemetry Controller starts sending the collected logs to the Logging Operator. To check that it works, run the following command to find the name of the Logging Operator pod:
kubectl get pods
The pod you need is called something like: all-to-file-fluentd-0
Check the logs of this pod:
kubectl logs all-to-file-fluentd-0
You should see the NGINX access logs that the log-generator has generated, like this:
2024-05-21 15:20:47.635797464 +0000 otelcol: {"severity":"","message":"144.100.113.30 - - [21/May/2024:15:20:47 +0000] \"GET /blog HTTP/1.1\" 503 18529 \"-\" \"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/28.0.1469.0 Safari/537.36\" \"-\"\n","time":"2024-05-21T15:20:47.422395672Z","exporter":"fluentforwardexporter/default_fluent","kubernetes":{"namespace_name":"log-generator","container_name":"log-generator","pod_name":"log-generator-74f5577887-rbk42","host":"minikube","labels":{"pod-template-hash":"74f5577887","app.kubernetes.io/instance":"log-generator","app.kubernetes.io/name":"log-generator"}}}
You can recognize these logs by the log-generator labels at their end: "app.kubernetes.io/instance":"log-generator","app.kubernetes.io/name":"log-generator"}}
Axoflow integration
Axoflow platform, our commercial offering is an end-to-end observability pipeline solution that simplifies the control of your telemetry infrastructure with a vendor-agnostic approach. It integrates with several different log collector and aggregator solutions, including the Telemetry Collector and the Logging Operator. Based on the metrics collected from the collectors and aggregators, Axoflow visualizes the complete edge-to-destination telemetry data flow. It gives you a complete picture of the log sources, their destinations, and their relative contribution to the entire logging pipeline.
Metrics are an effective means to find the root cause for these incidents, thereby they are important tools to reduce the MTTR. The Axoflow Management Console can visualize and alert on the collected metrics about your telemetry pipeline, in both on-premises and Kubernetes environments.
The UI fully supports multi-tenant scenarios, and provides RBAC-based views for the individual tenants.
Long term vision
In the long run, we see the Telemetry Collector as a fundamental part of our offering as the recommended collector agent of our telemetry pipeline. The Telemetry Collector will support standard protocols to work flawlessly with a number of aggregators, for example, OpenTelemetry Collector, the Logging Operator, or AxoRouter (our commercial log and telemetry data aggregator).
Summary
The Telemetry Collector is a new collector agent for edge nodes, designed for multi-tenant use cases. It provides declarative configuration, filtering, message enrichment, and per-tenant routing capabilities, allowing you and your tenants to selectively send their data to their own aggregators – like the Logging Operator as we’ve shown in the demo -, or directly to specific destinations. Give it a try, and tell us what you think!
On-deman Webinar
Parsing
sucks!
What can you do
about it?
56 minutes
Balázs SCHEIDLER
Founder syslog-ng™
Mark BONSACK
Co-creator SC4S
Sándor GUBA
Founder Logging Operator
Neil BOYD
Moderator
On-demand Webinar
Parsing
sucks!
What can you do about it?
56 minutes
Follow Our Progress!
We are excited to be realizing our vision above with a full Axoflow product suite.