Parsing sucks! Watch our on-demand webinar and learn what you can do about it! >>

Introduction to Logging Operator

Logging operator (a CNCF project) is commonly used to handle log collection scenarios on Kubernetes. The goal of that project is to provide an end-to-end solution for collecting logs on Kubernetes in different scenarios. However, as time passed, new challenges emerged in this field:

  • New technologies like OpenTelemetry
  • New use-cases like multi-tenant Kubernetes
  • Learnings from the years developing and operating the Logging Operator, for example, backpressure, noisy neighbors, configuration conflicts, and so on.

Solving all these problems would require drastic architecture changes in the Logging Operator, so we’ve decided to choose a different path. We felt that the scope of the Logging Operator was already too big, so we split the responsibilities between the Logging Operator and a new project, the Telemetry Controller:

  • The Telemetry Controller collects and labels data on the edges enriches them, and forwards it to the next layer of your logging infrastructure. It doesn’t do any complex processing or error handling, its task is to provide a consistent, well-formed data flow. For more details, see our introductory blog post about the Telemetry Controller.
  • The Logging Operator provides the aggregation and buffering layer and performs complex routing, processing, and tenant-specific operations.

Why do we need the Telemetry Controller?

One of the key features of the Logging operator is self-service. However, merging configurations for different use-cases is not an easy task, and is difficult to troubleshoot because the abstraction layer will hide information about the original problems. For example, a syntactically correct configuration doesn’t mean that the configuration is good. If you misconfigure a destination or are using resource-intensive transformations, you can easily block the whole flow.

Multi-tenant use-case

The Logging operator was designed to leverage Kubernetes namespaces to isolate logging flows so that tenants can define their log forwarding rules themselves, while not having to worry about how the log infrastructure is set up. This becomes complicated when you consider different scenarios like soft and hard multi-tenancy.

Soft multi-tenancy means that tenants trust each other to a certain degree, while hard multi-tenancy converges toward sharing as little resources as possible. The goal in our case is to support the spectrum of log isolation levels within the bounds of a single cluster.

Telemetry Collector resources flow
Traditionally, the Logging Operator uses Fluent Bit as its log collector agent. Fluent Bit has exceptional performance and a low resource footprint, but its routing capabilities are limited. Consequently, the aggregator becomes the place where all the logs are accumulated and where the permission boundaries need to be reimplemented.

Problems with this concept become increasingly obvious when you are operating a real multi-tenant Kubernetes cluster. For example, in scenarios like:

  • You are a platform provider for internal/external teams.
  • You are sharing infrastructure for your customers where to some extent, they can configure their own logging.
  • You are running multiple applications on the same cluster, but want to prevent them from affecting each other.

Difference between Logging Operator and Telemetry Controller

Telemetry Controller is a log collector that implements filtering and routing right at the edge, rather than at the aggregation level. Basically, it extends the flexibility of the Logging Operator to the edge. If there is no need for aggregation, the Telemetry Controller can directly send all or a filtered portion of your data directly to a remote destination. In scenarios like these it completely replaces other agents like Fluent Bit, Vector, or Promtail. Telemetry Controller also provides flexible routing capabilities out of the box with minimal complexity.

The Telemetry Controller provides a convenient and robust multi-tenant API on top of OpenTelemetry, so you can just describe what telemetry data you need, and where it should be forwarded. The Telemetry Controller provides isolation and access control for telemetry data, similar to what Kubernetes provides for pods, secrets, and other resources. It introduces new resources that give granular control over the shared data:

  • Administrators can define tenants to provide isolation and access control for telemetry data.
  • Users can create subscriptions to select telemetry data streams accessible by their tenant only.
  • Users can create or refer to available outputs for subscriptions to route and transport data.

For more details, see the Telemetry Controller introduction blog post.

Benefits of using Logging Operator and Telemetry Controller

Telemetry Controller provides a robust and high-performance collector agent for your edge nodes that you can declaratively configure, and allows your tenants to similarly manage their logging configuration. It also provides filtering, message enrichment, and per-tenant routing capabilities, allowing you and your tenants to selectively send their data to their own aggregators (like Logging Operator), or directly to specific destinations.

Sending data to Logging Operator gives you a flexible and high-performance log aggregator that supports 50+ destinations and powerful routing and log transformation capabilities.

Used together, the Telemetry Controller and the Logging Operator are ideal for multi-tenant log collection in Kubernetes. Also, since both provide detailed metrics about the processed data, together they give you unparalleled monitoring capabilities into the status of your logging and telemetry pipeline.

    Hands-on demo

    This demo shows you how to:

    • Install the Telemetry Controller
    • Install the Logging Operator
    • Install the log-generator that provides sample log messages that the Telemetry Collector can collect
    • Configure Telemetry Controller to send the collected data to Logging operator instance

    Prerequisites

    Create a KinD cluster.

    Deploy the Telemetry Controller

    helm upgrade --install --wait --create-namespace --namespace telemetry-controller-system telemetry-controller oci://ghcr.io/kube-logging/helm-charts/telemetry-controller

    Deploy the Logging Operator

    helm upgrade --install logging-operator oci://ghcr.io/kube-logging/helm-charts/logging-operator --version=4.6.0 -n logging-operator --create-namespace

    Deploy the log-generator

    helm upgrade --install --wait log-generator oci://ghcr.io/kube-logging/helm-charts/log-generator -n log-generator --create-namespace

    Configure the Logging Operator

    Apply the following sample configuration to the Logging Operator. Since in this demo we just want to demonstrate that the Telemetry Controller can send data to the Logging Operator, we don’t configure any specific output, just drop the incoming data into /dev/null.

    kubectl apply -f https://raw.githubusercontent.com/kube-logging/telemetry-controller/main/docs/examples/fluent-forward/telemetry-controller.yaml

    Configure the Telemetry Controller

    Apply the following sample configuration to the Telemetry Controller. This creates a tenant (called kubernetes) that has access to the log-generator namespace, an OtelOutput called fluent that sends the data to the address of the Logging Operator service, and a subscription that routes the collected logs of the tenant to this output.

    kubectl apply -f https://raw.githubusercontent.com/kube-logging/telemetry-controller/main/docs/examples/fluent-forward/logging-operator.yaml

    Check the collected logs

    After you’ve applied the above configurations, the Telemetry Controller starts sending the collected logs to the Logging Operator. To check that it works, run the following command to find the name of the Logging Operator pod:

    kubectl get pods

    The pod you need is called something like: all-to-file-fluentd-0

    Check the logs of this pod:

    kubectl logs all-to-file-fluentd-0

    You should see the NGINX access logs that the log-generator has generated, like this:

    2024-05-21 15:20:47.635797464 +0000 otelcol: {"severity":"","message":"144.100.113.30 - - [21/May/2024:15:20:47 +0000] \"GET /blog HTTP/1.1\" 503 18529 \"-\" \"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/28.0.1469.0 Safari/537.36\" \"-\"\n","time":"2024-05-21T15:20:47.422395672Z","exporter":"fluentforwardexporter/default_fluent","kubernetes":{"namespace_name":"log-generator","container_name":"log-generator","pod_name":"log-generator-74f5577887-rbk42","host":"minikube","labels":{"pod-template-hash":"74f5577887","app.kubernetes.io/instance":"log-generator","app.kubernetes.io/name":"log-generator"}}}

    You can recognize these logs by the log-generator labels at their end: "app.kubernetes.io/instance":"log-generator","app.kubernetes.io/name":"log-generator"}}

    Axoflow integration

    Axoflow platform, our commercial offering is an end-to-end observability pipeline solution that simplifies the control of your telemetry infrastructure with a vendor-agnostic approach. It integrates with several different log collector and aggregator solutions, including the Telemetry Collector and the Logging Operator. Based on the metrics collected from the collectors and aggregators, Axoflow visualizes the complete edge-to-destination telemetry data flow. It gives you a complete picture of the log sources, their destinations, and their relative contribution to the entire logging pipeline.

    Metrics are an effective means to find the root cause for these incidents, thereby they are important tools to reduce the MTTR. The Axoflow Management Console can visualize and alert on the collected metrics about your telemetry pipeline, in both on-premises and Kubernetes environments.

    The UI fully supports multi-tenant scenarios, and provides RBAC-based views for the individual tenants.

    Kubernetes tenant log flow in Axoflow Console
    Logs collected with Telemetry Controller

    Long term vision

    In the long run, we see the Telemetry Collector as a fundamental part of our offering as the recommended collector agent of our telemetry pipeline. The Telemetry Collector will support standard protocols to work flawlessly with a number of aggregators, for example, OpenTelemetry Collector, the Logging Operator, or AxoRouter (our commercial log and telemetry data aggregator). 

    Summary

    The Telemetry Collector is a new collector agent for edge nodes, designed for multi-tenant use cases. It provides declarative configuration, filtering, message enrichment, and per-tenant routing capabilities, allowing you and your tenants to selectively send their data to their own aggregators – like the Logging Operator as we’ve shown in the demo -, or directly to specific destinations. Give it a try, and tell us what you think!

    On-deman Webinar

    Parsing
    sucks!

    What can you do
    about it?

    56 minutes

    Balázs SCHEIDLER

    Balázs SCHEIDLER

    Founder syslog-ng™

    Mark BONSACK

    Mark BONSACK

    Co-creator SC4S

    Sándor GUBA

    Sándor GUBA

    Founder Logging Operator

    Neil BOYD

    Neil BOYD

    Moderator

    On-demand Webinar

    Parsing
    sucks!

    What can you do about it?

    56 minutes

    Follow Our Progress!

    We are excited to be realizing our vision above with a full Axoflow product suite.