April 10, 2024

and

No items found.

telemetry pipeline

splunk

syslog-ng

metrics

Metrics for telemetry pipelines based on SC4S and Splunk

Splunk Connect for Syslog (SC4S) is a containerized syslog-ng distribution designed to simplify getting syslog data into Splunk Enterprise and Splunk Cloud. Apart from various cloud sources, organizations send data - mostly log data - to Splunk Cloud from on-prem sources. Typically, data is sent using a log collector or forwarder agent, including:

Splunk Universal Forwarder
Splunk Heavy Forwarder (which is often a relay for the Universal Forwarder)
Splunk Connect for Syslog (SC4S) instances
Various other sources via syslog or HTTP (for example, syslog-ng and AxoSyslog)

Splunk is not just a log data collector, it can be also used to show metrics (typically time-series data) and create visualizations and dashboards. However, these agents often do not collect enough relevant data for Splunk to show about themselves and the telemetry pipeline itself. This blog post shows you how you can improve this situation and observe your telemetry pipeline in real time.

Metrics and the telemetry pipeline

Metrics have long been used to provide observability into application environments to reduce downtime. Observability solutions are able to aggregate and show metrics and health status for complex application environments. However, in most traditional log management settings (and many new cloud-based ones), these solutions fail to provide visibility and observability for the data pipeline itself, so you don’t have much insight into the operation of the data delivery mechanism. This means that your security and infrastructure teams cannot easily answer questions like these:

Are all devices sending logs?
Does the logging configuration of devices match the policy in effect?
Is all data encrypted in transit?
Am I losing messages somewhere?
Are there bottlenecks that cause excessive delays in log collection?
Are there any network issues that affect the log collection and delivery?
How much data is collected from devices and device classes?
How much data is sent to my SIEM by source/geography/team or BU?

These and similar questions are especially important if you are operating in a regulated environment and have to meet compliance requirements related to logging, such as SOC2, PCI DSS (section 10), HIPAA (NIST 800-66r2, 164.312(b)), or OMB M-21-31.

Syslog-ng, the application powering SC4S, has been the foundation for the logging infrastructure of many large enterprises during the last 20 years, because of its reliability, high performance, and flexibility to handle complex use cases.

We show you how you can add fleet management and observability to your existing Splunk Connect for Syslog (SC4S) infrastructure and upgrade it to a future-proof telemetry pipeline. The outlined solution supports Splunk Connect for Syslog (SC4S), syslog-ng Open Source Edition, and the commercial syslog-ng Premium Edition.

We introduce a novel method of managing telemetry pipelines that:

Reduces the costs and resource requirements of log collection and processing
Reduces infrastructure costs by replacing other, less effective agents and relays
Decreases the mean time to resolution (MTTR) for issues involving your telemetry pipeline by identifying problematic cases and issuing alerts.
Increases the reliability and robustness of your telemetry pipeline.

Though in this blog we explore the use of metrics in traditional enterprise logging environments, Axoflow supports full cloud-native environments as well, making it possible to combine your on-premises and cloud-native logging solutions (like OpenTelemetry) into a single telemetry pipeline.

What does the Universal Forwarder do

The Splunk Universal Forwarder is a log collector agent. It is installed on the endpoints (for example, Windows servers), where it reads the log files and other log sources and forwards the messages to Splunk. Based on some heuristics like the path and filename, the forwarder identifies the application that the log file belongs to, and:

adds vendor/product labels,
sets the sourcetype (which describes the data structure of the event in the log message),
sets the host name, then
sends data to the Splunk.

For file sources, it also adds the original filename to the log message.

What does SC4S do

Splunk Connect for Syslog (SC4S) is a solution to feed syslog data into Splunk. There are many networking devices and other appliances that do not allow you to install custom log collector agents (like the Universal Collector), only send out logs via the syslog protocol. Typically you configure these devices to send their logs to a syslog relay (like SC4S) that can forward the incoming logs to Splunk. It’s main advantages over other similar relays are:

It has a database that it uses to recognize the device that is sending the messages, and sets the sourcetype and other metadata.
Doesn’t require extensive configuration.
Sends data to the Splunk HEC endpoint via HTTP.

These steps improve data ingestion and search performance on the Splunk side. Also, using HTTP improves the reliability of the message transport (compared to using the syslog protocol), and allows for load balancing, which is especially useful in large Splunk deployments.

SC4S and Axoflow

The Axoflow Platform is an end-to-end observability pipeline solution that simplifies the operation of your telemetry infrastructure.

Once you onboard an existing syslog-ng deployment to Axoflow, it starts periodically sending metrics back to the Axoflow Management Plane for visualization and alerting. Note that Axoflow collects detailed, real-time metrics about the data-flows – giving you observability over the health of the security data pipeline and its components. Your security data remains in your self-managed cloud or in your on-prem instance where your sources and destinations are running, only metrics are forwarded to the Axoflow Console.

Axoflow host topology diagram with Splunk and SC4S

Using Axoflow with SC4S provides a number of benefits that we discuss in the next sections.

Metrics and topology visualization

Quick, metrics-based Sankey and sunburst diagrams that allows you to drill-down into what is sending large volumes of data (and route it to cheaper storage if not needed).

Metrics-based Sankey diagrams in Axoflow

Helps debug SC4S

One of the great advantages of SC4S is its simplicity: you install it and it “just works”. SC4S usually works out of the box, but can be difficult to troubleshoot, for example, if you are using an unsupported data source, or one which changed its logging format. Axoflow can expose the internal messages of the underlying syslog-ng to help you solve the problems.

Tap into the transferred logs

Log tapping shows you sample log messages from the live stream almost real-time to help troubleshooting, and optimization. You can also access the raw (unprocessed) log messages.

Tracking log types and their volumes based on metrics and metadata gives your platform engineers a holistic view of what is happening in your entire pipeline, including the syslog layer. This helps you spot problems like large amounts of debug logs, sources sending messages with formatting errors, and so on. That way you can not only resolve problems in the pipeline faster, but also improve the quality of your telemetry data. For details, see the Troubleshooting syslog errors with log tapping blog post.

Host metrics and contextual data

Sometimes, a telemetry-related bottleneck is caused by the limited resources of the hosts running the infrastructure. To remediate these issues, information about the host is crucial. For this reason, Axoflow collects host metrics to get up-to-date status and health information, like CPU and memory usage. Relevant data that’s available only locally at the endpoint running SC4S, for example, the original source IP address is also collected.

In addition, you also get information specific to syslog-ng (the syslog collector in SC4S), including:

UDP message drops (see our video on detecting data loss for details!)
Disk-buffer metrics
Message delay on the node (the processing time of the messages)

Axoflow Management Plane and a syslog-ng deployment

Alerting and Reporting

Metrics and visualization is a great tool to diagnose an incident once you are already aware that the incident is happening. Based on the collected metrics, you can create alerts for system health, data volume, data dropouts, data bursts, and critically, transport costs.

For a detailed comparison, see Axoflow vs SC4S.

The Axoflow Management Plane

Axoflow’s ability to pull metrics from various collection agents means that you have a unified view of all your telemetry data pipelines. That way you know system health, destinations, and sources at a glance, and can rely on the Axoflow Management Plane to alert you to problems that otherwise would go unnoticed. When problems arise, the Management Plane provides full visibility, alerting, and reporting on the relationships between the data flows, the network, and the servers, meaning that you can quickly isolate issues and take action to correct the problems.

This all helps site reliability, but the benefits of a well-run log collection layer are actually felt beyond that, as the entire firm's operational and security posture is improved. Firms can optimize their collection layer by identifying redundant processes, reducing the amount of data loss, and optimizing the data flows for the intended destinations. The entire attack surface can be reduced with this kind of insight, and the entire organization will benefit.

Using the Axoflow Management Plane with your syslog-ng based logging infrastructure:

Reduces the costs and resource requirements of log collection and processing
Reduces infrastructure costs by replacing other, less effective agents and relays
Increases mean time to resolve (MTTR) for issues involving your telemetry pipeline
Increases the reliability and robustness of your telemetry pipeline
Increases the effectiveness of SIEM and log management operations, including far more efficient and effective detection engineering

Why Axoflow?

Our founders include the original creators of syslog-ng and the Logging Operator for Kubernetes, and other main contributors to these projects, with vast knowledge and hands-on experience in observability, log management, and how to apply these technologies in the enterprise security context.

Axoflow is the biggest contributor to both syslog-ng Open Source Edition and the Logging Operator (now a CNCF sandbox project). We also maintain an up-to-date version of the syslog-ng documentation.

Where are we going?

Axoflow’s flexible architecture future-proofs the telemetry pipeline and keeps it agnostic. This means that the Management Plane will continue to work for you as new destinations and technologies are built. Automation of pipelines and the use of AI to automatically classify logs and send them to the proper destinations are coming. The Management Plane will enable teams to validate service uptimes, loss, and other variables related to the data-feeding critical backend systems like the SIEM. Teams will spend less time building difficult log collection environments and spend more time on business-related tasks.

For more details on how we can help your logging and security teams, read the AxoRouter, the security data curation pipeline engine blog post.