Ingesting Large-volume Data Stream to the Cloud

Moving telemetry data across data centers or ingesting them into a Cloud service involves less reliable or limited networks. In this blog we will dive deep into the challenges of distributing and managing log data across expansive networks. Learn how you can benefit from the OpenTelemetry protocol and AxoSyslog to ensure seamless and structured data transport throughout the network infrastructure.

Large networks often span multiple geographic locations, making it challenging to monitor and analyze log data in real-time. Transporting the logs from distributed sources to centralized locations makes it easier for the organizations to:

monitor network performance,
detect security incidents,
troubleshoot issues, and
analyze trends and patterns.

However, the centralized collection of log and telemetry data requires scalable and high-performance log transport solutions that can handle large volumes of data efficiently: it is common for organizations to have 10-50TB of data generated per day.

Log processing in large distributed systems includes enriching, parsing, and formatting the data. These operations should happen as early in the pipeline as possible – the closer to the source you perform the transformation the better precision you get, because contextual information is often not available at later stages.

Moreover, the whole telemetry pipeline can benefit from the additional information, for example, it can more accurately label and route the data.

AxoSyslog is the cloud-native distribution of syslog-ng, which has a long history of being a trusted tool that provides unprecedented performance, stability, and flexibility when deployed in a collector, processor, or aggregator role.

Let’s quickly review the common ways used to transport log data!

The traditional way

“It’s an old code sir, but it checks out.”

Syslog is an old and admittedly problematic protocol, but it’s still the most widely used way to transport log data, especially in on-premises environments. So, first review the pros and cons of transporting syslog formatted events.

Ingest large volume of data with OpenTelemetry and AxoSyslog

The Good

Syslog offers several advantages that make it a widely adopted choice in many network environments. One significant benefit is its simplicity. Syslog provides a straightforward and easy-to-implement protocol for sending data from various devices and applications to a server. This simplicity makes it accessible and manageable for administrators and developers across different platforms and systems.

The syslog format is standardized (more on this later) and is widely supported from carrier-grade to IoT devices, making it a ubiquitous solution for log transport in diverse network architectures. Its widespread adoption ensures compatibility and interoperability across different logging systems and applications.

Transporting logs in standardized syslog format imposes minimal boilerplate and overhead, allowing for efficient transmission of log data without unnecessary computational or network resource consumption. This approach helps to optimize performance and minimize latency, crucial factors in real-time log monitoring and analysis.

The Bad

Syslog protocol is designed for local networks where the client and server are ideally a few – optimally one – hop away from each other. The main problem is that it doesn’t support application layer acknowledgement, even if TCP transport is used. This means that while the sender side might believe the message was successfully transported – because it has received a TCP [ACK] – the log could remain in the receiver side’s kernel buffer, or could be in-flight in memory only, vulnerable to loss if the application shuts down or crashes unexpectedly (read more about detecting packet loss). The absence of confirmation at the application layer introduces potential failure and hidden data loss in the logging process. It isn’t enough to know that the server received the log, it should ensure the data is forwarded or persisted successfully. The chance of a failure increases with the number of Layer 4 devices in the path.

As syslog relies directly on the underlying TCP or UDP protocol for transportation, load balancing presents additional challenges. Due to their streaming nature within a single connection, a balancer cannot split the TCP traffic without interpreting the payload. It means we can’t split an ingress connection to multiple backend streams based on the data available on the TCP layer. This becomes a problem when not all connections transport the same amount of data. High-volume connections can cause an imbalance between backend nodes: chatty clients can overwhelm a single node, while nodes that are handling quiet connections are idle.

The Ugly

Logs often comprise structured data or evolve from simple texts into structured formats through processing, parsing, or data enrichment. While the RFC3164 (BSD) syslog protocol defines severity, host, and timestamp metadata fields, it lacks a standardized structured format for log messages. Although the RFC5424 syslog protocol introduces the STRUCTURED-DATA section, it sees limited adoption as application developers typically overlook or disregard it, favoring the simplicity and flexibility of creating structured logs in JSON, XML, or any other structured format instead. This structured data ends up in the message part of the log, causing a wide variety of complex, fragmented, differently structured, schematized, or not structured data in the log processing pipeline. This increases the complexity of processing these logs. Also if there is no universally accepted format for the structured data, parsed fields are often lost between hops.

To complicate the problem further, the above-mentioned standards are often implemented incorrectly in edge applications, which can result in malformatted messages, or even in message loss.

AxoRouter implements complex heuristics to process poorly formatted data and fix the message so it satisfies the protocol requirements.

The modern way

Fear not, OpenTelemetry comes to the rescue!

OpenTelemetry is an open-source protocol that aims to standardize and is designed to simplify the transportation of telemetry data such as metrics, traces, and logs. It was created to address the complexity and fragmentation inherent in the observability space, where different tools and frameworks often use proprietary or incompatible instrumentation methods.

OTLP Protocol and Transport

OpenTelemetry addresses the fragmentation of the logging pipeline by offering comprehensive solutions that benefit the entire telemetry ecosystem. Firstly, it tackles the challenge by providing a structured schema and protocol for logs, enabling uniformity and consistency across diverse logging systems. By defining common standards, OpenTelemetry ensures that log data remains interpretable and interoperable throughout the telemetry pipeline. The OpenTelemetry data model has metadata as a first-class citizen. Why is this important? First of all, it declares the importance of metadata on a protocol level and makes it valuable without extracting the payload itself. For example, you don’t have to scan through the data to identify the source of your log. This hierarchical structure also helps with batching logs with identical metadata saving bandwidth on the wire. In a Kubernetes scenario, when you have almost as much metadata as data, this saves you a lot of bandwidth.

The OpenTelemetry Protocol (OTLP) also solves the issue of application layer acknowledgment by enabling data transfer over gRPC or HTTP and defining proper acknowledgment mechanisms. And don’t forget that HTTP and gRPC provide compression, load balancing, and other useful features out of the box, solving several problems of the traditional syslog transport. You can read more about OTLP here.

Libraries

OpenTelemetry empowers developers at the edges of the telemetry pipeline with client libraries, simplifying the creation and forwarding of logs from applications. These libraries streamline the instrumentation process, allowing developers to seamlessly integrate logging capabilities into their applications without vendor lock-in or compatibility issues.

Collector

OpenTelemetry enhances the processing components of the telemetry pipeline through the OpenTelemetry Collector, which – similarly to AxoSyslog – can receive, process, and forward logs in a large variety of ways (more on this later). It has mature support for tracing and metrics. On the logging side, there are still some caveats and workaround but a promising tool that provides a reference implementation interactive with OpenTelemetry format.

The Axoflow way

Bringing it all together.

Traditional syslog is widespread but lacks some important features, and not every implementation follows the standard. OTLP excels where syslog lacks. However, translating from one protocol to the other is trickier than you might think. Parsing, mapping, and migrating schemas can be a lot of groundwork.

AxoSyslog is the bridge between the two worlds:

It handles both traditional syslog and OTLP as first-class citizens.
It embeds fixups to understand and process traditional syslogs.
It has been battle-tested during its 25+ years of existence.
Its configuration language is capable of creating complex data flows.
It has a wide toolset for log processing.
It’s optimized for performance and data resiliency.

We chose to use OTLP throughout our infrastructure as it ticks all the boxes:

It is an open standard and we believe in open source.
It provides the necessary structured log format.
It provides application layer acknowledgement.
It can easily be load-balanced.
It is widely accepted and becoming the new de-facto in the logging/telemetry space.

The Axoflow Platform stands out as the premier solution for managing telemetry pipelines, offering comprehensive support for both on-premise and cloud-native systems, SIEMs (Security Information and Event Management), and data lakes. It seamlessly handles the complexities of diverse and expansive networks, including large-volume log transport over WAN, providing organizations with a centralized platform for effective telemetry management. Axoflow also excels with its ability to process and classify telemetry data with high efficiency. These cannot happen without a proper, unified data format between its processing elements.

Conclusion

A large portion of log data is still transported using the traditional syslog protocol, which has great performance, but has reliability issues, especially across geo-distributed networks. OpenTelemetry is a modern and emerging technology that solves a lot of these problems, and is quickly becoming a standard in the observability space. Axoflow updates your existing traditional logging infrastructure into a telemetry pipeline, and seamlessly integrates with modern observability technologies. Read our follow-up blog post on how OTLP performs in different environments!

Sign me up

Follow Our Progress!

We are excited to be realizing our vision above with a full Axoflow product suite.