Syslog-ng disk buffering for a resilient syslog architecture

In syslog-ng, you can temporarily store log messages in a disk-buffer while a destination is unavailable. That way incoming messages are neither lost nor rejected, but put into the disk-buffer, and when the destination comes back up again, syslog-ng resumes sending them to the destination. Also, it can protect you from losing messages if syslog-ng or the host it is running on crashes. Read on to learn about the benefits of syslog-ng disk buffering, and how to properly configure it into your resilient syslog architecture – preventing data loss from many local and upstream failure modes.

When collecting observability or any other data near real-time, several issues can occur that disrupt data collection, including:

network outages
network bottlenecks; when the network cannot handle the traffic (for example, during sudden peaks). Syslog traffic can often exceed 1Gb/sec
destination overload; when the data processor (for example, your SIEM) cannot handle the incoming data fast enough
destination downtime, when the data processor is not processing data at all
syslog-ng or the host it is running on crashes

The ability to handle such outages on the data collector side is important, and has been one of the main reliability features of syslog-ng for years. However, as there are several related configuration options that affect both reliability and performance, it’s worth understanding how the syslog-ng disk-buffer works. Also there are some differences and considerations that you have to keep in mind if you are running syslog-ng in a container or in Kubernetes (for example, using Axoflow’s AxoSyslog images.

Note that disk-buffer is useful for transient peaks and overloads – it doesn’t solve the case when the incoming message rate is continuously higher than what the destination can ingest.

How syslog-ng disk-buffer works

By default, the disk-buffer is disabled, and syslog-ng uses an output queue for outgoing messages. This queue is stored in memory, so its content can be lost if syslog-ng or the host it is running on crashes. (During graceful shutdown, syslog-ng tries to send the messages from the output queue to the destination before shutting down.)

The disk-buffer comes in two flavors: reliable and non-reliable.

The name of the non-reliable version is somewhat misleading, because it is actually quite reliable: it creates a disk-buffer before the output queue, and uses an in-memory cache to improve the performance of the disk-buffer. Under normal operating conditions syslog-ng uses the cache, and stores the messages on disk if the destination cannot receive or process the messages. The output queue is empty, unless both the cache and the disk-buffer are full.
Your logs are safe, as long as syslog-ng shuts down gracefully, because syslog-ng saves the messages from the memory into the disk-buffer during shutdown. You can only lose the messages that are in the cache and the output queue only if syslog-ng or the host terminates abnormally.
The reliable version works like the non-reliable version, but it also stores the cache and the output queue on disk. This means syslog-ng writes all outgoing messages to disk. That way losing messages becomes unlikely even if syslog-ng or its host terminates abnormally. This is good but there is a tradeoff in performance, as the overall throughput of syslog-ng is affected due to significant disk use.

Here is how syslog-ng determines where to put the log messages:

When syslog-ng finishes filtering, routing, and transforming a message and is ready to send it to its destination, it places the message into the output queue of the destination (if disk-buffer is disabled), or into the cache of the disk-buffer.
You can set the size of the output queue using the log-fifo-size() option, and the size of the disk-buffer cache with the front-cache-size() option, which defaults to 1000 on newer syslog-ng versions (like the ones used in AxoSyslog).
If disk-buffering is enabled and the cache is full, syslog-ng puts the outgoing messages into the disk buffer of the destination. You can set the size of the disk-buffer using the capacity-bytes() option (the minimum value is 1MiB).
Syslog-ng automatically reserves a part of the disk-buffer as a special area, a flow-control window. If the cache is full and the disk-buffer starts using the reserved area, syslog-ng starts applying flow-control on the sources that send messages to this destination and attempts to slow down incoming traffic. (We’ll cover how flow-control works in a separate blog post.)
- When the disk-buffer is set to non-reliable, you can set the size of the window using the flow-control-window-size() option, which takes an argument for the number of messages. Also, for non-reliable disk buffers, syslog-ng stores the messages in the flow-control window in memory.
- When using a reliable disk-buffer, you can set the size of the window using the flow-control-window-bytes() option (which takes an argument for the number of bytes, because you can’t allocate disk based on the number of messages).

Disk-buffer considerations in syslog-ng

How to configure the different buffers depends on your risk avoidance and performance requirements, and usually include trade-offs: increasing performance means more messages stored in memory, while decreasing risks (losing messages in case of an outage or a crash) means storing more data on disk, decreasing performance. Careful consideration must be given to the risk of destination failure (and for how long). This will inform the setting of many parameters, not the least of which is the size of the buffer itself.

Note that syslog-ng uses the disk-buffer file as a circular buffer, not as a continuous file: it reads from the beginning and writes to the end of the circle. This means that even though there are no actively stored messages in the buffer, the file size might be at its maximum.

Preallocating disk-buffer files

By default, syslog-ng doesn’t reserve the disk space for the disk-buffer file, since in a properly configured and sized environment the disk-buffer is practically empty, so a large preallocated disk-buffer file is just a waste of disk space. But a preallocated buffer can prevent other data from using the intended buffer space (and elicit a warning from the OS if disk space is low), preventing message loss if the buffer is actually needed. To avoid this problem, when using syslog-ng 4.0 or later, you can preallocate the space for your disk-buffer files by setting prealloc(yes).

In addition to making sure that the required disk space is available when needed, preallocated disk-buffer files provide radically better (3-4x) performance as well: in case of an outage the amount of messages stored in the disk-buffer is continuously growing, and using large continuous files is faster, than constantly waiting on a file to change its size.

If you are running syslog-ng on a dedicated host (always recommended for any high-volume settings), use prealloc(yes).

Truncating disk-buffer files

If you are not preallocating your disk-buffer files, then syslog-ng dynamically increases the size of the disk-buffer file when needed. However, when the messages are successfully sent from the disk-buffer, the size of the file does not decrease immediately.

By default, syslog-ng frees the disk-space only when it can free up at least 10% of the disk-buffer file. You can adjust this behavior using the truncate-size-ratio() option:

smaller values free disk space quicker, while
larger ratios result in better performance.

If you want to avoid performance fluctuations:

use truncate-size-ratio(1) (never truncate), or
use prealloc(yes) to reserve the entire size of the disk-buffer on disk.

Configuration examples

The following examples show how the syslog-ng configuration looks like when configuring the different buffers. Note that this is a minimalistic configuration to demonstrate the options discussed in this blog post, and not a full-fledged production set up.

The names of the options related to disk-buffer configuration apply to syslog-ng release 4.3 and newer. For the option names used in older releases, see the end of this post.

Disk-buffer is disabled, the fifo is set to 10000 messages:

@version: 4.3

options { stats-level(3); time-reopen(3); };

log {
    	source { tcp(port(2000)); };
    	destination {
   		 tcp("127.0.0.1" port(2001)
   			 log-fifo-size(10000)
   		 );
    	};
    	flags(flow-control);
};

Using 1GiB reliable disk-buffer (out of which 200MiB is reserved for flow-controlled messages), and a cache of 1000 messages:

@version: 4.3

options { stats-level(3); time-reopen(3); };

log {
    source { tcp(port(2000)); };
    destination {
   	 tcp("127.0.0.1" port(2001)
   		 disk-buffer(reliable(yes)
   			 capacity-bytes(1GiB)
   			 flow-control-window-bytes(200MiB)
   			 front-cache-size(1000))
   	 );
    };
    	flags(flow-control);
};

Using 1GiB non-reliable disk-buffer with an in-memory flow-control window of 10000 messages, and a cache of 1000 messages:

@version: 4.3

options { stats-level(3); time-reopen(3); };

log {
    source { tcp(port(2000)); };
    destination {
   	 tcp("127.0.0.1" port(2001)
   		 disk-buffer(reliable(no)
   			 capacity-bytes(1GiB)
   			 flow-control-window-size(10000)
   			 front-cache-size(1000))
   	 );
    };
    flags(flow-control);
};

Disk-buffer metrics

With the metrics-related improvements in syslog-ng 4.2, you can monitor the size of the queues related to the destinations using the metrics provided by syslog-ng. This works for both disk-buffers and memory-based queues. You can display the metrics of your running syslog-ng 4.2 (or newer) instance by running:

syslog-ng-ctl stats prometheus

The corresponding driver is identified with the “id” and “driver_instance” labels.
The “memory_usage_bytes” and “events” counters show the amount of memory used by the queue, and the number of events in the queue.
disk-buffer metrics have the syslogng_disk_queue_ prefix.
disk-buffer metrics have an additional “path” label, pointing to the location of the disk-buffer file, and a “reliable” label, which is “true” or “false”.
Threaded destinations (like http or python) have an additional “worker” label.

For example:

syslogng_disk_queue_events{driver_instance="http,http://localhost:1239",id="d_http_disk_buffer#0",path="/var/syslog-ng/syslog-ng-00000.rqf",reliable="true",worker="0"} 80
syslogng_disk_queue_events{driver_instance="http,http://localhost:1239",id="d_http_disk_buffer#0",path="/var/syslog-ng/syslog-ng-00001.rqf",reliable="true",worker="1"} 7
syslogng_disk_queue_events{driver_instance="http,http://localhost:1239",id="d_http_disk_buffer#0",path="/var/syslog-ng/syslog-ng-00002.rqf",reliable="true",worker="2"} 7
syslogng_disk_queue_events{driver_instance="http,http://localhost:1239",id="d_http_disk_buffer#0",path="/var/syslog-ng/syslog-ng-00003.rqf",reliable="true",worker="3"} 7
syslogng_disk_queue_events{driver_instance="tcp,localhost:1235",id="d_network_disk_buffer#0",path="/var/syslog-ng/syslog-ng-00000.qf",reliable="false"} 101
syslogng_disk_queue_memory_usage_bytes{driver_instance="http,http://localhost:1239",id="d_http_disk_buffer#0",path="/var/syslog-ng/syslog-ng-00000.rqf",reliable="true",worker="0"} 3136
syslogng_disk_queue_memory_usage_bytes{driver_instance="http,http://localhost:1239",id="d_http_disk_buffer#0",path="/var/syslog-ng/syslog-ng-00001.rqf",reliable="true",worker="1"} 2776
syslogng_disk_queue_memory_usage_bytes{driver_instance="http,http://localhost:1239",id="d_http_disk_buffer#0",path="/var/syslog-ng/syslog-ng-00002.rqf",reliable="true",worker="2"} 2760
syslogng_disk_queue_memory_usage_bytes{driver_instance="http,http://localhost:1239",id="d_http_disk_buffer#0",path="/var/syslog-ng/syslog-ng-00003.rqf",reliable="true",worker="3"} 2776
syslogng_disk_queue_memory_usage_bytes{driver_instance="tcp,localhost:1235",id="d_network_disk_buffer#0",path="/var/syslog-ng/syslog-ng-00000.qf",reliable="false"} 39888

The following metrics are related to the size and usage of the disk-buffers:

“capacity_bytes”: The theoretical maximal useful size of the disk-buffer. This is always smaller than capacity-bytes(), as some space is reserved for metadata. The actual full disk-buffer file can be larger than this, as syslog-ng can write over this limit once, at the end of the file.
“disk_allocated_bytes”: The current size of the disk-buffer file on the disk. As we discussed earlier, the disk-buffer file size doesn’t strictly correlate with the number of messages, as it is a circular buffer implementation, and also syslog-ng optimizes the truncation of the file for performance reasons.
“disk_usage_bytes”: The serialized size of the queued messages in the disk-buffer file. This counter is useful for calculating the disk usage percentage (disk_usage_bytes / capacity_bytes) or the remaining available space (capacity_bytes – disk_usage_bytes).
“dir_available_space_bytes”: The space available in the directory where the disk-buffer files are stored. Note that since this metric doesn’t directly depend on a specific disk-buffer file or destination, it doesn’t have the syslogng_disk_queue prefix.

For example:

syslogng_disk_queue_capacity_bytes{driver_id="d_network#0",driver_instance="tcp,localhost:1235",path="/var/syslog-ng-00000.rqf",reliable="true"} 104853504
syslogng_disk_queue_disk_allocated_bytes{driver_id="d_network#0",driver_instance="tcp,localhost:1235",path="/var/syslog-ng-00000.rqf",reliable="true"} 17284
syslogng_disk_queue_disk_usage_bytes{driver_id="d_network#0",driver_instance="tcp,localhost:1235",path="/var/syslog-ng-00000.rqf",reliable="true"} 13188
syslogng_disk_buffer_dir_available_space_bytes{dir="/var/syslog-ng"} 870109413376

If you see the abandoned="true" field in the metric of a disk-buffer, it means that this disk-buffer file doesn’t have an active destination, and syslog-ng won’t send the remaining log messages from this file. This happens if you remove or change a destination while there are messages in the disk-buffer. How to deal with abandoned disk-buffer files will be the topic of another blog post.

How to use disk-buffers in containers and Kubernetes

When you are running syslog-ng in a container, or in Kubernetes, and you want to use disk-buffers, there are some additional things to configure.

Make sure to mount the disk-buffer files and the persist file (by default, both are stored in /var/lib/syslog-ng) in a way they are not lost when the pod or container is restarted.
- In Kubernetes, add a persistent volume to your pod and store the disk buffer files (/var/lib/syslog-ng) there.
- In a container, mount the disk-buffer directory from the host.
Use a reliable disk-buffer only if your storage is fast enough. For example, a low-speed persistent volume in Kubernetes can cause a significant performance degradation for syslog-ng.
Use the latest available version of syslog-ng, as many related improvements and performance improvements (for example, disk-buffer related metrics) are only supported in recent versions.

If you are using syslog-ng without disk-buffering, syslog-ng stores everything in memory, which results in great performance. If you enable disk-buffering, the performance will decrease. Make sure to size your observability pipeline appropriately.

If you want to try running a syslog-ng container, or use syslog-ng as a log collector in Kubernetes, try our AxoSyslog container image, or the AxoSyslog Helm chart! AxoSyslog is a cloud-native syslog-ng distribution, created by Axoflow.

Option names in older syslog-ng releases

We have changed the names of some disk-buffer related options in syslog-ng 4.3 to make them more understandable. The following table shows the old names of these options, so you can map them if you are using an older syslog-ng release.

syslog-ng 4.3 and newer	syslog-ng 4.2 and older
capacity-bytes()	disk-buf-size()
flow-control-window-bytes()	mem-buf-size()
flow-control-window-size()	mem-buf-length()
front-cache-size()	qout-size()

Summary

As you have learned from this post, the disk-buffer of syslog-ng can greatly improve the reliability of your log collection infrastructure and help you protect your data during transient peaks, overloads, and crashes. You have also learned how to configure and monitor the disk-buffer, and how to start using it in containerized environments.

We plan to continue this series, and cover topics like performance and sizing considerations in later blog posts. If you are interested, sign up to the Axoflow newsletter, or follow us on LinkedIn!

For an overview on how our platform enhances syslog-ng based log collection with metrics, including disk-buffer related metrics, see the Metrics for syslog-ng based log management infrastructures blog post.

Sign me up

Follow Our Progress!

We are excited to be realizing our vision above with a full Axoflow product suite.

Sign me up

Syslog-ng disk buffering for a resilient syslog architecture

How syslog-ng disk-buffer works

Disk-buffer considerations in syslog-ng

Preallocating disk-buffer files

Truncating disk-buffer files

Configuration examples

Disk-buffer metrics

How to use disk-buffers in containers and Kubernetes

Option names in older syslog-ng releases

Summary

Follow Our Progress!

Recent Posts

Any Questions?

Stay in Touch?

Tags

Syslog-ng disk buffering for a resilient syslog architecture

How syslog-ng disk-buffer works

Disk-buffer considerations in syslog-ng

Preallocating disk-buffer files

Truncating disk-buffer files

Configuration examples

Disk-buffer metrics

How to use disk-buffers in containers and Kubernetes

Option names in older syslog-ng releases

Summary

Follow Our Progress!

Follow Our Progress!

Recent Posts

Any Questions?

Stay in Touch?

Tags

Subscribe for Product News