1 - Collect metrics with Prometheus
Export AxoSyslog and syslog-ng metrics to Prometheus using the axosyslog-metrics-exporter and scrape them with Prometheus.
Prerequisites
- A running AxoSyslog instance
stats(level(2)) or higher set in your configuration file
- File-level access to the AxoSyslog control socket
Note
You must set
stats(level(2)) to expose host-level metrics. Without it, many metrics (including per-host counters) aren’t available. For details, see the
stats(level()) global option.
Deploy the metrics exporter
The axosyslog-metrics-exporter is a Go-based tool that exposes Prometheus-style metrics by connecting to the AxoSyslog control socket. It works with syslog-ng, syslog-ng Premium Edition, and all versions of AxoSyslog (syslog-ng™ is the trademark of One Identity LLC).
Run the exporter as a container:
sudo podman run -d -p 9577:9577 -v $(echo /var/*/syslog-ng/syslog-ng.ctl):/syslog-ng.ctl \
ghcr.io/axoflow/axosyslog-metrics-exporter:latest --socket.path=/syslog-ng.ctl
Once started, the metrics endpoint is available at http://127.0.0.1:9577/metrics.
Note
The control socket is typically located at /var/lib/syslog-ng/syslog-ng.ctl or /var/run/syslog-ng/syslog-ng.ctl. In containerized environments, share the Unix domain socket with the exporter container using a volume mount, as shown in the preceding command.
Create a prometheus.yml file with a scrape job pointing to the metrics exporter:
scrape_configs:
- job_name: axosyslog
static_configs:
- targets:
- <prometheus-host-ip>:9577
labels:
app: axosyslog
Then run Prometheus:
sudo podman run \
-p 9090:9090 \
-v ./prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
To verify that Prometheus is scraping correctly, open the following pages in your browser:
http://127.0.0.1:9090/config: shows the active configuration
http://127.0.0.1:9090/targets: shows whether the AxoSyslog scrape target is up
Key metrics to monitor
For a detailed reference, see Metrics reference. The main metrics that you should monitor are the following.
Critical metrics
These metrics indicate problems that require immediate attention:
output_unreachable: destination is unavailable
socket_receive_dropped_packets_total: messages dropped on the source side
output_events_total{result="dropped"}: messages dropped at the output without flow control
socket_rejected_connections_total: number of rejected incoming connections
Core pipeline metrics
These metrics give you a basic understanding of pipeline throughput:
input_events_total: total messages received by all sources
output_events_total: total messages sent by all destinations
filtered_events_total: total messages processed by filters
parsed_events_total: total messages processed by parsers
memory_queue_events and disk_queue_events: current buffer usage
io_worker_latency_seconds: I/O worker latency, a sign of potential overload
2 - Metrics reference
The following list shows the metrics available in AxoSyslog.
AxoSyslog provides detailed metrics about its performance and status for observability and monitoring. We recommend using Prometheus to scrape these metrics, see Collect metrics with Prometheus for details. To display the current metrics locally in Prometheus-compatible format, run:
syslog-ng-ctl stats prometheus
Note that which metrics are shown depends on the current value of the stats(level()) global option (you can list the available metrics by running syslog-ng --metrics-registry). For details on what the metrics mean, see Metrics reference.
Note
- Metrics that have the
_total suffix reset to zero when AxoSyslog is restarted. Reloading AxoSyslog doesn’t cause a reset.
- Different metrics are available on different
stats(level()).
classified_events_total
Description: Default metric of the metrics-probe() parser.
disk_queue_capacity_bytes
Description: Maximal size of the disk queue (in bytes), as set in the capacity-bytes() disk-buffer option.
disk_queue_capacity
Description: The size of the overflow queue of the destination, as set in the flow-control-window-size() disk-buffer option.
disk_queue_dir_available_bytes
Description: The size of the space available in the directories where disk-buffer files are stored (including directories storing abandoned disk-buffers), in bytes.
disk_queue_disk_allocated_bytes
Description: The actual size of the disk-buffer files, in bytes.
disk_queue_disk_usage_bytes
Description: Total size of data waiting in each disk-buffer, in bytes.
disk_queue_events
Description: Number of messages waiting in each disk-buffer by destination.
disk_queue_memory_usage_bytes
Description: Amount of memory used for caching disk-buffers, in bytes.
disk_queue_processed_events_total
Description: The number of events processed since startup by each disk-buffer.
event_processing_latency_seconds
Description: Histogram of the latency (time from receiving the message to fully processing it), from the source or destination perspective.
events_allocated_bytes
Description: The total amount of memory used by log messages in AxoSyslog.
filtered_events_total
Description: The total number of messages that matched and didn’t match a filter, for each filter in the configuration file.
Description: Incoming log messages processed by each source, measured in bytes.
Description: Number of incoming log messages processed by each source.
Description: Number of various transport errors that prevent AxoSyslog from ingesting messages, for example, TLS handshake errors or syslog framing errors. Labels include the source id, peer_address, and the reason for the error.
syslogng_input_transport_errors_total{address="127.0.0.1:5513",driver="syslog",peer_address="127.0.0.1",reason="invalid-frame-header",transport="tcp"} 1
syslogng_input_transport_errors_total{address="127.0.0.1:5515",driver="syslog",id="s_tls_req#0",peer_address="127.0.0.1",reason="tls-handshake",tls_error="0A0000C7",tls_error_string="SSL routines::peer did not return a certificate",transport="tls"} 1
Description: Available on stats(level(3)). Shows the current size of the flow-control window (how much is still free from log-iw-size()).
Description: Available on stats(level(3)). Shows the value of log-iw-size() (the size of the flow-control window).
Description: The total number of input window full events, for the entire configuration. These events cause AxoSyslog to throttle the source. Available on stats(level(1)).
internal_events_queue_capacity
Description: The internal queue size of the internal() source.
internal_events_total
Description: The number of messages the internal() source has queued, processed, or dropped.
io_worker_latency_seconds
Description: Shows how overloaded the IO workers of AxoSyslog are.
last_config_file_modification_timestamp_seconds
Description: The date when the configuration file was last modified.
last_config_reload_timestamp_seconds
Description: The date when the AxoSyslog configuration was last reloaded. If it differs from last_successful_config_reload_timestamp_seconds, reloading the configuration has failed.
last_successful_config_reload_timestamp_seconds
Description: The date when the AxoSyslog configuration was last reloaded successfully.
mainloop_io_worker_roundtrip_latency_seconds
Description: Shows how overloaded the main AxoSyslog loop is (how much time it takes to start a new worker). Values close to 0 are good, high values indicate high load or processing bottleneck.
memory_queue_capacity
Description: Shows the capacity (maximum possible size) of each queue. Note that this metric publishes log-fifo-size(), which only limits non-flow-controlled messages. Messages coming from flow-controlled paths aren’t limited by log-fifo-size(), but by the log-iw-size() of their corresponding source. For metrics on log-iw-size(), see input_window_available and input_window_capacity.
memory_queue_events
Description: Number of messages waiting in each memory queue by destination.
memory_queue_memory_usage_bytes
Description: Total bytes of data waiting in each memory queue.
memory_queue_processed_events_total
Description: The number of events processed since startup by each queue.
output_active_worker_partitions
Description: The number of active partitions when worker-partition-autoscaling() is set to yes.
output_batch_size_bytes
Description: Histogram-style metrics for the destination.
output_batch_size_events
Description: Histogram-style metrics for the destination.
output_batch_timedout_total
Description: For destinations that use batching, it shows the number of batches that were sent because of timeout (either batch-timeout() or batch-idle-timeout() expired).
output_event_bytes_total
Description: Log messages sent to each destination, measured in bytes.
output_event_latency_seconds
Description: Histogram of the latency: time from receiving the message to delivering it to the destination.
output_event_retries_total
Description: Shows the number of events when AxoSyslog retried sending a message.
output_event_size_bytes
Description: Histogram-style metrics for the destination.
output_events_total
Description: Number of log messages sent to each destination, showing sent and dropped messages.
output_grpc_requests_total
Description: The total number of gRPC requests.
output_http_requests_total
Description: Available on stats(level(1)) The total number of HTTP requests.
output_request_latency_seconds
Description: Histogram-style metrics for the destination.
output_unreachable
Description: A bool-like metric, which shows whether a destination is reachable or not.
output_workers
Description: The number of workers configured for each destination.
parallelize_failed_events_total
Description: The number of events that parallelize() couldn’t process in parallel. Such messages were sent without parallelization. A high number of such events can signal a configuration issue or a bottleneck.
parallelized_assigned_events_total
Description: The number of events each worker has received when using parallelize(). Can show if the workers receive the load unevenly.
parallelized_processed_events_total
Description: The number of events processed using parallelize().
parsed_events_total
Description: Shows the number of messages processed by each parser.
route_egress_total
Description: The number of messages delivered by each named log path.
route_ingress_total
Description: The number of messages entering each named log path.
scratch_buffers_bytes
Description: The number of bytes allocated to internal string buffers.
scratch_buffers_count
Description: The number of allocated internal string buffers.
socket_connections
Description: Number of active connections for the sources.
socket_max_connections
Description: Maximum permitted number of connections for the sources.
socket_receive_buffer_max_bytes
Description: The maximal size socket receive buffer in bytes, as configured in the so-rcvbuf() option of the destination.
socket_receive_buffer_used_bytes
Description: The number of bytes used from the socket receive buffer.
socket_receive_dropped_packets_total
Description: Number of UDP packets dropped by the OS before processing.
socket_rejected_connections_total
Description: The number of connections rejected because the max-connections() limit of the source was reached, for each source.
stats_level
Description: Shows the current verbosity level() of statistics and metrics.
tagged_events_total
Description: The number of messages marked with a tag, for each tag. (Every message automatically has the tag of its source in .source.<id_of_the_source_statement> format.)
3 - Statistics reference
Statistics are a legacy way to access the status of AxoSyslog. Metrics are newer and in active development. Many metrics aren’t available as legacy statistics.
You can list all active statistics on your AxoSyslog host using the following command (this lists the statistics, without their current values): syslog-ng-ctl query list "*"
To list the statistics and their values, use the following command: syslog-ng-ctl query get "*"
Example output:
destination.java.d_elastic#0.java_dst(ElasticSearch,elasticsearch-syslog-ng-test,t7cde889529c034aea9ec_micek).stats.dropped=0
destination.java.d_elastic#0.java_dst(ElasticSearch,elasticsearch-syslog-ng-test,t7cde889529c034aea9ec_micek).stats.processed=0
destination.java.d_elastic#0.java_dst(ElasticSearch,elasticsearch-syslog-ng-test,t7cde889529c034aea9ec_micek).stats.queued=0
destination.d_elastic.stats.processed=0
The displayed statistics have the following structure.
-
The type of the object (for example, dst.file, tag, src.facility)
-
The ID of the object used in the syslog-ng.conf configuration file, for example, d_internal or source.src_tcp. The #0 part means that this is the first destination in the destination group.
-
The instance ID (destination) of the object, for example, the filename of a file destination, or the name of the application for a program source or destination.
-
The status of the object. One of the following:
-
a: active. At the time of querying the statistics, the source or the destination was still alive (it continuously received statistical data).
-
d: dynamic. Such objects may not be continuously available, for example, like statistics based on the sender’s hostname. These counters only appear above a certain value of stats(level()) global option:
host: source host, from stats(level(2))
program: program, from stats(level(3))
sender: sender host, from stats(level(3))
The following example contains 6 different dynamic values: a sender, a host, and four different programs.
src.sender;;localhost;d;processed;4
src.sender;;localhost;d;stamp;1509121934
src.program;;P-18069;d;processed;1
src.program;;P-18069;d;stamp;1509121933
src.program;;P-21491;d;processed;1
src.program;;P-21491;d;stamp;1509121934
src.program;;P-9774;d;processed;1
src.program;;P-9774;d;stamp;1509121919
src.program;;P-14737;d;processed;1
src.program;;P-14737;d;stamp;1509121931
src.host;;localhost;d;processed;4
src.host;;localhost;d;stamp;1509121934
To avoid performance issues or even overloading AxoSyslog, you might want to limit the number of registered dynamic counters in the message statistics. To do this, configure the stats(max-dynamics()) global option.
-
o: This object was once active, but stopped receiving messages. (For example, a dynamic object may disappear and become orphan.)
Note
The AxoSyslog application stores the statistics of the objects when AxoSyslog is reloaded. However, if the configuration of AxoSyslog changed since the last reload, the statistics of orphaned objects are deleted.
The connections statistics counter displays the number of connections tracked by AxoSyslog for the selected source driver.
Example configuration and statistics output
The following configuration will display the following syslog-ng-ctl statistics output:
Configuration:
source s_network {
tcp(
port(8001)
);
};
Statistics output:
src.tcp;s_network#0;tcp,127.0.0.5;a;processed;1
src.tcp;s_network#0;tcp,127.0.0.1;a;processed;3
src.tcp;s_network;afsocket_sd.(stream,AF_INET(0.0.0.0:8001));a;connections;2
Statistics reference
The type of the statistics:
-
batch_size_avg: When batching is enabled, then this shows the current average batch size of the given source or destination.
-
batch_size_max: When batching is enabled, the value of batch_size_max shows the current largest batch size of the given source or destination.
-
discarded: The number of messages discarded by the given parser. These are messages that the parser could not parsed, and are therefore not processed. For example:
parser;demo_parser;;a;discarded;20
-
dropped: The number of dropped messages. AxoSyslog could not send these messages to the destination and the output buffer got full, so messages were dropped by the destination driver, or AxoSyslog dropped the message for some other reason (for example, a parsing error).
-
eps_last_1h: The EPS value of the past 1 hour.
-
eps_last_24h: The EPS value of the past 24 hours.
-
eps_since_start: The EPS value since the current AxoSyslog start.
Note
When using the eps_last_1h, the eps_last_24h, and the eps_since_start statistics, consider the following:
- EPS stands for “event per second”, and in our case, a message received or sent counts as a single event.
- The
eps_last_1h, the eps_last_24h, and the eps_since_start values are only approximate values.
- The
eps_last_1h, the eps_last_24h, and the eps_since_start values are automatically updated every 60 seconds.
-
matched: The number of messages that are accepted by a given filter. Available for filters and similar objects (for example, a conditional rewrite rule). For example, if a filter matches a specific hostname, then the matched counter contains the number of messages that reached the filter from this hosts.
filter;demo_filter;;a;matched;28
-
memory_usage: The memory used by the messages in the different queue types (in bytes). This includes every queue used by the object, including memory buffers (log-fifo) and disk-based buffers (both reliable and non-reliable). For example:
dst.network;d_net#0;tcp,127.0.0.1:9999;a;memory_usage;0
Note
The memory usage (size) of queues is not equal to the memory usage (size) of the log messages in AxoSyslog. A log message can be in multiple queues, thus its size is added to multiple queue sizes. To check the size of all log messages, use global.msg_allocated_bytes.value metric.
-
msg_size_max: The current largest message size of the given source or destination.
-
msg_size_avg: The current average message size of the given source or destination.
Note
When using the msg_size_avg and msg_size_max statistics, consider that message sizes are calculated as follows:
- on the source side: the length of the incoming raw message
- on the destination side: the length of the outgoing formatted message
-
not_matched: The number of messages that are filtered out by a given filter. Available for filters and similar objects (for example, a conditional rewrite rule). For example, if a filter matches a specific hostname, then the not_matched counter contains the number of messages that reached the filter from other hosts, and so the filter discarded them.
Note
Since the not_matched metric applies to filters, and filters are expected to discard messages that do not match the filter condition, not_matched messages are not included in the dropped metric of other objects.
filter;demo_filter;;a;not_matched;0
-
processed: The number of messages that successfully reached their destination driver.
Note
Consider that a message that has successfully reached its destination driver does not necessarily mean that the destination driver successfully delivered the messages as well. For example, a message can be written to disk or sent to a remote server after reaching the destination driver.
-
queued: The number of messages passed to the message queue of the destination driver, waiting to be sent to the destination.
-
stamp: The UNIX timestamp of the last message sent to the destination.
-
suppressed: The number of suppressed messages (if the suppress() feature is enabled).
-
written: The number of messages successfully delivered to the destination. This value is calculated from other counters: written = processed - queued - dropped. That is, the number of messages AxoSyslog passed to the destination driver (processed) minus the number of messages that are still in the output queue of the destination driver (queued) and the number of messages dropped because of an error (dropped, for example, because AxoSyslog could not deliver the message to the destination and exceeded the number of retries).
This metric is calculated from other metrics. You cannot reset this metric directly: to reset it, you have to reset the metrics it is calculated from.
Note
Consider that for AxoSyslog version 3.36, the following statistics counters are only supported for the http() destination, or the http() destination and all network() sources and destinations, and all file() sources and destinations, respectively:
msg_size_max
msg_size_avg
batch_size_max
batch_size_avg
eps_last_1h
eps_last_24h
eps_since_start
Availability of statistics
Certain statistics are available only if the stats(level()) global option is set to a higher value.
- Level 0 collects only statistics about the sources and destinations.
- Level 1 contains details about the different connections and log files, but has a slight memory overhead.
- Level 2 contains detailed statistics based on the hostname.
- Level 3 contains detailed statistics based on various message parameters like facility, severity, or tags.
When receiving messages with non-standard facility values (that is, higher than 23), these messages will be listed as other facility instead of their facility number.
Aggregated statistics
Aggregated statistics are available for different sources and destinations from different levels and upwards:
network() source and destination
|
from level 1 |
from level 1 |
counter N/A |
counter N/A |
from level 1 |
from level 1 |
from level 1 |
file() source and destination
|
from level 1 |
from level 1 |
counter N/A |
counter N/A |
from level 1 |
from level 1 |
from level 1 |
http() destination
|
from level 0
|
from level 0 |
from level 0 |
from level 0 |
from level 0 |
from level 0 |
from level 0 |
4 - Log statistics from the internal() source
Note
Instead of using the statistics messages of the
internal() source, we recommend monitoring AxoSyslog using
metrics, or if it’s not possible in your environment, by
querying statistics.
If the stats(freq()) global option is higher than 0, AxoSyslog periodically sends a log statistics message. This message contains statistics about the received messages, and about any lost messages since the last such message. It includes:
- a
processed entry for every source and destination, listing the number of messages received or sent, and
- a
dropped entry including the IP address of the server for every destination where AxoSyslog has lost messages.
- The
center(received) entry shows the total number of messages received from every configured sources.
The following is a sample log statistics message for a configuration that has a single source (s_local) and a network and a local file destination (d_network and d_local, respectively). All incoming messages are sent to both destinations.
Log statistics;
dropped='tcp(AF_INET(192.168.10.1:514))=6439',
processed='center(received)=234413',
processed='destination(d_tcp)=234413',
processed='destination(d_local)=234413',
processed='source(s_local)=234413'
The statistics include a list of source groups and destinations, as well as the number of processed messages for each. You can control the verbosity of the statistics using the stats(level()) global option. The following is an example output.
src.internal;s_all#0;;a;processed;6445
src.internal;s_all#0;;a;stamp;1268989330
destination;df_auth;;a;processed;404
destination;df_news_dot_notice;;a;processed;0
destination;df_news_dot_err;;a;processed;0
destination;d_ssb;;a;processed;7128
destination;df_uucp;;a;processed;0
source;s_all;;a;processed;7128
destination;df_mail;;a;processed;0
destination;df_user;;a;processed;1
destination;df_daemon;;a;processed;1
destination;df_debug;;a;processed;15
destination;df_messages;;a;processed;54
destination;dp_xconsole;;a;processed;671
dst.tcp;d_network#0;10.50.0.111:514;a;dropped;5080
dst.tcp;d_network#0;10.50.0.111:514;a;processed;7128
dst.tcp;d_network#0;10.50.0.111:514;a;queued;2048
destination;df_syslog;;a;processed;6724
destination;df_facility_dot_warn;;a;processed;0
destination;df_news_dot_crit;;a;processed;0
destination;df_lpr;;a;processed;0
destination;du_all;;a;processed;0
destination;df_facility_dot_info;;a;processed;0
center;;received;a;processed;0
destination;df_kern;;a;processed;70
center;;queued;a;processed;0
destination;df_facility_dot_err;;a;processed;0
The statistics are semicolon separated: every line contains statistics for a particular object (like source, destination, tag).
To reset the statistics to zero, use the following command: syslog-ng-ctl stats --reset