This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Statistics and metrics of AxoSyslog

1: Collect metrics with Prometheus

2: Metrics reference

3: Statistics reference

4: Log statistics from the internal() source

The AxoSyslog application collects various statistics and metrics about its performance and status for observability and monitoring. Which metrics and statistics are collected depends on the configuration of AxoSyslog and the value of the stats(level()) global option.

Metrics and statistics

AxoSyslog provides detailed metrics about its performance and status for observability and monitoring. We recommend using Prometheus to scrape these metrics, see Collect metrics with Prometheus for details. To display the current metrics locally in Prometheus-compatible format, run:
```
syslog-ng-ctl stats prometheus
```
Note that which metrics are shown depends on the current value of the stats(level()) global option (you can list the available metrics by running syslog-ng --metrics-registry). For details on what the metrics mean, see Metrics reference.
Statistics are a legacy way to access the status of AxoSyslog. Metrics are newer and in active development. Many metrics aren’t available as legacy statistics.

You can access legacy statistics using the following methods.
- The syslog-ng-ctl query command gives structured access to the selected legacy statistics..
- The syslog-ng-ctl stats command lists all the available legacy statistics in bulk.
- Using the internal() source. We recommend using either of the previous two methods instead.
For details about the available counters and the output format, see Statistics reference.

1 - Collect metrics with Prometheus

Export AxoSyslog and syslog-ng metrics to Prometheus using the axosyslog-metrics-exporter and scrape them with Prometheus.

Prerequisites

A running AxoSyslog instance
stats(level(2)) or higher set in your configuration file
File-level access to the AxoSyslog control socket

Note You must set stats(level(2)) to expose host-level metrics. Without it, many metrics (including per-host counters) aren’t available. For details, see the stats(level()) global option.

Deploy the metrics exporter

The axosyslog-metrics-exporter is a Go-based tool that exposes Prometheus-style metrics by connecting to the AxoSyslog control socket. It works with syslog-ng, syslog-ng Premium Edition, and all versions of AxoSyslog (syslog-ng™ is the trademark of One Identity LLC).

Run the exporter as a container:

sudo podman run -d -p 9577:9577 -v $(echo /var/*/syslog-ng/syslog-ng.ctl):/syslog-ng.ctl \
  ghcr.io/axoflow/axosyslog-metrics-exporter:latest --socket.path=/syslog-ng.ctl

Once started, the metrics endpoint is available at http://127.0.0.1:9577/metrics.

Note The control socket is typically located at /var/lib/syslog-ng/syslog-ng.ctl or /var/run/syslog-ng/syslog-ng.ctl. In containerized environments, share the Unix domain socket with the exporter container using a volume mount, as shown in the preceding command.

Configure Prometheus

Create a prometheus.yml file with a scrape job pointing to the metrics exporter:

scrape_configs:
  - job_name: axosyslog
    static_configs:
      - targets:
          - <prometheus-host-ip>:9577
        labels:
          app: axosyslog

Then run Prometheus:

sudo podman run \
    -p 9090:9090 \
    -v ./prometheus.yml:/etc/prometheus/prometheus.yml \
    prom/prometheus

To verify that Prometheus is scraping correctly, open the following pages in your browser:

http://127.0.0.1:9090/config: shows the active configuration
http://127.0.0.1:9090/targets: shows whether the AxoSyslog scrape target is up

Key metrics to monitor

For a detailed reference, see Metrics reference. The main metrics that you should monitor are the following.

Critical metrics

These metrics indicate problems that require immediate attention:

output_unreachable: destination is unavailable
socket_receive_dropped_packets_total: messages dropped on the source side
output_events_total{result="dropped"}: messages dropped at the output without flow control
socket_rejected_connections_total: number of rejected incoming connections

Core pipeline metrics

These metrics give you a basic understanding of pipeline throughput:

input_events_total: total messages received by all sources
output_events_total: total messages sent by all destinations
filtered_events_total: total messages processed by filters
parsed_events_total: total messages processed by parsers
memory_queue_events and disk_queue_events: current buffer usage
io_worker_latency_seconds: I/O worker latency, a sign of potential overload

2 - Metrics reference

The following list shows the metrics available in AxoSyslog.

AxoSyslog provides detailed metrics about its performance and status for observability and monitoring. We recommend using Prometheus to scrape these metrics, see Collect metrics with Prometheus for details. To display the current metrics locally in Prometheus-compatible format, run:

syslog-ng-ctl stats prometheus

Note that which metrics are shown depends on the current value of the stats(level()) global option (you can list the available metrics by running syslog-ng --metrics-registry). For details on what the metrics mean, see Metrics reference.

Note

Metrics that have the _total suffix reset to zero when AxoSyslog is restarted. Reloading AxoSyslog doesn’t cause a reset.
Different metrics are available on different stats(level()).

classified_events_total

Description: Default metric of the metrics-probe() parser.

disk_queue_capacity_bytes

Description: Maximal size of the disk queue (in bytes), as set in the capacity-bytes() disk-buffer option.

disk_queue_capacity

Description: The size of the overflow queue of the destination, as set in the flow-control-window-size() disk-buffer option.

disk_queue_dir_available_bytes

Description: The size of the space available in the directories where disk-buffer files are stored (including directories storing abandoned disk-buffers), in bytes.

disk_queue_disk_allocated_bytes

Description: The actual size of the disk-buffer files, in bytes.

disk_queue_disk_usage_bytes

Description: Total size of data waiting in each disk-buffer, in bytes.

disk_queue_events

Description: Number of messages waiting in each disk-buffer by destination.

disk_queue_memory_usage_bytes

Description: Amount of memory used for caching disk-buffers, in bytes.

disk_queue_processed_events_total

Description: The number of events processed since startup by each disk-buffer.

event_processing_latency_seconds

Description: Histogram of the latency (time from receiving the message to fully processing it), from the source or destination perspective.

events_allocated_bytes

Description: The total amount of memory used by log messages in AxoSyslog.

filtered_events_total

Description: The total number of messages that matched and didn’t match a filter, for each filter in the configuration file.

input_event_bytes_total

Description: Incoming log messages processed by each source, measured in bytes.

input_events_total

Description: Number of incoming log messages processed by each source.

input_transport_errors_total

Description: Number of various transport errors that prevent AxoSyslog from ingesting messages, for example, TLS handshake errors or syslog framing errors. Labels include the source id, peer_address, and the reason for the error.

syslogng_input_transport_errors_total{address="127.0.0.1:5513",driver="syslog",peer_address="127.0.0.1",reason="invalid-frame-header",transport="tcp"} 1
syslogng_input_transport_errors_total{address="127.0.0.1:5515",driver="syslog",id="s_tls_req#0",peer_address="127.0.0.1",reason="tls-handshake",tls_error="0A0000C7",tls_error_string="SSL routines::peer did not return a certificate",transport="tls"} 1

input_window_available

Description: Available on stats(level(3)). Shows the current size of the flow-control window (how much is still free from log-iw-size()).

input_window_capacity

Description: Available on stats(level(3)). Shows the value of log-iw-size() (the size of the flow-control window).

input_window_full_total

Description: The total number of input window full events, for the entire configuration. These events cause AxoSyslog to throttle the source. Available on stats(level(1)).

internal_events_queue_capacity

Description: The internal queue size of the internal() source.

internal_events_total

Description: The number of messages the internal() source has queued, processed, or dropped.

io_worker_latency_seconds

Description: Shows how overloaded the IO workers of AxoSyslog are.

last_config_file_modification_timestamp_seconds

Description: The date when the configuration file was last modified.

last_config_reload_timestamp_seconds

Description: The date when the AxoSyslog configuration was last reloaded. If it differs from last_successful_config_reload_timestamp_seconds, reloading the configuration has failed.

last_successful_config_reload_timestamp_seconds

Description: The date when the AxoSyslog configuration was last reloaded successfully.

mainloop_io_worker_roundtrip_latency_seconds

Description: Shows how overloaded the main AxoSyslog loop is (how much time it takes to start a new worker). Values close to 0 are good, high values indicate high load or processing bottleneck.

memory_queue_capacity

Description: Shows the capacity (maximum possible size) of each queue. Note that this metric publishes log-fifo-size(), which only limits non-flow-controlled messages. Messages coming from flow-controlled paths aren’t limited by log-fifo-size(), but by the log-iw-size() of their corresponding source. For metrics on log-iw-size(), see input_window_available and input_window_capacity.

memory_queue_events

Description: Number of messages waiting in each memory queue by destination.

memory_queue_memory_usage_bytes

Description: Total bytes of data waiting in each memory queue.

memory_queue_processed_events_total

Description: The number of events processed since startup by each queue.

output_active_worker_partitions

Description: The number of active partitions when worker-partition-autoscaling() is set to yes.

output_batch_size_bytes

Description: Histogram-style metrics for the destination.

output_batch_size_events

Description: Histogram-style metrics for the destination.

output_batch_timedout_total

Description: For destinations that use batching, it shows the number of batches that were sent because of timeout (either batch-timeout() or batch-idle-timeout() expired).

output_event_bytes_total

Description: Log messages sent to each destination, measured in bytes.

output_event_latency_seconds

Description: Histogram of the latency: time from receiving the message to delivering it to the destination.

output_event_retries_total

Description: Shows the number of events when AxoSyslog retried sending a message.

output_event_size_bytes

Description: Histogram-style metrics for the destination.

output_events_total

Description: Number of log messages sent to each destination, showing sent and dropped messages.

output_grpc_requests_total

Description: The total number of gRPC requests.

output_http_requests_total

Description: Available on stats(level(1)) The total number of HTTP requests.

output_request_latency_seconds

Description: Histogram-style metrics for the destination.

output_unreachable

Description: A bool-like metric, which shows whether a destination is reachable or not.

output_workers

Description: The number of workers configured for each destination.

parallelize_failed_events_total

Description: The number of events that parallelize() couldn’t process in parallel. Such messages were sent without parallelization. A high number of such events can signal a configuration issue or a bottleneck.

parallelized_assigned_events_total

Description: The number of events each worker has received when using parallelize(). Can show if the workers receive the load unevenly.

parallelized_batch_size

Available in AxoSyslog 4.26 and later at stats-level(4)

Description: Prometheus-style histograms to that show how logscheduler batches messages. Useful only for low-level debugging and troubleshooting.

syslogng_parallelized_batch_size_sum{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5"} 1000000
syslogng_parallelized_batch_size_count{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5"} 13962
syslogng_parallelized_batch_size_bucket{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5",le="1"} 4686
syslogng_parallelized_batch_size_bucket{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5",le="2"} 2857
syslogng_parallelized_batch_size_bucket{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5",le="4"} 2757
syslogng_parallelized_batch_size_bucket{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5",le="8"} 1401

parallelized_input_batch_size

Available in AxoSyslog 4.26 and later at stats-level(4)

Description: Prometheus-style histograms to that show how logscheduler batches messages. Useful only for low-level debugging and troubleshooting.

Example:

syslogng_parallelized_input_batch_size_sum{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5"} 1000000
syslogng_parallelized_input_batch_size_count{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5"} 11979
syslogng_parallelized_input_batch_size_bucket{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5",le="1"} 4545
syslogng_parallelized_input_batch_size_bucket{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5",le="2"} 2784
syslogng_parallelized_input_batch_size_bucket{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5",le="4"} 2729
syslogng_parallelized_input_batch_size_bucket{parallelize="/install/etc/callgrind-syslog-ng.conf:30:5",le="8"} 1429

parallelized_processed_events_total

Description: The number of events processed using parallelize().

parsed_events_total

Description: Shows the number of messages processed by each parser.

route_egress_total

Description: The number of messages delivered by each named log path.

route_ingress_total

Description: The number of messages entering each named log path.

scratch_buffers_bytes

Description: The number of bytes allocated to internal string buffers.

scratch_buffers_count

Description: The number of allocated internal string buffers.

socket_connections

Description: Number of active connections for the sources.

socket_max_connections

Description: Maximum permitted number of connections for the sources.

socket_receive_buffer_max_bytes

Description: The maximal size socket receive buffer in bytes, as configured in the so-rcvbuf() option of the destination.

socket_receive_buffer_used_bytes

Description: The number of bytes used from the socket receive buffer.

socket_receive_dropped_packets_total

Description: Number of UDP packets dropped by the OS before processing.

socket_rejected_connections_total

Description: The number of connections rejected because the max-connections() limit of the source was reached, for each source.

stats_level

Description: Shows the current verbosity level() of statistics and metrics.

tagged_events_total

Description: The number of messages marked with a tag, for each tag. (Every message automatically has the tag of its source in .source.<id_of_the_source_statement> format.)

3 - Statistics reference

Statistics are a legacy way to access the status of AxoSyslog. Metrics are newer and in active development. Many metrics aren’t available as legacy statistics.

You can list all active statistics on your AxoSyslog host using the following command (this lists the statistics, without their current values): syslog-ng-ctl query list "*"

Format of statistics

To list the statistics and their values, use the following command: syslog-ng-ctl query get "*"

Example output:

destination.java.d_elastic#0.java_dst(ElasticSearch,elasticsearch-syslog-ng-test,t7cde889529c034aea9ec_micek).stats.dropped=0
destination.java.d_elastic#0.java_dst(ElasticSearch,elasticsearch-syslog-ng-test,t7cde889529c034aea9ec_micek).stats.processed=0
destination.java.d_elastic#0.java_dst(ElasticSearch,elasticsearch-syslog-ng-test,t7cde889529c034aea9ec_micek).stats.queued=0
destination.d_elastic.stats.processed=0

The displayed statistics have the following structure.

The type of the object (for example, dst.file, tag, src.facility)
The ID of the object used in the syslog-ng.conf configuration file, for example, d_internal or source.src_tcp. The #0 part means that this is the first destination in the destination group.
The instance ID (destination) of the object, for example, the filename of a file destination, or the name of the application for a program source or destination.
The status of the object. One of the following:
- a: active. At the time of querying the statistics, the source or the destination was still alive (it continuously received statistical data).
- d: dynamic. Such objects may not be continuously available, for example, like statistics based on the sender’s hostname. These counters only appear above a certain value of stats(level()) global option:
  - host: source host, from stats(level(2))
  - program: program, from stats(level(3))
  - sender: sender host, from stats(level(3))
  The following example contains 6 different dynamic values: a sender, a host, and four different programs.
```
src.sender;;localhost;d;processed;4
src.sender;;localhost;d;stamp;1509121934
src.program;;P-18069;d;processed;1
src.program;;P-18069;d;stamp;1509121933
src.program;;P-21491;d;processed;1
src.program;;P-21491;d;stamp;1509121934
src.program;;P-9774;d;processed;1
src.program;;P-9774;d;stamp;1509121919
src.program;;P-14737;d;processed;1
src.program;;P-14737;d;stamp;1509121931
src.host;;localhost;d;processed;4
src.host;;localhost;d;stamp;1509121934
```
  To avoid performance issues or even overloading AxoSyslog, you might want to limit the number of registered dynamic counters in the message statistics. To do this, configure the stats(max-dynamics()) global option.
- o: This object was once active, but stopped receiving messages. (For example, a dynamic object may disappear and become orphan.)
Note The AxoSyslog application stores the statistics of the objects when AxoSyslog is reloaded. However, if the configuration of AxoSyslog changed since the last reload, the statistics of orphaned objects are deleted.

The connections statistics counter displays the number of connections tracked by AxoSyslog for the selected source driver.

Example configuration and statistics output

The following configuration will display the following syslog-ng-ctl statistics output:

Configuration:

source s_network { 
  tcp( 
    port(8001)  
  ); 
};

Statistics output:

src.tcp;s_network#0;tcp,127.0.0.5;a;processed;1
src.tcp;s_network#0;tcp,127.0.0.1;a;processed;3
src.tcp;s_network;afsocket_sd.(stream,AF_INET(0.0.0.0:8001));a;connections;2

Statistics reference

The type of the statistics:

batch_size_avg: When batching is enabled, then this shows the current average batch size of the given source or destination.
batch_size_max: When batching is enabled, the value of batch_size_max shows the current largest batch size of the given source or destination.
discarded: The number of messages discarded by the given parser. These are messages that the parser could not parsed, and are therefore not processed. For example:
```
parser;demo_parser;;a;discarded;20
```
dropped: The number of dropped messages. AxoSyslog could not send these messages to the destination and the output buffer got full, so messages were dropped by the destination driver, or AxoSyslog dropped the message for some other reason (for example, a parsing error).
eps_last_1h: The EPS value of the past 1 hour.
eps_last_24h: The EPS value of the past 24 hours.
eps_since_start: The EPS value since the current AxoSyslog start.
Note
When using the eps_last_1h, the eps_last_24h, and the eps_since_start statistics, consider the following:
- EPS stands for “event per second”, and in our case, a message received or sent counts as a single event.
- The eps_last_1h, the eps_last_24h, and the eps_since_start values are only approximate values.
- The eps_last_1h, the eps_last_24h, and the eps_since_start values are automatically updated every 60 seconds.
matched: The number of messages that are accepted by a given filter. Available for filters and similar objects (for example, a conditional rewrite rule). For example, if a filter matches a specific hostname, then the matched counter contains the number of messages that reached the filter from this hosts.
```
filter;demo_filter;;a;matched;28
```
memory_usage: The memory used by the messages in the different queue types (in bytes). This includes every queue used by the object, including memory buffers (log-fifo) and disk-based buffers (both reliable and non-reliable). For example:
```
dst.network;d_net#0;tcp,127.0.0.1:9999;a;memory_usage;0
```
Note The memory usage (size) of queues is not equal to the memory usage (size) of the log messages in AxoSyslog. A log message can be in multiple queues, thus its size is added to multiple queue sizes. To check the size of all log messages, use global.msg_allocated_bytes.value metric.
msg_size_max: The current largest message size of the given source or destination.
msg_size_avg: The current average message size of the given source or destination.
Note
When using the msg_size_avg and msg_size_max statistics, consider that message sizes are calculated as follows:
- on the source side: the length of the incoming raw message
- on the destination side: the length of the outgoing formatted message
not_matched: The number of messages that are filtered out by a given filter. Available for filters and similar objects (for example, a conditional rewrite rule). For example, if a filter matches a specific hostname, then the not_matched counter contains the number of messages that reached the filter from other hosts, and so the filter discarded them.
Note
Since the not_matched metric applies to filters, and filters are expected to discard messages that do not match the filter condition, not_matched messages are not included in the dropped metric of other objects.
```
filter;demo_filter;;a;not_matched;0
```
processed: The number of messages that successfully reached their destination driver.

Note Consider that a message that has successfully reached its destination driver does not necessarily mean that the destination driver successfully delivered the messages as well. For example, a message can be written to disk or sent to a remote server after reaching the destination driver.
queued: The number of messages passed to the message queue of the destination driver, waiting to be sent to the destination.
stamp: The UNIX timestamp of the last message sent to the destination.
suppressed: The number of suppressed messages (if the suppress() feature is enabled).
written: The number of messages successfully delivered to the destination. This value is calculated from other counters: written = processed - queued - dropped. That is, the number of messages AxoSyslog passed to the destination driver (processed) minus the number of messages that are still in the output queue of the destination driver (queued) and the number of messages dropped because of an error (dropped, for example, because AxoSyslog could not deliver the message to the destination and exceeded the number of retries).

This metric is calculated from other metrics. You cannot reset this metric directly: to reset it, you have to reset the metrics it is calculated from.

Note

Consider that for AxoSyslog version 3.36, the following statistics counters are only supported for the http() destination, or the http() destination and all network() sources and destinations, and all file() sources and destinations, respectively:

msg_size_max
msg_size_avg
batch_size_max
batch_size_avg
eps_last_1h
eps_last_24h
eps_since_start

Availability of statistics

Certain statistics are available only if the stats(level()) global option is set to a higher value.

Level 0 collects only statistics about the sources and destinations.
Level 1 contains details about the different connections and log files, but has a slight memory overhead.
Level 2 contains detailed statistics based on the hostname.
Level 3 contains detailed statistics based on various message parameters like facility, severity, or tags.

When receiving messages with non-standard facility values (that is, higher than 23), these messages will be listed as other facility instead of their facility number.

Aggregated statistics

Aggregated statistics are available for different sources and destinations from different levels and upwards:

	`msg_size_avg`	`msg_size_max`	`batch_size_avg`	`batch_size_max`	`eps_last_1h`	`eps_last_1h`	`eps_last_1h`
`network()` source and destination	from level 1	from level 1	counter N/A	counter N/A	from level 1	from level 1	from level 1
`file()` source and destination	from level 1	from level 1	counter N/A	counter N/A	from level 1	from level 1	from level 1
`http()` destination	from level 0	from level 0	from level 0	from level 0	from level 0	from level 0	from level 0

msg_size_avg

msg_size_max

batch_size_avg

batch_size_max

eps_last_1h

network() source and destination

from level 1

counter N/A

from level 1

file() source and destination

from level 1

counter N/A

from level 1

http() destination

from level 0

4 - Log statistics from the internal() source

Note Instead of using the statistics messages of the internal() source, we recommend monitoring AxoSyslog using metrics, or if it’s not possible in your environment, by querying statistics.

If the stats(freq()) global option is higher than 0, AxoSyslog periodically sends a log statistics message. This message contains statistics about the received messages, and about any lost messages since the last such message. It includes:

a processed entry for every source and destination, listing the number of messages received or sent, and
a dropped entry including the IP address of the server for every destination where AxoSyslog has lost messages.
The center(received) entry shows the total number of messages received from every configured sources.

The following is a sample log statistics message for a configuration that has a single source (s_local) and a network and a local file destination (d_network and d_local, respectively). All incoming messages are sent to both destinations.

Log statistics;
dropped='tcp(AF_INET(192.168.10.1:514))=6439',
processed='center(received)=234413',
processed='destination(d_tcp)=234413',
processed='destination(d_local)=234413',
processed='source(s_local)=234413'

The statistics include a list of source groups and destinations, as well as the number of processed messages for each. You can control the verbosity of the statistics using the stats(level()) global option. The following is an example output.

src.internal;s_all#0;;a;processed;6445
src.internal;s_all#0;;a;stamp;1268989330
destination;df_auth;;a;processed;404
destination;df_news_dot_notice;;a;processed;0
destination;df_news_dot_err;;a;processed;0
destination;d_ssb;;a;processed;7128
destination;df_uucp;;a;processed;0
source;s_all;;a;processed;7128
destination;df_mail;;a;processed;0
destination;df_user;;a;processed;1
destination;df_daemon;;a;processed;1
destination;df_debug;;a;processed;15
destination;df_messages;;a;processed;54
destination;dp_xconsole;;a;processed;671
dst.tcp;d_network#0;10.50.0.111:514;a;dropped;5080
dst.tcp;d_network#0;10.50.0.111:514;a;processed;7128
dst.tcp;d_network#0;10.50.0.111:514;a;queued;2048
destination;df_syslog;;a;processed;6724
destination;df_facility_dot_warn;;a;processed;0
destination;df_news_dot_crit;;a;processed;0
destination;df_lpr;;a;processed;0
destination;du_all;;a;processed;0
destination;df_facility_dot_info;;a;processed;0
center;;received;a;processed;0
destination;df_kern;;a;processed;70
center;;queued;a;processed;0
destination;df_facility_dot_err;;a;processed;0

The statistics are semicolon separated: every line contains statistics for a particular object (like source, destination, tag).

To reset the statistics to zero, use the following command: syslog-ng-ctl stats --reset