This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Statistics and metrics of AxoSyslog

The AxoSyslog application collects various statistics and metrics about its performance and status for observability and monitoring. Which metrics and statistics are collected depends on the configuration of AxoSyslog and the value of the stats(level()) global option.

Metrics and statistics

  • AxoSyslog provides detailed metrics about its performance and status for observability and monitoring. We recommend using Prometheus to scrape these metrics, see Collect metrics with Prometheus for details. To display the current metrics locally in Prometheus-compatible format, run:

    syslog-ng-ctl stats prometheus
    

    Note that which metrics are shown depends on the current value of the stats(level()) global option (you can list the available metrics by running syslog-ng --metrics-registry). For details on what the metrics mean, see Metrics reference.

  • Statistics are a legacy way to access the status of AxoSyslog. Metrics are newer and in active development. Many metrics aren’t available as legacy statistics.

    You can access legacy statistics using the following methods.

    For details about the available counters and the output format, see Statistics reference.

1 - Collect metrics with Prometheus

Export AxoSyslog and syslog-ng metrics to Prometheus using the axosyslog-metrics-exporter and scrape them with Prometheus.

Prerequisites

  • A running AxoSyslog instance
  • stats(level(2)) or higher set in your configuration file
  • File-level access to the AxoSyslog control socket

Deploy the metrics exporter

The axosyslog-metrics-exporter is a Go-based tool that exposes Prometheus-style metrics by connecting to the AxoSyslog control socket. It works with syslog-ng, syslog-ng Premium Edition, and all versions of AxoSyslog (syslog-ng™ is the trademark of One Identity LLC).

Run the exporter as a container:

sudo podman run -d -p 9577:9577 -v $(echo /var/*/syslog-ng/syslog-ng.ctl):/syslog-ng.ctl \
  ghcr.io/axoflow/axosyslog-metrics-exporter:latest --socket.path=/syslog-ng.ctl

Once started, the metrics endpoint is available at http://127.0.0.1:9577/metrics.

Configure Prometheus

Create a prometheus.yml file with a scrape job pointing to the metrics exporter:

scrape_configs:
  - job_name: axosyslog
    static_configs:
      - targets:
          - <prometheus-host-ip>:9577
        labels:
          app: axosyslog

Then run Prometheus:

sudo podman run \
    -p 9090:9090 \
    -v ./prometheus.yml:/etc/prometheus/prometheus.yml \
    prom/prometheus

To verify that Prometheus is scraping correctly, open the following pages in your browser:

  • http://127.0.0.1:9090/config: shows the active configuration
  • http://127.0.0.1:9090/targets: shows whether the AxoSyslog scrape target is up

Key metrics to monitor

For a detailed reference, see Metrics reference. The main metrics that you should monitor are the following.

Critical metrics

These metrics indicate problems that require immediate attention:

  • output_unreachable: destination is unavailable
  • socket_receive_dropped_packets_total: messages dropped on the source side
  • output_events_total{result="dropped"}: messages dropped at the output without flow control
  • socket_rejected_connections_total: number of rejected incoming connections

Core pipeline metrics

These metrics give you a basic understanding of pipeline throughput:

  • input_events_total: total messages received by all sources
  • output_events_total: total messages sent by all destinations
  • filtered_events_total: total messages processed by filters
  • parsed_events_total: total messages processed by parsers
  • memory_queue_events and disk_queue_events: current buffer usage
  • io_worker_latency_seconds: I/O worker latency, a sign of potential overload

2 - Metrics reference

The following list shows the metrics available in AxoSyslog.

AxoSyslog provides detailed metrics about its performance and status for observability and monitoring. We recommend using Prometheus to scrape these metrics, see Collect metrics with Prometheus for details. To display the current metrics locally in Prometheus-compatible format, run:

syslog-ng-ctl stats prometheus

Note that which metrics are shown depends on the current value of the stats(level()) global option (you can list the available metrics by running syslog-ng --metrics-registry). For details on what the metrics mean, see Metrics reference.

classified_events_total

Description: Default metric of the metrics-probe() parser.

disk_queue_capacity_bytes

Description: Maximal size of the disk queue (in bytes), as set in the capacity-bytes() disk-buffer option.

disk_queue_capacity

Description: The size of the overflow queue of the destination, as set in the flow-control-window-size() disk-buffer option.

disk_queue_dir_available_bytes

Description: The size of the space available in the directories where disk-buffer files are stored (including directories storing abandoned disk-buffers), in bytes.

disk_queue_disk_allocated_bytes

Description: The actual size of the disk-buffer files, in bytes.

disk_queue_disk_usage_bytes

Description: Total size of data waiting in each disk-buffer, in bytes.

disk_queue_events

Description: Number of messages waiting in each disk-buffer by destination.

disk_queue_memory_usage_bytes

Description: Amount of memory used for caching disk-buffers, in bytes.

disk_queue_processed_events_total

Description: The number of events processed since startup by each disk-buffer.

event_processing_latency_seconds

Description: Histogram of the latency (time from receiving the message to fully processing it), from the source or destination perspective.

events_allocated_bytes

Description: The total amount of memory used by log messages in AxoSyslog.

filtered_events_total

Description: The total number of messages that matched and didn’t match a filter, for each filter in the configuration file.

input_event_bytes_total

Description: Incoming log messages processed by each source, measured in bytes.

input_events_total

Description: Number of incoming log messages processed by each source.

input_transport_errors_total

Description: Number of various transport errors that prevent AxoSyslog from ingesting messages, for example, TLS handshake errors or syslog framing errors. Labels include the source id, peer_address, and the reason for the error.

syslogng_input_transport_errors_total{address="127.0.0.1:5513",driver="syslog",peer_address="127.0.0.1",reason="invalid-frame-header",transport="tcp"} 1
syslogng_input_transport_errors_total{address="127.0.0.1:5515",driver="syslog",id="s_tls_req#0",peer_address="127.0.0.1",reason="tls-handshake",tls_error="0A0000C7",tls_error_string="SSL routines::peer did not return a certificate",transport="tls"} 1

input_window_available

Description: Available on stats(level(3)). Shows the current size of the flow-control window (how much is still free from log-iw-size()).

input_window_capacity

Description: Available on stats(level(3)). Shows the value of log-iw-size() (the size of the flow-control window).

input_window_full_total

Description: The total number of input window full events, for the entire configuration. These events cause AxoSyslog to throttle the source. Available on stats(level(1)).

internal_events_queue_capacity

Description: The internal queue size of the internal() source.

internal_events_total

Description: The number of messages the internal() source has queued, processed, or dropped.

io_worker_latency_seconds

Description: Shows how overloaded the IO workers of AxoSyslog are.

last_config_file_modification_timestamp_seconds

Description: The date when the configuration file was last modified.

last_config_reload_timestamp_seconds

Description: The date when the AxoSyslog configuration was last reloaded. If it differs from last_successful_config_reload_timestamp_seconds, reloading the configuration has failed.

last_successful_config_reload_timestamp_seconds

Description: The date when the AxoSyslog configuration was last reloaded successfully.

mainloop_io_worker_roundtrip_latency_seconds

Description: Shows how overloaded the main AxoSyslog loop is (how much time it takes to start a new worker). Values close to 0 are good, high values indicate high load or processing bottleneck.

memory_queue_capacity

Description: Shows the capacity (maximum possible size) of each queue. Note that this metric publishes log-fifo-size(), which only limits non-flow-controlled messages. Messages coming from flow-controlled paths aren’t limited by log-fifo-size(), but by the log-iw-size() of their corresponding source. For metrics on log-iw-size(), see input_window_available and input_window_capacity.

memory_queue_events

Description: Number of messages waiting in each memory queue by destination.

memory_queue_memory_usage_bytes

Description: Total bytes of data waiting in each memory queue.

memory_queue_processed_events_total

Description: The number of events processed since startup by each queue.

output_active_worker_partitions

Description: The number of active partitions when worker-partition-autoscaling() is set to yes.

output_batch_size_bytes

Description: Histogram-style metrics for the destination.

output_batch_size_events

Description: Histogram-style metrics for the destination.

output_batch_timedout_total

Description: For destinations that use batching, it shows the number of batches that were sent because of timeout (either batch-timeout() or batch-idle-timeout() expired).

output_event_bytes_total

Description: Log messages sent to each destination, measured in bytes.

output_event_latency_seconds

Description: Histogram of the latency: time from receiving the message to delivering it to the destination.

output_event_retries_total

Description: Shows the number of events when AxoSyslog retried sending a message.

output_event_size_bytes

Description: Histogram-style metrics for the destination.

output_events_total

Description: Number of log messages sent to each destination, showing sent and dropped messages.

output_grpc_requests_total

Description: The total number of gRPC requests.

output_http_requests_total

Description: Available on stats(level(1)) The total number of HTTP requests.

output_request_latency_seconds

Description: Histogram-style metrics for the destination.

output_unreachable

Description: A bool-like metric, which shows whether a destination is reachable or not.

output_workers

Description: The number of workers configured for each destination.

parallelize_failed_events_total

Description: The number of events that parallelize() couldn’t process in parallel. Such messages were sent without parallelization. A high number of such events can signal a configuration issue or a bottleneck.

parallelized_assigned_events_total

Description: The number of events each worker has received when using parallelize(). Can show if the workers receive the load unevenly.

parallelized_processed_events_total

Description: The number of events processed using parallelize().

parsed_events_total

Description: Shows the number of messages processed by each parser.

route_egress_total

Description: The number of messages delivered by each named log path.

route_ingress_total

Description: The number of messages entering each named log path.

scratch_buffers_bytes

Description: The number of bytes allocated to internal string buffers.

scratch_buffers_count

Description: The number of allocated internal string buffers.

socket_connections

Description: Number of active connections for the sources.

socket_max_connections

Description: Maximum permitted number of connections for the sources.

socket_receive_buffer_max_bytes

Description: The maximal size socket receive buffer in bytes, as configured in the so-rcvbuf() option of the destination.

socket_receive_buffer_used_bytes

Description: The number of bytes used from the socket receive buffer.

socket_receive_dropped_packets_total

Description: Number of UDP packets dropped by the OS before processing.

socket_rejected_connections_total

Description: The number of connections rejected because the max-connections() limit of the source was reached, for each source.

stats_level

Description: Shows the current verbosity level() of statistics and metrics.

tagged_events_total

Description: The number of messages marked with a tag, for each tag. (Every message automatically has the tag of its source in .source.<id_of_the_source_statement> format.)

3 - Statistics reference

Statistics are a legacy way to access the status of AxoSyslog. Metrics are newer and in active development. Many metrics aren’t available as legacy statistics.

You can list all active statistics on your AxoSyslog host using the following command (this lists the statistics, without their current values): syslog-ng-ctl query list "*"

Format of statistics

To list the statistics and their values, use the following command: syslog-ng-ctl query get "*"

Example output:

destination.java.d_elastic#0.java_dst(ElasticSearch,elasticsearch-syslog-ng-test,t7cde889529c034aea9ec_micek).stats.dropped=0
destination.java.d_elastic#0.java_dst(ElasticSearch,elasticsearch-syslog-ng-test,t7cde889529c034aea9ec_micek).stats.processed=0
destination.java.d_elastic#0.java_dst(ElasticSearch,elasticsearch-syslog-ng-test,t7cde889529c034aea9ec_micek).stats.queued=0
destination.d_elastic.stats.processed=0

The displayed statistics have the following structure.

  • The type of the object (for example, dst.file, tag, src.facility)

  • The ID of the object used in the syslog-ng.conf configuration file, for example, d_internal or source.src_tcp. The #0 part means that this is the first destination in the destination group.

  • The instance ID (destination) of the object, for example, the filename of a file destination, or the name of the application for a program source or destination.

  • The status of the object. One of the following:

    • a: active. At the time of querying the statistics, the source or the destination was still alive (it continuously received statistical data).

    • d: dynamic. Such objects may not be continuously available, for example, like statistics based on the sender’s hostname. These counters only appear above a certain value of stats(level()) global option:

      • host: source host, from stats(level(2))
      • program: program, from stats(level(3))
      • sender: sender host, from stats(level(3))

      The following example contains 6 different dynamic values: a sender, a host, and four different programs.

      src.sender;;localhost;d;processed;4
      src.sender;;localhost;d;stamp;1509121934
      src.program;;P-18069;d;processed;1
      src.program;;P-18069;d;stamp;1509121933
      src.program;;P-21491;d;processed;1
      src.program;;P-21491;d;stamp;1509121934
      src.program;;P-9774;d;processed;1
      src.program;;P-9774;d;stamp;1509121919
      src.program;;P-14737;d;processed;1
      src.program;;P-14737;d;stamp;1509121931
      src.host;;localhost;d;processed;4
      src.host;;localhost;d;stamp;1509121934
      

      To avoid performance issues or even overloading AxoSyslog, you might want to limit the number of registered dynamic counters in the message statistics. To do this, configure the stats(max-dynamics()) global option.

    • o: This object was once active, but stopped receiving messages. (For example, a dynamic object may disappear and become orphan.)

The connections statistics counter displays the number of connections tracked by AxoSyslog for the selected source driver.

Example configuration and statistics output

The following configuration will display the following syslog-ng-ctl statistics output:

Configuration:

source s_network { 
  tcp( 
    port(8001)  
  ); 
};

Statistics output:

src.tcp;s_network#0;tcp,127.0.0.5;a;processed;1
src.tcp;s_network#0;tcp,127.0.0.1;a;processed;3
src.tcp;s_network;afsocket_sd.(stream,AF_INET(0.0.0.0:8001));a;connections;2

Statistics reference

The type of the statistics:

  • batch_size_avg: When batching is enabled, then this shows the current average batch size of the given source or destination.

  • batch_size_max: When batching is enabled, the value of batch_size_max shows the current largest batch size of the given source or destination.

  • discarded: The number of messages discarded by the given parser. These are messages that the parser could not parsed, and are therefore not processed. For example:

    parser;demo_parser;;a;discarded;20
    
  • dropped: The number of dropped messages. AxoSyslog could not send these messages to the destination and the output buffer got full, so messages were dropped by the destination driver, or AxoSyslog dropped the message for some other reason (for example, a parsing error).

  • eps_last_1h: The EPS value of the past 1 hour.

  • eps_last_24h: The EPS value of the past 24 hours.

  • eps_since_start: The EPS value since the current AxoSyslog start.

  • matched: The number of messages that are accepted by a given filter. Available for filters and similar objects (for example, a conditional rewrite rule). For example, if a filter matches a specific hostname, then the matched counter contains the number of messages that reached the filter from this hosts.

    filter;demo_filter;;a;matched;28
    
  • memory_usage: The memory used by the messages in the different queue types (in bytes). This includes every queue used by the object, including memory buffers (log-fifo) and disk-based buffers (both reliable and non-reliable). For example:

    dst.network;d_net#0;tcp,127.0.0.1:9999;a;memory_usage;0
    
  • msg_size_max: The current largest message size of the given source or destination.

  • msg_size_avg: The current average message size of the given source or destination.

  • not_matched: The number of messages that are filtered out by a given filter. Available for filters and similar objects (for example, a conditional rewrite rule). For example, if a filter matches a specific hostname, then the not_matched counter contains the number of messages that reached the filter from other hosts, and so the filter discarded them.

  • processed: The number of messages that successfully reached their destination driver.

  • queued: The number of messages passed to the message queue of the destination driver, waiting to be sent to the destination.

  • stamp: The UNIX timestamp of the last message sent to the destination.

  • suppressed: The number of suppressed messages (if the suppress() feature is enabled).

  • written: The number of messages successfully delivered to the destination. This value is calculated from other counters: written = processed - queued - dropped. That is, the number of messages AxoSyslog passed to the destination driver (processed) minus the number of messages that are still in the output queue of the destination driver (queued) and the number of messages dropped because of an error (dropped, for example, because AxoSyslog could not deliver the message to the destination and exceeded the number of retries).

This metric is calculated from other metrics. You cannot reset this metric directly: to reset it, you have to reset the metrics it is calculated from.

Availability of statistics

Certain statistics are available only if the stats(level()) global option is set to a higher value.

  • Level 0 collects only statistics about the sources and destinations.
  • Level 1 contains details about the different connections and log files, but has a slight memory overhead.
  • Level 2 contains detailed statistics based on the hostname.
  • Level 3 contains detailed statistics based on various message parameters like facility, severity, or tags.

When receiving messages with non-standard facility values (that is, higher than 23), these messages will be listed as other facility instead of their facility number.

Aggregated statistics

Aggregated statistics are available for different sources and destinations from different levels and upwards:

 

msg_size_avg

msg_size_max

batch_size_avg

batch_size_max

eps_last_1h

eps_last_1h

eps_last_1h

network() source and destination

from level 1

from level 1

counter N/A

counter N/A

from level 1

from level 1

from level 1

file() source and destination

from level 1

from level 1

counter N/A

counter N/A

from level 1

from level 1

from level 1

http() destination

from level 0

 

from level 0

from level 0

from level 0

from level 0

from level 0

from level 0

4 - Log statistics from the internal() source

If the stats(freq()) global option is higher than 0, AxoSyslog periodically sends a log statistics message. This message contains statistics about the received messages, and about any lost messages since the last such message. It includes:

  • a processed entry for every source and destination, listing the number of messages received or sent, and
  • a dropped entry including the IP address of the server for every destination where AxoSyslog has lost messages.
  • The center(received) entry shows the total number of messages received from every configured sources.

The following is a sample log statistics message for a configuration that has a single source (s_local) and a network and a local file destination (d_network and d_local, respectively). All incoming messages are sent to both destinations.

Log statistics;
dropped='tcp(AF_INET(192.168.10.1:514))=6439',
processed='center(received)=234413',
processed='destination(d_tcp)=234413',
processed='destination(d_local)=234413',
processed='source(s_local)=234413'

The statistics include a list of source groups and destinations, as well as the number of processed messages for each. You can control the verbosity of the statistics using the stats(level()) global option. The following is an example output.

src.internal;s_all#0;;a;processed;6445
src.internal;s_all#0;;a;stamp;1268989330
destination;df_auth;;a;processed;404
destination;df_news_dot_notice;;a;processed;0
destination;df_news_dot_err;;a;processed;0
destination;d_ssb;;a;processed;7128
destination;df_uucp;;a;processed;0
source;s_all;;a;processed;7128
destination;df_mail;;a;processed;0
destination;df_user;;a;processed;1
destination;df_daemon;;a;processed;1
destination;df_debug;;a;processed;15
destination;df_messages;;a;processed;54
destination;dp_xconsole;;a;processed;671
dst.tcp;d_network#0;10.50.0.111:514;a;dropped;5080
dst.tcp;d_network#0;10.50.0.111:514;a;processed;7128
dst.tcp;d_network#0;10.50.0.111:514;a;queued;2048
destination;df_syslog;;a;processed;6724
destination;df_facility_dot_warn;;a;processed;0
destination;df_news_dot_crit;;a;processed;0
destination;df_lpr;;a;processed;0
destination;du_all;;a;processed;0
destination;df_facility_dot_info;;a;processed;0
center;;received;a;processed;0
destination;df_kern;;a;processed;70
center;;queued;a;processed;0
destination;df_facility_dot_err;;a;processed;0

The statistics are semicolon separated: every line contains statistics for a particular object (like source, destination, tag).

To reset the statistics to zero, use the following command: syslog-ng-ctl stats --reset