Alerts reference

Host down (warning)

host-type host with name 'hostname' isn't reporting data, indicating potential downtime.

An edge or router host has stopped reporting data to the AxoConsole. Because the system tolerates short reporting gaps, the alert takes around seven minutes to fire after the host actually goes silent. The usual causes are loss of network connectivity from the host, Axolet being stopped on the host, or a hardware or operating-system failure on the host itself.

Related metrics: cpu, memory

Cert Expires (info)

host-type host with name 'hostname' uses a certificate that expires in less than 45 days. It should have been renewed automatically.

The client certificate that an edge or router host uses to authenticate to the AxoConsole will expire in less than 45 days. Certificates should be renewed automatically; if this alert appears, automatic renewal has failed. Restarting the axolet service on the affected host often triggers another renewal attempt. If the alert doesn't clear within five minutes, contact Axoflow support.

Stats level (info)

'hostname' is configured with a stats level of 0.

An axorouter-syslog (or another AxoSyslog-based) service on a host is configured with a stats level of 0, so most metrics shown on the AxoConsole will be incomplete or missing for it. The recommended stats level is 2. Add options { stats(level(2)); }; to the top of the main configuration file and reload the service. See the AxoSyslog stats option reference.

High CPU (warning)

CPU usage at hostname has been above 90% for the last 15 minutes.

The average CPU usage across all cores on a host has been above 90% for at least 15 minutes. Sustained high CPU usage can prevent the host from keeping up with its workload, and may lead to message drops, packet drops, or processing delays.

Related metrics: cpu, droppedPacketsTotal, logInputEvents

Message drop (warning)

Messages have been dropped on hostname in the last 10 minutes.

An axorouter-syslog service on a host has dropped messages in the last 10 minutes. This happens when a destination can't accept messages fast enough and flow control isn't configured to back-pressure the source. Once the destination's output buffer overflows, further incoming messages are discarded and can't be recovered.

Related metrics: droppedPacketsTotal, logInputEvents, logOutputEvents

UDP packet drop (warning)

UDP packets have been dropped on hostname in the last 10 minutes.

A host has dropped UDP packets in the last 10 minutes. The most common cause is that the kernel's UDP receive buffer is full: packets that arrive while the buffer is saturated are silently discarded by the kernel before the service can read them. An undersized socket buffer on the receiving connector is a frequent contributor (see also the AxorouterLowRmemMax alert).

Related metrics: droppedPacketsTotal, logInputEvents, logOutputEvents

Abandoned disk queue files (info)

There are abandoned disk queues on hostname. See the Axoflow documentation for instructions on how to remove or back up these files.

A disk queue file on a host is no longer attached to any running axorouter-syslog instance. Such files occupy disk space and, if they contain messages, those messages remain stranded until they're manually recovered or wiped. Use dqtool cat <filename> to inspect a file before removing it; see the axorouter-ctl wipe-disk-buffer documentation for the supported procedure.

Related metrics: logDiskQueue, logDiskQueueBytes

Abandoned messages (warning)

There are messages abandoned in disk queues. See the Axoflow documentation for instructions on how to remove or back up these files.

A disk queue file on a host is no longer attached to any running axorouter-syslog instance, and that file still contains messages. Those messages remain stranded until they're manually recovered or the file is wiped. Use dqtool cat <filename> to inspect a file before removing it; see the axorouter-ctl wipe-disk-buffer documentation for the supported procedure.

Related metrics: logDiskQueue, logDiskQueueBytes

Destination unreachable (warning)

Unreachable driver destinations. Check network connectivity and agent logs.

The axorouter-syslog service on a host has been unable to reach a configured destination for at least five minutes. The alert fires per destination driver and address, so a single host with multiple broken destinations produces multiple concurrent alerts. The service buffers messages for the unreachable destination in memory or on disk. AxoRouter drops further messages once buffers fill up. Check the destination's network reachability and the service's logs on the host.

Related metrics: droppedPacketsTotal, logMemoryQueue, logDiskQueue, logMemoryQueueBytes, logDiskQueueBytes, eventDelaySeconds

HTTP 5xx (warning)

High rate of HTTP response-code on hostname to url

HTTP requests sent to a destination URL have been returning 5xx responses in the last minute, and no 2xx responses have been observed for the same URL in that window. The alert fires once per destination URL and response code, so a destination returning several distinct 5xx codes can produce several alerts. A 5xx response typically points to a server-side problem (at the destination, or in a proxy in front of it) rather than to a sender-side configuration mistake.

Related metrics: droppedPacketsTotal, logOutputEvents, logDiskQueueBytes

HTTP 4xx (warning)

Increased rate of HTTP response-code on hostname to url

HTTP requests sent to a destination URL have been returning 4xx responses in the last minute, and no 2xx responses have been observed for the same URL in that window. The alert fires once per destination URL and response code, so a destination returning several distinct 4xx codes can produce several alerts. A 4xx response usually indicates a problem at the sender side rather than a fault at the destination: invalid credentials, missing permissions, a target resource (bucket, stream, topic) that doesn't exist at the destination, or a malformed request.

Related metrics: droppedPacketsTotal, logOutputEvents, logDiskQueueBytes

gRPC error (warning)

Increased rate of gRPC response-code responses on hostname to driver

gRPC requests sent to a destination have been returning non-ok status codes in the last minute. The alert fires once per destination URL, driver, topic, and response code. Common causes include misconfigured credentials, missing permissions, or a target resource (bucket, stream, topic) that doesn't exist at the destination.

Related metrics: droppedPacketsTotal, logOutputEvents, logDiskQueueBytes

Buffer filling up (warning)

Disk buffer path is predicted to fill up within 10 minutes on hostname

The remaining capacity of a disk buffer on a host is trending down fast enough that, at the current rate, it's predicted to fill up within the next 10 minutes. The alert fires per disk buffer path, so a host with multiple buffers can produce multiple alerts. Once a disk buffer fills up entirely, the service stops being able to enqueue further messages for the affected destination and may start dropping them.

Related metrics: logDiskQueueBytes, networkInputBytes

Buffer is full (critical)

Disk buffer path is filled up on hostname

A disk buffer on a host has filled up: the service can't enqueue further messages for the affected destination. Messages may be dropped until the buffer drains enough to accept new writes, which requires either the destination to start accepting traffic again or the buffer to be enlarged.

Related metrics: logDiskQueueBytes, networkInputBytes

Config Sync Error (warning)

An error occurred on host hostname during configuration synchronization for service service-id. Check the axolet logs for details.

Axolet on a host has been unable to write a new configuration to disk for at least five minutes. The previously synchronized configuration remains in place, so the managed services on the host keep processing traffic, but changes made in the AxoConsole won't take effect on this host until synchronization succeeds again. Common causes include a filesystem permissions issue or a missing target directory on the host.

Config Load Error (warning)

Agent configuration load failed on hostname for service service-id. Check the axolet logs for details.

An axorouter-syslog service on a host failed to load a new configuration. The previously loaded configuration remains active, so traffic processing continues, but changes made in the AxoConsole won't take effect on this host until the load succeeds. Common causes include a syntactically invalid generated configuration, or a missing file on the host (see also the ServiceMissingFile alert).

WEC Config Generation Error (warning)

WEC configuration generation failed on 'hostname' for service 'service-id'. Contact Axoflow support.

Configuration generation for the Windows Event Collector service on a host has been failing for at least five minutes. The host continues to use the last successfully generated WEC configuration, but changes made in the AxoConsole don't reach this host until generation succeeds again. This typically indicates an internal issue and should be reported to Axoflow support.

AxoStore Config Generation Error (warning)

AxoStore configuration generation failed on 'hostname' for service 'service-id'. Contact Axoflow support.

Configuration generation for the AxoStore service on a host has been failing for at least five minutes. The host continues to use the last successfully generated AxoStore configuration, but changes made in the AxoConsole don't reach this host until generation succeeds again. This typically indicates an internal issue and should be reported to Axoflow support.

Flow stuck in error state (warning)

Unable to provision paths for Flow flow due to error state. Check AxoConsole for details.

A Flow is in an error state, so the system can't provision its data paths. The Flow's status on the AxoConsole contains the underlying reason. Until the error is resolved, traffic that would have flowed through this Flow isn't processed.

Flow has no traffic (warning)

Flow flow has no traffic. Check AxoConsole for details.

A Flow used to have inbound or outbound traffic but currently has none. The alert fires per Flow and traffic direction. A Flow that has never received traffic doesn't trigger this alert; it only fires when previously flowing traffic has stopped. Common causes are an upstream source becoming silent, a configuration change that disabled or rerouted the source, or a Flow selector that no longer matches the intended sources.

Related metrics: logDiskQueueBytes, networkInputBytes

AxoRouter service version mismatch (warning)

AxoRouter hostname has a service service-id (version service-version) that doesn't match its axolet's version version. It may be due to a misconfigured deployment.

An AxoRouter host has a managed service whose binary version doesn't match the host's Axolet version. Service and Axolet versions normally move together; a mismatch usually points to a misconfigured deployment or to an in-progress upgrade that didn't complete. Contact Axoflow support if the mismatch doesn't clear on its own.

Suboptimal worker count (info)

Destination id on hostname is using more active partitions than workers. Consider increasing the number of workers.

An output destination on a host is using more active worker partitions than it has configured workers, which means it could be processing data with more parallelism than it has been allowed. The alert fires per destination. Increasing the worker count on the destination so it matches or exceeds the number of active partitions usually improves throughput.

Related metrics: logOutputEvents, eventDelaySeconds

Update available (info)

Axoflow software running on hostname has an update available, as it's running Axolet version while version is available.

A host is running a different Axolet version than the one the AxoConsole offers. An update is available and can be applied through the standard Axolet upgrade procedure. The AxoRouter service version normally moves together with the Axolet version, so this alert usually also implies that a new AxoRouter version is available (see also the AxoRouterServiceVersionMismatch alert).

Host service has a missing file (warning)

Host hostname has a missing file which is required to write/reload config.

A managed service on a host can't have its configuration written or reloaded because a file it depends on is missing on the host. The alert fires per missing file, so a host with several missing files produces several alerts. The specific file name is available in the alert details. Place it in the expected location to resolve the issue.

Traffic without flows (warning)

There are no flows that have a router selector matching hostname, but it's receiving traffic. AxoRouter is dropping these events. Create a flow that matches it.

An AxoRouter is receiving input events, but no enabled Flow has a Router selector that matches it, so all received events are dropped. Either enable or create a Flow whose selector matches this AxoRouter, adjust the AxoRouter's labels so an existing Flow matches it, or create a Flow that explicitly drops these events if the traffic is unwanted.

Related metrics: droppedPacketsTotal, logInputBytes, logInputEvents, logOutputBytes, logOutputEvents, logMemoryQueue, logDiskQueue, logMemoryQueueBytes, logDiskQueueBytes, networkInputBytes, networkInputPackets, networkOutputBytes, networkOutputPackets

no `axorouter-syslog` service running (warning)

AxoRouter 'hostname' has no axorouter-syslog running

An AxoRouter host has no running axorouter-syslog service registered with the AxoConsole. The AxoConsole can only configure an AxoRouter that has at least one such managed service present and running. This usually happens when the service fails to start on the host or when its registration is incomplete. It can also indicate that automatic service registration has been turned off.

AxoRouter Config Generation Error (warning)

AxoRouter configuration generation failed on 'hostname' for service 'service-id'. Check the axorouter_confgen controller logs in controller-manager for details.

Configuration generation for the AxoRouter service on a host has been failing for at least five minutes. The host continues to use the last successfully generated configuration, but changes made in the AxoConsole don't reach this host until generation succeeds again. This typically indicates an internal issue; the AxoConsole logs contain the underlying error, otherwise contact Axoflow support.

AxoStore Config Load Error (warning)

AxoStore configuration load failed on 'hostname' for service 'service-id'. Check the AxoStore logs for details.

The AxoStore service on a host failed to load its configuration. The previously loaded configuration remains in effect, so storage continues to operate, but configuration changes won't take effect on this host until the load succeeds. The AxoStore logs on the host contain the underlying reason.

WEC Unreachable Clients (info)

AxoRouter 'hostname' has value unreachable clients on WEC subscription 'subscription'. Reach out to Axoflow support on how to list unreachable clients.

An AxoRouter has one or more Windows Event Collector clients that have stopped sending heartbeats and are now considered unreachable. The alert fires per WEC subscription. Common causes are the WEC client service being stopped on the endpoint, a network problem between the endpoint and the AxoRouter, or a credential or policy change that broke authentication. Contact Axoflow support to obtain the list of unreachable clients.

Axolet frequently restarts (warning)

Axolet 'hostname' has restarted >3 times in the last 15 minutes. Check the startup logs.

The Axolet process on a host has restarted more than three times in the last 15 minutes, which suggests a crash loop or repeated process termination. The host's startup logs typically contain the underlying reason; contact Axoflow support if the cause isn't obvious.

Host service frequently restarts (warning)

'service-id' on 'hostname' has restarted >3 times in the last 15 minutes. Check the startup logs.

A managed service on a host has restarted more than three times in the last 15 minutes, which suggests a crash loop or repeated process termination. The alert fires per service. The host's startup logs and the service's own logs typically contain the underlying reason; contact Axoflow support if the cause isn't obvious.

Rejected syslog connections (warning)

AxoRouter 'hostname' has rejected connections in the last 5 minutes. Check the maximum connections in the connector rule settings for 'id'.

An AxoRouter has rejected incoming syslog source connections in the last five minutes because the configured maximum number of concurrent connections has been reached. The alert fires per syslog connector. If the increased connection count is legitimate, raise the Maximum connections setting on the syslog connector in the AxoConsole.

Related metrics: connections, logInputEvents

Small UDP socket buffer (warning)

'hostname' has an UDP connector with a low socket buffer size configured.

An AxoRouter has at least one UDP connector with a socket receive buffer smaller than 8 MiB, and is actually receiving traffic on it. An undersized receive buffer is a common cause of UDP packet loss under load. Raise the net.core.rmem_max sysctl on the host operating system or increase the socket buffer size in the connector settings; the recommended size is 32 MiB.

Related metrics: droppedPacketsTotal, logInputEvents

Conflicting flow control parameters on a host (info)

Conflicting flow control parameters on hostname. The combination causes unintended backpressure and reduces throughput. This alert triggers when a router forwards traffic between a source and a destination with Batch Timeout > 0, disk buffering turned off, and Log window size / Maximum connections less than Batch Lines * Number of Workers. See the alert description for remediation steps.

An AxoRouter forwards traffic between a source and a destination whose flow-control parameters conflict: the destination has Batch Timeout > 0, disk buffering is disabled, and Log window size / Maximum connections is less than Batch Lines × Number of Workers. This combination can cause unintended backpressure and reduce throughput. To resolve, do one of the following:

  • increase Log window size on the source, or
  • reduce Maximum connections on the source, or
  • reduce Batch Lines on the destination, or
  • reduce Number of Workers on the destination.

Conflicting flow control parameters on a destination. (info)

Conflicting flow control parameters on a destination: destination-name. The combination causes unintended backpressure and reduces throughput. This alert triggers when a router forwards traffic between a source and a destination with Batch Timeout > 0, disk buffering turned off, and Log window size / Maximum connections less than Batch Lines * Number of Workers. Batch Lines or Number of Workers is too high to be continuously fed by messages from source connectors.

A destination receives traffic from a source whose flow-control parameters conflict with the destination's: the destination has Batch Timeout > 0, disk buffering is disabled, and Log window size / Maximum connections is less than Batch Lines × Number of Workers. Batch Lines or Number of Workers is too high to be continuously fed from the source connectors, which can cause unintended backpressure and reduce throughput. To resolve, do one of the following:

  • increase Log window size on the source, or
  • reduce Maximum connections on the source, or
  • reduce Batch Lines on the destination, or
  • reduce Number of Workers on the destination.

Low disk free space (warning)

There is low disk free space on hostname.

A disk used by the AxoStore on a host has less than 10% free space. The alert fires per disk, so a host with multiple low disks produces multiple alerts. If a disk fills up entirely, the AxoStore on this host will be unable to persist new messages.

Store's disk is filling up (warning)

Store's disk is filling up on hostname

A disk used by the AxoStore on a host is filling up fast enough that, at the current rate, it's predicted to be full within the next six hours. The alert fires per disk. Once a disk fills up entirely, the AxoStore on this host will be unable to persist new messages.

Store's error rate increased (warning)

Store's error rate increased on hostname

The rate at which the AxoStore on a host emits errors of a given type has risen above the expected baseline. The alert fires per error type, so a single underlying problem may surface as several concurrent alerts. A persistent high error rate often degrades storage reliability and should be investigated promptly.

AxoStore doesn't persist messages received from AxoRouter (critical)

AxoStore doesn't persist messages from axorouter-syslog.

An AxoRouter is receiving traffic that's configured to be persisted in the AxoStore, but the AxoStore was unable to persist it. The unpersisted messages aren't available in the AxoConsole for tapping or search. Check the AxoConsole for the underlying reason and remediation steps.