This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Troubleshooting

1: Possible causes of losing log messages

2: Creating core files

3: Collecting debugging information with strace, truss, or tusc

4: Running a failure script

5: Stopping the syslog-ng process

6: Reporting bugs and finding help

7: Recover data from orphaned diskbuffer files

8: Error messages

9: SELinux prevents using the execmem access on a process

This chapter provides tips and guidelines about troubleshooting problems related to syslog-ng.

As a general rule, first try to log the messages to a local file. Once this is working, you know that AxoSyslog is running correctly and receiving messages, and you can proceed to forwarding the messages to the server.
Always check the configuration files for any syntax errors on both the client and the server using the syslog-ng --syntax-only command.
If the AxoSyslog server does not receive the messages, verify that the IP addresses and ports are correct in your sources and destinations. Also, check that the client and the server uses the same protocol (a common error is to send logs on UDP, but configure the server to receive logs on TCP).

If the problem persists, use tcpdump or a similar packet sniffer tool on the client to verify that the messages are sent correctly, and on the server to verify that it receives the messages.
To find message-routing problems, run AxoSyslog with the following command syslog-ng -Fevd. That way AxoSyslog will run in the foreground, and display debug messages about the messages that are processed.
If AxoSyslog is closing the connections for no apparent reason, be sure to check the log messages of syslog-ng. You may also want to run syslog-ng with the --verbose or --debug command-line options for more-detailed log messages. You can enable these messages without restarting syslog-ng using the syslog-ng-ctl verbose --set=on command. For details, see the The syslog-ng.conf manual page.
Build up encrypted connections step-by-step. First create a working, unencrypted (for example, TCP) connection, then add TLS encryption, and finally, client authentication if needed.
If you use the same driver and options in the destination of your AxoSyslog client and the source of your AxoSyslog server, everything should work as expected. Unfortunately, there are some other combinations, that may seem to work, but result in losing parts of the messages. For details on the working combinations, see Things to consider when forwarding messages between AxoSyslog hosts.
If you’re using FilterX, see Troubleshooting FilterX for specific tips.

Support

In case you need help with any of the AxoSyslog projects, or directly with syslog-ng, you have several ways to contact us:

Open a GitHub issue in the AxoSyslog repository.
Chat with us in the syslog-ng or axosyslog channels of the Axoflow Discord server.
Fill the Axoflow contact form.

We also provide consulting and professional services for logging and observability related projects. Contact us if you need our help!

1 - Possible causes of losing log messages

During the course of a message from the sending application to the final destination of the message, there are a number of locations where a message may be lost, even though AxoSyslog does its best to avoid message loss. Usually losing messages can be avoided with careful planning and proper configuration of AxoSyslog and the hosts running syslog-ng. The following list shows the possible locations where messages may be lost, and provides methods to minimize the risk of losing messages:

Between the application and the AxoSyslog client: Make sure to use an appropriate source to receive the logs from the application (for example, from /dev/log). For example, use unix-stream instead of unix-dgram whenever possible.
When AxoSyslog is sending messages: If AxoSyslog cannot send messages to the destination and the output buffer gets full, AxoSyslog will drop messages.

Use flags (flow-control) to avoid this (for details, see Configuring flow-control). For more information about the error caused by the missing flow-control, see Destination queue full in Error messages.

The number of dropped messages is displayed per destination in the log message statistics of AxoSyslog (for details, see Statistics of AxoSyslog).
On the network: When transferring messages using the UDP protocol, messages may be lost without any notice or feedback — such is the nature of the UDP protocol. Always use the TCP protocol to transfer messages over the network whenever possible.

For details on minimizing message loss when using UDP, see the tutorial.
In the socket receive buffer: When transferring messages using the UDP protocol, the UDP datagram (that is, the message) that reaches the receiving host placed in a memory area called the socket receive buffer. If the host receives more messages than it can process, this area overflows, and the kernel drops messages without letting AxoSyslog know about it. Using TCP instead of UDP prevents this issue. If you must use the UDP protocol, increase the size of the receive buffer using the so-rcvbuf() option.
When AxoSyslog is receiving messages:
- The receiving AxoSyslog (for example, the AxoSyslog server or relay) may drop messages if the fifo of the destination file gets full. The number of dropped messages is displayed per destination in the log message statistics of AxoSyslog (for details, see Statistics of AxoSyslog).
When the destination cannot handle large load: When AxoSyslog is sending messages at a high rate into an SQL database, a file, or another destination, it is possible that the destination cannot handle the load, and processes the messages slowly. As a result, the buffers of AxoSyslog fill up, AxoSyslog cannot process the incoming messages, and starts to loose messages. For details, see the previous entry. Use the throttle parameter to avoid this problem.
As a result of an unclean shutdown of the AxoSyslog server: If the host running the AxoSyslog server experiences an unclean shutdown, it takes time until the clients realize that the connection to the AxoSyslog server is down. Messages that are put into the output TCP buffer of the clients during this period are not sent to the server.
When AxoSyslog is writing messages into files: If AxoSyslog receives a signal (SIG) while writing log messages to file, the log message that is processed by the write call can be lost if the flush_lines parameter is higher than 1.

2 - Creating core files

Purpose:

When syslog-ng crashes for some reason, it can create a core file that contains important troubleshooting information. To enable core files, complete the following procedure:

Steps:

Core files are produced only if the maximum core file size ulimit is set to a high value in the init script of syslog-ng. Add the following line to the init script of syslog-ng:
```
    ulimit -c unlimited
```
Verify that syslog-ng has permissions to write the directory it is started from, for example, /opt/syslog-ng/sbin/.
If syslog-ng crashes, it will create a core file in the directory syslog-ng was started from.
To test that syslog-ng can create a core file, you can create a crash manually. For this, determine the PID of syslog-ng (for example, using the ps -All|grep syslog-ng command), then issue the following command: kill -ABRT <syslog-ng pid>

This should create a core file in the current working directory.

3 - Collecting debugging information with strace, truss, or tusc

To properly troubleshoot certain situations, it can be useful to trace which system calls AxoSyslog performs. How this is performed depends on the platform running AxoSyslog. In general, note the following points:

When AxoSyslog is started, a supervisor process might stay in the foreground, while the actual syslog-ng daemon goes to the background. Always trace the background process.
Apart from the system calls, the time between two system calls can be important as well. Make sure that your tracing tool records the time information as well. For details on how to do that, refer to the manual page of your specific tool (for example, strace on Linux, or truss on Solaris and BSD).
Run your tracing tool in verbose mode, and if possible, set it to print long output strings, so the messages are not truncated.
When using strace, also record the output of lsof to see which files are accessed.

The following are examples for tracing system calls of syslog-ng on some platforms. The output is saved into the /tmp/syslog-ng-trace.txt file, sufficed with the PID of the related syslog-ng process. The path of the syslog-ng binary may be different for your installation, as distribution-specific packages may use different paths.

Linux: strace -o /tmp/trace.txt -s256 -ff -ttT /opt/syslog-ng/sbin/syslog-ng -f /opt/syslog-ng/etc/syslog-ng.conf -Fdv
HP-UX: tusc -f -o /tmp/syslog-ng-trace.txt -T /opt/syslog-ng/sbin/syslog-ng
IBM AIX and Solaris: truss -f -o /tmp/syslog-ng-trace.txt -r all -w all -u libc:: /opt/syslog-ng/sbin/syslog-ng -d -d -d

Note To execute these commands on an already running AxoSyslog process, use the -p <pid_of_syslog-ng> parameter.

4 - Running a failure script

Purpose:

You can create a failure script that is executed when AxoSyslog terminates abnormally, that is, when it exits with a non-zero exit code. For example, you can use this script to send an automatic email notification.

Prerequisites:

The failure script must be the following file: /opt/syslog-ng/sbin/syslog-ng-failure, and must be executable.

To create a sample failure script, complete the following steps.

Steps:

Create a file named /opt/syslog-ng/sbin/syslog-ng-failure with the following content:

    #!/usr/bin/env bash
    cat >>/tmp/test.txt <<EOF
    $(date)
    Name............$1
    Chroot dir......$2
    Pid file dir....$3
    Pid file........$4
    Cwd.............$5
    Caps............$6
    Reason..........$7
    Argbuf..........$8
    Restarting......$9

    EOF

Make the file executable: chmod +x /opt/syslog-ng/sbin/syslog-ng-failure
Run the following command in the /opt/syslog-ng/sbin directory: ./syslog-ng --process-mode=safe-background; sleep 0.5; ps aux | grep './syslog-ng' | grep -v grep | awk '{print $2}' | xargs kill -KILL; sleep 0.5; cat /tmp/test.txt

The command starts AxoSyslog in safe-background mode (which is needed to use the failure script) and then kills it. You should see that the relevant information is written into the /tmp/test.txt file, for example:
```
    Thu May 18 12:08:58 UTC 2017
    Name............syslog-ng
    Chroot dir......NULL
    Pid file dir....NULL
    Pid file........NULL
    Cwd.............NULL
    Caps............NULL
    Reason..........signalled
    Argbuf..........9
    Restarting......not-restarting
```

You should also see messages similar to the following in system syslog. The exact message depends on the signal (or the reason why AxoSyslog stopped):

    May 18 13:56:09 myhost supervise/syslog-ng[10820]: Daemon exited gracefully, not restarting; exitcode='0'
    May 18 13:57:01 myhost supervise/syslog-ng[10996]: Daemon exited due to a deadlock/signal/failure, restarting; exitcode='131'
    May 18 13:57:37 myhost supervise/syslog-ng[11480]: Daemon was killed, not restarting; exitcode='9'

The failure script should run on every non-zero exit event.

5 - Stopping the syslog-ng process

To avoid problems, always use the init scripts to stop syslog-ng (/etc/init.d/syslog-ng stop), instead of using the kill command. This is especially true on Solaris and HP-UX systems, here use /etc/init.d/syslog stop.

6 - Reporting bugs and finding help

If you need help, want to open a support ticket, or report a bug, we recommend using the syslog-ng-debun tool to collect information about your environment and AxoSyslog version. For details, see the The syslog-debun manual page. For support contacts, see Getting support.

7 - Recover data from orphaned diskbuffer files

When you change the configuration of a AxoSyslog host that uses disk-based buffering (also called disk queue), AxoSyslog may start new disk buffer files for the destinations that you have changed. In this case, AxoSyslog abandons the old disk queue files. If there were unsent log messages in the disk queue files, these messages remain in the disk queue files, and will not be sent to the destinations.

8 - Error messages

This section describes the most common error messages.

Destination queue full

Error message: Destination queue full, dropping messages; queue_len='10000', log_fifo_size='10000', count='4', persist_name='afsocket_dd_qfile(stream,serverdown:514)'

Description:

This message indicates message loss.

Flow-control must be enabled in the log path. When flow-control is enabled, syslog-ng will stop reading messages from the sources of the log statement if the destinations are not able to process the messages at the required speed.

If flow-control is enabled, syslog-ng will only drop messages if the destination queues/window sizes are improperly sized.

Solution:

Enable flow-control in the log path.

If flow-control is disabled, syslog-ng will drop messages if the destination queues are full. Note that syslog-ng will drop messages even if the server is alive. If the remote server accepts logs at a slower rate than the sender syslog-ng receives them, the sender syslog-ng will fill up the destination queue, then drop the newer messages. Sometimes this error occurs only at a specific time interval, for example, only between7:00AM and8:00AM or between16:00PM and17:00PM when your users log in or log off and that generates a lot of messages within a short interval.

For more information, see Managing incoming and outgoing messages with flow-control.

Alert unknown CA

Error message:	`SSL error while writing stream; tls_error='SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca'`
Description:	This message indicates that the other (remote) side could not verify the certificate sent by `syslog-ng`.
Solution:	Check the logs on the remote site and identify why the receiving `syslog-ng` could not find the CA certificate that signed this certificate.

PEM routines:PEM_read_bio:no start line

Error message: testuser@thor-x1:~/cert_no_start_line/certs$ openssl x509 -in cert.pem -text unable to load certificate 140178126276248:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:701:Expecting: TRUSTED CERTIFICATE

Description:

The error message is displayed when using Transport Layer Security (TLS). The syslog-ng application uses OpenSSL for TLS and this message indicates that the certificate contains characters that OpenSSL cannot process.

The error occurs when the certificate comes from Windows and you want to use it on a Linux-based computer. On Windows, the end of line (EOL) character is different (\r\n) compared to Linux (\n).

To verify this, open the certificate in a text editor, for example, MCEdit. Notice the ^M characters as shown in the image below:

Solution:

On Windows, save the certificate using UTF-8, for example, using Notepad++.

Windows Notepad is not able to save the file in normal UTF-8, even if you select it.
1. In Notepad++, from the menu, selectEncoding.
2. Change the value fromUTF-8-BOMtoUTF-8.
3. Save.
On Linux, run dos2unix cert.pem. This will convert the file to a Linux-compatible style.

Alternatively, replace the EOL characters in the file manually.

9 - SELinux prevents using the execmem access on a process

If you are using a recent enough PCRE library, AxoSyslog will automatically use the JIT of the regexp engine, which will result in a similar error:

   setroubleshoot [21631 ] : SELinux is preventing <syslog-ng path> from using the execmem access on a process. (...)
    
    python [21631 ] : SELinux is preventing <syslog-ng path> from using the execmem access on a process.

To resolve this issue, switch off the PCRE JIT compile function by using the disable-jit flags() option in the given filter or rewrite rule of your configuration.