Parsing messages with comma-separated and similar values

The AxoSyslog application can separate parts of log messages (that is, the contents of the ${MESSAGE} macro) at delimiter characters or strings to named fields (columns) using the csv (comma-separated-values) parser The parsed fields act as user-defined macros that can be referenced in message templates, file- and tablenames, and so on.

Parsers are similar to filters: they must be defined in the AxoSyslog configuration file and used in the log statement. You can also define the parser inline in the log path.

To create a csv-parser(), you have to define the columns of the message, the separator characters or strings (also called delimiters, for example, semicolon or tabulator), and optionally the characters that are used to escape the delimiter characters (quote-pairs()).

Declaration:

   parser <parser_name> {
        csv-parser(
            columns(column1, column2, ...)
            delimiters(chars("<delimiter_characters>"), strings("<delimiter_strings>"))
        );
    };

Column names work like macros.

Names starting with a dot (for example, .example) are reserved for use by AxoSyslog. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see Hard versus soft macros for details). To avoid such problems, use a prefix when naming the parsed values, for example, prefix(my-parsed-data.)

Starting with AxoSyslog version 4.5, you can omit the columns() option, and extract the values into matches ($1, $2, $3, and so on), which are available as the anonymous list $*. For example:

@version: current

log {
    source { tcp(port(2000) flags(no-parse)); };

    parser { csv-parser(delimiters(',') dialect(escape-backslash)); };
    destination { stdout(template("$ISODATE $*\n")); };
};

Example: Segmenting hostnames separated with a dash

The following example separates hostnames like example-1 and example-2 into two parts.

   parser p_hostname_segmentation {
        csv-parser(columns("HOSTNAME.NAME", "HOSTNAME.ID")
        delimiters("-")
        flags(escape-none)
        template("${HOST}"));
    };
    destination d_file {
        file("/var/log/messages-${HOSTNAME.NAME:-examplehost}");
    };
    log {
        source(s_local);
        parser(p_hostname_segmentation);
        destination(d_file);
    };

Example: Parsing Apache log files

The following parser processes the log of Apache web servers and separates them into different fields. Apache log messages can be formatted like:

   "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T %v"

Here is a sample message:

   192.168.1.1 - - [31/Dec/2007:00:17:10 +0100] "GET /cgi-bin/example.cgi HTTP/1.1" 200 2708 "-" "curl/7.15.5 (i4 86-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5" 2 example.mycompany

To parse such logs, the delimiter character is set to a single whitespace (delimiters(" ")). Whitespaces between quotes and brackets are ignored (quote-pairs('""[]')).

   parser p_apache {
        csv-parser(
            columns("APACHE.CLIENT_IP", "APACHE.IDENT_NAME", "APACHE.USER_NAME",
            "APACHE.TIMESTAMP", "APACHE.REQUEST_URL", "APACHE.REQUEST_STATUS",
            "APACHE.CONTENT_LENGTH", "APACHE.REFERER", "APACHE.USER_AGENT",
            "APACHE.PROCESS_TIME", "APACHE.SERVER_NAME")
            flags(escape-double-char,strip-whitespace)
            delimiters(" ")
            quote-pairs('""[]')
        );
    };

The results can be used for example, to separate log messages into different files based on the APACHE.USER_NAME field. If the field is empty, the nouser name is assigned.

   log {
        source(s_local);
        parser(p_apache);
        destination(d_file);
    };
    destination d_file {
        file("/var/log/messages-${APACHE.USER_NAME:-nouser}");
    };

Example: Segmenting a part of a message

Multiple parsers can be used to split a part of an already parsed message into further segments. The following example splits the timestamp of a parsed Apache log message into separate fields.

   parser p_apache_timestamp {
        csv-parser(
            columns("APACHE.TIMESTAMP.DAY", "APACHE.TIMESTAMP.MONTH", "APACHE.TIMESTAMP.YEAR", "APACHE.TIMESTAMP.HOUR", "APACHE.TIMESTAMP.MIN", "APACHE.TIMESTAMP.SEC", "APACHE.TIMESTAMP.ZONE")
            delimiters("/: ")
            flags(escape-none)
            template("${APACHE.TIMESTAMP}")
        );
    };
    log {
        source(s_local);
        parser(p_apache);
        parser(p_apache_timestamp);
        destination(d_file);
    };

Further examples: