Comma-separated values

1: Options of CSV parsers

The parse_csv FilterX function can separate parts of log messages (that is, the contents of the ${MESSAGE} macro) along delimiter characters or strings into lists, or key-value pairs within dictionaries, using the csv (comma-separated-values) parser.

Usage: parse_csv(<input-string>, columns=json_array, delimiter=string, string_delimiters=json_array, dialect=string, strip_whitespace=boolean, greedy=boolean)

Only the input parameter is mandatory.

If the columns option is set, parse_csv returns a dictionary with the column names (as keys) and the parsed values. If the columns option isn’t set, parse_csv returns a list.

The following example separates hostnames like example-1 and example-2 into two parts.

block filterx p_hostname_segmentation() {
    cols = json_array(["NAME","ID"]);
    HOSTNAME = parse_csv(${HOST}, delimiter="-", columns=cols);
    # HOSTNAME is a json object containing parts of the hostname
    # For example, for example-1 it contains:
    # {"NAME":"example","ID":"1"}

    # Set the important elements as name-value pairs so they can be referenced in the destination template
    ${HOSTNAME_NAME} = HOSTNAME.NAME;
    ${HOSTNAME_ID} = HOSTNAME.ID;
};
destination d_file {
    file("/var/log/${HOSTNAME_NAME:-examplehost}/${HOSTNAME_ID}"/messages.log);
};
log {
    source(s_local);
    filterx(p_hostname_segmentation());
    destination(d_file);
};

Parse Apache log files

The following parser processes the log of Apache web servers and separates them into different fields. Apache log messages can be formatted like:

"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T %v"

Here is a sample message:

192.168.1.1 - - [31/Dec/2007:00:17:10 +0100] "GET /cgi-bin/example.cgi HTTP/1.1" 200 2708 "-" "curl/7.15.5 (i4 86-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5" 2 example.mycompany

To parse such logs, the delimiter character is set to a single whitespace (delimiter=" "). Excess leading and trailing whitespace characters are stripped.

block filterx p_apache() {
    ${APACHE} = json();
    cols = [
    "CLIENT_IP", "IDENT_NAME", "USER_NAME",
    "TIMESTAMP", "REQUEST_URL", "REQUEST_STATUS",
    "CONTENT_LENGTH", "REFERER", "USER_AGENT",
    "PROCESS_TIME", "SERVER_NAME"
    ];
    ${APACHE} = parse_csv(${MESSAGE}, columns=cols, delimiter=(" "), strip_whitespace=true, dialect="escape-double-char");

    # Set the important elements as name-value pairs so they can be referenced in the destination template
    ${APACHE_USER_NAME} = ${APACHE.USER_NAME};
};

The results can be used for example, to separate log messages into different files based on the APACHE.USER_NAME field. in case the field is empty, the nouser string is assigned as default.

log {
    source(s_local);
    filterx(p_apache());
    destination(d_file);
};
destination d_file {
    file("/var/log/messages-${APACHE_USER_NAME:-nouser}");
};

Segment a part of a message

You can use multiple parsers in a layered manner to split parts of an already parsed message into further segments. The following example splits the timestamp of a parsed Apache log message into separate fields. Note that the scoping of FilterX variables is important:

If you add the new parser to the FilterX block used in the previous example, every variable is available.
If you use a separate FilterX block, only global variables and name-value pairs (variables with names starting with the $ character) are accessible from the block.

block filterx p_apache_timestamp() {
    cols = ["TIMESTAMP.DAY", "TIMESTAMP.MONTH", "TIMESTAMP.YEAR", "TIMESTAMP.HOUR", "TIMESTAMP.MIN", "TIMESTAMP.SEC", "TIMESTAMP.ZONE"];
    ${APACHE.TIMESTAMP} = parse_csv(${APACHE.TIMESTAMP}, columns=cols, delimiters=("/: "), dialect="escape-none");
    
    # Set the important elements as name-value pairs so they can be referenced in the destination template
    ${APACHE_TIMESTAMP_DAY} = ${APACHE.TIMESTAMP_DAY};
};
destination d_file {
    file("/var/log/messages-${APACHE_USER_NAME:-nouser}/${APACHE_TIMESTAMP_DAY}");
};
log {
    source(s_local);
    filterx(p_apache());
    filterx(p_apache_timestamp());
    destination(d_file);
};

1 - Options of CSV parsers

The parse_csv FilterX function has the following options.

columns


Synopsis:	`columns=["1st","2nd","3rd"]`
Default value:	N/A

Description: Specifies the names of the columns, and correspondingly the keys in the resulting JSON array.

If the columns option is set, parse_csv returns a dictionary with the column names (as keys) and the parsed values.
If the columns option isn’t set, parse_csv returns a list.

delimiter


Synopsis:	`delimiter="<string-with-delimiter-characters>"`
Default value:	`,`

Description: The delimiter parameter contains the characters that separate the columns in the input string. If you specify multiple characters, every character will be treated as a delimiter. Note that the delimiters aren’t included in the column values. For example:

To separate the text at every hyphen (-) and colon (:) character, use delimiter="-:".
To separate the columns along the tabulator (tab character), specify delimiter="\\t".
To use strings instead of characters as delimiters, see string_delimiters.

Multiple delimiters

If you use more than one delimiter, note the following points:

AxoSyslog will split the message at the nearest possible delimiter. The order of the delimiters in the configuration file does not matter.
You can use both string delimiters and character delimiters in a parser.
The string delimiters may include characters that are also used as character delimiters.
If a string delimiter and a character delimiter both match at the same position of the input, AxoSyslog uses the string delimiter.

dialect


Synopsis:	`dialect="<dialect-name>"`
Default value:	`escape-none`

Description: Specifies how to handle escaping in the input strings.

The following values are available.

escape-backslash: The parsed message uses the backslash (\\) character to escape quote characters.
escape-backslash-with-sequences: The parsed message uses "" as an escape character but also supports C-style escape sequences, like \n or \r. Available in AxoSyslog version 4.0 and later.
escape-double-char: The parsed message repeats the quote character when the quote character is used literally. For example, to escape a comma (,), the message contains two commas (,,).
escape-none: The parsed message does not use any escaping for using the quote character literally.

greedy


Synopsis:	`greedy=true`
Default value:	`false`

If the greedy option is enabled, AxoSyslog adds the remaining part of the message to the last column, ignoring any delimiters that may appear in this part of the message. You can use this option to process messages where the number of columns varies from message to message.

For example, you receive the following comma-separated message: example 1, example2, example3, and you segment it with the following parser:

my-parsed-values = parse_csv(${MESSAGE}, columns=["COLUMN1", "COLUMN2", "COLUMN3"], delimiter=",");

The COLUMN1, COLUMN2, and COLUMN3 variables will contain the strings example1, example2, and example3, respectively. If the message looks like example 1, example2, example3, some more information, then any text appearing after the third comma (that is, some more information) is not parsed, and thus possibly lost if you use only the parsed columns to reconstruct the message (for example, if you send the columns to different columns of a database table).

Using the greedy=true flag will assign the remainder of the message to the last column, so that the COLUMN1, COLUMN2, and COLUMN3 variables will contain the strings example1, example2, and example3, some more information.

my-parsed-values = parse_csv(${MESSAGE}, columns=["COLUMN1", "COLUMN2", "COLUMN3"], delimiters=[","], greedy=true);

strip_whitespace


Synopsis:	`strip_whitespace=true`
Default value:	`false`

Description: Remove leading and trailing whitespaces from all columns. The strip_whitespace option is an alias for strip_whitespace.

string_delimiters


Synopsis:	`string_delimiters=json_array(["first-string","2nd-string"])`

Description: In case you have to use a string as a delimiter, list your string delimiters as a JSON array in the string_delimiters=["<delimiter_string1>", "<delimiter_string2>", ...] option.

By default, the parse_csv FilterX function uses the comma as a delimiter. If you want to use only strings as delimiters, you have to disable the default space delimiter, for example: delimiter="", string_delimiters=["<delimiter_string>"])

Otherwise, AxoSyslog will use the string delimiters in addition to the default character delimiter, so for example, string_delimiters=["=="] is actually equivalent to delimiters=",", string_delimiters=["=="], and not delimiters="", string_delimiters=["=="]

Multiple delimiters

If you use more than one delimiter, note the following points:

AxoSyslog will split the message at the nearest possible delimiter. The order of the delimiters in the configuration file does not matter.
You can use both string delimiters and character delimiters in a parser.
The string delimiters may include characters that are also used as character delimiters.
If a string delimiter and a character delimiter both match at the same position of the input, AxoSyslog uses the string delimiter.