This is the multi-page printable view of this section. Click here to print.
Parsing data in FilterX
- 1: CEF
- 2: Comma-separated values
- 3: key=value pairs
- 4: LEEF
- 5: Windows Event Log
- 6: XML
1 - CEF
FilterX is an experimental feature currently under development. Feedback is most welcome on Discord and GitHub.
Available in AxoSyslog 4.8.1 and later.
Available in AxoSyslog 4.9 and later.
The parse_cef
FilterX function parses messages formatted in the Common Event Format (CEF).
Declaration
Usage: parse_cef(<input-string>, value_separator="=", pair_separator="|")
The first argument is the input message. Optionally, you can set the pair_separator
and value_separator
arguments to override their default values.
The value_separator
must be a single-character string. The pair_separator
can be a regular string.
Example
The following is a CEF-formatted message including mandatory and custom (extension) fields:
CEF:0|KasperskyLab|SecurityCenter|13.2.0.1511|KLPRCI_TaskState|Completed successfully|1|foo=foo bar=bar baz=test
The following FilterX expression parses it and converts it into JSON format:
filterx {
${PARSED_MESSAGE} = json(parse_cef(${MESSAGE}));
};
The content of the JSON object for this message will be:
{
"version":"0",
"device_vendor":"KasperskyLab",
"device_product":"SecurityCenter",
"device_version":"13.2.0.1511",
"device_event_class_id":"KLPRCI_TaskState",
"name":"Completed successfully",
"agent_severity":"1",
"extensions": {
"foo":"foo=bar",
"bar":"bar=baz",
"baz":"test"
}
}
1.1 - Options of CEF parsers
The parse_cef
FilterX function has the following options.
pair_separator
Specifies the character or string that separates the key-value pairs in the extensions. Default value:
(space).
value_separator
Specifies the character that separates the keys from the values in the extensions. Default value: =
.
2 - Comma-separated values
FilterX is an experimental feature currently under development. Feedback is most welcome on Discord and GitHub.
Available in AxoSyslog 4.8.1 and later.
The parse_csv
FilterX function can separate parts of log messages (that is, the contents of the ${MESSAGE}
macro) along delimiter characters or strings into lists, or key-value pairs within dictionaries, using the csv (comma-separated-values) parser.
Usage: parse_csv(<input-string>, columns=json_array, delimiter=string, string_delimiters=json_array, dialect=string, strip_whitespace=boolean, greedy=boolean)
Only the input parameter is mandatory.
If the columns
option is set, parse_csv
returns a dictionary with the column names (as keys) and the parsed values. If the columns
option isn’t set, parse_csv
returns a list.
The following example separates hostnames like example-1
and example-2
into two parts.
block filterx p_hostname_segmentation() {
cols = json_array(["NAME","ID"]);
HOSTNAME = parse_csv(${HOST}, delimiter="-", columns=cols);
# HOSTNAME is a json object containing parts of the hostname
# For example, for example-1 it contains:
# {"NAME":"example","ID":"1"}
# Set the important elements as name-value pairs so they can be referenced in the destination template
${HOSTNAME_NAME} = HOSTNAME.NAME;
${HOSTNAME_ID} = HOSTNAME.ID;
};
destination d_file {
file("/var/log/${HOSTNAME_NAME:-examplehost}/${HOSTNAME_ID}"/messages.log);
};
log {
source(s_local);
filterx(p_hostname_segmentation());
destination(d_file);
};
Parse Apache log files
The following parser processes the log of Apache web servers and separates them into different fields. Apache log messages can be formatted like:
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T %v"
Here is a sample message:
192.168.1.1 - - [31/Dec/2007:00:17:10 +0100] "GET /cgi-bin/example.cgi HTTP/1.1" 200 2708 "-" "curl/7.15.5 (i4 86-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5" 2 example.mycompany
To parse such logs, the delimiter character is set to a single whitespace (delimiter=" "
). Excess leading and trailing whitespace characters are stripped.
block filterx p_apache() {
${APACHE} = json();
cols = [
"CLIENT_IP", "IDENT_NAME", "USER_NAME",
"TIMESTAMP", "REQUEST_URL", "REQUEST_STATUS",
"CONTENT_LENGTH", "REFERER", "USER_AGENT",
"PROCESS_TIME", "SERVER_NAME"
];
${APACHE} = parse_csv(${MESSAGE}, columns=cols, delimiter=(" "), strip_whitespace=true, dialect="escape-double-char");
# Set the important elements as name-value pairs so they can be referenced in the destination template
${APACHE_USER_NAME} = ${APACHE.USER_NAME};
};
The results can be used for example, to separate log messages into different files based on the APACHE.USER_NAME field. in case the field is empty, the nouser
string is assigned as default.
log {
source(s_local);
filterx(p_apache());
destination(d_file);
};
destination d_file {
file("/var/log/messages-${APACHE_USER_NAME:-nouser}");
};
Segment a part of a message
You can use multiple parsers in a layered manner to split parts of an already parsed message into further segments. The following example splits the timestamp of a parsed Apache log message into separate fields. Note that the scoping of FilterX variables is important:
- If you add the new parser to the FilterX block used in the previous example, every variable is available.
- If you use a separate FilterX block, only global variables and name-value pairs (variables with names starting with the
$
character) are accessible from the block.
block filterx p_apache_timestamp() {
cols = ["TIMESTAMP.DAY", "TIMESTAMP.MONTH", "TIMESTAMP.YEAR", "TIMESTAMP.HOUR", "TIMESTAMP.MIN", "TIMESTAMP.SEC", "TIMESTAMP.ZONE"];
${APACHE.TIMESTAMP} = parse_csv(${APACHE.TIMESTAMP}, columns=cols, delimiters=("/: "), dialect="escape-none");
# Set the important elements as name-value pairs so they can be referenced in the destination template
${APACHE_TIMESTAMP_DAY} = ${APACHE.TIMESTAMP_DAY};
};
destination d_file {
file("/var/log/messages-${APACHE_USER_NAME:-nouser}/${APACHE_TIMESTAMP_DAY}");
};
log {
source(s_local);
filterx(p_apache());
filterx(p_apache_timestamp());
destination(d_file);
};
2.1 - Options of CSV parsers
The parse_csv
FilterX function has the following options.
columns
Synopsis: | columns=["1st","2nd","3rd"] |
Default value: | N/A |
Description: Specifies the names of the columns, and correspondingly the keys in the resulting JSON array.
- If the
columns
option is set,parse_csv
returns a dictionary with the column names (as keys) and the parsed values. - If the
columns
option isn’t set,parse_csv
returns a list.
delimiter
Synopsis: | delimiter="<string-with-delimiter-characters>" |
Default value: | , |
Description: The delimiter parameter contains the characters that separate the columns in the input string. If you specify multiple characters, every character will be treated as a delimiter. Note that the delimiters aren’t included in the column values. For example:
- To separate the text at every hyphen (-) and colon (:) character, use
delimiter="-:"
. - To separate the columns along the tabulator (tab character), specify
delimiter="\\t"
. - To use strings instead of characters as delimiters, see
string_delimiters
.
Multiple delimiters
If you use more than one delimiter, note the following points:
- AxoSyslog will split the message at the nearest possible delimiter. The order of the delimiters in the configuration file does not matter.
- You can use both string delimiters and character delimiters in a parser.
- The string delimiters may include characters that are also used as character delimiters.
- If a string delimiter and a character delimiter both match at the same position of the input, AxoSyslog uses the string delimiter.
dialect
Synopsis: | dialect="<dialect-name>" |
Default value: | escape-none |
Description: Specifies how to handle escaping in the input strings.
The following values are available.
escape-backslash
: The parsed message uses the backslash (\\
) character to escape quote characters.escape-backslash-with-sequences
: The parsed message uses""
as an escape character but also supports C-style escape sequences, like\n
or\r
. Available in AxoSyslog version 4.0 and later.escape-double-char
: The parsed message repeats the quote character when the quote character is used literally. For example, to escape a comma (,
), the message contains two commas (,,
).escape-none
: The parsed message does not use any escaping for using the quote character literally.
greedy
Synopsis: | greedy=true |
Default value: | false |
If the greedy
option is enabled, AxoSyslog adds the remaining part of the message to the last column, ignoring any delimiters that may appear in this part of the message. You can use this option to process messages where the number of columns varies from message to message.
For example, you receive the following comma-separated message: example 1, example2, example3
, and you segment it with the following parser:
my-parsed-values = parse_csv(${MESSAGE}, columns=["COLUMN1", "COLUMN2", "COLUMN3"], delimiter=",");
The COLUMN1
, COLUMN2
, and COLUMN3
variables will contain the strings example1
, example2
, and example3
, respectively. If the message looks like example 1, example2, example3, some more information
, then any text appearing after the third comma (that is, some more information
) is not parsed, and thus possibly lost if you use only the parsed columns to reconstruct the message (for example, if you send the columns to different columns of a database table).
Using the greedy=true
flag will assign the remainder of the message to the last column, so that the COLUMN1
, COLUMN2
, and COLUMN3
variables will contain the strings example1
, example2
, and example3, some more information
.
my-parsed-values = parse_csv(${MESSAGE}, columns=["COLUMN1", "COLUMN2", "COLUMN3"], delimiters=[","], greedy=true);
strip_whitespace
Synopsis: | strip_whitespace=true |
Default value: | false |
Description: Remove leading and trailing whitespaces from all columns. The strip_whitespace
option is an alias for strip_whitespace
.
string_delimiters
Synopsis: | string_delimiters=json_array(["first-string","2nd-string"]) |
Description: In case you have to use a string as a delimiter, list your string delimiters as a JSON array in the string_delimiters=["<delimiter_string1>", "<delimiter_string2>", ...]
option.
By default, the parse_csv
FilterX function uses the comma as a delimiter. If you want to use only strings as delimiters, you have to disable the default space delimiter, for example: delimiter="", string_delimiters=["<delimiter_string>"])
Otherwise, AxoSyslog will use the string delimiters in addition to the default character delimiter, so for example, string_delimiters=["=="]
is actually equivalent to delimiters=",", string_delimiters=["=="]
, and not delimiters="", string_delimiters=["=="]
Multiple delimiters
If you use more than one delimiter, note the following points:
- AxoSyslog will split the message at the nearest possible delimiter. The order of the delimiters in the configuration file does not matter.
- You can use both string delimiters and character delimiters in a parser.
- The string delimiters may include characters that are also used as character delimiters.
- If a string delimiter and a character delimiter both match at the same position of the input, AxoSyslog uses the string delimiter.
3 - key=value pairs
FilterX is an experimental feature currently under development. Feedback is most welcome on Discord and GitHub.
Available in AxoSyslog 4.8.1 and later.
The parse_kv
FilterX function can split a string consisting of whitespace or comma-separated key=value
pairs (for example, Postfix log messages). You can also specify other value separator characters instead of the equal sign, for example, colon (:
) to parse MySQL log messages. The AxoSyslog application automatically trims any leading or trailing whitespace characters from the keys and values, and also parses values that contain unquoted whitespace.
key1=value1, key2=value2, key1=value3, key3=value4, key1=value5
), then AxoSyslog only stores the last (rightmost) value for the key. Using the previous example, AxoSyslog will store the following pairs: key1=value5, key2=value2, key3=value4
.
By default, the parser discards sections of the input string that are not key=value
pairs, even if they appear between key=value
pairs that can be parsed. To store such sections, see stray_words_key.
The names of the keys can contain only the following characters: numbers (0-9), letters (a-z,A-Z), underscore (_), dot (.), hyphen (-). Other special characters are not permitted.
Declaration
Usage: parse_kv(<input-string>, value_separator="=", pair_separator=",", stray_words_key="stray_words")
The value_separator
must be a single-character string. The pair_separator
can be a regular string.
Example
In the following example, the source is a Postfix log message consisting of comma-separated key=value
pairs:
Jun 20 12:05:12 mail.example.com <info> postfix/qmgr[35789]: EC2AC1947DA: from=<[email protected]>, size=807, nrcpt=1 (queue active)
filterx {
${PARSED_MESSAGE} = parse_kv(${MESSAGE});
};
You can set the value separator character (the character between the key and the value) to parse for example, key:value
pairs, like MySQL logs:
Mar 7 12:39:25 myhost MysqlClient[20824]: SYSTEM_USER:'oscar', MYSQL_USER:'my_oscar', CONNECTION_ID:23, DB_SERVER:'127.0.0.1', DB:'--', QUERY:'USE test;'
filterx {
${PARSED_MESSAGE} = parse_kv(${MESSAGE}, value_separator=":", pair_separator=",");
};
3.1 - Options of key=value parsers
The parse_kv
FilterX function has the following options.
pair_separator
Specifies the character or string that separates the key-value pairs from each other. Default value: ,
.
For example, to parse key1=value1;key2=value2
pairs, use:
${MESSAGE} = parse_kv("key1=value1;key2=value2", pair_separator=";");
stray_words_key
Specifies the key where AxoSyslog stores any stray words that appear before or between the parsed key-value pairs. If multiple stray words appear in a message, then AxoSyslog stores them as a comma-separated list. Default value:N/A
For example, consider the following message:
VSYS=public; Slot=5/1; protocol=17; source-ip=10.116.214.221; source-port=50989; destination-ip=172.16.236.16; destination-port=162;time=2016/02/18 16:00:07; interzone-emtn_s1_vpn-enodeb_om; inbound; policy=370;
This is a list of key-value pairs, where the value separator is =
and the pair separator is ;
. However, before the last key-value pair (policy=370
), there are two stray words: interzone-emtn_s1_vpn-enodeb_om;
and inbound;
. If you want to store or process these, specify a key to store them, for example:
${MESSAGE} = "VSYS=public; Slot=5/1; protocol=17; source-ip=10.116.214.221; source-port=50989; destination-ip=172.16.236.16; destination-port=162;time=2016/02/18 16:00:07; interzone-emtn_s1_vpn-enodeb_om; inbound; policy=370;";
${PARSED_MESSAGE} = parse_kv(${MESSAGE}, stray_words_key="stray_words");
The value of ${PARSED_MESSAGE}.stray_words
for this message will be: ["interzone-emtn_s1_vpn-enodeb_om", "inbound"]
value_separator
Specifies the character that separates the keys from the values. Default value: =
.
For example, to parse key:value
pairs, use:
${MESSAGE} = parse_kv("key1:value1,key2:value2", value_separator=":");
4 - LEEF
FilterX is an experimental feature currently under development. Feedback is most welcome on Discord and GitHub.
Available in AxoSyslog 4.8.1 and later.
Available in AxoSyslog 4.9 and later.
The parse_leef
FilterX function parses messages formatted in the Log Event Extended Format (LEEF).
Both LEEF versions (1.0 and 2.0) are supported.
Declaration
Usage: parse_leef(<input-string>, value_separator="=", pair_separator="|")
The first argument is the input message. Optionally, you can set the pair_separator
and value_separator
arguments to override their default values.
The value_separator
must be a single-character string. The pair_separator
can be a regular string.
Example
The following is a LEEF-formatted message including mandatory and custom (extension) fields:
LEEF:1.0|Microsoft|MSExchange|4.0 SP1|15345|src=192.0.2.0 dst=172.50.123.1 sev=5cat=anomaly srcPort=81 dstPort=21 usrName=john.smith
The following FilterX expression parses it and converts it into JSON format:
filterx {
${PARSED_MESSAGE} = json(parse_leef(${MESSAGE}));
};
The content of the JSON object for this message will be:
{
"version":"1.0",
"vendor":"Microsoft",
"product_name":"MSExchange",
"product_version":"4.0 SP1",
"event_id":"15345",
"extensions": {
"src":"192.0.2.0",
"dst":"172.50.123.1",
"sev":"5cat=anomaly",
"srcPort":"81",
"dstPort":"21",
"usrName":"john.smith"
}
}
4.1 - Options of LEEF parsers
The parse_leef
FilterX function has the following options.
pair_separator
Specifies the character or string that separates the key-value pairs in the extensions. Default value: \t
(tab).
LEEF v2 can specify the separator per message. Omitting this option uses the LEEF v2 provided separator, setting this value overrides it during parsing.
value_separator
Specifies the character that separates the keys from the values in the extensions. Default value: =
.
5 - Windows Event Log
FilterX is an experimental feature currently under development. Feedback is most welcome on Discord and GitHub.
Available in AxoSyslog 4.8.1 and later.
Available in AxoSyslog 4.9 and later.
The parse_windows_eventlog_xml()
FilterX function parses Windows Event Logs XMLs. It’s a specialized version of the parse_xml()
parser.
The parser returns false in the following cases:
- The input isn’t valid XML.
- The root element doesn’t reference the Windows Event Log schema (
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
). Note that the parser doesn’t validate the input data to the schema.
For example, the following converts the input XML into a JSON object:
filterx {
xml = "<xml-input/>"
$MSG = json(parse_windows_eventlog_xml(xml));
};
Given the following input:
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
<System>
<Provider Name='EventCreate'/>
<EventID Qualifiers='0'>999</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>0</Task>
<Opcode>0</Opcode>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime='2024-01-12T09:30:12.1566754Z'/>
<EventRecordID>934</EventRecordID>
<Correlation/>
<Execution ProcessID='0' ThreadID='0'/>
<Channel>Application</Channel>
<Computer>DESKTOP-2MBFIV7</Computer>
<Security UserID='S-1-5-21-3714454296-2738353472-899133108-1001'/>
</System>
<RenderingInfo Culture='en-US'>
<Message>foobar</Message>
<Level>Error</Level>
<Task></Task>
<Opcode>Info</Opcode>
<Channel></Channel>
<Provider></Provider>
<Keywords>
<Keyword>Classic</Keyword>
</Keywords>
</RenderingInfo>
<EventData>
<Data Name='param1'>foo</Data>
<Data Name='param2'>bar</Data>
</EventData>
</Event>
The parser creates the following JSON object:
{
"Event": {
"@xmlns": "http://schemas.microsoft.com/win/2004/08/events/event",
"System": {
"Provider": {"@Name": "EventCreate"},
"EventID": {"@Qualifiers": "0", "#text": "999"},
"Version": "0",
"Level": "2",
"Task": "0",
"Opcode": "0",
"Keywords": "0x80000000000000",
"TimeCreated": {"@SystemTime": "2024-01-12T09:30:12.1566754Z"},
"EventRecordID": "934",
"Correlation": "",
"Execution": {"@ProcessID": "0", "@ThreadID": "0"},
"Channel": "Application",
"Computer": "DESKTOP-2MBFIV7",
"Security": {"@UserID": "S-1-5-21-3714454296-2738353472-899133108-1001"},
},
"RenderingInfo": {
"@Culture": "en-US",
"Message": "foobar",
"Level": "Error",
"Task": "",
"Opcode": "Info",
"Channel": "",
"Provider": "",
"Keywords": {"Keyword": "Classic"},
},
"EventData": {
"Data": {
"param1": "foo",
"param2": "bar",
},
},
},
}
6 - XML
FilterX is an experimental feature currently under development. Feedback is most welcome on Discord and GitHub.
Available in AxoSyslog 4.8.1 and later.
Available in AxoSyslog 4.9 and later.
The parse_xml()
FilterX function parses raw XMLs into dictionaries. This is a new implementation, so the limitations and options of the legacy xml-parser()
do not apply.
There is no standardized way of converting XML into a dict. AxoSyslog creates the most compact dict possible. This means certain nodes will have different types and structures depending on the input XML element. Note the following points:
-
Empty XML elements become empty strings.
XML: <foo></foo> JSON: {"foo": ""}
-
Attributions are stored in
@attr
key-value pairs, similarly to other converters (like python xmltodict).XML: <foo bar="123" baz="bad"/> JSON: {"foo": {"@bar": "123", "@baz": "bad"}}
-
If an XML element has both attributes and a value, we need to store them in a dict, and the value needs a key. We store the text value under the
#text
key.XML: <foo bar="123">baz</foo> JSON: {"foo": {"@bar": "123", "#text": "baz"}}
-
An XML element can have both a value and inner elements. We use the
#text
key here, too.XML: <foo>bar<baz>123</baz></foo> JSON: {"foo": {"#text": "bar", "baz": "123"}}
-
An XML element can have multiple values separated by inner elements. In that case we concatenate the values.
XML: <foo>bar<a></a>baz</foo> JSON: {"foo": {"#text": "barbaz", "a": ""}}
Usage
my_structured_data = parse_xml(raw_xml);