FilterX function reference

FilterX is an experimental feature currently under development. Feedback is most welcome on Discord and GitHub.

Available in AxoSyslog 4.8.1 and later.

This page describes the functions you can use in FilterX blocks.

Functions have arguments that can be either mandatory or optional.

  • Mandatory options are always positional, so you need to pass them in the correct order. You cannot set them in the arg=value format.
  • Optional arguments are always named, like arg=value. You can pass optional arguments in any order.

cache_json_file

Load the contents of an external JSON file in an efficient manner. You can use this to lookup contextual information. (Basically, this is a FilterX-specific implementation of the add-contextual-data() functionality.)

Usage: cache_json_file("/path/to/file.json")

For example, if your context-info-db.json file contains the following:

{
  "nginx": "web",
  "httpd": "web",
  "apache": "web",
  "mysql": "db",
  "postgresql": "db"
}

Then the following FilterX expression selects only “web” traffic:

filterx {
  declare known_apps = cache_json_file("/context-info-db.json");
  ${app} = known_apps[${PROGRAM}] ?? "unknown";
  ${app} == "web";  # drop everything that's not a web server log
}

datetime

Cast a value into a datetime variable.

Usage: datetime(<string or expression to cast as datetime>)

For example:

date = datetime("1701350398.123000+01:00");

Usually, you use the strptime FilterX function to create datetime values. Alternatively, you can cast an integer, double, string, or isodate variable into datetime with the datetime() FilterX function. Note that:

  • When casting from an integer, the integer is the number of microseconds elapsed since the UNIX epoch (January 1, 1970 12:00:00 AM).
  • When casting from a double, the double is the number of seconds elapsed since the UNIX epoch (January 1, 1970 12:00:00 AM). (The part before the floating points is the seconds, the part after the floating point is the microseconds.)
  • When casting from a string, the string (for example, 1701350398.123000+01:00) is interpreted as: <the number of seconds elapsed since the UNIX epoch>.<microseconds>+<timezone relative to UTC (GMT +00:00)>

flatten

Flattens the nested elements of an object using the specified separator, similarly to the format-flat-json() template function. For example, you can use it to flatten nested JSON objects in the output if the receiving application cannot handle nested JSON objects.

Usage: flatten(dict, separator=".")

You can use multi-character separators, for example, =>. If you omit the separator, the default dot (.) separator is used.

sample-dict = json({"a": {"b": {"c": "1"}}});
${MESSAGE} = flatten(sample-dict);

The value of ${MESSAGE} will be: {"a.b.c": "1"}

format_csv

Formats a dictionary or a list into a comma-separated string.

Usage: format_csv(<input-list-or-dict>, columns=<json-list>, delimiter=<delimiter-character>, default_value=<string>)

Only the input is mandatory, other arguments are optional. Note that the delimiter must be a single character.

By default, the delimiter is the comma (delimiter=","), the columns and default_value are empty.

If the columns option is set, AxoSyslog checks that the number of entries in the input data matches the number of columns. If there are fewer entries, it adds the default_value to the missing entries.

format_kv

Formats a dictionary into a string containing key=value pairs.

Usage: format_kv(kvs_dict, value_separator="<separator-character>", pair_separator="<separator-string>")

By default, format_kv uses = to separate values, and , (comma and space) to separate the pairs:

filterx {
    ${MESSAGE} = format_kv(<input-dictionary>);
};

The value_separator option must be a single character, the pair_separator can be a string. For example, to use the colon (:) as the value separator and the semicolon (;) as the pair separator, use:

format_kv(<input-dictionary>, value_separator=":", pair_separator=";")

format_json

Formats any value into a raw JSON string.

Usage: format_json($data)

isodate

Parses a string as a date in ISODATE format: %Y-%m-%dT%H:%M:%S%z

isset

Returns true if the argument exists and its value is not empty or null.

Usage: isset(<name of a variable, macro, or name-value pair>)

istype

Returns true if the object (first argument) has the specified type (second argument). The type must be a quoted string. (See List of type names.)

Usage: istype(object, "type_str")

For example:

istype({"key": "value"}, "json_object"); # True
istype(${PID}, "string");
istype(my-local-json-object.mylist, "json_array");

If the object doesn’t exist, istype() returns with an error, causing the FilterX statement to become false, and logs an error message to the internal() source of AxoSyslog.

json, json_object

Cast a value into a JSON object. json_object() is an alias for json().

Usage: json(<string or expression to cast as json>)

For example:

js = json({"key": "value"});

json_array

Cast a value into a JSON array.

Usage: json_array(<string or expression to cast as json array>)

For example:

list = json_array(["first_element", "second_element", "third_element"]);

len

Returns the number of items in an object as an integer: the length (number of characters) of a string, the number of elements in a list, or the number of keys in an object.

Usage: len(object)

lower

Converts a string into lowercase characters.

Usage: lower(string)

otel_array

Creates a dictionary represented as an OpenTelemetry array.

otel_kvlist

Creates a dictionary represented as an OpenTelemetry key-value list.

otel_logrecord

Creates an OpenTelemetry log record object.

otel_resource

Creates an OpenTelemetry resource object.

otel_scope

Creates an OpenTelemetry scope object.

parse_csv

Separate a comma-separated or similar string.

Usage: parse_csv(msg_str [columns=json_array, delimiter=string, string_delimiters=json_array, dialect=string, strip_whitespace=boolean, greedy=boolean])

For details, see Comma-separated values.

parse_kv

Separate a string consisting of whitespace or comma-separated key=value pairs (for example, WELF-formatted messages).

Usage: parse_kv(msg, value_separator="=", pair_separator=", ", stray_words_key="stray_words")

The value_separator must be a single-character string. The pair_separator must be a string.

For details, see key=value pairs.

Searches a string and returns the matches of a regular expression as a list or a dictionary. If there are no matches, the list or dictionary is empty.

Usage: regexp_search("<string-to-search>", <regular-expression>)

For example:

# ${MESSAGE} = "ERROR: Sample error message string"
my-variable = regexp_search(${MESSAGE}, "ERROR");

You can also use unnamed match groups (()) and named match groups ((?<first>ERROR)(?<second>message)).

Note the following points:

  • Regular expressions are case sensitive by default. For case insensitive matches, add (?i) to the beginning of your pattern.
  • You can use regexp constants (slash-enclosed regexps) within FilterX blocks to simplify escaping special characters, for example, /^beginning and end$/.
  • FilterX regular expressions are interpreted in “leave the backslash alone mode”, meaning that a backslash in a string before something that doesn’t need to be escaped and will be interpreted as a literal backslash character. For example, string\more-string is equivalent to string\\more-string.

Unnamed match groups

${MY-LIST} = json(); # Creates an empty JSON object
${MY-LIST}.unnamed = regexp_search("first-word second-part third", /(first-word)(second-part)(third)/);

${MY-LIST}.unnamed is a list containing: ["first-word second-part third", "first-word", "second-part", "third"],

Named match groups

${MY-LIST} = json(); # Creates an empty JSON object
${MY-LIST}.named = regexp_search("first-word second-part third", /(?<one>first-word)(?<two>second-part)(?<three>third)/);

${MY-LIST}.named is a dictionary with the names of the match groups as keys, and the corresponding matches as values: {"0": "first-word second-part third", "one": "first-word", "two": "second-part", "three": "third"},

Mixed match groups

If you use mixed (some named, some unnamed) groups, the output is a dictionary, where AxoSyslog automatically assigns a key to the unnamed groups. For example:

${MY-LIST} = json(); # Creates an empty JSON object
${MY-LIST}.mixed = regexp_search("first-word second-part third", /(?<one>first-word)(second-part)(?<three>third)/);

${MY-LIST}.mixed is: {"0": "first-word second-part third", "first": "first-word", "2": "second-part", "three": "third"}

regexp_subst

Rewrites a string using regular expressions. This function implements the subst rewrite rule functionality.

Usage: regexp_subst(<input-string>, <pattern-to-find>, <replacement>, flags

The following example replaces the first IP in the text of the message with the IP-Address string.

regexp_subst(${MESSAGE}, "IP", "IP-Address");

To replace every occurrence, use the global=true flag:

regexp_subst(${MESSAGE}, "IP", "IP-Address", global=true);

Note the following points:

  • Regular expressions are case sensitive by default. For case insensitive matches, add (?i) to the beginning of your pattern.
  • You can use regexp constants (slash-enclosed regexps) within FilterX blocks to simplify escaping special characters, for example, /^beginning and end$/.
  • FilterX regular expressions are interpreted in “leave the backslash alone mode”, meaning that a backslash in a string before something that doesn’t need to be escaped and will be interpreted as a literal backslash character. For example, string\more-string is equivalent to string\\more-string.

For a case sensitive search, use the ignorecase=true option.

Options

You can use the following flags with the regexp_subst function:

  • global=true:

    Replace every occurrence of the search string.

  • ignorecase=true:

    Do case insensitive search.

  • jit=true:

    Enable just-in-time compilation function for PCRE regular expressions.

  • newline=true:

    When configured, it changes the newline definition used in PCRE regular expressions to accept either of the following:

    • a single carriage-return
    • linefeed
    • the sequence carriage-return and linefeed (\\r, \\n and \\r\\n, respectively)

    This newline definition is used when the circumflex and dollar patterns (^ and $) are matched against an input. By default, PCRE interprets the linefeed character as indicating the end of a line. It does not affect the \\r, \\n or \\R characters used in patterns.

  • utf8=true:

    Use Unicode support for UTF-8 matches: UTF-8 character sequences are handled as single characters.

string

Cast a value into a string. Note currently AxoSyslog evaluates strings and executes template functions and template expressions. In the future, template evaluation will be moved to a separate FilterX function.

Usage: string(<string or expression to cast>)

For example:

myvariable = string(${LEVEL_NUM});

Sometimes you have to explicitly cast values to strings, for example, when you want to concatenate them into a message using the + operator.

strptime

Creates a datetime object from a string, similarly to the date-parser(). The first argument is the string containing the date. The second argument is a format string that specifies how to parse the date string. Optionally, you can specify additional format strings that are applied in order if the previous one doesn’t match the date string.

Usage: strptime(time_str, format_str_1, ..., format_str_N)

For example:

${MESSAGE} = strptime("2024-04-10T08:09:10Z", "%Y-%m-%dT%H:%M:%S%z");

You can use the following elements in the format string:

%%      PERCENT
%a      day of the week, abbreviated
%A      day of the week
%b      month abbr
%B      month
%c      MM/DD/YY HH:MM:SS
%C      ctime format: Sat Nov 19 21:05:57 1994
%d      numeric day of the month, with leading zeros (eg 01..31)
%e      like %d, but a leading zero is replaced by a space (eg  1..31)
%f      microseconds, leading 0's, extra digits are silently discarded
%D      MM/DD/YY
%G      GPS week number (weeks since January 6, 1980)
%h      month, abbreviated
%H      hour, 24 hour clock, leading 0's)
%I      hour, 12 hour clock, leading 0's)
%j      day of the year
%k      hour
%l      hour, 12 hour clock
%L      month number, starting with 1
%m      month number, starting with 01
%M      minute, leading 0's
%n      NEWLINE
%o      ornate day of month -- "1st", "2nd", "25th", etc.
%p      AM or PM
%P      am or pm (Yes %p and %P are backwards :)
%q      Quarter number, starting with 1
%r      time format: 09:05:57 PM
%R      time format: 21:05
%s      seconds since the Epoch, UCT
%S      seconds, leading 0's
%t      TAB
%T      time format: 21:05:57
%U      week number, Sunday as first day of week
%w      day of the week, numerically, Sunday == 0
%W      week number, Monday as first day of week
%x      date format: 11/19/94
%X      time format: 21:05:57
%y      year (2 digits)
%Y      year (4 digits)
%Z      timezone in ascii format (for example, PST), or in format -/+0000
%z      timezone in ascii format (for example, PST), or in format -/+0000  (Required element)

For example, for the date 01/Jan/2016:13:05:05 PST use the following format string: "%d/%b/%Y:%H:%M:%S %Z"

The isodate FilterX function is a specialized version of strptime that accepts only a fixed format.

unset

Deletes a variable, a name-value pair, or a key in a complex object (like JSON), for example: unset(${<name-value-pair-to-unset>});

You can also list multiple values to delete: unset(${<first-name-value-pair-to-unset>}, ${<second-name-value-pair-to-unset>});

See also Delete values.

unset_empties

Deletes (unsets) the empty fields of an object, for example, a JSON object or list. Use the recursive=true parameter to delete empty values of inner dicts’ and lists’ values.

Usage: unset_empties(object, recursive=true)

upper

Converts a string into uppercase characters.

Usage: upper(string)

vars

Returns the variables (including pipeline variables and name-value pairs) defined in the FilterX block as a JSON object.

For example:

filterx {
  ${logmsg_variable} = "foo";
  local_variable = "bar";
  declare pipeline_level_variable = "baz";
  ${MESSAGE} = vars();
};

The value of ${MESSAGE} will be: {"logmsg_variable":"foo","pipeline_level_variable":"baz"}

Last modified September 26, 2024: Rename filterx to FilterX unless it's code (2ace915)