This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Regular expressions

1: Options of regular expressions

1.1: The type() options of regular expressions

1.2: The flags() options of regular expressions

1.2.1: Perl Compatible Regular Expressions (PCRE)

1.2.2: Literal string searches

1.2.3: Glob patterns without regular expression support

2: Optimizing regular expressions

Filters and substitution rewrite rules can use regular expressions. In regular expressions, the characters ()[].\*?+^$|\\ are used as special symbols. Depending on how you want to use these characters and which quotation mark you use, these characters must be used differently, as summarized below.

Strings between single quotes ('string') are treated literally and are not interpreted at all, you do not have to escape special characters. For example, the output of '\\x41' is \\x41 (characters as follows: backslash, x(letter), 4(number), 1(number)). This makes writing and reading regular expressions much more simple: it is recommended to use single quotes when writing regular expressions.
When enclosing strings between double-quotes ("string"), the string is interpreted and you have to escape special characters, that is, to precede them with a backslash (\\) character if they are meant literally. For example, the output of the "\\x41" is simply the letter a. Therefore special characters like \\(backslash) or "(quotation mark) must be escaped (\\\\ and \\"). The following expressions are interpreted: \\a, \\n, \\r, \\t, \\v. For example, the \\$40 expression matches the $40 string. Backslashes have to be escaped as well if they are meant literally, for example, the \\\\d expression matches the \\d string.

Note
If you use single quotes, you do not need to escape the backslash, for example, match("\\\\.") is equivalent to match('\\.').
Enclosing alphanumeric strings between double-quotes ("string") is not necessary, you can just omit the double-quotes. for example, when writing filters, match("sometext") and match(sometext) will both match for the sometext string.

Note
Only strings containing alphanumerical characters can be used without quotes or double quotes. If the string contains whitespace or any special characters (()[].\*?+^$|\\ or ;:#), you must use quotes or double quotes.
When using the ;:# characters, you must use quotes or double quotes, but escaping them is not required.

By default, all regular expressions are case sensitive. To disable the case sensitivity of the expression, add the flags(ignore-case) option to the regular expression.

   filter demo_regexp_insensitive {
        host("system" flags(ignore-case));
    };

Note

Adding the flags(ignore-case) option to glob patterns does not disable case sensitivity.

The regular expressions can use up to 255 regexp matches (${1} ... ${255}), but only from the last filter and only if the flags("store-matches") flag was set for the filter. For case-insensitive searches, use the flags("ignore-case") option.

1 - Options of regular expressions

This chapter lists regular expressions supported by AxoSyslog and their available supported type() and flags() options.

By default, AxoSyslog uses PCRE-style regular expressions. To use other expression types, add the type() option after the regular expression.

The AxoSyslog application supports the following regular expression type() options:

1.1 - The type() options of regular expressions

By default, AxoSyslog uses PCRE-style regular expressions, which are supported on every platform starting with AxoSyslog version 3.1. To use other expression types, add the type() option after the regular expression.

The AxoSyslog application supports the following type() options:

Perl Compatible Regular Expressions (pcre)

Description: Uses Perl Compatible Regular Expressions (PCRE). If the type() parameter is not specified, AxoSyslog uses PCRE regular expressions by default.

For more information about the flags() options of PCRE regular expressions, see The flags() options of regular expressions.

Literal string searches (string)

Description: Matches the strings literally, without regular expression support. By default, only identical strings are matched. For partial matches, use the flags("prefix") or the flags("substring") flags.

For more information about the flags() options of literal string searches, see The flags() options of regular expressions.

Glob patterns without regular expression support (glob)

Description: Matches the strings against a pattern containing \* and ? wildcards, without regular expression and character range support. The advantage of glob patterns to regular expressions is that globs can be processed much faster.

\*: matches an arbitrary string, including an empty string
?: matches an arbitrary character
The wildcards can match the / character.
You cannot use the \* and ? literally in the pattern.

1.2 - The flags() options of regular expressions

Similarly to the type() options, the flags() options are also optional within regular expressions.

The following list describes each type() option’s flags() options.

1.2.1 - Perl Compatible Regular Expressions (PCRE)

Starting with AxoSyslog version 3.1, PCRE expressions are supported on every platform. If the type() parameter is not specified, AxoSyslog uses PCRE regular expressions by default.

The following example shows the structure of PCRE-style regular expressions in use.

Example: Using PCRE regular expressions

   rewrite r_rewrite_subst {
        subst("a*", "?", value("MESSAGE") flags("utf8" "global"));  
    };

PCRE-style regular expressions have the following flags() options:

disable-jit

Switches off the just-in-time compilation function for PCRE regular expressions.

dupnames

Allows using duplicate names for named subpatterns.

Configuration example:

   filter { match("(?<DN>foo)|(?<DN>bar)" value(MSG) flags(store-matches, dupnames)); };
    ...
    destination { file(/dev/stdout template("$DN\n")); };

global

Usable only in rewrite rules, flags("global") matches for every occurrence of the expression, not only the first one.

ignore-case

Disables case-sensitivity.

newline

When configured, it changes the newline definition used in PCRE regular expressions to accept either of the following:

a single carriage-return
linefeed
the sequence carriage-return and linefeed (\\r, \\n and \\r\\n, respectively)

This newline definition is used when the circumflex and dollar patterns (^ and $) are matched against an input. By default, PCRE interprets the linefeed character as indicating the end of a line. It does not affect the \\r, \\n or \\R characters used in patterns.

store-matches

Stores the matches of the regular expression into the $0, ... $255 variables. The $0 stores the entire match, $1 is the first group of the match (parentheses), and so on. Named matches (also called named subpatterns), for example, (?<name>...), are stored as well. Matches from the last filter expression can be referenced in regular expressions.

Note

To convert match variables into a AxoSyslog list, use the $\* macro, which can be further manipulated using List manipulation, or turned into a list in type-aware destinations.

unicode

Uses Unicode support for UTF-8 matches: UTF-8 character sequences are handled as single characters.

utf8

An alias for the unicode flag.

1.2.2 - Literal string searches

Literal string searches have the following flags() options:

global

Usable only in rewrite rules, flags("global") matches for every occurrence of the expression, not only the first one.

ignore-case

Disables case-sensitivity.

prefix

During the matching process, patterns (also called search expressions) are matched against the input string starting from the beginning of the input string, and the input string is matched only for the maximum character length of the pattern. The initial characters of the pattern and the input string must be identical in the exact same order, and the pattern’s length is definitive for the matching process (that is, if the pattern is longer than the input string, the match will fail).

Example: matching / non-matching patterns for the input string ’exam'

For the input string 'exam',

the following patterns will match:
- 'ex' (the pattern contains the initial characters of the input string in the exact same order)
- 'exam' (the pattern is an exact match for the input string)
the following patterns will not match:
- 'example' (the pattern is longer than the input string)
- 'hexameter' (the pattern’s initial characters do not match the input string’s characters in the exact same order, and the pattern is longer than the input string)

store-matches

Note

To convert match variables into a AxoSyslog list, use the $\* macro, which can be further manipulated using List manipulation, or turned into a list in type-aware destinations.

substring

The given literal string will match when the pattern is found within the input. Unlike flags("prefix"), the pattern does not have to be identical with the given literal string.

1.2.3 - Glob patterns without regular expression support

There are no supported flags() options for glob patterns without regular expression support.

2 - Optimizing regular expressions

The host(), match(), and program() filter functions and some other objects accept regular expressions as parameters. But evaluating general regular expressions puts a high load on the CPU, which can cause problems when the message traffic is very high. Often the regular expression can be replaced with simple filter functions and logical operators. Using simple filters and logical operators, the same effect can be achieved at a much lower CPU load.

Example: Optimizing regular expressions in filters

Suppose you need a filter that matches the following error message logged by the xntpd NTP daemon:

   xntpd[1567]: time error -1159.777379 is too large (set clock manually);

The following filter uses regular expressions and matches every instance and variant of this message.

   filter f_demo_regexp {
        program("demo_program") and
        match("time error .* is too large .* set clock manually");
    };

Segmenting the match() part of this filter into separate match() functions greatly improves the performance of the filter.

   filter f_demo_optimized_regexp {
        program("demo_program") and
        match("time error") and
        match("is too large") and
        match("set clock manually");
    };