This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Concepts

This section describes the main concepts of Axoflow.

1 - Automatic data processing

When forwarding data to your SIEM, poor quality data and malformed logs that lack critical fields like timestamps or hostnames need to be fixed. The usual solutions fix the problem in the SIEM, and involve complex regular expressions, which are difficult to create and maintain. SIEM users often rely on vendors to manage these rules, but support is limited, especially for less popular devices. Axoflow offers a unique solution by automatically processing, curating, and classifying data before it’s sent to the SIEM, ensuring accurate, structured, and optimized data, reducing ingestion costs and improving SIEM performance.

The problem

The main issue related to classification is that many devices send malformed messages: missing timestamp, missing hostname, invalid message format, and so on. Such errors can cause different kinds of problems:

  • Log messages are often routed to different destinations based on the sender hostname. Missing or invalid hostnames mean that the message is not attributed to the right host, and often doesn’t arrive at its intended destination.
  • Incorrect timestamp or timezone hampers investigations during an incident, resulting in potentially critical data failing to show up (or extraneous data appearing) in queries for a particular period.
  • Invalid data can lead to memory leaks or resource overload in the processing software (and to unusable monitoring dashboards) when a sequence number or other rapidly varying field is mistakenly parsed as the hostname, program name, or other low cardinality field.

Overall, they decrease the quality of security data you’re sending to your SIEM tools, which increases false positives, requires secondary data processing to clean, and increases query time – all of which ends up costing firms a lot more.

For instance, this is a log message from a SonicWall firewall appliance:

<133> id=firewall sn=C0EFE33057B0 time="2024-10-07 14:56:47 UTC" fw=172.18.88.37 pri=6 c=1024 m=537 msg="Connection Closed" f=2 n=316039228 src=192.0.0.159:61254:X1: dst=10.0.0.7:53:X3:SIMILDC01 proto=udp/dns sent=59 rcvd=134 vpnpolicy="ELG Main"

A well-formed syslog message should look like this:

<priority>timestamp hostname application: message body

As you can see, the SonicWall format is completely invalid after the initial <priority> field. Instead of the timestamp, hostname, and application name comes the free-form part of the message (in this case a whitespace-separated key=value list). Unless you extract the hostname and timestamp from the content of this malformed message, you won’t be able to reconstruct the course of events during a security incident.

Axoflow provides data processing, curation, and classification intelligence that’s built into the data pipeline, so it processes and fixes the data before it’s sent to the SIEM.

Our solution

Our data engine and database solution automatically processes the incoming data: AxoRouter recognizes and classifies the incoming data, applies device-specific fixes for the errors, then enriches and optimizes the formatting for the specific destination (SIEM). This approach has several benefits:

  • Cost reduction: All data is processed before it’s sent to the SIEM. That way, we can automatically reduce the amount of data sent to the SIEM (for example, by removing empty and redundant fields), cutting your data ingestion costs.
  • Structured data: Axoflow recognizes the format of the incoming data payload (for example, JSON, CSV, LEEF, free text), and automatically parses the payload into a structured map. This allows us to have detailed, content-based metrics and alerts, and also makes it easy for you to add custom transformations if needed.
  • SIEM-independent: The structured data representation allows us to support multiple SIEMs (and other destinations) and optimize the data for every destination.
  • Performance: Compared to the commonly used regular expressions, it’s more robust and has better performance, allowing you to process more data with fewer resources.
  • Maintained by Axoflow: We maintain the database; you don’t have work with it. This includes updates for new product versions and adding new devices. We proactively monitor and check the new releases of main security devices for logging-related changes and update our database. (If something’s not working as expected, you can easily submit log samples and we’ll fix it ASAP). Currently, we have over 80 application adapters in our database.

The automatic classification and curation also adds labels and metadata that can be used to make decisions and route your data. Messages with errors are also tagged with error-specific tags. For example, you can easily route all firewall and security logs to your Splunk deployment, and exclude logs (like debug logs) that have no security relevance and shouldn’t be sent to the SIEM.

Axoflow Console allows you to quickly drill down to find log flows with issues, and to tap into the log flow and see samples of the specific messages that are processed, along with the related parsing information, like tags that describe the errors of invalid messages.

  • message.utf8_sanitized: The message is not valid UTF-8.
  • syslog.missing_timestamp: The message has no timestamp.
  • syslog.invalid_hostname: The hostname field doesn’t seem to be valid, for example, it contains invalid characters.
  • syslog.missing_pri: The priority (PRI) field is missing from the message.
  • syslog.unexpected_framing: An octet count was found in front of the message, suggested invalid framing.
  • syslog.rfc3164_missing_header: The date and the host are missing from the message – practically that’s the entire header of RFC3164-formatted messages.
  • syslog.rfc5424_unquoted_sdata_value: The message contains an incorrectly quoted RFC5424 SDATA field.
  • message.parse_error: Some other parsing error occurred.

Details of a log message

2 - Host attribution and inventory

Axoflow’s built-in inventory solution enriches your security data with critical metadata (like the origin host) so you can pinpoint the exact source of every data entry, enabling precise, label-based routing and more informed security decisions.

Enterprises and organizations collect security data (like syslog and other event data) from various data sources, including network devices, security devices, servers, and so on. When looking at a particular log entry during a security incident, it’s not always trivial to determine what generated it. Was it an appliance or an application? Which team is the owner of the application or device? If it was a network device like a switch or a Wi-Fi access point (of which even medium-sized organizations have dozens), can you tell which one it was, and where is it located?

Cloud-native environments, like Kubernetes have addressed this issue using resource metadata. You can attach labels to every element of the infrastructure: containers, pods, nodes, and so on, to include region, role, owner or other custom metadata. When collecting log data from the applications running in containers, the log collector agent can retrieve these labels and attach them to the log data as metadata. This helps immensely to associate routing decisions and security conclusions with the source systems.

In non-cloud environments, like traditionally operated physical or virtual machine clusters, the logs of applications, appliances, and other data sources lack such labels. To identify the log source, you’re stuck with the sender’s IP address, the sender’s hostname, and in some cases the path and filename of the log file, plus whatever information is available in the log message.

Why isn’t the IP address or hostname enough?

  • Source IP addresses are poor identifiers, as they aren’t strongly tied to the host, especially when they’re dynamically allocated using DHCP or when a group of senders are behind a NAT and have the same address.
  • Some organizations encode metadata into the hostname or DNS record. However, compared to the volume of log messages, DNS resolving is slow. Also, the DNS record is available at the source, but might not be available at the log aggregation device or the log server. As a result, this data is often lost.
  • Many data sources and devices omit important information from their messages. For example, the log messages of Cisco routers by default omit the hostname. (They can be configured to send their hostname, but unfortunately, often they aren’t.) You can resolve their IP address to obtain the hostname of the device, but that leads back to the problem in the previous point.
  • In addition, all the above issues are rooted in human configuration practices, which tend to be the main cause of anomalies in the system.

Correlating data to data source

Some SIEMs (like IBM QRadar) rely on the sender IP address to identify the data source. Others, like Splunk, delegate the task of providing metadata to the collector agent, but log sources often don’t supply metadata, and even if they do, the information is based on IP address or DNS.

Axoflow host attribution

Certain devices, like SonicWall firewalls include a unique device ID in the content of their log messages. Having access to this ID makes attribution straightforward, but logging solutions rarely extract such information. AxoRouter does.

Axoflow builds an inventory to match the messages in your data flow to the available data sources, based on data including:

  • IP address, host name, and DNS records of the source (if available),

  • serial number, device ID, and other unique identifiers extracted from the messages, and

  • metadata based on automatic classification of the log messages, like product and vendor name.

    Axoflow (or more precisely, AxoRouter) classifies the processed log data to identify common vendors and products, and automatically applies labels to attach this information.

You can also integrate the telemetry pipeline with external asset management systems to further enrich your security data with custom labels and additional contextual information about your data sources.

3 - Policy based routing

Axoflow’s host attribution gives you unprecedented possibilities to route your data. You can tell exactly what kind of data you want to send where. Not only based on technical parameters like sender IP or application name, but using a much more fine-grained inventory that includes information about the devices, their roles, and their location. For example:

  • The vendor and the product (for example, a SonicWall firewall)
  • The physical location of the device (geographical location, rack number)
  • Owner of the device (organization/department/team)

Sample labels

Axoflow automatically adds metadata labels based on information extracted from the data payload, but you can also add static custom labels to hosts manually, or dynamically using flows.

Paired with the information from the content of the messages, all this metadata allows you to formulate high-level, declarative policies and alerts using the metadata labels, for example:

  • Route the logs of devices to the owner’s Splunk index.
  • Route every firewall and security log to a specific index.
  • Let the different teams manage and observe their own devices.
  • Granular access control based on relevant log content (for example, identify a proxy log-stream based on the upstream address), instead of just device access.
  • Ensure that specific logs won’t cross geographic borders, violating GDPR or other compliance requirements.

Policy based routing

4 - Reliable transport

Between its components, Axoflow transports data using the reliable OpenTelemetry protocol (OTLP) for high performance, and to avoid losing messages. Under the hood, OpenTelemetry uses gRPC, and has superior performance compared to other log transport protocols, consumes less bandwidth (because it uses protocol buffers), and also provides features like:

  • on-the-wire compression
  • authentication
  • application layer acknowledgement
  • batched data sending
  • multi-worker scalability on the client and the server.