Our automated data engine solves syslog issues: fixes, optimizes, and structures security logs before they reach your SIEM, improving performance and accuracy

Fix the Syslog Mess: keep invalid syslog data from wrecking your SIEM

The security data you rely on is often incomplete or poorly formatted and your SIEM struggles to make sense of it. This post explains why this is an issue and how Axoflow’s data processing engine offers a more reliable, scalable solution.

Collecting security data is essential for detecting anomalies, investigating incidents, and ensuring compliance with security regulations. Based on this data, Security Information and Event Management (SIEM) systems provide analysis, correlation, and alerts to help security teams quickly identify, respond to, and mitigate potential threats before they escalate.

Even though OpenTelemetry is quickly gaining popularity for transporting security data, most security devices (for example, firewalls, access gateways, proxies) use syslog to forward their data. As a result, up to 50% of the data sent to SIEMs is still syslog. Why is this a problem? The syslog protocol has several reliability problems (see these blog posts), but you should also be concerned with the quality of the data you’re sending to your SIEM.

Syslog data quality

The main issue related to classification is that many devices send malformed messages: missing timestamp, missing hostname, invalid message format, and so on. For instance, this is a log message from a Sonicwall firewall appliance:

<133> id=firewall sn=C0EFE33057B0 time="2024-10-07 14:56:47 UTC" fw=172.18.88.37 pri=6 c=1024 m=537 msg="Connection Closed" f=2 n=316039228 src=192.0.0.159:61254:X1: dst=10.0.0.7:53:X3:SIMILDC01 proto=udp/dns sent=59 rcvd=134 vpnpolicy="ELG Main"

A well-formed syslog message should look like this:

<priority>timestamp hostname application: message body

As you can see, the Sonicwall format is completely invalid after the initial <priority> field. Instead of the timestamp, hostname, and application name comes the free-form part of the message (in this case a whitespace-separated key=value list). Unless you extract the hostname and timestamp from the content of this malformed message, you won’t be able to reconstruct the course of events during a security incident.

Fixing syslog errors

Because of such issues, SIEMs need to process and fix the incoming data in a way that’s specific for the source device. For most SIEMs, processing the data means creating several, usually complicated regular expressions. These rulesets are created and maintained by the SIEM vendor, or the device vendor. If neither vendor does this work, you have to solve the problem in-house. Sidenote: regular expressions are difficult to write and maintain, are fragile, and have low performance. Both vendor and in-house solutions usually use regular expressions.

For Splunk, such rulesets are called technology add-ons (TAs), and you can download them from Splunkbase. Some of these TAs are supported by Splunk, others are community or vendor supported, and greatly vary in quality/maintenance.

For you (the SIEM user), the best option is if the SIEM vendor does this work. However, they usually do this only for the most popular devices – if you’re using something else you’re out of luck. Also, as the saying goes, if this approach would work, it would’ve worked by now.

Also, creating a ruleset for your devices once is not enough, they also require maintenance, because data formats often change between device updates or releases. For example, the log format of Palo Alto firewalls gained many new fields in recent releases. Such changes can break your SIEM dashboards, which can return incorrect data, or no data at all. Such problems and changes can be difficult to detect, though having meaningful metrics that monitor the processed data flow helps a lot.

Data source vendors don’t bother with creating and maintaining rulesets for SIEMs because:

  • They don’t have the know-how about logging. If they did, they wouldn’t send malformed log messages in the first place.
  • They don’t have the know-how about the SIEMs. Also, instead of creating and maintaining complex rulesets for different SIEMs, they could make their life easier by fixing the format of their logs – which hasn’t happened yet.

The Axoflow solution

Axoflow provides a data processing, curation, and classification intelligence that’s built into the data pipeline, so it processes and fixes the data before it’s sent to the SIEM.

Our data engine and database solution has several benefits:

  • Automatic processing: Axoflow automatically recognizes and classifies the incoming data, applies device-specific fixes and enrichment, and optimizes the formatting for the specific destination (SIEM).
  • Performance: Compared to the commonly used regular expressions, it’s more robust and has better performance.
  • Maintained by Axoflow: We maintain the database; you don’t have work with it. This includes updates for new product versions and adding new devices. We have both the know-how and the incentive to do it: providing the best database we can is a cornerstone of our product and one of its main benefits to our customers. Therefore, we proactively monitor and check the new releases of main security devices for logging-related changes and update our database. (If something’s not working as expected, you can easily submit log samples and we’ll fix it ASAP).
    Currently, we have over 60 application adapters in our database.
  • Structured data: Syslog is inherently an unstructured format. Axoflow recognizes the format of the incoming data payload (for example, JSON, CSV, LEEF, free text), and automatically parses the payload into a structured map. This allows us to have detailed, content-based metrics and alerts, and also makes it easy for you to add custom transformations if needed.
  • SIEM-independent: The structured data representation allows us to support multiple SIEMs (and other destinations) and optimize the data for every destination.

The automatic classification and curation also adds labels and metadata that can be used to make decisions and routing. For example, you can easily route all firewall and security logs to your Splunk deployment, and exclude logs (like debug logs) that have no security relevance and shouldn’t be sent to the SIEM.

Another important aspect is that all data is processed before it’s sent to the SIEM. That way, we can automatically reduce the amount of data sent to the SIEM (for example, by removing empty and redundant fields), cutting your data ingestion costs.

Summary

When forwarding syslog data to your SIEM system, poor quality data and malformed logs that lack critical fields like timestamps or hostnames need to be fixed. The usual solutions involve complex regular expressions, which are difficult to create and maintain. SIEM users often rely on vendors to manage these rules, but support is limited, especially for less popular devices. Axoflow offers a unique solution by automatically processing, curating, and classifying data before it’s sent to the SIEM, ensuring accurate, structured, and optimized data, reducing ingestion costs and improving SIEM performance. To learn more about this topic, sign up for our webinar about Why parsing sucks.

 

Solve the Malformed<br />
Message Madness!
Solve the Malformed Message Madness!
Follow Our Progress!

Follow Our Progress!

We are excited to be realizing our vision above with a full Axoflow product suite.

Follow Our Progress!