Classify and reduce security data
Axoflow provides a robust classification system that actually verifies the data it receives, instead of relying on using dedicated ports. This approach results in automatic data labeling, high-quality data, and volume reduction - out of the box, automated, without coding.
Classify the incoming data
Verifying which device or service a certain message belongs to is difficult: even a single data source (like an appliance) can have different kinds of messages, and you have to be able to recognize each one, uniquely. This requires:
- deep, device and vendor-specific understanding of the data, and also
- understanding of the syslog data formats and protocols, because oftentimes the data sources send invalid messages that you have to recognize and fix as part of the classification process.
Also, classification needs to be both reliable and performant. A naive implementation using regexps is neither, nevertheless, that’s the solution you find at the core of today’s ingestion pipelines. We at Axoflow understand that creating and maintaining such a classification database is difficult, this is why we decided to make classification a core functionality of the Axoflow Platform, so you will never need to write another parsing regexp. At the moment, Axoflow supports over 90 data sources of well-known vendors.
Classification and the ability to process your security data in the pipeline also allows you to:
- Fix the incoming data (like the malformed firewall messages shown above) to add missing information, like hostname or timestamp.
- Identify the source host,
- Parse the log to access the information contained within,
- Redact sensitive information before it gets sent to a SIEM or storage, like PII information,
- Reduce the data volume and as a result, storage and SIEM costs,
- Enrich the data with contextual information, like adding labels based on the source or content of the data,
- Use all the above to route the data to the appropriate destinations, and finally
- Transform the data into an optimized format that the destination can reliably and effortlessly consume. This includes mapping your data to multiple different schemas if you use multiple analytic tools.
Reduce data volume
Classifying and parsing the incoming data allows you to remove the parts that aren’t needed, for example, to:
- drop entire messages if they are redundant or not relevant from a security perspective, or
- remove parts of individual messages, like fields that are non-empty even if they do not convey information (for example, that contain values such as “N/A” or “0”).
As this data reduction happens in the pipeline, before the data arrives in the SIEM or storage, it can save you significant costs, and also improves the quality of the data your detection engineers get to work with.
Palo Alto log reduction example
Let’s see an example on how data reduction works in Axoflow. Here is a log message from a Palo Alto firewall:
<165>Mar 26 18:41:06 us-east-1-dc1-b-edge-fw 1,2025/03/26 18:41:06,007200001056,TRAFFIC,end,1,2025/03/26 18:41:06,192.168.41.30,192.168.41.255,10.193.16.193,192.168.41.255,allow-all,,,netbios-ns,vsys1,Trust,Untrust,ethernet1/1,ethernet1/2,To-Panorama,2025/03/26 18:41:06,8720,1,137,137,11637,137,0x400000,udp,allow,276,276,0,3,2025/03/26 18:41:06,2,any,0,2345136,0x0,192.168.0.0-192.168.255.255,192.168.0.0-192.168.255.255,0,3,0
Here’s what you can drop from this particular message:
-
Redundant timestamps: Palo Alto log messages contain up to five, practically identical timestamps (see the Receive time, Generated time, and High resolution timestamp fields in the Traffic Log Fields documentation):
- the syslog timestamp in the header (Mar 26 18:41:06),
- the time Panorama (the management plane of Palo Alto firewalls) collected the message (2025/03/26 18:41:06), and
- the time when the event was generated (2025/03/26 18:41:06).
The sample log message has five timestamps. Leaving only one timestamp can reduce the message size by up to 15%.
-
The priority field (
<165>
) is identical in most messages and has no information value. While that takes up only about 1% of the size of the, on high-traffic firewalls even this small change adds up to significant data saving. -
Several fields contain default or empty values that provide no information, for example, default internal IP ranges like
192.168.0.0-192.168.255.255
. Removing such fields yields over 10% size reduction.
Note that when removing fields, we can delete only the value of the field, because the message format (CSV) relies on having a fixed order of columns for each message type. This also means that we have to individually check what can be removed from each of the 17 Palo Alto log type.
Palo Alto firewalls send this specific message when a connection is closed. They also send a message when a new connection is started, but that doesn’t contain any information that’s not available in the ending message, so it’s completely redundant and can be dropped. As every connection has a beginning and an end, this alone almost halves the size of the data stored per connection. For example:
Connection start message:
<113>Apr 11 10:58:18 us-east-1-dc1-b-edge-fw 1,10:58:18.421048,007200001056,TRAFFIC,end,1210,10:58:18.421048,192.168.41.30,192.168.41.255,10.193.16.193,192.168.41.255,allow-all,,,ssl,vsys1,trust-users,untrust,ethernet1/2.30,ethernet1/1,To-Panorama,2020/10/09 17:43:54,36459,1,39681,443,32326,443,0x400053,tcp,allow,43135,24629,18506,189,2020/10/09 16:53:27,3012,laptops,0,1353226782,0x8000000000000000,10.0.0.0-10.255.255.255,United States,0,90,99,tcp-fin,16,0,0,0,,testhost,from-policy,,,0,,0,,N/A,0,0,0,0,ace432fe-a9f2-5a1e-327a-91fdce0077da,0
Connection end message:
<113>Apr 11 10:58:18 us-east-1-dc1-b-edge-fw 1,10:58:18.421048,007200001056,TRAFFIC,end,1210,10:58:18.421048,192.168.41.30,192.168.41.255,10.193.16.193,192.168.41.255,allow-all,,,ssl,vsys1,trust-users,untrust,ethernet1/2.30,ethernet1/1,To-Panorama,2020/10/09 17:43:54,36459,1,39681,443,32326,443,0x400053,tcp,allow,43135,24629,18506,189,2020/10/09 16:53:27,3012,laptops,0,1353226782,0x8000000000000000,10.0.0.0-10.255.255.255,United States,0,90,99,tcp-fin,16,0,0,0,,testhost,from-policy,,,0,,0,,N/A,0,0,0,0,ace432fe-a9f2-5a1e-327a-91fdce0077da,0
You can enable data reduction in your data flows using the Reduce processing step, and see the amount of data received and transferred in the flow on the Metrics page of the flow.