4 tricks to reduce security data volume

Top 4 tricks to reduce SIEM data volume

Security teams are drowning in data. With a staggering 28% YoY increase in security logs and SIEM pricing tied to data ingestion, your budget likely feels the strain. But here’s the kicker: more data doesn’t equal better visibility—it often adds noise, complicates detection, and bloats costs.

The solution? Less is more. By collecting better data and reducing volume strategically, you can empower your security teams and trim expenses. In this post, we’ll share four proven tricks to help you slash SIEM data volume without sacrificing quality. Ready to optimize your pipeline and see real results? Let’s dive in.

Data volume vs data quality

The data problem you’re facing is two-fold: volume and quality. The commonly employed “send everything to the SIEM” tactic leads to:

  • high data volume,
  • high data volume growth rate,
  • high noise (and low signal) level, because a huge ratio of the data will be badly formatted, redundant, or not even security-relevant.

All in all, it adds a huge data quality problem, significantly hampering the work of your security teams, increasing costs, and reducing your security posture.

The way out of this downward spiral is to collect less but better data. In this post, we’ll show 4 tricks to get you started with reducing data volume. (We also have a webinar about feeding your SIEM with reduced and actionable security data, feel free to attend if you have any questions.)

What you need to reduce data volume

Data reduction tricks range from simple to complex. Which ones and how many of them you need to apply depends on your environment and your data reduction goals. But all of them have one thing in common: you have to do them in the pipeline. Reducing and fixing data in the SIEM doesn’t really help, because you’ve already paid for ingesting the said data, and it also leads to discrepancies in multi-SIEM/multi-destination scenarios.

So, as a minimum, you’ll need:

  • Tools: A data collector agent or an aggregator that allows you to filter, parse, and manipulate data, and to configure which parts of the message are sent to the SIEM.
    Note that typical data pipelines (like the agents of many SIEM vendors) don’t help to understand and classify data, and often don’t support multiple destinations: they are simple forwarder agents for their SIEM.
  • Knowledge: You must know your data well enough to determine what you can throw away, and what you need to keep. This can apply to devices/applications and also to parts of specific log messages (do you need all three identical timestamps from your firewall logs?). Also, understanding how your SIEM handles data most effectively is a big help.
  • Feedback: A way to monitor the data you send to the SIEM (preferably something other than the monthly bill), so you know if you’re really reducing the data volume (and not just breaking the configuration of your entire data pipeline).

4 tricks to start reducing data volume

Let’s see some real-life examples on how to reduce data volume. Some of these tricks are simple, some are more difficult, some are generic, and some are device-specific. (See our case studies for details on the effects of volume reduction.)

Send only what your SIEM needs

Removing parts of the data that your SIEM doesn’t use is a quick way to reduce data volume. For example, when sending syslog data (which still makes up for about 50% of security data), most SIEMs – including Splunk – don’t need the syslog header in the message body, because they prefer to receive the timestamp and hostname of a message as metadata.

In the case of short messages, like the logs of some firewalls and networking devices, which are typically short but high volume, the syslog header can take up about 10% of the message. Implementing this trick doesn’t even need deep know-how about your data: you only adjust the data format (template) in your pipeline before sending the data to your SIEM.

Redundancy in firewall logs

Firewall logs often contain redundant data. For example, the logs of Palo Alto Networks firewalls:

  1. Contain multiple timestamps: the syslog timestamp, the time Panorama (the management plane of Palo Alto firewalls) collected the message, and a 3rd about when the event was generated. (See the Receive time, Generated time, and High resolution timestamp fields in the Traffic Log Fields documentation.)
  2. They also have fields that are non-empty even if they do not convey information, and only contain values such as “N/A”, “0” or default source names such as “10.0.0.0-10.255.255.255”.

Removing these redundancies yields a surprising 20-25% volume reduction. The key difficulty in removing such redundancies is that you have to:

  • recognize which part of your data flows are actually from Palo Alto firewalls (in other words, classify the log messages of your data flow),
  • be able to manipulate the log messages in real time (high-traffic firewalls can produce lots of logs),
  • understand the particular log messages to know what you must keep and what’s redundant, and also
  • regularly check that everything still works and adjust if needed – the fields of Palo Alto firewall logs have changed three times in the last two years.

Axoflow can automatically classify and reduce the logs of over a hundred off-the-shelf commercial devices and applications, including Palo Alto firewalls.

Unneeded DNS logs

DNS logs are important: they show which servers your users and applications are trying to access from your networks, and can help detect malware that use DNS to access their command and control servers. However, you won’t ever need the vast majority of DNS resolution request logs, because they will target legitimate traffic to the most visited websites of the world. It’s not really surprising, nor especially security-relevant, that users on your corporate office network visit sites like google, youtube, or reddit, but such queries can easily make up ~90% of your DNS query logs.

So you can tremendously reduce the amount of DNS logs by parsing the logs of your DNS servers and filtering out the top twenty sites (or fifty). For starters, you can use the Most visited websites list of Semrush for your region.

Similarly to the firewall logs example, to implement this trick you need to:

  • recognize the DNS resolution request logs in your data flow (classification),
  • parse the log to extract the domain name, 
  • compare the extracted domain name to your exclude list, and
  • filter (drop) the messages of the matching domains to reduce noise.

This is a case where Axoflow shines, as it implements automatic classification and parsing, so you only have to add the list of domains to exclude.

Bonus: Windows event logs

Windows environments offer many ways to significantly reduce your log data volume.

  • First, the XML format used by Windows event logs is extremely verbose (because, you know, XML). So just by converting it into another structured format (like JSON) and sending that to your SIEM is a big win.
  • The second part of verbosity comes from the fact that practically the entire message is duplicated in the RenderedText field of the XML format. Removing this field significantly reduces the size of the messages.
  • If you can parse the XML fields (which you kinda need for the two previous steps), you can also check the Event ID of the message, and collect only the security-relevant messages, discarding the ones that you don’t need.

Axoflow can fetch Windows event logs, automatically convert them into JSON format, remove unnecessary fields, and filter the messages based on Event IDs and other parameters.

Hey, I can do this!

Are such tricks difficult to implement? Technically, implementing one or more of such tricks is not necessarily difficult (of course, it depends on the particular trick). As always, the devil is in the details, and in the maintenance.

For example, data collection agents that support filtering often do so by permitting you to use regular expressions. Regular expressions are really versatile and often can get the job done, but:

  • they can be difficult to write and maintain,
  • inefficient expressions can cause performance problems, and
  • cannot really handle structured data (like JSON-formatted logs).

Also keep in mind that as you add more tricks, managing and maintaining your pipeline configurations can become increasingly difficult.

Conclusion

For security data, less is more and better. Blindly sending every available data into your SIEM is counterproductive: it makes the life of your SoC teams a pain, decreases your security posture, and immensely increases your SIEM costs.

To improve the life of your teams, you have to increase the quality of your data and get its volume under control. You can achieve both by processing data in the pipeline, before ingesting it into your SIEM.

With the tricks (or similar ones) we’ve discussed, you can reduce your log data volume by ~50%. Naturally, which tricks you can use depends on your environment and the applications and devices you use: optimizing Windows logs won’t help you in a Linux-only environment, just like a way to reduce Cisco firewall logs by 90% is useless if you only have Sonicwall devices.

But finding, understanding, implementing, maintaining, and distributing such data processing fixes manually is a big undertaking that requires a lot of effort and know-how. The Axoflow Platform automates these tasks and allows you access to over a hundred device and application-specific data reduction and quality improvement solutions that are continually improved and updated.

Request a demo and see how you can help your teams, improve data quality, and reduce data volume (and costs) within days. Or if you have questions about data reduction techniques, sign up to our Feed your SIEM with reduced and actionable security data webinar.

Feed your SIEM with Reduced and Actionable Data!
Feed your SIEM with Reduced and Actionable Data!
Follow Our Progress!

Follow Our Progress!

We are excited to be realizing our vision above with a full Axoflow product suite.

Follow Our Progress!