Parsing firewall logs with FilterX

Your SIEM is only as good as the data you feed it. But when firewall logs from major vendors like FortiGate, Palo Alto, and SonicWall arrive incomplete, inconsistent, or just plain broken, most syslog pipelines choke. FilterX, AxoSyslog’s open source parsing engine, solves this at the root—offering a flexible, scalable way to normalize and enrich logs before they hit your SIEM. Whether you're maintaining your own pipeline or scaling with Axoflow, FilterX ensures you ship clean, classified, context-rich security data—every time.

Traditional logging infrastructures all face some difficult design choices:

If the endpoints send all syslog data to port 514, as is common, you've to reliably sort the incoming messages based on the sender. This should be trivial to do, but unfortunately is far from that, for two reasons:
- Endpoints often send their data via a relay, which must be configured properly, otherwise it can mask the original sender. Also, all hops in the pipeline must be able to handle the relayed data and attribute it to the original sender host.
- A huge percentage of common endpoints (including a staggering amount of commercial firewall and other appliance vendors) send invalid syslog messages that are incorrectly formatted and/or are missing critical information (like the hostname of the sender, or the timestamp of the event). Such problems greatly increase the chance of a relay node accidentally hiding the original sender, and also requires you to find a way to parse the messages to classify them.
The other way is to configure similar endpoints (like appliances of the same type) to send their data to a designated port. This method implicitly gives you automatic classification, but:
- You have to properly configure all endpoints to send data to the appropriate port, and maintain the portmapping. All devices of the same type need to be configured identically, and you must also enable all these ports on every firewall between the data sources and the syslog server.
- Instead of parsing all types of messages, you still have to parse the messages to fix endpoint-specific problems in the messages for proper host attribution.
- Most such solutions don't actually verify that the received messages belong to the sources intended to send data to the specific port. So if a device is misconfigured and accidentally sends messages to an incorrect port, its messages are likely to get dropped without warning.

FilterX, the solution in AxoSyslog

FilterX was designed to help you with the classification and parsing problems discussed in the previous section. It's a replacement for the traditional syslog-ng filter statements, parsers, and rewrite rules that allows you to filter, parse, manipulate, and rewrite variables and complex data structures. FilterX:

includes high-performance parsers to process common log formats, and
is tightly integrated with data routing, allowing you to make routing decisions based on the message content and classification results.

Parsing and classifying large amounts of different messages inevitably requires you to organize message processing into a decision tree. Since it was specifically designed for this use case, FilterX makes it easy to backtrack in the decision tree if a message (or part of a message) doesn't match a parser, and also allows you to handle exceptions, without a significant performance penalty.

To make the processing of modern log messages possible, FilterX is especially good in handling deeply-nested structured data, like JSON objects and OpenTelemetry logs. The legacy solutions in syslog-ng don't handle such data well.

If you're a heavy syslog-ng user, maybe you remember the patterndb classification engine, which organized the parsers in its database into a Radix tree. FilterX offers similar performance, but with way more flexibility and a hugely extended syntax. And honestly, who wants to write rules in XML anyway?

Examples

Let's see some FilterX parsing and classification examples. If you don't know FilterX yet, I recommend quickly checking our FilterX introduction blog or the FilterX documentation.

First, we'll classify and parse messages for the three commercial firewalls that we've discussed in an earlier blog: FortiGate, Palo Alto, and SonicWall. Here's a sample message for each, along with some message characteristics specific to that firewall that we can use to classify the message.

NOTE: A well-formed syslog message consists of a header and the message body, like this:

<priority>timestamp hostname application: message body with info
<-----------------header----------------><----message body----->

None of the following messages are well-formed syslog messages.

FortiGate parser

<165> us-east-1-dc1-a-dmz-fw date=2025-03-26 time=18:41:07Z devname=us-east-1-dc1-a-dmz-fw devid=FGT60D4614044725 logid=0100040704 type=event subtype=system level=notice vd=root logdesc="System performance statistics" action="perf-stats" cpu=2 mem=35 totalsession=61 disk=2 bandwidth=158/138 setuprate=2 disklograte=0 fazlograte=0 msg="Performance statistics: average CPU: 2, memory: 35, concurrent sessions: 61, setup-rate: 2"

This message begins with an incomplete syslog header (has only <priority> hostname), followed by space-separated key=value pairs, which repeat the hostname in the devname field and has a devid field followed by a logid field. Let's see what a FilterX block for this looks like:

block log parse_fortigate() {
  filterx {
	# Check that the message contains the devid= and logid= strings
	includes($MSG, "devid=") and includes($MSG, "logid=");

	# Parse the $MSG part of the message as key-value pairs into the key_values object
	declare key_values = parse_kv($MSG);

	# Verify that the devid field was present
	key_values.devid;

	# Set the hostname syslog field to the value of the devname field
	$HOST = key_values.devname;

	# Successfully classified a fortigate message
	$VENDOR = "fortinet";
	$PRODUCT = "fortigate";

	# Set variables for Splunk sourcetype and index based on the value of the type field
	switch (key_values.type) {
  	case "event":
    	  $SPLUNK_SOURCETYPE = "fortigate_event";
    	  $SPLUNK_INDEX = "netops";
    	break;
  	case "traffic":
    	  $SPLUNK_SOURCETYPE = "fortigate_traffic";
    	  $SPLUNK_INDEX = "netfw";
    	break;
  	case "utm":
    	  $SPLUNK_SOURCETYPE = "fortigate_utm";
    	  $SPLUNK_INDEX = "netfw";
    	break;
  	case "anomaly":
    	  $SPLUNK_SOURCETYPE = "fortigate_anomaly";
    	  $SPLUNK_INDEX = "netfw";
    	break;
  	default:
    	  $SPLUNK_SOURCETYPE = "fortigate_event";
    	  $SPLUNK_INDEX = "netops";
    	break;
	};
  };
};

Note: The example above sets the Splunk sourcetype and index based on the content from the message.

Palo Alto firewall parser

Here we show a sample for TRAFFIC logs of Palo Alto firewall logs.

<165>Mar 26 18:41:06 us-east-1-dc1-b-edge-fw 1,2025/03/26 18:41:06,007200001056,TRAFFIC,end,1,2025/03/26 18:41:06,192.168.41.30,192.168.41.255,10.193.16.193,192.168.41.255,allow-all,,,netbios-ns,vsys1,Trust,Untrust,ethernet1/1,ethernet1/2,To-Panorama,2025/03/26 18:41:06,8720,1,137,137,11637,137,0x400000,udp,allow,276,276,0,3,2025/03/26 18:41:06,2,any,0,2345136,0x0,192.168.0.0-192.168.255.255,192.168.0.0-192.168.255.255,0,3,0

Palo Alto messages get most of the header right (<priority>timestamp hostname), <165>Mar 26 18:41:06 us-east-1-dc1-b-edge-fw, omits the name of the application, then puts a long list of comma-separated values into the message body. The body begins with a version number (1), followed by a timestamp and a serial number.

block log parse_palo_alto() {
  filterx {
    # Check that the message includes the "1," and ",TRAFFIC," strings
    includes($MSG, "1,") and includes($MSG, ",TRAFFIC,");

    # Names of the columns in TRAFFIC logs
    declare palo_alto_traffic_columns = ["future_use1", "received_time", "serial_number", "type", "log_subtype", "version", "generated_time", "src_ip", "dest_ip", "src_translated_ip", "dest_translated_ip", "rule", "src_user", "dest_user", "app", "vsys", "src_zone", "dest_zone", "src_interface", "dest_interface", "log_forwarding_profile", "future_use3", "session_id", "repeat_count", "src_port", "dest_port", "src_translated_port", "dest_translated_port", "session_flags", "protocol", "action", "bytes", "bytes_sent", "bytes_received", "packets", "start_time", "elapsed_time", "http_category", "future_use4", "sequence_number", "action_flags", "src_location", "dest_location", "future_use5", "packets_sent", "packets_received", "session_end_reason", "devicegroup_level1", "devicegroup_level2", "devicegroup_level3", "devicegroup_level4", "vsys_name", "dvc_name", "action_source", "src_uuid", "dst_uuid", "tunnelid_imsi", "monitortag_imei", "parent_session_id", "parent_start_time", "tunnel", "assoc_id", "chunks", "chunks_sent", "chunks_received", "rule_uuid", "http2_connection", "link_change_count", "policy_id", "link_switches", "sdwan_cluster", "sdwan_device_type", "sdwan_cluster_type", "sdwan_site", "dynusergroup_name", "xff_ip", "src_category", "src_profile", "src_model", "src_vendor", "src_osfamily", "src_osversion", "src_host", "src_mac", "dst_category", "dst_profile", "dst_model", "dst_vendor", "dst_osfamily", "dst_osversion", "dst_host", "dst_mac", "container_id", "pod_namespace", "pod_name", "src_edl", "dst_edl", "hostid", "client_serialnumber", "src_dag", "dst_dag", "session_owner", "high_res_timestamp", "nssai_sst", "nssai_sd", "subcategory_of_app", "category_of_app", "technology_of_app", "risk_of_app", "characteristic_of_app", "container_of_app", "tunneled_app", "is_saas_of_app", "sanctioned_state_of_app", "offloaded", "flow_type", "cluster_name"];
    
    # parse the entire line, columns are type specific
    key_values = parse_csv($RAWMSG, columns=palo_alto_traffic_columns);

    # Verify that the TYPE field contains TRAFFIC
    key_values.type == "TRAFFIC";

    # Successfully classified a palo alto message
    $VENDOR = "paloalto";
    $PRODUCT = "firewall";

    # Set variables for Splunk sourcetype and index based on the value of the type field
    switch (key_values.type) {
      case "TRAFFIC":
          $SPLUNK_SOURCETYPE = "pan:traffic";
          $SPLUNK_INDEX = "netfw";
        break;
      # Add other sourcetypes/index for other message types
    };
  };
};

Note that Palo Alto firewalls have several different types of log messages in addition to TRAFFIC logs (15+ altogether), and each have their own unique list of columns that are included in the message. Also, the list of columns often changes between upgrades, so you need to check and update your parsers as needed.

SonicWall parser

<165> id=us-west-1-dc1-a-dmz-fw sn=C0EFE3336C80 time="2025-03-26 18:41:01" fw=192.168.1.239 pri=6 c=1024 gcat=6 m=537 msg="Connection Closed" srcMac=00:50:56:f5:50:27 src=10.237.228.74:54406:X20 srcZone=Trusted natSrc=192.168.1.239:38377 dstMac=00:1a:f0:8b:e0:18 dst=44.190.129.212:123:X2 dstZone=Untrusted natDst=44.190.129.212:123 proto=udp/ntp sent=152 rcvd=152 spkt=2 rpkt=2 cdur=30250 rule="22 (LAN->WAN)" n=490872197 fw_action="NA" dpi=0

Begins with a <priority> field, followed by 1 or 2 spaces, and a long list of space-separated key=value pairs. The first such field is the id field, which contains the hostname followed by a serial number field (sn). Let's see a FilterX block for this:

@include "scl.conf"
block log parse() {
  # Conditionals to go through the custom parsers
  if   { parse_fortigate(); }
  elif { parse_palo_alto(); }
  elif { parse_sonicwall(); };
};
source s_network {
  default-network-drivers(
	flags(store-raw-message) # Needed to parse messages that are very much non-syslog compliant, like the Sonicwall messages
  );
};
destination d_splunk_hec_event {
  splunk-hec-event(
	url("https://localhost:8088")
	token("70b6ae71-76b3-4c38-9597-0c5b37ad9630")
	sourcetype($SPLUNK_SOURCETYPE)
	index($SPLUNK_INDEX)
	default-index("netops")
  );
};
log {
  source(s_network);
  parse();
  destination(d_splunk_hec_event);
};

Combining the blocks

Let’s create the other parts of the configuration that are needed to use the parsers we’ve created:

A block with if-elif conditionals to group the parsers.
Unlike traditional programming languages, FilterX was purposefully built for such use cases. As a result, these conditionals can be evaluated effectively, and having to backtrack when a parser doesn’t match (drop any failed or partial results and try the next conditional on the original message) has only a minor performance impact.
A generic network source to receive the messages.
A Splunk destination that will use the sourcetype and index set in the parsers, and
a log path to tie all these together.

Are we done yet?

I think the above parsing examples highlight a few important things:

With the right tool (for example, AxoSyslog), message parsing and classification can be done effectively.
However, even the best tool requires significant know-how about:
- the tool itself,
- the specific log messages you need to process, and also a
- good overview of the parsers you've already created to avoid overlaps and misclassification of similar messages, and also to be able to maintain them (because the message formats might change after a product upgrade).
Creating and maintaining the parsers for many applications or devices is a complex and time-consuming effort, especially for high-volume message sources that need performance optimizations (like firewalls).

If you don’t have the resources to tackle the problem, an alternative is to outsource these efforts to a product dedicated to managing your security data pipeline that includes optimized message parsing for over a hundred ubiquitous products, like Axoflow Platform. That way you solve your data parsing issues, and can also:

Deliver high-quality, optimized data to your SIEM
Gain insight into the status of your data pipeline to avoid losing data
Remove noise and redundant data to cut your SIEM and storage costs

Summary

Parsing messages and other security data is still an ongoing issue for organizations. The open source FilterX data processing engine of AxoSyslog provides an effective way to parse and classify your data, and is a great choice if you want to build a classification database. However, creating and maintaining such a database requires significant time, effort, and know-how. If your organization prefers, you can outsource this to Axoflow Platform, our security data curation pipeline.