BigQuery from Google is a data warehouse that can scale to the petabyte range and provides built-in ML/AI and BI features. Since the log and telemetry data that your organization has to collect is estimated to grow by about 25% every year, BigQuery can be an ideal way to store all your data. In addition, its powerful analytics features can give you valuable insights that make the data you collect actually useful. This post shows you how to send log and telemetry data to Google BigQuery with syslog-ng.

Syslog-ng’s brand new bigquery() destination sends your data directly to BigQuery via the Storage Write gRPC API. You don’t need to send your logs to Google Pub/Sub, then stream them to BigQuery, you can do this all in one step.

Demo: Sending Kubernetes logs to Google BigQuery

We will use Axoflow’s custom cloud-ready image of syslog-ng called AxoSyslog and its respective helm chart called the AxoSyslog Collector chart to collect Kubernetes logs and send them to BigQuery.

The bigquery() destination is available in syslog-ng version 4.6.0 and newer.

Prerequisites

Before you begin, you will need: 

  • a Google BigQuery environment, for example, the BigQuery Sandbox
  • a Kubernetes cluster (Minikube, kind, or K3s can help you get started easily),
  • Helm, and
  • a recent kubectl version.

Create a demo BigQuery table

You might already have a BigQuery table. For the sake of this demo we will create a very basic table for our Kubernetes logs. You can follow this guide to create a table with the following schema:

Demo BigQuery schema for sending logs with syslog-ng

Set up Google authentication

The bigquery() destination supports ADC authentication, which itself provides multiple authentication methods. In this demo we will use its Service Account Key authentication method. For local testing, you can use your personal key as a Kubernetes Secret. In a production environment, use a service account and workload identity.

Generate and download the JSON key, then make a Kubernetes Secret with its content, for example:

kubectl create secret generic syslog-ng-biguery-demo-secret --from-file=application_default_credentials.json=/path/to/your/service-key.json

To check whether you successfully added the secret, run this command, which should print the content of the JSON file:

kubectl get secret syslog-ng-biguery-demo-secret --template='{{ index .data "application_default_credentials.json" }}' | base64 -d

We will mount this secret to /root/.config/gcloud/ so the automatic ADC authentication can find and use it.

Generating logs

In your Kubernetes environment you probably already have a lot of logs from pods. If not then you can install kube-logging/log-generator to generate some logs for you.

helm install log-generator --wait oci://ghcr.io/kube-logging/helm-charts/log-generator

Check that it’s running:

kubectl get pods

The output should look like:

NAME                            READY   STATUS    RESTARTS   AGE
log-generator-97c6b5b48-j285z   1/1     Running   0          8s

Install AxoSyslog Collector

First, let’s create our configuration for the AxoSyslog Collector helm chart. Create a file called bigquery.yaml with the following content:

image:
 tag: "nightly"

daemonset:
 secretMounts:
   - name: syslog-ng-biguery-demo-secret
     secretName: syslog-ng-biguery-demo-secret
     path: /root/.config/gcloud/
 hostNetworking: true

config:
 raw: |
   @version: current
   @include "scl.conf"

   source s_kubernetes {
     kubernetes(key-delimiter("#"));
   };

   destination d_bigquery {
     bigquery(
       project("syslog-ng-bigquery-demo-project")
       dataset("syslog_ng_bigquery_demo_dataset")
       table("syslog-ng-bigquery-demo-table")
       schema(
         "timestamp" TIMESTAMP => "${R_UNIXTIME}${R_USEC}"
         "namespace" STRING => "${.k8s.namespace_name}"
         "pod" STRING => "${.k8s.pod_name}"
         "labels" STRING => "$(format-json --key-delimiter '#' --subkeys '.k8s.labels#')"
         "log" STRING => "${MESSAGE}"
       )
     );
   };

   log {
     source(s_kubernetes);
     destination(d_bigquery);
   };

This configuration uses the nightly AxoSyslog Collector container image, mounts the previously created secret, enables internet connection to the daemonset, and configures syslog-ng with the raw config given. The raw configuration works with any syslog-ng instance (>= version 4.6.0), not just AxoSyslog Collector.

In the raw syslog-ng config we set up a kubernetes() source and a bigquery() destination. 

You will need to:

  • Change the project(), dataset() and table() options of the bigquery() destination to match your environment. 
  • If you have a different BigQuery table schema than the one shown in this demo, make sure to update the schema() option, too. On the left side of the arrow you can set the name of the column and its type, on the right side you can set any syslog-ng template or macro, which gets evaluated on each log that is routed to the bigquery() destination. The available column types are: STRING, BYTES, INTEGER, FLOAT, BOOLEAN, TIMESTAMP, DATE, TIME, DATETIME, JSON, NUMERIC, BIGNUMERIC, GEOGRAPHY, RECORD, INTERVAL.

We can now add the helm repo and install the collector with our configuration:

helm repo add axosyslog https://axoflow.github.io/axosyslog-charts
helm repo update
helm install axosyslog-bigquery axosyslog/axosyslog-collector -f bigquery.yaml

We can validate that our AxoSyslog Collector is up and running:

kubectl get pods

Expected output:

NAME                                           READY   STATUS    RESTARTS   AGE
axosyslog-bigquery-axosyslog-collector-49b84   1/1     Running   0          4m
log-generator-97c6b5b48-8rnpz                  1/1     Running   0          3m

Check the logs in BigQuery

After navigating to BigQuery in the Google Cloud Console, we can run a query to see our ingested logs, for example:

SELECT * FROM `syslog-ng-bigquery-demo-project.syslog_ng_bigquery_demo_dataset.syslog-ng-bigquery-demo-table` LIMIT 1000
Sample logs sent with syslog-ng to BigQuery

Summary

Google BigQuery can be a viable way to store and analyze large-scale log and telemetry data. With its high performance and vast range of supported data sources, syslog-ng allows you to send all kinds of data ranging from legacy (like syslog) to modern (like OpenTelemetry) sources directly to BigQuery. This helps you to simplify your logging and telemetry infrastructure by reducing the number of components you need to get your data from its original source to your BigQuery data warehouse.

Follow Our Progress!

We are excited to be realizing our vision above with a full Axoflow product suite.

Subscribe for Product News

  • Technology oriented content only.
  • Not more than 1-3 posts per month.
  • You can unsubscribe any time.

By signing up you agree to receive promotional messages
according to Axoflow's Terms of Services.