# HDFS destination options

The `hdfs` destination stores the log messages in files on the Hadoop Distributed File System (HDFS). The `hdfs` destination has the following options.

The following options are required: `hdfs-file()`, `hdfs-uri()`. Note that to use `hdfs`, you must add the following line to the beginning of your AxoSyslog configuration:
```
 
       @include "scl.conf"
    
```

## client-lib-dir()

|   
---|---  
Type: | string  
Default: | The AxoSyslog module directory: /opt/syslog-ng/lib/syslog-ng/java-modules/  
  
_Description:_ The list of the paths where the required Java classes are located. For example, `class-path("/opt/syslog-ng/lib/syslog-ng/java-modules/:/opt/my-java-libraries/libs/")`. If you set this option multiple times in your AxoSyslog configuration (for example, because you have multiple Java-based destinations), AxoSyslog will merge every available paths to a single list.

For the `hdfs` destination, include the path to the directory where you copied the required libraries (see [Prerequisites](../../../docs/axosyslog-core/chapter-destinations/configuring-destinations-hdfs/destination-hdfs-prerequisites/index.md)), for example, `client-lib-dir("/opt/syslog-ng/lib/syslog-ng/java-modules/:/opt/hadoop/libs/")`.

## disk-buffer()

_Description:_ This option enables putting outgoing messages into the disk buffer of the destination to avoid message loss in case of a system failure on the destination side. It has the following options:

### capacity-bytes()

Type: | number (bytes)  
---|---  
Default: | 1MiB  
  
_Description:_ This is a required option. The maximum size of the disk-buffer in bytes. The minimum value is `1048576` bytes. If you set a smaller value, the minimum value will be used automatically. It replaces the old `log-disk-fifo-size()` option.

In AxoSyslog version 4.2 and earlier, this option was called `disk-buf-size()`.

### compaction()

Type: | yes/no  
---|---  
Default: | no  
  
_Description:_ If set to `yes`, AxoSyslog prunes the unused space in the LogMessage representation, making the disk queue size smaller at the cost of some CPU time. Setting the `compaction()` argument to `yes` is recommended when numerous name-value pairs are unset during processing, or when the same names are set multiple times.

Note Simply unsetting these name-value pairs by using the `unset()` rewrite operation is not enough, as due to performance reasons that help when AxoSyslog is CPU bound, the internal representation of a `LogMessage` will not release the memory associated with these name-value pairs. In some cases, however, the size of this overhead becomes significant (the raw message size can grow up to four times its original size), which unnecessarily increases the disk queue file size. For these cases, the compaction will drop `unset` values, making the `LogMessage` representation smaller at the cost of some CPU time required to perform compaction. 

### dir()

Type: | string  
---|---  
Default: | N/A  
  
_Description:_ Defines the folder where the disk-buffer files are stored.

Warning

When creating a new `dir()` option for a disk buffer, or modifying an existing one, make sure you delete the persist file.

AxoSyslog creates disk-buffer files based on the path recorded in the persist file. Therefore, if the persist file is not deleted after modifying the `dir()` option, then following a restart, AxoSyslog will look for or create disk-buffer files in their old location. To ensure that AxoSyslog uses the new `dir()` setting, the persist file must not contain any information about the destinations which the disk-buffer file in question belongs to.

Note If the `dir()` path provided by the user does not exist, AxoSyslog creates the path with the same permission as the running instance. 

### flow-control-window-bytes()

Type: | number (bytes)  
---|---  
Default: | 163840000  
  
_Description:_ Use this option if the option `reliable()` is set to `yes`. This option contains the size of the messages in bytes that is used in the memory part of the disk buffer. It replaces the old `log-fifo-size()` option. It does not inherit the value of the global `log-fifo-size()` option, even if it is provided. Note that this option will be ignored if the option `reliable()` is set to `no`.

In AxoSyslog version 4.2 and earlier, this option was called `mem-buf-size()`.

### flow-control-window-size()

Type: | number(messages)  
---|---  
Default: | 10000  
  
_Description:_ Use this option if the option `reliable()` is set to `no`. This option contains the number of messages stored in overflow queue. It replaces the old `log-fifo-size()` option. It inherits the value of the global `log-fifo-size()` option if provided. If it is not provided, the default value is `10000` messages. Note that this option will be ignored if the option `reliable()` is set to `yes`.

In AxoSyslog version 4.2 and earlier, this option was called `mem-buf-length()`.

### front-cache-size()

Type: | number(messages)  
---|---  
Default: | 1000  
  
_Description:_ The number of messages stored in the output buffer of the destination. Note that if you change the value of this option and the disk-buffer already exists, the change will take effect when the disk-buffer becomes empty.

Options `reliable()` and `capacity-bytes()` are required options.

In AxoSyslog version 4.2 and earlier, this option was called `qout-size()`.

### prealloc()

Type: | yes/no  
---|---  
Default: | no  
  
_Description:_

By default, AxoSyslog doesn’t reserve the disk space for the disk-buffer file, since in a properly configured and sized environment the disk-buffer is practically empty, so a large preallocated disk-buffer file is just a waste of disk space. But a preallocated buffer can prevent other data from using the intended buffer space (and elicit a warning from the OS if disk space is low), preventing message loss if the buffer is actually needed. To avoid this problem, when using AxoSyslog 4.0 or later, you can preallocate the space for your disk-buffer files by setting `prealloc(yes)`.

In addition to making sure that the required disk space is available when needed, preallocated disk-buffer files provide radically better (3-4x) performance as well: in case of an outage the amount of messages stored in the disk-buffer is continuously growing, and using large continuous files is faster, than constantly waiting on a file to change its size.

If you are running AxoSyslog on a dedicated host (always recommended for any high-volume settings), use `prealloc(yes)`.

Available in AxoSyslog 4.0 and later.

### reliable()

Type: | yes/no  
---|---  
Default: | no  
  
_Description:_ If set to `yes`, AxoSyslog cannot lose logs in case of reload/restart, unreachable destination or AxoSyslog crash. This solution provides a slower, but reliable disk-buffer option. It is created and initialized at startup and gradually grows as new messages arrive. If set to `no`, the normal disk-buffer will be used. This provides a faster, but less reliable disk-buffer option.

Warning Hazard of data loss! If you change the value of `reliable()` option when there are messages in the disk-buffer, the messages stored in the disk-buffer will be lost. 

### truncate-size-ratio()

Type: | number((between 0 and 1))  
---|---  
Default: | 1 (do not truncate)  
  
_Description:_ Limits the truncation of the disk-buffer file. Truncating the disk-buffer file can slow down the disk IO operations, but it saves disk space. By default, AxoSyslog version 4.0 and later doesn’t truncate disk-buffer files by default (`truncate-size-ratio(1)`). Earlier versions freed the disk-space when at least 10% of the disk-buffer file could be freed (`truncate-size-ratio(0.1)`).

AxoSyslog only truncates the file if the possible disk gain is more than `truncate-size-ratio()` times `capacity-bytes()`.

  * Smaller values free disk space quicker.
  * Larger ratios result in better performance.



If you want to avoid performance fluctuations:

  * use `truncate-size-ratio(1)` (never truncate), or
  * use `prealloc(yes)` to reserve the entire size of the disk-buffer on disk.



Warning Axoflow does not recommend you to change `truncate-size-ratio()`. Only change its value if you understand the performance implications of doing so. 

### Example: Examples for using disk-buffer()

In the following case reliable disk-buffer() is used.
```
 
    destination d_demo {
        network(
            "127.0.0.1"
            port(3333)
            disk-buffer(
                flow-control-window-bytes(10000)
                capacity-bytes(2000000)
                reliable(yes)
                dir("/tmp/disk-buffer")
            )
        );
    };
    
```

In the following case normal disk-buffer() is used.
```
 
    destination d_demo {
        network(
            "127.0.0.1"
            port(3333)
                disk-buffer(
                flow-control-window-size(10000)
                capacity-bytes(2000000)
                reliable(no)
                dir("/tmp/disk-buffer")
            )
        );
    };
    
```

## frac-digits()

|   
---|---  
Type: | number  
Default: | 0  
  
_Description:_ The AxoSyslog application can store fractions of a second in the timestamps according to the ISO8601 format. The `frac-digits()` parameter specifies the number of digits stored. The digits storing the fractions are padded by zeros if the original timestamp of the message specifies only seconds. Fractions can always be stored for the time the message was received.

Note The AxoSyslog application can add the fractions to non-ISO8601 timestamps as well. 

Note As AxoSyslog is precise up to the microsecond, when the `frac-digits()` option is set to a value higher than 6, AxoSyslog will truncate the fraction seconds in the timestamps after 6 digits. 

## hdfs-append-enabled()

|   
---|---  
Type: | `true  
Default: | false  
  
_Description:_ When `hdfs-append-enabled` is set to `true`, AxoSyslog will append new data to the end of an already existing HDFS file. Note that in this case, archiving is automatically disabled, and AxoSyslog will ignore the `hdfs-archive-dir` option.

When `hdfs-append-enabled` is set to `false`, the AxoSyslog application always creates a new file if the previous has been closed. In that case, appending data to existing files is not supported.

When you choose to write data into an existing file, AxoSyslog does not extend the filename with a UUID suffix because there is no need to open a new file (a new unique ID would mean opening a new file and writing data into that).

Warning Before enabling the `hdfs-append-enabled` option, ensure that your HDFS server supports the `append` operation and that it is enabled. Otherwise AxoSyslog will not be able to append data into an existing file, resulting in an error log. 

## hdfs-archive-dir()

|   
---|---  
Type: | string  
Default: | N/A  
  
_Description:_ The path where AxoSyslog will move the closed log files. If AxoSyslog cannot move the file for some reason (for example, AxoSyslog cannot connect to the HDFS NameNode), the file remains at its original location. For example, `hdfs-archive-dir("/usr/hdfs/archive/")`.

Note When `hdfs-append-enabled` is set to `true`, archiving is automatically disabled, and AxoSyslog will ignore the `hdfs-archive-dir` option. 

## hdfs-file()

|   
---|---  
Type: | string  
Default: | N/A  
  
_Description:_ The path and name of the log file. For example, `hdfs-file("/usr/hdfs/mylogfile.txt")`. AxoSyslog checks if the path to the logfile exists. If a directory does not exist AxoSyslog automatically creates it.

`hdfs-file()` supports the usage of macros. This means that AxoSyslog can create files on HDFS dynamically, using macros in the file (or directory) name.

Note When a filename resolved from the macros contains a character that HDFS does not support, AxoSyslog will not be able to create the file. Make sure that you use macros that do not contain unsupported characters. 

## Example: Using macros in filenames

In the following example, a `/var/testdb_working_dir/$DAY-$HOUR.txt` file will be created (with a UUID suffix):
```
 
       destination d_hdfs_9bf3ff45341643c69bf46bfff940372a {
            hdfs(client-lib-dir(/hdfs-libs)
         hdfs-uri("hdfs://hdp2.syslog-ng.example:8020")
         hdfs-file("/var/testdb_working_dir/$DAY-$HOUR.txt"));
        };
    
```

As an example, if it is the 31st day of the month and it is 12 o’clock, then the name of the file will be `31-12.txt`.

## hdfs-max-filename-length()

|   
---|---  
Type: | number  
Default: | 255  
  
_Description:_ The maximum length of the filename. This filename (including the UUID that AxoSyslog appends to it) cannot be longer than what the file system permits. If the filename is longer than the value of `hdfs-max-filename-length`, AxoSyslog will automatically truncate the filename. For example, `hdfs-max-filename-length("255")`.

## hdfs-resources()

|   
---|---  
Type: | string  
Default: | N/A  
  
_Description:_ The list of Hadoop resources to load, separated by semicolons. For example, `hdfs-resources("/home/user/hadoop/core-site.xml;/home/user/hadoop/hdfs-site.xml")`.

## hdfs-uri()

|   
---|---  
Type: | string  
Default: | N/A  
  
_Description:_ The URI of the HDFS NameNode is in `hdfs://IPaddress:port` or `hdfs://hostname:port` format. When using MapR-FS, the URI of the MapR-FS NameNode is in `maprfs://IPaddress` or `maprfs://hostname` format, for example: `maprfs://10.140.32.80`. The IP address of the node can be IPv4 or IPv6. For example, `hdfs-uri("hdfs://10.140.32.80:8020")`. The IPv6 address must be enclosed in square brackets (_[]_) as specified by RFC 2732, for example, `hdfs-uri("hdfs://[FEDC:BA98:7654:3210:FEDC:BA98:7654:3210]:8020")`.

## hook-commands()

_Description:_ This option makes it possible to execute external programs when the relevant driver is initialized or torn down. The `hook-commands()` can be used with all source and destination drivers with the exception of the `usertty()` and `internal()` drivers.

Note The AxoSyslog application must be able to start and restart the external program, and have the necessary permissions to do so. For example, if your host is running AppArmor or SELinux, you might have to modify your AppArmor or SELinux configuration to enable AxoSyslog to execute external applications. 

### Using `hook-commands()` when AxoSyslog starts or stops

To execute an external program when AxoSyslog starts or stops, use the following options:

#### `startup()`

Type: | string  
---|---  
Default: | N/A  
  
_Description:_ Defines the external program that is executed as AxoSyslog starts.

#### `shutdown()`

Type: | string  
---|---  
Default: | N/A  
  
_Description:_ Defines the external program that is executed as AxoSyslog stops.

### Using the hook-commands() when AxoSyslog reloads

To execute an external program when the AxoSyslog configuration is initiated or torn down, for example, on startup/shutdown or during a AxoSyslog reload, use the following options:

#### `setup()`

Type: | string  
---|---  
Default: | N/A  
  
_Description:_ Defines an external program that is executed when the AxoSyslog configuration is initiated, for example, on startup or during a AxoSyslog reload.

#### `teardown()`

Type: | string  
---|---  
Default: | N/A  
  
_Description:_ Defines an external program that is executed when the AxoSyslog configuration is stopped or torn down, for example, on shutdown or during a AxoSyslog reload.

### Example: Using hook-commands() with a network source

In the following example, the `hook-commands()` is used with the `network()` driver and it opens an [iptables](<https://en.wikipedia.org/wiki/Iptables> "https://en.wikipedia.org/wiki/Iptables") port automatically as AxoSyslog is started/stopped.

The assumption in this example is that the `LOGCHAIN` chain is part of a larger ruleset that routes traffic to it. Whenever the AxoSyslog created rule is there, packets can flow, otherwise the port is closed.
```
 
    source {
        network(transport(udp)
        hook-commands(
              startup("iptables -I LOGCHAIN 1 -p udp --dport 514 -j ACCEPT")
              shutdown("iptables -D LOGCHAIN 1")
            )
         );
    };
    
```

## jvm-options()

|   
---|---  
Type: | list  
Default: | N/A  
  
_Description:_ Specify the Java Virtual Machine (JVM) settings of your Java destination from the AxoSyslog configuration file.

For example:
```
 
       jvm-options("-Xss1M -XX:+TraceClassLoading")
    
```

You can set this option only as a [global option](../../../docs/axosyslog-core/chapter-global-options/index.md), by adding it to the `options` statement of the `syslog-ng.conf` configuration file.

## kerberos-keytab-file()

|   
---|---  
Type: | string  
Default: | N/A  
  
_Description:_ The path to the Kerberos keytab file that you received from your Kerberos administrator. For example, `kerberos-keytab-file("/opt/syslog-ng/etc/hdfs.headless.keytab")`. This option is needed only if you want to authenticate using Kerberos in Hadoop. You also have to set the [`hdfs-option-kerberos-principal()`](../../../docs/axosyslog-core/chapter-destinations/configuring-destinations-hdfs/reference-destination-hdfs/index.md) option. For details on the using Kerberos authentication with the `hdfs()` destination, see [Kerberos authentication with the hdfs() destination](../../../docs/axosyslog-core/chapter-destinations/configuring-destinations-hdfs/destination-hdfs-kerberos-authentication/index.md).
```
 
       destination d_hdfs {
            hdfs(client-lib-dir("/hdfs-libs/lib")
            hdfs-uri("hdfs://hdp-kerberos.syslog-ng.example:8020")
            kerberos-keytab-file("/opt/syslog-ng/etc/hdfs.headless.keytab")
            kerberos-principal("hdfs-hdpkerberos@MYREALM")
            hdfs-file("/var/hdfs/test.log"));
        };
    
```

Available in AxoSyslog version 3.10 and later.

## kerberos-principal()

|   
---|---  
Type: | string  
Default: | N/A  
  
_Description:_ The Kerberos principal you want to authenticate with. For example, `kerberos-principal("hdfs-user@MYREALM")`. This option is needed only if you want to authenticate using Kerberos in Hadoop. You also have to set the [`hdfs-option-kerberos-keytab-file()`](../../../docs/axosyslog-core/chapter-destinations/configuring-destinations-hdfs/reference-destination-hdfs/index.md) option. For details on the using Kerberos authentication with the `hdfs()` destination, see [Kerberos authentication with the hdfs() destination](../../../docs/axosyslog-core/chapter-destinations/configuring-destinations-hdfs/destination-hdfs-kerberos-authentication/index.md).
```
 
       destination d_hdfs {
            hdfs(client-lib-dir("/hdfs-libs/lib")
            hdfs-uri("hdfs://hdp-kerberos.syslog-ng.example:8020")
            kerberos-keytab-file("/opt/syslog-ng/etc/hdfs.headless.keytab")
            kerberos-principal("hdfs-hdpkerberos@MYREALM")
            hdfs-file("/var/hdfs/test.log"));
        };
    
```

Available in AxoSyslog version 3.10 and later.

## log-fifo-size()

|   
---|---  
Type: | number  
Default: | Use global setting.  
  
_Description:_ The number of messages that the output queue can store.

## on-error()

Type: | One of: `drop-message`, `drop-property`, `fallback-to-string`, `silently-drop-message`, `silently-drop-property`, `silently-fallback-to-string`  
---|---  
Default: | Use the global setting (which defaults to `drop-message`)  
  
_Description:_ Controls what happens when type-casting fails and AxoSyslog cannot convert some data to the specified type. By default, AxoSyslog drops the entire message and logs the error. Currently the `value-pairs()` option uses the settings of `on-error()`.

  * `drop-message`: Drop the entire message and log an error message to the `internal()` source. This is the default behavior of AxoSyslog.
  * `drop-property`: Omit the affected property (macro, template, or message-field) from the log message and log an error message to the `internal()` source.
  * `fallback-to-string`: Convert the property to string and log an error message to the `internal()` source.
  * `silently-drop-message`: Drop the entire message silently, without logging the error.
  * `silently-drop-property`: Omit the affected property (macro, template, or message-field) silently, without logging the error.
  * `silently-fallback-to-string`: Convert the property to string silently, without logging the error.



## retries()

|   
---|---  
Type: | number (of attempts)  
Default: | 3  
  
_Description:_ If AxoSyslog cannot send a message, it will try again until the number of attempts reaches `retries()`.

If the number of attempts reaches `retries()`, AxoSyslog will wait for `time-reopen()` time, then tries sending the message again.

## template()

|   
---|---  
Type: | string  
Default: | A format conforming to the default logfile format.  
  
_Description:_ Specifies a template defining the logformat to be used in the destination. Macros are described in [Macros of AxoSyslog](../../../docs/axosyslog-core/chapter-manipulating-messages/customizing-message-format/reference-macros/index.md). Please note that for network destinations it might not be appropriate to change the template as it changes the on-wire format of the syslog protocol which might not be tolerated by stock syslog receivers (like `syslogd` or `syslog-ng` itself). For network destinations make sure the receiver can cope with the custom format defined.

## throttle()

|   
---|---  
Type: | number  
Default: | 0  
  
_Description:_ Sets the maximum number of messages sent to the destination per second. Use this output-rate-limiting functionality only when using disk-buffer as well to avoid the risk of losing messages. Specifying `0` or a lower value sets the output limit to unlimited.

## time-reap()

|   
---|---  
Accepted values: | number (seconds)  
Default: | 0 (disabled)  
  
_Description:_ The time to wait in seconds before an idle destination file is closed. Note that if `hdfs-archive-dir` option is set and `time-reap` expires, archiving is triggered for the affected file.

## time-zone()

|   
---|---  
Type: | name of the timezone, or the timezone offset  
Default: | unspecified  
  
_Description:_ Convert timestamps to the timezone specified by this option. If this option is not set, then the original timezone information in the message is used. Converting the timezone changes the values of all date-related macros derived from the timestamp, for example, `HOUR`. For the complete list of such macros, see [Date-related macros](../../../docs/axosyslog-core/chapter-manipulating-messages/customizing-message-format/date-macros/index.md).

The timezone can be specified by using the name, for example, `time-zone("Europe/Budapest")`), or as the timezone offset in +/-HH:MM format, for example, `+01:00`). On Linux and UNIX platforms, the valid timezone names are listed under the `/usr/share/zoneinfo` directory.

## ts-format()

|   
---|---  
Type: | rfc3164, bsd, rfc3339, iso  
Default: | rfc3164  
  
_Description:_ Override the global timestamp format (set in the global `ts-format()` parameter) for the specific destination. For details, see [ts-format()](../../../docs/axosyslog-core/chapter-global-options/reference-options/index.md).

Note This option applies only to file and file-like destinations. Destinations that use specific protocols (for example, `network()`, or `syslog()`) ignore this option. For protocol-like destinations, use a template locally in the destination, or use the [proto-template](../../../docs/axosyslog-core/chapter-global-options/reference-options/index.md) option. 

Last modified July 2, 2023: [Change highlight mode of code examples (2f8a9593)](<https://github.com/axoflow/axosyslog-core-docs/commit/2f8a95937c6498193e7168ce8b0dc831e9f0f8ad>)