# File Collector

Collect logs from a local file that’s available on the edge host.

## Prerequisites

This collector can be deployed to edge hosts running [Axoflow agent for Linux](../../../docs/axoflow/provisioning/linux-agent/index.md) and [Axoflow agent for Windows](../../../docs/axoflow/provisioning/windows-opentelemetry/index.md).

## Add new File Collector

To create a new [Collection Rule](../../../docs/axoflow/data-sources/connector-rules/index.md) that collects data from files on edge hosts, complete the following steps:

  1. Select **Sources > Collection Rules > Add Rule**. (Alternatively, you can select **Add Collector > Create a collection rule** on the **Collectors** page of an edge host.)

![Collection rules list](/docs/axoflow/img/data-management/collection-rules-list.png)

  2. Select **File Collector**.

  3. Configure the connector rule.

     1. Enter a name for the collection rule into the **Rule Name** field.

![Generic collection rule parameters](/docs/axoflow/img/collection-rule-generic.png)

     2. (Optional) Add labels to the collection rule.

You can use these metrics labels as:

        * **Filter labels** on the [Analytics page](../../../docs/axoflow/metrics/analytics/index.md)
        * in the **Filter By Label** field during [log tapping](../../../docs/axoflow/onboard-hosts/log-tapping/index.md)
        * in [Flow Processing steps](../../../docs/axoflow/data-management/processing/index.md), for example, in the **Query** field of **Select Messages** steps.

For edge-related metrics, see the metrics beginning with [`edge_connector`](../../../docs/axoflow/reference/message-schema/reference/index.md#meta.edge.connector.labels)

     3. Set the **Edge Selector** for the collection rule. The selector determines which edge hosts will have a collector based on this collection rule.

![Edge selectors](/docs/axoflow/img/collection-rule-edge-selector.png)

        * Only edge hosts will match the rule.
        * If you leave the **Edge Selector** field empty, the rule will match every edge host.
        * To select only a specific host, set the `name` field to the name of the host as selector.
        * If you set multiple fields in the selector, the collection rule will apply only to edge hosts that match all elements of the selector. (There in an AND relationship between the fields.) For example, `label.location = us-east-1 AND label.product = windows`
     4. (Optional) Enter a **Suffix** for the collection rule. This suffix will be used in the name of the collector instances created on the edge hosts. For example, if the name of a matching edge host is “my-edge”, and the suffix of the rule is “otel-file-collector”, the collector created for the edge will be named “my-edge-otel-file-collector”.

If the **Suffix** field is empty, the name of the collection rule is used instead.

     5. (Optional) Enter a description for the rule.

  4. Enter the path of the log file, or a pattern to match multiple files into the **File pattern** field, for example: `C:\Windows\System32\DNS\dns.log` or `/path/to/**/*.log`

![OpenTelemetry file collector settings](/docs/axoflow/data-sources/collection-rules/file-collector/connector.png)

**CAUTION:**

On Linux hosts, the collector runs as the `axoflow-otel-collector` user, which is a member of the `adm` and `systemd-journal` groups. Make sure that the `axoflow-otel-collector` user has read access to the file you want to collect logs from. Usually, the `adm` group can read logs from the `/var/log/` directory on Debian-based systems, but not on RHEL-based systems. 

You can use the following special characters:

     * `*`: Matches one or more characters that aren’t path separators.

     * `/**/`: Matches zero or more directories.

     * `?`: Matches a single non-path-separator character.

     * `[class]`: Matches any single non-path-separator character from the specified class. The following classes are available:

       * `[abc123]`: Matches any single character of the specified characters.
       * `[a-z0-9]`: Matches any single alphanumeric character in the range of a-z or 0-9.
       * `[^class]` or `[!class]`: Negates the class, so it matches any single character which does not match the class.
  5. (Optional) If needed, set [advanced options](../../../docs/axoflow/data-sources/collection-rules/file-collector/index.md#advanced-options) under **More options**.

  6. To apply a specific parser on the messages of the log file, select it from the **Log format** field. Currently Windows DNS and DHCP log files are supported.11

  7. Select **Add**. Based on the collection rule, Axoflow automatically creates collectors on the edge hosts that match the **Edge Selector**.

**CAUTION:**

Make sure to configure [Data Forwarding Rules](../../../docs/axoflow/data-sources/data-forwarding/index.md) for your edge hosts to transfer the collected data to the OpenTelemetry connector of an AxoRouter. 




## Related metrics

You can use these metrics labels as:

  * **Filter labels** on the [Analytics page](../../../docs/axoflow/metrics/analytics/index.md)
  * in the **Filter By Label** field during [log tapping](../../../docs/axoflow/onboard-hosts/log-tapping/index.md)
  * in [Flow Processing steps](../../../docs/axoflow/data-management/processing/index.md), for example, in the **Query** field of **Select Messages** steps.

label | value  
---|---  
edge_connector_name | The name of the collector that collected the message.  
edge_connector_type | `otelFile`  
edge_connector_label_ | Labels set by the collector. By default: `vendor:opentelemety`, `product:otel-file`  
edge_connector_rule_id | The ID of the Collector Rule resource in Axoflow that created the collector.  
edge_flow_name | The name of the data forwarding rule that sent the message.  
  
## Advanced options

  * **Exclude file pattern** : Exclude some files that match the **File pattern**. You can use the same special characters as in the **File pattern** field.

  * **Exclude older than** : Exclude files whose modification time is older than the specified value, for example: `1h`, `24h`, `7d`.

  * **Multi-line start pattern** : [Regex pattern](<https://github.com/google/re2/wiki/Syntax>) to identify the start of a multi-line log entry. Mutually exclusive with **Multi-line end pattern**.

  * **Multi-line end pattern** : [Regex pattern](<https://github.com/google/re2/wiki/Syntax>) to identify the end of a multi-line log entry. Mutually exclusive with Multi-line start pattern.

  * **Multi-line omit pattern** : If enabled, the lines matching the multiline pattern are omitted from the entry.

  * **Force flush period** : Always flush the current batch if the after the specified period. Example values: `1s`, `5m`, `1h`. Default value: `500ms`

  * **Encoding** : Specifies the encoding of the file being read. Default value: `utf-8`. The following values are supported:

    * `nop`: No encoding validation. Treats the file as a stream of raw bytes
    * `utf-8`: UTF-8 encoding
    * `utf-8-raw`: UTF-8 encoding without replacing invalid UTF-8 bytes
    * `utf-16le`: UTF-16 encoding with little-endian byte order
    * `utf-16be`: UTF-16 encoding with big-endian byte order
    * `ascii`: ASCII encoding
    * `big5`: The Big5 Chinese character encoding
  * **Poll interval** : The duration between filesystem polls, for example: `1s`, `5m`, `1h`. Default value: `200ms`

  * **Retry on failure max elapsed time** : Maximum time (including retries) to send a log batch to a downstream consumer before discarding it, for example: `1s`, `5m`, `1h`. Retrying never stops if set to `0`. Default value `0`

  * **Initial buffer size** : The initial size (in KiB) of the buffer to read file headers and logs. The buffer will grow as needed; larger values may cause unnecessary memory allocation, while smaller values may require multiple copies during growth. Default value: `16KiB`

  * **Max log size** : Maximum size of a log entry in megabytes. Larger log entries will be truncated. Default value: `1MiB`

  * **Max concurrent files** : Maximal number of files to read from in parallel.

  * **Max batches** : Maximum number of batches to keep in memory; applicable only when more than `1024` files match the **File pattern**.

  * **Compression** : Specifies the compression format of the files being read. Possible values are the empty string, `gzip`, and `auto`. Use `auto` when your **File pattern** matches a mix of compressed and uncompressed files.

  * **Start at** : Specifies where to start reading logs on startup: `beginning` or `end` of the file. Default value: `beginning`