- Metrics: Metrics are numerical values that measure the performance or behavior of a system over time. They provide a high-level overview of the health and performance of your system, and can identify trends and patterns that help your DevOps teams with troubleshooting and optimization. Examples of metrics include CPU usage, memory usage, network traffic, and response times. Metrics are a main indicator for operations teams, and can help you identify potential issues before they become critical, and make informed decisions about capacity planning and scaling.
- Traces: Traces provide a detailed view of a specific request or transaction as it flows through a system. Traces can help you identify performance bottlenecks and issues with specific components of a system. Traces are mainly used by operations, and enable your DevOps teams to understand how requests are processed, where errors occur, and how long it takes to complete each step of the process. For example, we can trace a user’s request from the frontend to the backend, and pinpoint the exact location where the request failed or was delayed.
- Logs: Logs provide a detailed record of events that occur within a system. Logs typically include information about user actions, system events, and errors. They can be used to identify the root cause of issues and provide context for debugging. In addition to your DevOps teams troubleshooting your systems, they are used for auditing and compliance purposes, and are fed to your SIEM to identify and prevent security incidents.
Overall, metrics, traces, and logs are all critical components of observability. Together, they provide a comprehensive view of the health and performance of a system and enable you to quickly identify and resolve issues when they occur. But unlike metrics and traces, logs often are stored for longer periods, up to several years in some industry fields. Consequently, most organizations have a log management solution in place. Unfortunately, only a few of these organizations have a solution in place that not only ticks the required boxes on the audit sheets, but actually works well in providing reliable and useful insight into what is actually happening in your organization in real time. The most important reasons for this include:
- Data quality problems: Although the syslog protocol and message format has been standardized decades ago, many applications and appliances still don’t comply with the standards, and send their logs without a hostname or a proper timestamp.
- No visibility into the data flow: Most solutions focus only on getting the data that a collector has received to the consumer (which is a SIEM in many cases). However, they fail to notice the errors of the data pipeline, like sudden drops in data throughput, or bottlenecks and outages in the pipeline.
- Knowledge gap: Properly designing, implementing, and maintaining an infrastructure to collect log and observability data is difficult. It requires in-depth knowledge of the available tools (like logging agents, relays), and also specific to your organization: the tools and applications your organization uses and how you can best collect their data, how and where will your different departments use this data (do IT, Ops, and Security use the same consumers, or do they all have different solutions that best fit their requirements?). Regarding the maintenance of the data pipeline, often the elements of the pipeline (like a log collector agent) is configured once and never adjusted later, sometimes because the engineers who did the original configuration have left the organization and the knowledge about the hows and whys of the implementation logic were lost.
- Limitations of the agents: Commercial observability solutions often require you to deploy their own data collector agent on your endpoints. Regrettably, such agents are often purpose-built to collect data from fixed sources on the endpoint, and directly send them to the central consumer of the solution. Such agents usually have only limited configuration and customization options, and do not integrate well with other consumers that might be needed for your departments.
As a result of these problems:
- Most organizations do not have insight into what is happening in their data pipeline.
- Have no domain knowledge that would allow them to optimize their data pipeline, collect and store the important data in an accessible format, or reduce their storage costs by only storing useful data
Our goal at AxoFlow is to help you solve these problems. We are not another SIEM vendor, nor another logging agent vendor, but we do give you know-how and expertise in collecting logs and observability data, and a way to effectively manage and configure the widely used and battle-tested tools that build the data pipelines of the world – some of which are probably already in use at your organization.