Learn how to create usable log messages from our blog post series covering topics like log levels, using timestamps, and formatting log messages.
Logging is a critical aspect of application development, providing invaluable insights into an application’s behavior. Beyond aiding in debugging and troubleshooting, logs play a pivotal role in bolstering an application’s security.
Why log levels are important
One typical attribute of a log message is its log level or log severity. The operations and security teams will thank you for implementing a good set of conventions at log levels. Should a service go down, your DevOps/SRE team will have an easier time navigating logs, and understanding root causes, thereby decreasing Mean Time To Repair (MTTR).
Log levels also allow for cost savings, as they will enable the selective transport and storage of logs. Just like OS and infrastructure logs, application logs are often aggregated and stored for a long period, in some industries and regulations up to 5 years. If you set log levels correctly, some of the log messages may not be stored for the same amount of time as the rest.
As discussed, being able to separate production and/or security logs from debug situations helps a great deal, and the usual solution is to use log levels in a coherent manner.
Log levels vs amounts expected
Logging frameworks will give you a select number of log levels. They usually range from trace and/or debug levels and go up to fatal or emergency levels that are reserved for the most critical of log messages.
It is reasonable to expect that trace and debug would constitute the majority of the log volume (if those messages are enabled), whereas fatal errors should be far and few in between.
What are the common log levels
The framework you use would probably have a selection of log levels. Python for example uses numeric log levels, ranging from 10 to 50:
- DEBUG (10): Detailed information, typically useful for diagnosing problems.
- INFO (20): General information about the application’s operations, confirming things are working as expected.
- WARNING (30): Indicates unexpected events or issues that might need attention but don’t disrupt the application.
- ERROR (40): Indicates more serious problems that may affect the application’s functionality.
- CRITICAL (50): Indicates critical errors that may lead to application failure.
When Python logging writes these messages to syslog, it will need to map these log levels to standard syslog severities, which is done by the mapPriority method of SyslogHandler.
Other languages may have other definitions, but the basic pattern is the same.
Log levels vs application verbosity controls
It is a pretty common practice that applications have a knob called the “log level”, “verbosity level” or “debug level”. This option controls whether we expect logs to contain information needed just for production or a debug situation.
Sometimes a log level is as simple as a threshold in log levels, e.g. in the case of the Python logging framework, the level parameter of the basicConfig function becomes a simple filter: anything less severe than the specified logging level will simply be ignored.
# Set up logging
logging.basicConfig(level=logging.WARNING) # Set the log level to WARNING
# Create a simple log message
logging.debug("This is a DEBUG message")
logging.info("This is an INFO message")
logging.warning("This is a WARNING message")
logging.error("This is an ERROR message")
logging.critical("This is a CRITICAL message")
WARNING:root:This is a WARNING message
ERROR:root:This is an ERROR message
CRITICAL:root:This is a CRITICAL message
As you can see, the messages with DEBUG and INFO levels were filtered out from the output.
The log output of an application might be an important part of its functionality. This warrants a more sophisticated log level schema and/or a more sophisticated set of knobs to control logging. For example, the log functionality of a security product can be considered a core feature. In such a product, in addition to operational issues, logs are also used to generate dashboards or alerts.
Another reason for more sophisticated logging controls is that enabling the “debug” log level would generate an excessive amount of logs, which can easily prevent normal operation. If enabling the logging of debug messages is often needed to troubleshoot production systems, then providing a more granular control to enable debug messages makes sense.
Here are a couple of ideas for these more sophisticated controls. The operator may want to
- Enable logs from specific functionalities while keeping others silent: for example, enable debug level logging from authentication to debug an access problem.
- Enable logs from specific application modules: for example, enable logging from one specific implementation module, e.g. write ahead logging to debug an I/O-related issue.
Whenever working out your own mechanisms, imagine the target persona/user who would use the log output:
- Is it a DevOps/SRE person trying to reproduce/triage an issue?
- Is it a security analyst trying to find a security incident?
- Is it just the basis of some form of operational report/dashboard?
With the proper message categories in place, you can easily provide control knobs so that each of these personas above can finetune the amount of messages they need.
Mapping messages to syslog and OpenTelemetry log levels
Production logs from applications are almost always collected and aggregated. Maybe your application is part of a larger stack with multiple services interoperating and you want the logs in the same place so that you can diagnose complex interactions between services. Maybe you are combing through logs to look for potential security incidents.
When doing log aggregation, the transport and storage mechanisms used for aggregation have their own schema for log levels, which may or may not fit perfectly the conventions you use in your application.
OpenTelemetry and syslog are examples of such aggregation mechanisms.
Syslog log levels
Syslog has 8 categories for log levels:
|system is unusable
|action must be taken immediately
|normal but significant condition
OpenTelemetry log levels
OpenTelemetry has a numeric log level value that is both similar and different from Python’s schema
|A fine-grained debugging event. Typically disabled in default configurations.
|A debugging event.
|An informational event. Indicates that an event happened.
|A warning event. Not an error but is likely more important than an informational event.
|An error event. Something went wrong.
|A fatal error such as application or system crash.
As you can see, the mapping between Python log levels and those provided by syslog and OpenTelemetry is relatively straightforward.
Log levels and filtering in the backend
The original idea for the various log levels has been to focus attention on the operationally significant events: to keep more important messages while deleting the less important ones.
However, It can be argued that a simple log level across all applications of an organization might be just too simple to drive filtering and storage policies. Some applications would log critical information on higher log levels (e.g. debug) whereas others would just generate too much information on a lower log level (e.g info).
If you already have good application specific controls in place, the same mechanism should be available post collection as well. If you already have a category/label system in place internally to the application, just add the same information to the output message as well. This way, filtering/storage decisions can be made later on, in the pipeline or in the backend.
This metadata can be added to the OpenTelemetry attributes map or syslog structured data (RFC5424), but simply adding these to the message body as a string is useful too.
With this metadata in place, the decision if a message is to be kept long term or can be disposed of faster can be made even after collection.
Logging is a similar aspect of application development to choosing the language, application framework and operating environment. Logging is usually not part of formal product management practices: specifications and user stories do not contain this level of detail.
We can argue though that logging is important for the sustainability and operations of a product, and with that in mind, it makes sense to have a strategy in place. Make sure that your strategy includes good conventions and samples for setting the log levels right. I’d also add the review of logging as an explicit part of any developer peer review process.
In the next part of this series, we’ll cover timestamps.
If you want to learn more about how we manage logs, check out our product page.