Overview

Since I started my project qWatch in February, which collects logs for error-analysis, I realized how important it is to have logs in production. It might seem evident for some of you, but for others, logs might just be normal files stored somewhere. In this article, I will share what I know about logs, by going through the following sections:

  • What can be found inside a log event?
  • Why log files are not enough for the production env?
  • Where are logs coming from?
  • Log Analytics

Logging Event

Traditionally, a logging event is very simple. Taking Java logging framework Log4J as an example, the conversion pattern in properties file might look like the following expression, in which there’s a timestamp (%d) of the event, the thread name (%t), the priority (%p), the category name (%c), the message (%m) and a line separator (%n) at the end.

%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %-5p %c{1} - %m%n

But what if we can enrich the logging event with other information to provide a more complete context? For example, the source code, the web access, the cloud provider… Logging event will become much more interesting. For example, in Datadog, you are able to see this information.

Category Field Description
Core Source The source of logs, e.g. Tomcat Server.
Core Host The host machine.
Core Service The service name.
Core Status The log level: error, warn, info, …
Source Code Logger Name The name of the logger.
Source Code Exception Class From which class the exception was thrown.
Source Code Thread Name The name of the thread.
Source Code Stacktrace The stacktrace of exception.
Source Code Exception Message The message of exception.
Source Code Class The class of exception.
Customer Log Type The type of log, e.g. SSO, Tomcat
Customer Environment Dev, pre-prod, prod, …
Customer Project The project name.
Web Access Client IP The IP address of the client.
Web Access OS The operation system of the client.
Web Access Browser The browser of the client.
Web Access Referer The referer of the client.
Web Access Response Time Sec The response time in second.
Web Access URL Path The path of the URL.
Web Access User Agent The user agent of the client.
Web Access Device The device of the client.
Web Access Status Code The status code of the client.
Web Access Method The HTTP method of the client.
AWS ELB Name Elastic Loading Balancing
AWS S3 Bucket S3 Bucket

I found this enrichment (I don’t know if this is the right word) very important. When there’s something wrong in production, having a simple log message is not enough. As a developer, I need more detail. More detail about the user, more detail about the cluster, more detail about the source code, …

Log File vs Log Platform

What is the different between log files and log platform? Why log files are not enough for the production environment.

In my opinion, log files are suitable for small projects. You can SSH to your environment and watch the logs using tail -f or less +F. However, face to a big project, where the number of machines keep growing, watching log files becomes harder and harder. By the way, sometimes you are not even allowed to access the machines. On the other hand, log platform provides opportunity to aggregate, enrich, search, analysis, and monitor log events. These options are essential for being able to handle critical events in production.

# Log File Log Platform
Tail Yes Yes
Aggregation No Yes
Analysis Manual Graphical
Enrichment No Yes
Search Single Source Multi Source
Monitoring No Yes

Source of Logs

Logs can come from many sources. Here’re some of them I saw, separated by server, container, cloud, and other.

Server: Apache, Cassandra, Consul, Elasticsearch, HA Proxy, Nginx, MongoDB, Java, Journald, Apache Tomcat, Go, Microsoft .NET, Ruby, Node.js, PostgreSQL, Varnish Cache, Python, MySQL, Redis, Microsoft IIS, Apache Kafka, Apache ZooKeeper, RabbitMQ, PHP, Windows, Custom files

Container: Docker, Kubernetes, Amazon ECS, Amazon EKS, Mesos, CoreOS, RedHat Openshift, AWS Fargate, Istio

Cloud: AWS, Fastly, Azure, Cloud Foundry, Heroku, Google Cloud Platform

Other: Rsyslog, Fluentd, Logstash, Syslog-ng, RxNXLog

Log Analytics

When using log analytics, it is possible to perform time-series analysis based on log events aggregation. It allows you to understand the volume of events in the past. Combined with visualization techniques, you will also be able to find out the import information (anomaly, trends, …) easily. Two concrete solutions in my mind are Datadog - Log Explorer and Elastic - Logging.

Obviously, analysis goes far beyond time-series. There’re also pie chart, maps, … The key here is aggregation and filtering. Thanks to log platform, you are able to group by different information and filter what you need.

Conclusion

In this article, I shared what I know about logs through the information inside a log event, the advantages of having a log platform (aggregate, enrich, search, analysis, monitor), the sources of logs, and log analytics. Hope you enjoy this article, see you the next time!

References