What I Know About Logs - Mincong Huang

Overview

Since I started my project qWatch in February, which collects logs for error-analysis, I realized how important it is to have logs in production. It might seem evident for some of you, but for others, logs might just be normal files stored somewhere. In this article, I will share what I know about logs, by going through the following sections:

What can be found inside a log event?
Why log files are not enough for the production env?
Where are logs coming from?
Log Analytics

Logging Event

Traditionally, a logging event is very simple. Taking Java logging framework Log4J as an example, the conversion pattern in properties file might look like the following expression, in which there’s a timestamp (%d) of the event, the thread name (%t), the priority (%p), the category name (%c), the message (%m) and a line separator (%n) at the end.

%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %-5p %c{1} - %m%n

But what if we can enrich the logging event with other information to provide a more complete context? For example, the source code, the web access, the cloud provider… Logging event will become much more interesting. For example, in Datadog, you are able to see this information.

Category	Field	Description
Core	Source	The source of logs, e.g. Tomcat Server.
Core	Host	The host machine.
Core	Service	The service name.
Core	Status	The log level: error, warn, info, …
Source Code	Logger Name	The name of the logger.
Source Code	Exception Class	From which class the exception was thrown.
Source Code	Thread Name	The name of the thread.
Source Code	Stacktrace	The stacktrace of exception.
Source Code	Exception Message	The message of exception.
Source Code	Class	The class of exception.
Customer	Log Type	The type of log, e.g. SSO, Tomcat
Customer	Environment	Dev, pre-prod, prod, …
Customer	Project	The project name.
Web Access	Client IP	The IP address of the client.
Web Access	OS	The operation system of the client.
Web Access	Browser	The browser of the client.
Web Access	Referer	The referer of the client.
Web Access	Response Time Sec	The response time in second.
Web Access	URL Path	The path of the URL.
Web Access	User Agent	The user agent of the client.
Web Access	Device	The device of the client.
Web Access	Status Code	The status code of the client.
Web Access	Method	The HTTP method of the client.
AWS	ELB Name	Elastic Loading Balancing
AWS	S3 Bucket	S3 Bucket

I found this enrichment (I don’t know if this is the right word) very important. When there’s something wrong in production, having a simple log message is not enough. As a developer, I need more detail. More detail about the user, more detail about the cluster, more detail about the source code, …

Log File vs Log Platform

What is the different between log files and log platform? Why log files are not enough for the production environment.

In my opinion, log files are suitable for small projects. You can SSH to your environment and watch the logs using tail -f or less +F. However, face to a big project, where the number of machines keep growing, watching log files becomes harder and harder. By the way, sometimes you are not even allowed to access the machines. On the other hand, log platform provides opportunity to aggregate, enrich, search, analysis, and monitor log events. These options are essential for being able to handle critical events in production.

#	Log File	Log Platform
Tail	Yes	Yes
Aggregation	No	Yes
Analysis	Manual	Graphical
Enrichment	No	Yes
Search	Single Source	Multi Source
Monitoring	No	Yes

Source of Logs

Logs can come from many sources. Here’re some of them I saw, separated by server, container, cloud, and other.

Server: Apache, Cassandra, Consul, Elasticsearch, HA Proxy, Nginx, MongoDB, Java, Journald, Apache Tomcat, Go, Microsoft .NET, Ruby, Node.js, PostgreSQL, Varnish Cache, Python, MySQL, Redis, Microsoft IIS, Apache Kafka, Apache ZooKeeper, RabbitMQ, PHP, Windows, Custom files

Container: Docker, Kubernetes, Amazon ECS, Amazon EKS, Mesos, CoreOS, RedHat Openshift, AWS Fargate, Istio

Cloud: AWS, Fastly, Azure, Cloud Foundry, Heroku, Google Cloud Platform

Other: Rsyslog, Fluentd, Logstash, Syslog-ng, RxNXLog

Log Analytics

When using log analytics, it is possible to perform time-series analysis based on log events aggregation. It allows you to understand the volume of events in the past. Combined with visualization techniques, you will also be able to find out the import information (anomaly, trends, …) easily. Two concrete solutions in my mind are Datadog - Log Explorer and Elastic - Logging.

Obviously, analysis goes far beyond time-series. There’re also pie chart, maps, … The key here is aggregation and filtering. Thanks to log platform, you are able to group by different information and filter what you need.

Conclusion

In this article, I shared what I know about logs through the information inside a log event, the advantages of having a log platform (aggregate, enrich, search, analysis, monitor), the sources of logs, and log analytics. Hope you enjoy this article, see you the next time!

References

“Log4J - TTCC”, Wikipedia, 2019. https://en.wikipedia.org/wiki/Log4j#TTCC

PREVIOUSProject qWatch

NEXTMerging 20 Git Repositories