Project qWatch - Mincong Huang

Overview

Today, I would like to share with you my side project: qWatch. qWatch stands for “Quality Watch”, it is a data aggregator for code quality, based on different metrics. For now, the only implemented one is statistics around logs. In this article, I will explain why I chose logs as the first metric, its mechanism and how the tool is implemented. At the end, I will also share some thoughts about side project.

Functionality

First of all, let me explain two basic commands of qWatch:

qwatch collect
qwatch stats [topN]

Command qwatch-collect collects logs from a target directory, following a specific format. For now, I export log history from Datadog as CSV files. Then, I launch the qwatch-collect command to collect them into my database. The collect command compares new log events with existing ones, and merges them without duplicate.

Command qwatch-stats aggregates existing log events from database using log patterns, and then display the top N results to the console. It helps to understand which log events are the most frequent ones. In the following example, you can see several information: the number of entries extracted, the date range where the log entries are applied, the top N errors with the number of occurrences, the pattern ID if exists, and the error summary.

$ qwatch stats 10
1,999 entries extracted (2019-02-13 to 2019-02-26).
Top 10 errors:
- 1,234 [P1] Something goes wrong.
-   234 [P2] Something is incorrect.
-   123 [  ] Incorrect parameter
-    20 [P5] Project ${id} not found
-    ...

Why I developed this tool?

I developed this tool because there’re many errors in our production environment. By using the tool, I can measure the number of errors for given time window. Having my own patterns makes the aggregation more precise. Actually, this functionality already exists in Datadog, but it does not support custom log pattern. Another reason is because I wanted to improve my Java skill by writing more code. And most importantly, I believe that we should do data-driven decision making (DDDM), so that we can fix the most valuable bugs in the given time and resource.

Architecture

The architecture of the logs tool is very simple. It’s a command-based tool. When the main function is called, it dispatches the arguments to different command:

var command = args[0];
if (CollectCommand.NAME.equals(command)) {
  CollectCommand.newBuilder() //
      .logDir(Paths.get("/Users/mincong/Downloads"))
      .build()
      .execute();
} else if (StatsCommand.NAME.equals(command)) {
  int n = args.length > 1 ? Integer.parseInt(args[1]) : 200;
  StatsCommand.newBuilder() //
      .logDir(Paths.get("/Users/mincong/datadog"))
      .topN(n)
      .days(14)
      .build()
      .execute();
} else {
  logger.warn("Unknown command: {}", command);
}

Each command has its own logic. One command does not depend on other command. When performing I/O actions, a command delegates the different classes available for importation and exportation. They are defined in package qwatch.logs.io. For now, 3 classes are created:

CSV Importer CsvImporter
JSON Importer JsonImporter
JSON Exporter JsonExporter

Some classes have concurrent capability. They are executed concurrently using thread pool.

Best Practices

In the following section, I would like to share the best practices I learnt and applied to this small tool.

Immutability. Value classes and data structures used in this project are immutable. For value classes, I used Google’s AutoValue to generate the immutable implementation. It allows to write value classes in a declarative way, while keeping the hashCode(), equals(), and toString() implemented correctly. For data structures, I don’t use Java built-in data structures, but those coming from Vavr. It keeps syntax concise, and ensures that structures are immutable.

Testing. I tried to do TDD as much as I can. The test coverage of this tool is 76%. The test effort is mostly focus on the I/O part and the log patterns. Keeping testing in mind allows me to create better structure in the source code, where code is loosely coupled. Also, I described some behavior in tests, so that the source code meets the goal before being implemented.

Java 11. I take the chance to upgrade to Java 11 for this project. It allows me to use var, the local-variable type inference system introduced by JEP 286. I didn’t use Java Platform Module System (JPMS) yet, but I will if there’s any opportunity in the future.

Functional Programming. Using functional library Vavr allows me to manipulate data objects easily in Java. It makes sort, deduplicate, map and transformation easy.

Build. This project is built using Maven. A default Maven configuration is defined under .mvn folder, where the multi-thread build is enabled by default as -T 1C. It allows to speed up the build time. In CircleCI, the build takes about 13 seconds to execute.

What’s Next?

I don’t have concrete plan right now. Since I only developed this project during lunch time on weekdays and in the transport, it’s actually hard to go further. I think about several possibilities:

Avoid manual download from Datadog Log Explorer
Connect to AWS S3 to log download
Automate execution
Measure the Jenkins build and extract warnings and errors
Generate log patterns from source code directly using a Maven plugin
Visualize errors using Jupyter Notebook
Identify bug tickets and measure the bug-fix delay (time between the feature creation date and the bug fix date)

Conclusion

In this article, we saw the tool qWatch, how it collect and aggregates logs. I also explained the architecture of the tool, the best practices I applied and the eventual next steps for the project. Hope you enjoy this one, see you the next time. The source code is available on GitHub: https://github.com/mincong-h/quality-watch.

References

Kevin Bourrillion, Éamonn McManus, “Google / AutoValue”, github.com, 2019. [Online]. Available: https://github.com/google/auto/tree/master/value
Daniel Dietrich, “Vavr”, github.com, 2019. [Online]. Available: https://github.com/vavr-io/vavr/

PREVIOUSDesign Pattern: Static Factory Method

NEXTWhat I Know About Logs