Lately we’ve seen a variety of new tools that help you make sense of your logs. Open-source projects such as Scribe and LogStash, on-premise tools like Splunk, and hosted services such as SumoLogic and PaperTrail. These all help you reduce mass amounts of log data into something more meaningful.

duke_log

But there’s one thing none of them can help you with. They’re all dependent on the data you actually put into the log. The task of getting that quality and quantity of the data right falls on to you. So when the chips are down and you need to debug your code with partial or missing log data, you may want to cancel that dinner reservation.

To reduce the chances of that happening I wanted to share 5 things you should be constantly mindful of when logging in production:

1. Hello my (Thread) name is

Just like Ringo, the Thread name property is one of Java’s most undervalued methods. The reason is that it’s mainly descriptive. So what? so are our names, and we put some meaning into them.

The place where thread names play the biggest part is in multi-threaded logging. Most logging frameworks will log the current thread name from which a call was made. Sadly enough, this would mostly look something like “http-nio-8080-exec-3″, a name assigned to it by the thread-pool or container.

For some reason, there’s a misconception I’ve heard more than once that thread names are immutable. They’re not. The thread name is prime real-estate in your log, and you want to make sure you use it right. This would most likely mean assigning it with detailed context such as the name of the Servlet or task its fulfilling right now, and some dynamic context such as a user or message ID.

As such, an entry point into your code should look something like this:

Thread.currentThread().setName(ProcessTask.class.getName() + “: “+ message.getID);

A more advanced version would be to load a thread local variable into the current thread, and configure a log appender that will automatically add it to every log entry.

This is extremely useful when multiple threads are writing to the server log and you need to focus on a single thread. If you’re running in a distributed / SOA environment, this has an added advantaged which we’ll see in a moment.

2. Distributed identifiers

In an SOA or message driven architecture, the execution of a specific task will most likely span across multiple machines. When processing of one such fails, connecting the dots between the machines involved and their state will be key to understanding what happened. Most log analyzers will group those log messages for you, assuming you provided them with a unique ID by which they can do that as part of the actual log message.

From a design perspective this means that every inbound operation going into your system should have a unique ID which will travel with it until it’s completed. Note that a persistent identifier such as user ID may not be a good vessel for this, as one user may have multiple operations happening for him in the course of a log file, which will make isolating a specific flow much harder. UUIDs can be a good choice here. The value can be loaded into the actual thread name or as a TLS – thread local storage.

3. Dont text + drive. Don’t log + loop

Often times you’ll see a piece of code that’s running in a tight loop and performs a log operation. The underlying assumption is that the code will run a limited number of times.

That may very well be the case when things are working well. But when the code gets unexpected input the loop may not break. In that case you’re not just dealing with an infinite loop (which is bad enough), you’re dealing with code that’s writing infinite amounts of data to disk or network.

Left to its own devices this can take a server down, or in a distributed scenario, an entire cluster. So when possible, do not log within tight loops. This is especially true when catching errors.

Let’s look at an example, where we log an exception inside a while loop:

void read() {
    while (hasNext()) {
        try {
            readData();
        } catch {
            Exception e) {
            // this isn’t recommend
            logger.error(“error reading data“, e);
        }
    }
}

If readData throws, and hasNext returns true, we end up with infinite logging. One way to solve this is to make sure we won’t log everything:

void read() {
        int exceptionsThrown = 0;
        while (hasNext()) {
            try {
                readData();
            } catch {
                Exception e) {
                if (exceptionsThrown & lt; THRESHOLD) {
                    logger.error(“error reading data ", e);
                        exceptionsThrown++;
                    }
                    else {
                        // Now the error won’t choke the system.
                    }
                }
            }
        }

Another approach is to remove the logging from the loop altogether and save the first/last exception object to log it elsewhere.

4. Uncaught handlers

Westeros has the Wall as its last line of defense. You have Thread.uncaughtExceptionHandler. So make sure you use them. If you don’t install one of these handlers, you run the risk of throwing exceptions into the wild with very little context, and very little control of if and where they end up being logged.

Finding out that your code has been encountering massive errors without them being logged, or with them being logged with little to no state data is an epic fail.

Notice that even within an uncaught exception handler, which on the surface of things has no access to any variable within that thread (which has already terminated) you still get a reference to the actual Thread object. If you adhered to step #1, you’ll still get a meaningful thread.getName() value which you can log.

5. Catch external calls

Whenever you make an external API call which leaves the JVM, chances of an exception increase dramatically. This includes Web service, HTTP, DB, File system, OS and any other JNI calls. Treat each call as it will explode (it most likely will at some point).

For the most part, the reasons for external API failures have to do with unexpected input provided to them. Having those inputs readily available for you in the log is a key part in fixing the code.

You may choose not to log an error at this point, but have the exception thrown, which is fine. In that case, simply collect as many of the relevant arguments passed to that call and parse them into the exception error message.

Just make sure that exception is caught and logged at a higher level within the call stack.

try {
    return s3client.generatePresignedUrl(request);
} catch (Exception e) {
    String err = String.format(“Error generating request: % s bucket: % s key: % s.method: % s ", request, bucket, path, method);
        log.error(err, e); //you can also throw a nested exception here with err instead.
    }

Have any other logging best practices you’ve learned the hard way, or completely disagree with the above? Hit me in the comments section.

lambda

Compiling Lambda Expressions: Scala vs. Java 8 – read more

As a co-founder and CTO, Tal is responsible for overseeing OverOps' product and engineering strategy. Previously, Tal was co-founder and CEO at VisualTao, acquired by Autodesk Inc. (ADSK). Following that, Tal was the Director for the AutoCAD global Cloud and Mobile product line. Plays Jazz drums and Skypes, sometimes simultaneously.