In this post we’ll look at five ways in which we can use efficient coding to help our garbage collector spend less CPU time allocating and freeing memory, and reduce GC overhead. Long GCs can often lead to our code being stopped while memory is reclaimed (AKA “stop the world”).

reduce GC overhead

Some background

The GC is built to handle large amounts of allocations of short lived objects (think of something like rendering a web page, where most of the objects allocated become obsolete once the page is served).

The GC does this using what’s called a “young generation” – a heap segment where new objects are allocated. Each object has an “age” (placed in the object’s header bits) which defines how many collections it has “survived” without being reclaimed. Once a certain age is reached, the object is copied into another section in the heap called a “survivor” or “old” generation.

The process, while efficient, still comes at a cost. Being able to reduce the number of temporary allocations can really help us increase throughput, especially in high-scale environments, or Android apps where resources are more limited.

Below are five ways we can write everyday code that’s more memory efficient, without having to spend a lot of time on it, or reducing code readability.

1. Avoid implicit Strings

Strings are an integral part of almost every data structure we manage. Being much heavier than other primitive values, they have a much stronger impact on memory usage.

One of the most important things to note is that Strings are immutable. They cannot be modified after allocation. Operators such as “+” for concatenation actually allocate a new String containing the contents of the strings being joined. What’s worse, is there’s an implicit StringBuilder object that’s allocated to actually do the work of combining them.

For example –

a = a + b; // a and b are Strings

The compiler generates comparable code behind the scenes:

StringBuilder temp = new StringBuilder(a).
a = temp.toString(); // a new String is allocated here.
// The previous “a” is now garbage.

But it gets worse. 

Let’s look at this example –

String result = foo() + arg;
result += boo();
System.out.println(“result = “ + result);

In this example we have 3 StringBuilders allocated in the background – one for each plus operation, and two additional Strings – one to hold the result of the second assignment and another to hold the string passed into the print method. That’s 5 additional objects in what would otherwise appear to be a pretty trivial statement.

Think about what happens in real-world code scenarios such as generating a web page, working with XML or reading text from a file. Nested within loop structures, you could be looking at hundreds or thousands of objects that are implicitly allocated. While the VM has mechanisms to deal with this, it comes at a cost – one paid by your users.

The solution: One way of reducing this is being proactive with StringBuilder allocations. The example below achieves the same result as the code above while allocating only one StringBuilder and one String to hold the final result, instead of the original five objects.

StringBuilder value = new StringBuilder(“result = “);

By being mindful of the way Strings and StringBuilders are implicitly allocated you can materially reduce the amount of short-term allocations in high-scale code locations.

2. Plan List capacities

Dynamic collections such as ArrayLists are among the most basic structures to hold dynamic length data. ArrayLists and other collections such as HashMaps and TreeMaps are implemented using underlying Object[] arrays. Like Strings (themselves wrappers over char[] arrays), array size is also immutable. The obvious question then becomes – how can we add/put items in collections if their underlying array’s size is immutable? The answer is obvious as well – by allocating more arrays.

Let’s look at this example –

List items = new ArrayList();

for (int i = 0; i < len; i++)
Item item = readNextItem();

The value of len determines the ultimate length of items once the loop finishes. This value, however, is unknown to the constructor of the ArrayList which allocates a new Object array with a default size. Whenever the capacity of the internal array is exceeded, it’s replaced with a new array of sufficient length, making the previous array garbage.

If you’re executing the loop thousands of times you may be forcing a new array to be allocated and a previous one to be collected multiple times. For code running in a high-scale environment, these allocations and deallocations are all deducted from your machine’s CPU cycles.

The solution: Whenever possible, allocate lists and maps with an initial capacity, like so:

List<MyObject> items = new ArrayList<MyObject>(len);

This ensures that no unnecessary allocations and deallocations of internal arrays occur at runtime as the list now has sufficient capacity to begin with. If you don’t know the exact size, it’s better to go with an estimate (e.g. 1024, 4096) of what an average size would be, and add some buffer to prevent accidental overflows.

3. Use efficient primitive collections

Current versions of the Java compiler support arrays or maps with a primitive key or value type through the use of “boxing” – wrapping the primitive value in a standard object which can be allocated and recycled by the GC.

This can have some negative implications. Java implements most collections using internal arrays. For each key/value entry added to a HashMap an internal objectis allocated to hold both values. This is a necessary evil when dealing with maps, which means an extra allocation and possible deallocation made every time you put an item into a map. There’s also the possible penalty of outgrowing capacity and having to reallocate a new internal array. When dealing with large maps containing thousands or more entries, these internal allocations can have increasing costs for your GC.

A very common case is to hold a map between a primitive value (such as an Id) and an object. Since Java’s HashMap is built to hold object types (vs. primitives), this means that every insertion into the map can potentially allocate yet another object to hold the primitive value (“boxing” it).

The standard Integer.valueOf method caches the values between -128 and 127, but for each number outside that range, a new object will be allocated in addition to the internal key / value entry object. This can potentially more than triple GC overhead for the map. For those coming from a C++ background this can really be troubling news, where STL templates solve this problem very efficiently.

Luckily, this problem is being worked on for next versions of Java. Until then, it’s been dealt with quite efficiently by some great libraries which provide primitive trees, maps and lists for each of Java’s primitive types. I strongly recommend Trove, which I’ve worked with for quite a while and found that can really reduce GC overhead in high-scale code.

4. Use Streams instead of in-memory buffers

Most of the data we manipulate in server applications comes to us in the form of files or data streamed over the network from another web service or a DB. In most cases, the incoming data is in serialized form, and needs to be deserialized into Java objects before we can begin operating on it. This stage is very prone to large implicit allocations.

The easiest thing to do usually is read the data into memory using a  ByteArrayInputStream, ByteBuffer and then pass that on to the deserialization code.

This can be a bad move, as you’d need to allocate and later deallocate room for that data in its entirety while constructing new objects out of it . And since the size of the data can be of unknown size, you guessed it – you’ll have to allocate and deallocate internal byte[] arrays to hold the data as it grows beyond the initial buffer’s capacity.

The solution is pretty straightforward. Most persistence libraries such as Java’s native serialization, Google’s Protocol Buffers, etc. are built to deserialize data directly from the incoming file or network stream, without ever having to keep it in memory, and without having to allocate new internal byte arrays to hold the data as it grows. If available, go for that approach vs. loading the data into memory. Your GC will thank you.

5. Aggregate Lists

Immutability is a beautiful thing, but in some high-scale situations it can have some serious drawbacks. One scenario is when passing List objects between methods.

When returning a collection from a function, it’s usually advisable to create the collection object (e.g. ArrayList) within the method, fill it and return it in the form of an immutable Collection interface.

There are some cases where this doesn’t work well. The most noticeable one is when collections are aggregated from multiple method calls into a final collection. While immutability provides more clarity, in high-scale situations it can also mean massive allocation of interim collections.

The solution in this case would be not to return new collections, but instead aggregate values into a single collection that’s passed into those methods as a parameter.

Example 1 (inefficient) –

List < Item > items = new ArrayList < Item > ();

for (FileData fileData: fileDatas) {
// Each invocation creates a new interim list with possible
// internal interim arrays

Example 2 –

List < Item > items =
new ArrayList < Item > (fileDatas.size() * avgFileDataSize * 1.5);

for (FileData fileData: fileDatas) {
readFileItem(fileData, items); // fill items inside

Example 2, while disobeying the rules of immutability (which should normally be adhered to) can save N list allocations (along with any interim array allocations). In high-scale situations this can be a boon to your GC.

Additional reading

String interning –

Efficient wrappers –

Using Trove –

Tal is the CTO of OverOps. Tal has been designing scalable, real-time Java and C++ applications for the past 15 years. He still enjoys analyzing a good bug though, and instrumenting code. In his free time Tal plays Jazz drums.
  • joaonunesk

    In high scalable applications and weak hardware systems, coding for GC optimization is a key to success. Thank you for the very good article with practices that should be a must for every developer.

    • Tal Weiss

      Thanks 🙂

  • Mikhail Vorontsov

    One minor correction. Integer.valueOf, as well as Byte, Short and Long.valueOf cache values between -128 and 127 (Java byte range). Character.valueOf caches 0-127 – intersection of char and byte ranges.

    You can take a look at Integer.IntegerCache class soruces for proof of my words. I have written on this topic in more details here:

    • Tal Weiss

      You’re 100% right. Will update the post to reflect. Thanks!

  • Michael Romose

    Some nice tips here, thanks. I am a little concerned that a less experienced coder might read too much into your first paragraphs. As you probably know, historically, coders have cached objects excessively, when in fact shorter-lived objects are handled more efficiently by the garbage collector, even in the process you describe. When garbage collecting the young generation, the GC walks the heap to find all object that are still in scope, and it is the longer lived objects that must be copied to the survivor generation – it is this that is the heaviest part of GC. The objects that are unreachable and therefore garbage collected have a very low cost at GC-time. Of course, if you are creating enormous amounts of objects, you will have more frequent GC, and that could become a problem, but it is object allocation that has the highest cost in this case (and that is not so high, nowadays), not GC.

    So it might be worth mentioning that you shouldn’t keep or cache objects unnecessarily – you only want to avoid completely unnecessary object allocation.

    Although tips 2,3,4 are almost always valid, I wouldn’t implement 1 unless I knew I had this particular performance problem, as the code will become significantly harder to read, and writing more efficient code at a higher level becomes more difficult. (for example, perhaps I didn’t need to call this whole method at all in most cases – the simpler code is easier to read and I can more quickly determine what it does).

    So, to sum up: think performance when you have a performance problem. Improve performance only through testing. Even great programmers make mistakes when reasoning about performance.

    • Tal Weiss

      Michael – great feedback.

      I agree that caching is a design pattern that should be applied selectively as part of a specific architecture (and why I haven’t put it in the post, probably deserves a bunch of posts on its own).

      Re perf optimization – in high scale situations, there’s usually not just one thing that you can do to optimize your code (unless you’re dealing with a big bottleneck), it’s usually a continuous process of small improvements, which is what these habits are usually good for. Same goes for Android, where these should probably be applied a bit more on a continuous basis.

      Re Strings, I fully agree that you shouldn’t get crazy every time you concat a string. It is effective where you have code running at scale doing a lot of String processing (xml, html, lgging, etc..) where being proactive about how you parse / concat things can yield better throughput, which is why I thought having this tool in your arsenal is a good thing.

  • Ramkumar Manavalan

    In your 1st point, you may have to mention that StringBuilder is not thread-safe and developers should use StringBuffer instead.

    • Tal Weiss

      Since string concatenation usually occurs at the method level (with the StringBuilder allocated as a non-escaped local var) it’s thread safe in that context, without requiring an synchronization / CASing overhead.

      I agree that in slightly more rare cases where the StringBuilder is exposed or escaped to other threads, than it should replaced by a StringBuffer, or the dev should use wrapper methods to synchronize access to the StringBuffer using RWlocks, monitors, etc..

  • atc

    Just a nitpick but still relevant: TreeMap doesn’t use an Object[] but a linked list for storing objects. HashMap uses an Entry[] that wraps linked list too.

    Otherwise a fantastic article; thanks.

    • Tal Weiss

      Hi – not a nitpic at all, and thanks for the nice feedback.

      You’re right that in a lot of the more complex data structures there’s a combination of both linked lists and arrays, especially if you look at stuff (e.g Guava collections).

      The principle of replaceable interim arrays is true in most random or semi random access collections. TreeMap is a good example of one where it’s not.

  • Q Q

    Your statement about enforcing Stringbuilders is incorrect. If you have statement like
    this :

    String foo = val1 + val2 + “this is a string”;

    The compiler automatically uses a StringBuilder and converts the above expression to

    String foo = new StringBuilder(val1).append(val2).append(“this is a string”).toString();

    Unless you are doing such additions in a loop, it only makes the code harder to read.

    Guessing the size of the collection may not be possible, but setting a value higher than the default size of 10 is definitely better. The article seems to suggest that copying arrays over and over again may turn out to be a bottlenecks in case of appends, but this might not be true since internally ArrayList uses System.arrayCopy which is a native call. It directly translates to memmove in C. There is no type checking performed.

    The rest of the article is really nice.

    • Vipin Sharma

      What you are saying is correct but blog also says same thing , read it one more time.
      Problem is toString() method of StringBuilder which returns new String().

  • szahn

    Great tips! Thanks for the reminders.

  • Lukas Eder

    #2 is rather dangerous, if it is applied as in #5. Excess capacity can create major memory consumption issues. I’ve always found the default ArrayList capacity increase mechanism pretty good.