16 August 2012

Inside the JVM: Garbage Collection – Living the Java High-life

To most Java developers, the Java Virtual Machine doesn’t really register on their radar – it’s a tool that they use to run their applications and nothing more. Its internals, features and workings are a complete mystery and many developers are quite happy for it to remain that way – after all, what impact does the JVM have on delivering functionality to customers. It’s not uncommon for developers to put more thought and effort into considering the operating system that their application runs on than the JVM.

In some ways, all this is a Good Thing™ , it means that the JVM is doing its job right – in 80% of cases developers don’t need to even recognise that it exists. But that doesn’t mean that it isn’t worth understanding a bit about how it works. Understanding the internals of the JVM can help you write better applications, find faults, and fix them faster. This blog is the first part of a series covering key aspects of the Java Virtual Machine.
 

Thanks for all the memories

Most developers are familiar with the concept of “garbage collection” they know that they don’t need to worry about allocating and de-allocating blocks of memory, although to those lucky enough to not have been brought up on C/C++, they might not really know what that means. When you need to store something in a program, you need to allocate a segment of memory for it, the size of that segment depends how bit the thing you want to allocate is, and if you don’t know how bit it is, you have to take a guess and add a bit more for safety. Because the amount of memory in computers is finite, if you never free memory up, you will eventually run out. Programming languages like C and C++ provided developers with explicit calls to allocate and de-allocate segments of memory. Developers would be responsible for keeping track of when they no-longer needed a segment of memory and freeing it up, so that the space could be re-used for something else. One of the most common causes of application crashes was memory leaks, usually caused by a developer forgetting to put a call to free() somewhere that it was needed.

This approach works as a solution, but places a lot of emphasis on the developer to focus on minutae that are not directly related to solving the business problems. Like many problems, this need to track used memory and free it up when it is no longer needed can be solved by computers – we can get a computer to clean up its own memory, this process is called Automatic Garbage Collection, and is what the Garbage Collector in the JVM does for you.

Talking Trash

It’s not quite as simple as it sounds though, there are a number of ways that you can track and clear up memory that is no longer needed, and like many algorithms, the obvious ones turn out to not necessarily be the most efficient, due to a number of subtle factors. There are a number of algorithms available to the JVM for garbage collection, and within those algorithms, a great many tuneable factors. Recent Java Virtual machines do a very good job of calculating which algorithms and settings will be the best for your application (a JVM feature called GC Ergonimics works this out) but it doesn’t always make the best choice, so its worth understanding some of the principals behind modern garbage collectors.

You can think of garbage collection as being like having a live-in maid to clean your house. She is there all the time, but she doesn’t follow you around picking up behind you, because this is not the most effective way to clean the house, instead, whenever the house gets above a certain threashold of ‘dirtiness’, she will blitz the whole house, top to bottom, before going back to reading her copy of Hello magazine. Java garbage collection takes a similar approach, it doesn’t constantly run, cleaning up behind you, it is triggered by certain events it will pause the execution of application code (to ensure consistency) in what is called a ‘stop the world event’ and clear up as much memory as it can.

Talking about My Generation

The single most important advancement in the development of garbage collectors came very early on in development, with the realisation that memory allocation in Java follows something called “The theory of infant mortality” – that is, most objects become garbage very soon after they are allocated, with only a few objects hanging around for long periods of time. This happens because many objects are created inside loops, and once the loop exits the object goes out of scope, and can be collected. To take advantage of this, all modern garbage collection algorithms do something called “generational garbage collection”, that is, they divide the memory space into several different areas, and run the garbage collector more frequently over the areas in which new objects are created. These areas are referred to as the new generation (comprised of an eden space, and two survivor spaces), and the old (or tenured) generation. Objects are created in Eden, are moved to the survivor spaces, and eventually on to the old generation. The new generation contains a lot of garbage, but can be kept quite small, it can therefore be collected using a fast but space-inneficient algorithm. These young generation collections are frequent but take just a few ms to compete, so are hardly noticeable. The old generation is normally much larger, and the efficient use of this memory space is more of a priority, however garbage collection can occur much less frequently here, so we can use a slow, but space-efficient algororithm. Garbage collections that collect the old generation as well as the young generation are called full garbage collections, and can take hundreds of milliseconds to several seconds to run, which is far more noticeable to users. There is also a space called the Permanent generation, where data that it used to be thought would never become irrelevant (such as class data etc) is stored. Recent JVMs have also included garbage collection of the permanent generation though, and Oracle’s Jrockit JVM does away with it altogether. 

Garbage collection tuning, is therefore about picking the right algorithms for the various memory spaces, based on the rate of objection creation and the number of retained objects (together with the specifications of the machine), sizing the memory spaces correctly, and aiming to keep the number of full garbage collections low.

Down the rabbit hole

Hopefully this has given you some interesting background into the kind of complexity and features that make up a modern Java Virtual machine, and inspired you to learn more. If you are interested, we are running a webinar on this subject on Friday 24th August 2012, which you can register for here: https://www4.gotomeeting.com/register/528726463 

Matt Brasier
@mbrasier

No comments:

Post a Comment