In any business, it’s a wise decision to not only plan for outages in production but to expect them. When problems do occur, particularly for production systems, time taken to resolve the issue is always key, whether that means time taken to restore the affected system to a stable, consistent and usable state or a root cause analysis to determine the causes and preventative measures to put in place.
When opening a support ticket, then, it is very important to reduce the amount of time wasted by waiting for support to tell you what data they need to diagnose the problem.
This post will provide a basic guide to some useful first steps following a Java OutOfMemory exception which should help get things moving quickly.
What data should I collect?
In short: as much as possible! In an ideal world, all possible logs would be available for such problems but, since these errors occur mainly in production systems, running your server with maximum tracing on is not an option! Here is a list of things which are very useful to provide if you can:
Heapdumps and Java cores
- The JVM should be set to generate heapdumps and javacores following an OOM event.
Verbose GC output
- You will find Verbose GC logged in the native_stderr.log file, but it can be configured to log to a different file without much trouble
- If any recurring errors contributed to the OOM, it is likely they will have been logged in some way. The data contained in these will very likely be minimal, but it’s worth providing them to support anyway.
Any other server logs
How to prepare for a reoccurrence
Make sure the JVM is set to provide a heapdump on OutOfMemory errors.
- This is not a default setting on Sun’s JVM! This involves setting the –XX +HeapdumpOnOutOfMemoryError command line flag. More information can be found on the wiki page for MemoryAnalyzer; a tool we will look at later in this article.
Turn on VerboseGC if it’s not on already
- Verbose GC output is a very lightweight process which will make negligible difference to performance in almost all scenarios. Most of the work to produce the output will be done by the garbage collector anyway, so setting –verbose:gc will just save this data to a log file.
- JRockit supplies extra flags for more fine-grained control over what is logged. Oracle has more information in their documentation.
- The JDK provides a useful tool called VisualVM which allows live monitoring of the heap. See below for more details.
Supply as much information about how the affected system is set up as possible
Have you deployed any new applications to the server?
- The problem could lie in the application code, or it may have uncovered an underlying issue on the server. If it’s possible to roll back these updates they may, in some cases, add some stability.
Any increased load to the system?
- This could be due to a marketing campaign, or seasonal changes to your business rather than a technical change.
Any updates to the system? What is the current software level? (Including minor version?)
- Are there any fixes or patches related to memory or performance that you are missing?
What network topology is implemented?
- This is, perhaps, of less importance but it is still useful to know what, if any, load balancing is available to the problem server.
Does the system handle large message sizes or large volumes of messages/transactions?
- This sort of information is helpful in the diagnosis of heap exhaustion.
Use freely available tools to perform a basic analysis of the problem while waiting
If you find yourself at a loose end after support has been engaged, there are a lot of tools available for you to investigate the problem yourself. You may even spot the answer before they do! Going into much detail about these is outside the scope of this article, but links are provided to more comprehensive articles on those you find interesting.
- Memory Analyzer can give reports and summary tables of heapdumps (HPROF or IBM formatted). It can be used as an Eclipse plugin or through the IBM Support Assistant (ISA).
- IBM GCMV. This is a very useful dynamic GC visualiser. It provides graphs showing heap usage over time and can quickly show usage patterns. It will also generate a report giving recommendations for best practices based on your JVM’s behaviour.
- IBM Pattern Modelling and Analysis Tool (PMAT) for the garbage collector is a more static tool with an emphasis squarely on improving performance, rather than fixing a problem.
- IBM TMDA for javacores is a very useful tool which can work together with a tool like Memory Analyzer to tie in hung threads to objects taking up a large percentage of the heap. Deadlocks can often be uncovered like this, though they are never easy to spot!
- Heap Analyzer is a very good stand-alone alternative to Memory Analyzer.
- VisualVM is a part of the JDK from 1.6 update 7 onwards. More information on how to use it on local and remote VMs is available in the comprehensive documentation.
This is such a huge topic that one blog post could never cover everything in as much depth as it warrants. Hopefully though, despite that, you will feel more prepared the next time you get an OutOfMemory error!