8 November 2016

What is the future of Java EE

Following Oracle's announcements on the future of JavaEE at JavaOne in September - and the subsequent community feedback that was surveyed through to mid October - c2b2 Head of Profession Matt Brasier looks at the key players in the Java EE debate and offers his opinion on what the future might hold.




Some of you may have seen 
the significant coverage that Oracle's announcements around Java EE 8 at JavaOne has received in the last few weeks, and wondered what all the fuss was about. 

Java EE standards have never been the fastest things out of the starting blocks, but their progress has always marched in the right direction - so why was this year any different? Well there have been a number of things going on in the background that made these particular announcements from Oracle quite key to the future of Java EE.


First some background; Java EE (and Java SE and ME) standards are driven by a thing called the Java Community Process (JCP). This process, which is run by Oracle, puts together panels of interested experts for the various areas where it is felt that a new standard is needed, and they work on a Java Standards Request (JSR) which eventually becomes part of the Java SE, Java EE or Java ME specification. Many of these working groups are led by Oracle employees, but contain members from a number of organisations, including the big players in the Java EE market, such as IBM and Red Hat.

For the last twelve plus months, members of many of these working groups were telling the same story - that Oracle seemed to be unresponsive and that the Oracle JSR leads weren’t helping move the standards forwards. The Java EE 8 standard in particular seemed to have come to a complete halt, with rumours abounding that Oracle had decided to let it die - and that they were turning their back on standards and community processes in favour of proprietary and vendor-specific technologies. 

In response to this, individuals and organisations committed to the future of Java EE (with or without Oracle) formed into groups such as Java EE Guardians (javaee-guardians.io/) and Microprofile (http://microprofile.io/). Their purpose was to drive forward advocacy of the Java EE platform, working on standards to keep Java EE moving forwards and relevant to modern application development. These groups were backed by a number of well-known names from the Java EE community - both individuals and organisations - and generated significant interest.

This movement and advocacy started to have an effect during the summer, with Oracle putting out statements to the effect that it continued to be committed to developing Java EE standards, and the Java EE 8 specification in particular. 

This apparent u-turn and re-engagement was reaffirmed at JavaOne this year, with a proposed timetable and more information about how development would happen. Of particular interest was the fact that Oracle put out a community questionnaire to look at which areas people felt were important to include in the Java EE 8 standard, and which are areas - that while important to development - don’t need industry standardisation.

The impact of these events is far greater than it first appears, however - and have been something of a wake up call to the Java EE community.The key concerns raised are that we may not always be able to rely on Oracle to steer Java EE in the direction that the community would like to see it go, and that the stranglehold Oracle has over the JCP is unhealthy. 

In particular, organisations such as IBM, Red Hat and Apache - who have significant investment in the Java EE ecosystem - are starting to question whether letting Oracle (who they consider one of their key competitors in many respects) have that much power over the standards process is a good idea. 



I think this has served as a wake-up call to the Java EE community in a number of respects. Firstly to the amount of power that Oracle holds over the current ways of doing things, and secondly that it is possible to make progress without relying on Oracle. It is this second aspect that I think will play the biggest role moving forward. Organisations such as microprofile have been seen to function well and fairly in producing a standard without needing oversight from Oracle, and whilst that standard may now go before the JCP, doing so would be a rubber stamp exercise rather than be seen as a requirement.

If Oracle refuses to certify or wishes to change what the microprofile organisation agrees on, it will continue without support from Oracle. Neither microprofile nor the Java EE guardians show any signs of stepping back now that Oracle have stepped forward, and they have a key role to play in keeping Oracle focused on its stewardship of Java EE. 

I hope that moving forwards, we see more genuine community involvement in the development of standards, via groups like microprofile, to the point where the JCP either becomes a genuine and open standard body, or it gets replaced by community led organisations such as Microprofile.

While the events of the last 12-18 months have been concerning at times, I think the outcome has strengthened Java EE and its place with developers - and whilst there are complaints about how all of this has caused delays of up to a year in getting Java EE 8 finalised, I think overall the benefits (in terms of a stronger community that is willing to challenge Oracle to make a better future for Java EE) outweigh any delays.




4 November 2016

JBUG - Java Microservice Development with Apache Camel

In the first of our November JBUGs, we welcomed Red Hat Senior Software Engineer James Netherton, who works in the Fuse engineering team, and who demonstrated how to develop, build and deploy microservices using Apache Camel, Spring Boot and WildFly Swarm for OpenShift / Kubernetes.

In this post, JBUG host, JBoss specialist and Senior Consultant, Brian Randell looks back at another hugely enjoyable evening of JBoss.




Back again at Skills Matter’s CodeNode venue in London for another JBoss User Group - London.  It was a damp, chilly autumnal evening - so where better to spend it than listening to and talking tech with like minded professionals followed by a bit of networking over a drink and a slice of pizza.

This JBUG was dedicated to Microservices, and our guest speaker for the evening was James Netherton. James is a Red Hat Senior Software Engineer working in the Fuse engineering team and has done a lot of work with microservices using Apache Camel, Spring Boot and Wildfly Swarm with Kubernetes and Openshift - so it promised to be an informative night.

James set the scene by talking about Apache Camel - which is a framework that allows you to integrate with disparate systems. Although now 10 years old, Camel is very actively developed and has a good community around it. It supports over 250 components - allowing it to easily implement Enterprise Integration Patterns (EIP). It's also easy to understand as its Domain-Specific Language (DSL) is simple to follow.

One of the supported components of Camel relevant to the later demo was Zipkin. Zipkin enables you to get timing data across your microservices so you can identify latency issues.  Well worth a look.

The two Java Micro Services frameworks that James talked about were Spring (using Spring boot) and WildFly Swarm.  These both have bootstrap IDEs (Spring Initializr and WildFly Swarm Generator) to enable you to quickly create a project that utilises Camel and hence where you can define camel routes.

Apache Camel has a lot of microservice support for components, data formats and EIPs. This can be configured in various ways, for example through configuration files, environment variables or Kubernetes ConfigMap.


James then took us through a typical workflow for creating and deploying a Camel Microservice.  This is made easy by using fabric8 and the options it provides for Kubernetes.

Using the Fabric8 maven plugin you can easily build your project and generate Docker images and Kubernetes resources.  The workflow then is to generate your resources (using fabric8:resource), build your images (using fabric8:build) and deploy your changes (using fabric8:deploy). Simple - and gets you up and running as quickly as possible.

Camel has a lot of support for testing and again it is made easier using Fabric8 Arquillian where you can deploy on Openshift and Kubernetes and assert on what has been deployed.

In terms of setting up local clusters, whilst you can use vagrant, docker and RedHat CDK, James also suggested a couple of other possibilities of using MiniKube and MiniShift which will run a single node cluster in a lightweight VM.

After a short break it was demo time.

The demo consisted of a Name generator application where a client generated a random name, POSTed this to a service to add a title.  his in turn POSTed to another service which added a surname and this returned the result to the client.  

Zipkin was used to retrieve the latency statistics, Hystrix (the dashboard seen through Turbine) was used to provide a resilience and fault tolerance to the components.

James showed us the building, starting and deploying of the components where the services were running on pods in minishift.  We could see in the client’s logs it receiving the result after the request had gone through the services and been returned.  Through the Zipkin dashboard we could see the latency of the requests as the frequency was increased and decreased simply by changing a variable in a properties file.

One of the services was then shut down and we could see that using Hystrix the environment didn’t completely fail and we could see through Zipkin that errors were accumulating as the requests failed to hit the unavailable service.  The service was then restarted and everything started working again and Zipkin once again showed successful requests and latency information.

After all that it was certainly time for a drink and a bit of pizza.

Again a big thank you to James for presenting. Check out the presentation and try the demo from the links below.

There were a lot of libraries, components and applications being used and it certainly opened the mind to lots of possibilities.  If interested see the Further Reading links below.


Until next time!

Further Reading


Apache Camel : http://camel.apache.org/


Kubernetes : http://kubernetes.io/
Arquillian : http://arquillian.org/


2 November 2016

Understanding Java Garbage Collection - Part Two

In this, the second of two articles on Java Garbage Collection, c2b2 middleware support specialist, Andy Overton follows up his introduction to Garbage Collection with a look at changes in memory management in Java 8 with the advent of MetaSpace as a replacement for PermGen.





In my last post, 
I covered the basics of what Garbage Collection is - how it works, how to collect GC logs, and how to understand the data they contain. For this post, I'll look at the changes in memory management in Java 8 with the advent of the new MetaSpace as a replacement for PermGen, explain what the changes mean, and then examine what happens when there are issues with both PermGen and with MetaSpace. 



What is PermGen and MetaSpace?

Let's look first at what we're dealing with. Prior to Java 8, memory in Java was split into 2 main areas: 


  • heap space
  • non-heap space
The heap is split further into generations - but since I covered that in the previous post, let's not go over the same ground here.

In Java, you have Objects and Classes and Objects are instantiations of Classes. In terms of memory allocation, object data is stored on the heap, and class data is stored in non-heap space. This non-heap space is also known as the permanent generation (or PermGen for short).

The following data is stored regarding classes:
  • The name of the class
  • The class methods
  • The class fields
  • Object arrays and type arrays associated with a class



In addition to this, other information is stored in PermGen regarding internal objects created by the JVM and compiler information.

Prior to Java 7, interned-strings were also held in PermGen, but as of Java 8, PermGen has been replaced by MetaSpace. Whilst the same data is held here, the way it works is different in that whereas PermGen data was held contiguous to the heap data, now MetaSpace data is held in native memory.

So what does this mean for your applications?

The issue we need to investigate is what happens when you create more classes than you have room for. Remember, prior to Java 8 this data is stored in PermGen whereas in Java 8 it is stored in MetaSpace - which is held in native memory.

First of all let’s see what happens when there is an issue with PermGen filling up, then see what happens when we run the same code with Java 8 and the new Metaspace. 

I'll dip into Stack Overflow for a suitable piece of code:



This basically contains an infinite loop that continually creates new classes and loads them in order to fill the permgen with dummy data.

Note: To run these tests you will need both JDK1.7 and JDK 1.8



package permgenfiller;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;

public class PermgenFiller
{
    private static final int BUFFER = 1024;
    private List<Class<?>> classList = new ArrayList<Class<?>>();

    public static void main(String[] args)
    {
        PermgenFiller permgenFiller = new PermgenFiller();
        
        // Load just some class with class loaders until perm gen space fills.
        while (true)
        {
            permgenFiller.classLoader();
        }
    }

    private void classLoader()
    {
        ClassLoader classLoader = new ClassLoader()
        {
            public Class<?> loadClass(String classNameWithPackage) throws ClassNotFoundException
            {
                if (!classNameWithPackage.contains("DummyClass"))
                {
                    return super.loadClass(classNameWithPackage);
                }
                
                String className = classNameWithPackage.replace('.', '/') + ".class";
                byte[] classData = null;
                InputStream inputStream = null;
                
                try
                {
                    inputStream = getResourceAsStream(className);
                    byte[] buffer = new byte[BUFFER];
                    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
                    int bytesRead = -1;
                    
                    while ((bytesRead = inputStream.read(buffer, 0, BUFFER)) != -1)
                    {
                        outputStream.write(buffer, 0, bytesRead);
                    }
                    
                    classData = outputStream.toByteArray();
                }
                catch (IOException e)
                {
                    e.printStackTrace();
                }
                finally
                {
                    if (inputStream != null)
                    {
                        try
                        {
                            inputStream.close();
                        }
                        catch (Exception e)
                        {
                            e.printStackTrace();
                        }
                    }
                }
                
                Class<?> c = defineClass(classNameWithPackage, classData, 0, classData.length);
                resolveClass(c);
                classList.add(c);
                
                return c;
            }
        };

        try
        {
            Class.forName("permgenfiller.DummyClass", true, classLoader);
        }
        catch (ClassNotFoundException e)
        {
            e.printStackTrace();
        }
    }
}

We will also need a dummy class just for testing purposes. 

package permgenfiller;

public class DummyClass
{
    public DummyClass()
    {
        
    }
    
    public void someMethod()
    {
        int a;
        
        a = anotherMethod();
        
        a++;
    }
    
    private int anotherMethod()
    {
        int x = 5;
        
        return x;
    }
}



Run JVisualVM (with the Visual GC plugin) first before running this code in order to visualise what is happening with regards to the memory. See the first blog for details.

You may find that when running this code you get a 'GC overhead limit exceeded' error. This indicates that the garbage collector is running all the time and Java program is making very slow progress (which is understandable). In order to prevent this you can add a parameter -XX:-UseGCOverheadLimit in order to prevent it crashing.


So, run with the following command line (altering the JDK location and GC log location to your machine):



./jdk1.7.0_25/bin/java -verbose:gc -Xloggc:/home/andy/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:-UseGCOverheadLimit -XX:MaxPermSize=500M -jar PermgenFiller.jar 



What you should see is that the number of classes loaded rises rapidly and in parallel to this the amount of PermGen used does the same (as we'd expect). With MaxPermSize set to 500M it will pretty quickly run out of space and towards the end it may load a few further classes but will eventually die with a 'java.lang.OutOfMemoryError: PermGen space' error. If you run JVisualVM you should see results similar to the following:





Obviously, here we are deliberately creating more classes than we can deal with but the same issue can occur in the real world - it just tends to happen more slowly!


If you check the GC logs, things start off OK, but pretty rapidly we see that a large number of full GCs occur with the PermGen being collected but very little or no memory being reclaimed before eventually we get the Out Of Memory error:



2016-04-12T11:14:57.541+0100: 104.571: [GC [PSYoungGen: 1713K->96K(143296K)] 286910K->285293K(817792K), 0.0735010 secs] [Times: user=0.28 sys=0.00, real=0.08 secs]

2016-04-12T11:14:57.615+0100: 104.645: [Full GC [PSYoungGen: 96K->0K(143296K)] [ParOldGen: 285197K->285190K(674496K)] 285293K->285190K(817792K) [PSPermGen: 511999K->511999K(512000K)], 1.5195020 secs] [Times: user=4.44 sys=0.02, real=1.52 secs]

2016-04-12T11:14:59.139+0100: 106.169: [GC [PSYoungGen: 1713K->32K(162944K)] 286903K->285222K(837440K), 0.0626180 secs] [Times: user=0.24 sys=0.00, real=0.06 secs]

2016-04-12T11:14:59.202+0100: 106.232: [Full GC [PSYoungGen: 32K->0K(162944K)] [ParOldGen: 285190K->285165K(674496K)] 285222K->285165K(837440K) [PSPermGen: 511999K->511999K(512000K)], 1.5558600 secs] [Times: user=4.64 sys=0.02, real=1.56 secs]

2016-04-12T11:15:00.758+0100: 107.788: [GC [PSYoungGen: 0K->0K(144576K)] 285165K->285165K(819072K), 0.0633300 secs] [Times: user=0.25 sys=0.00, real=0.06 secs]

2016-04-12T11:15:00.821+0100: 107.851: [Full GC [PSYoungGen: 0K->0K(144576K)] [ParOldGen: 285165K->285164K(674496K)] 285165K->285164K(819072K) [PSPermGen: 511999K->511999K(512000K)], 1.7865650 secs] [Times: user=5.53 sys=0.03, real=1.79 secs]

2016-04-12T11:15:02.625+0100: 109.655: [GC [PSYoungGen: 5603K->128K(161280K)] 290768K->285292K(835776K), 0.0681750 secs] [Times: user=0.25 sys=0.01, real=0.06 secs]



2016-04-12T11:15:02.693+0100: 109.723: [Full GC [PSYoungGen: 128K->0K(161280K)] [ParOldGen: 285164K->7121K(479744K)] 285292K->7121K(641024K) [PSPermGen: 511999K->7469




Java 8
Here you will need to run the JDK8 version of JVisualVM, again with the Visual GC plugin.

WARNING! - Running this code as is will possibly crash your machine or make it highly unstable, therefore it's highly recommended that you only run this code in a virtual machine.

I tested this on a VM with 4GB of RAM, and in order to see what is happening, I'd recommend that as well as running JVisualVM you also run the Linux top command to view what is happening with regards to native memory.


Running the following:



./jdk1.8.0_77/bin/java -verbose:gc -Xloggc:/home/andy/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:-UseGCOverheadLimit -jar PermgenFiller.jar 




Things soon start to go wrong!

Within about 3 minutes the VM became very unresponsive with even the simplest tasks like opening a new terminal taking an age. JVisualVM initially captures what is happening but soon grinds to a halt too. Opening up any other program is almost impossible.

After 10 minutes the program crashed with the following error:



java.lang.OutOfMemoryError: Compressed class space 



You may find you get a different error or that it doesn't crash at all. The first time I ran this I got this error, the second time I got the same but with a full thread dump


So, what happened? Well, if you watch the output of top you will see the total used memory of the machine rapidly declining and the memory used by the java program rapidly growing. After a while you will see a program called kswapd begin to appear being run. The purpose of kswapd is to keep the memory management system operating efficiently. So, what we see here is Linux trying it's best to stay on top of things, freeing up native memory and using the swap space instead. However, we then keep filling up native memory.



On top of this, if we look at the GC logs we see very long pause times and that the final last ditch collection (a last ditch collection will be triggered if an allocation fails and the memory pool cannot be expanded) took over 6 minutes (397.3261472 secs)!



2016-04-11T17:05:14.042+0100: 201.739: [GC (Allocation Failure) [PSYoungGen: 178560K->43584K(229376K)] 573262K->450830K(735232K), 1.2642069 secs] [Times: user=0.25 sys=0.15, real=1.27 secs]

2016-04-11T17:05:27.618+0100: 215.314: [GC (Allocation Failure) [PSYoungGen: 181312K->31680K(237056K)] 588558K->465926K(742912K), 2.1491851 secs] [Times: user=0.38 sys=0.04, real=2.15 secs]

2016-04-11T17:05:40.364+0100: 228.061: [GC (Metadata GC Threshold) [PSYoungGen: 153289K->14496K(240640K)] 587536K->478878K(746496K), 0.4647120 secs] [Times: user=0.66 sys=0.85, real=0.46 secs]

2016-04-11T17:05:40.829+0100: 228.526: [Full GC (Metadata GC Threshold) [PSYoungGen: 14496K->0K(240640K)] [ParOldGen: 464382K->476262K(674816K)] 478878K->476262K(915456K), [Metaspace: 802870K->802870K(3151872K)], 330.5048203 secs] [Times: user=15.28 sys=37.65, real=330.60 secs]

2016-04-11T17:11:11.457+0100: 559.153: [GC (Last ditch collection) [PSYoungGen: 0K->0K(243200K)] 476262K->476262K(918016K), 9.3730600 secs] [Times: user=22.10 sys=0.94, real=9.37 secs]



2016-04-11T17:11:20.830+0100: 568.527: [Full GC (Last ditch collection) [PSYoungGen: 0K->0K(243200K)] [ParOldGen: 476262K->475790K(674816K)] 476262K->475790K(918016K), [Metaspace: 802870K->802714K(3151872K)], 397.3261472 secs] [Times: user=272.60 sys=63.82, real=397.43 secs]


In order to prevent the Compressed class space error I tried adding the flag:



-XX:-UseCompressedClassPointers 


I ran the code again and left it running overnight. Now, although the code kept running, when I checked the GC logs I noted some rather large pause times… not sure there are many users (or bosses) that would be happy with 97 minute pauses!


2016-04-12T06:10:42.097+0100: 43932.613: [Full GC (Ergonomics) [PSYoungGen: 137728K->130135K(235008K)] [ParOldGen: 674330K->674330K(674816K)] 812058K->804466K(909824K), [Metaspace: 1206329K->1206283K(3254272K)], 5873.9990681 secs] [Times: user=540.71 sys=464.05, real=5874.05 secs]



2016-04-12T07:53:02.076+0100: 50072.576: [Full GC (Ergonomics) [PSYoungGen: 137728K->130340K(235008K)] [ParOldGen: 674330K->674330K(674816K)] 812058K->804670K(909824K), [Metaspace: 1206616K->1206616K(3254272K)], 2014.0357872 secs] [Times: user=3278.56 sys=216.07, real=2014.10 secs]


So, what’s the issue? 

Metaspace uses native memory and by default its use is unbounded. Therefore it will use up as much native memory as possible. The OS in turn does its best to hang on to some native memory in order to keep running by using swap space. The Garbage Collector tries its best to free up memory but takes longer and longer as the number of classes grows and it can’t collect them.

In order to prevent this you need to ensure that you set a boundary for the maximum size that the Metaspace can grow to. To do this simply add the following flag:



-XX:MaxMetaspaceSize=


Running with this set to 1 GB.



./jdk1.8.0_77/bin/java -verbose:gc -Xloggc:/home/andy/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:-UseGCOverheadLimit -XX:MaxMetaspaceSize=1G -jar PermgenFiller.jar 



After around 30 seconds the program crashes with the following error:


Exception in thread "main" java.lang.OutOfMemoryError: Metaspace

Whilst this is not ideal, it’s much better than keeping running on a system that has ground to a virtual halt.

Conclusion

In essence, ensure that if you’re using Java 8 that don’t use the default settings and that you set a boundary on the size that Metaspace can grow to. Memory leaks are bad enough when they cause your application to crash but once they start to use up native memory things can become much worse and OS behaviour can be very erratic.


As with all things ensure that you test, test and test again to ensure your applications will run as expected. Also, ensure you have a good monitoring solution in place and are keeping track of what your application is doing and what resources it’s using.




25 October 2016

Understanding Java Garbage Collection - Part One

In this post, c2b2 middleware support specialist, Andy Overton will be taking an introductory look at Java Garbage Collection. Intended to offer a basic understanding of how Garbage Collection works, he'll focus on how to collect GC logs and offer an understanding of what data they contain.



Before we go any further, let me first say that this post is intended as an introduction to the subject of Garbage Collection - so I won't be going into details about log analysis or the different types of Garbage Collection. Also, note that things have changed somewhat in Java 8 - and I'll look into that more deeply in a future post.


What is Garbage Collection - and how does it work?


In languages like C/C++ memory management is left to the developer (known as manual memory management). Languages such as Java and C# utilise Garbage Collection (or GC) to perform automatic memory management - and by Garbage Collection, I mean the process of reclaiming memory occupied by objects that are no longer in use by your code.

Unsurprisingly, because we provide middleware support services to customers with Java EE infrastructure, I'll be concentrating on GC in Java environments.

In Java, memory is separated into three areas:

  • Heap Memory - This is used to store the Java objects your program uses
  • Non-Heap Memory (also known as the Permanent Generation) - This is used to store loaded classes and other meta-data
  • Other - This is used to store the JVM code itself, JVM internal structures etc.


The heap space is further split into the following areas:

Young Generation:

  • Eden Space - The pool from which memory is allocated for most new objects
  • Survivor Space - The pool containing objects that have survived garbage collection of the Eden space.


Old/Tenured Generation:

  • The pool containing objects that have existed for some time in the survivor space
  • The lifetime of an object


So, how does an object move through the spaces?

The following diagram illustrates this:



A new object ObjectX is created, this will initially be created in the Eden Space.

When the Eden Space becomes full (the JVM is unable to allocate space for a new Object) a Minor GC is triggered.

If ObjectX is still referenced within your code it will be moved to one of the Survivor spaces. Other referenced objects will also be moved to the survivor space.

Once the survivor space is full, all objects that cannot be collected (including ObjectX) will be moved to the other survivor space, leaving the first survivor space empty.

This cycle will continue with objects that survive Minor GC being swapped from one survivor space to the other.

Each Object has an age, and each time it survives Minor GC this is added to.

Once it reaches a certain number (known as the tenuring threshold) it will be moved to the Tenured generation (by default the tenuring threshold is set to 15).

When the Tenured generation becomes full a Major GC is triggered. At this point if ObjectX is still referenced it will remain in the Tenured generation. If not, it will be garbage collected.

Why should I care?


I hear you ask, "Surely if the JVM is doing all of this automatically, I don't need to worry about it?"

Well, most of the time, no. However, when GC isn't being done efficiently it can cause some sizable problems.

The key issue here is that GC is a stop-the-world event - meaning that when GC is taking place, your code stops running. A common misconception is that Minor GCs do not trigger stop-the-world pauses. However, this is untrue (although Minor GCs are optimised, and in general the length of the pauses is negligible). Most of the time spent doing Minor GC is not the cleaning process, but the copying of objects to/from the different spaces. Therefore, if you have a large number of objects that survive Minor GC then these pauses can become much longer.

Often, people don't care about GC until it becomes an issue. When pauses are small and infrequent they're not noticed, and all is well with the world. However, when users start complaining that your application is running really slowly or your production server falls over with an Out Of Memory error you'll soon learn the importance of understanding GC and how to ensure that it is working in the way you require.

To see what happens when it goes wrong compile the following code as a jar file:


import java.util.HashMap;
import java.util.Map;

/**
 *
 * @author Andy Overton
 */
public class MemoryFiller
{
    private Map<String, String> everGrowingMap;
    
    private String randomData = "Some random data that will fill up the memory pretty quickly! " +
            "Some random data that will fill up the memory pretty quickly!" +
            "Some random data that will fill up the memory pretty quickly!" +
            "Some random data that will fill up the memory pretty quickly!" +
            "Some random data that will fill up the memory pretty quickly!" +
            "Some random data that will fill up the memory pretty quickly!" +
            "Some random data that will fill up the memory pretty quickly!" +
            "Some random data that will fill up the memory pretty quickly!";
    
    public MemoryFiller()
    {
        everGrowingMap = new HashMap<>();
        
        try
        {
            for (long i = 0; i < 1000000000; i++)
            {
                everGrowingMap.put("String-" + i, randomData);
                
                Thread.sleep(1);
            }
        }
        catch (Throwable t)
        {
            if (t instanceof java.lang.OutOfMemoryError) 
            {
                System.out.println("OUT OF MEMORY ERROR! : " + t);
            }
            else
            {
                System.out.println("Exception : " + t);
            }
        }
    }
    
    /**
     * @param args the command line arguments
     */
    public static void main(String[] args)
    {
        new MemoryFiller();
    }
}

To view what's happening when you run this, you can use a tool called JVisualVM. This useful tool comes with the JDK and can be found in the bin directory - but note that you'll need to add the Visual GC plugin. This can be easily done by going to Tools-Plugins.

Once you have JVisualVM running, run the code with the following parameter in order to set the maximum heap size to 10MB.

java -Xmx100M -jar MemoryFiller.jar

Once the code is running you should see it appear under applications in JVisualVM. Click on it and select the Visual GC tab.

You should then see the Eden Space fill up pretty rapidly until GC occurs and then objects being moved to one of the survivor spaces, this will occur over and over with objects being put into the Old Gen until eventually the Old Gen fills up completely and the program crashes with the following:

java.lang.OutOfMemoryError: Java heap space

In JVisualVM you should see something similar to this:



Now, obviously that code is designed specifically to fill up the memory, but I've seen a number of real world examples of similar things that have caused production servers to crash, managers to question what happened, and support people having no idea!

Gathering logs


OK, so you've (hopefully) got a basic understanding of how GC works and can see what happens when things go wrong - but how do we find out what the JVM is doing with regards to GC?

Well, the JVM already has all of the information you need, you simply need to ask for it. So, in order to turn on GC logging, we generally recommend the following JVM parameters:


-verbose:gc
-Xloggc:path_to_log/gc.log
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps


In older versions of Java the GC log files weren't rotated but built-in support has been added to the HotSpot JVM. It is available in:

  • Java 6 Update 34
  • Java 7 Update 2


There are three new JVM flags that can be used to enable and configure it:

  • XX:+UseGCLogFileRotation - must be used with -Xloggc:<filename>
  • XX:NumberOfGCLogFiles=<number of files> - must be >=1, default is one
  • XX:GCLogFileSize=<number>M (or K) - default will be set to 512K.


One of the questions we're often asked is 'Does turning GC logging on have an effect on performance?’ The performance impact is negligible. Basically, if the impact is an issue then the issue is with the GC itself rather than you logging what is happening.


Run the code again adding in the GC logging parameters. You will need to change the path of where to put the log file:


java -verbose:gc -Xloggc:/home/andy/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xmx10M -jar MemoryFiller.jar 


Once the program crashes you will be left with a log file that can be used to see what happened.

Reading the logs


Open up the log and you will find lines similar to the following:

2016-04-08T10:13:19.572+0100: 11.627: [GC [PSYoungGen: 2624K->368K(3008K)] 2624K->1000K(9856K), 0.0023300 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]

The logs contain information about each time GC takes place in the following format:

{System Time}: {JVM Time}: [{GC Type} {Starting size}->{Final size} ({Amount freed}) , {time taken in seconds}

  • System Time - The time the GC event occurred
  • JVM Time - Time in seconds since the JVM started
  • GC Type - Either GC (minor) or Full GC (full)
  • Starting Size - Size of the space before GC
  • Final Size - Size of the space after GC
  • Total size - Total space size
  • Heap Starting Size - Size of the heap before GC
  • Heap Final Size - Size of the heap after GC
  • Heap Total size - Total heap size
  • Time Taken - How long the GC took in seconds
  • Times - Duration of GC in different categories (user, sys, real)


So, in the line above we get the following information:

  • System Time - 2016-04-08T10:13:19.572+0100
  • JVM Time - 11.627
  • GC Type - GC
  • Starting Size - 2624K
  • Final Size - 368K
  • Total size - 3008K
  • Heap Starting Size - 2624K
  • Heap Final Size - 1000K
  • Heap Total size - 9856K
  • Time Taken - 0.0023300 secs
  • Times - 0.01, 0.00, 0.00



As mentioned earlier, I'm not going to go into details regarding analysing the logs but it's useful to have an understanding of what they contain before you can begin to know what to look for.

There are a number of good tools out there that you can use for visualising the logs, a list of which can be found here:


 One of the quickest and easiest is gceasy - which is available online at http://gceasy.io/

This allows you to upload your log file and get a quick analysis report.

Summary

Hopefully, you now have a basic understanding of what Garbage Collection is and how it works - and can now collect GC logs, understand what data they contain, and what it means.

In my next blog I will be taking a look at JDK 1.8 and the changes to memory with the replacement of PermGen with Metaspace.







12 October 2016

How to Cluster with Cold Fusion


ColdFusion isn't one of the most commonly used application servers, but one that c2b2 Head of Support, Claudio Salinitro stumbled upon during a troubleshooting engagement he performed for one of our middleware support customers. With a remit to embrace new Java technologies, Claudio spent some time investigating ColdFusion, and here describes how to set up a simple cluster.




ColdFusion is definitely not one of the most popular enterprise application servers on the market, but despite a few weaknesses and lack of good documentation, in the right scenario, its small footprint and fast development time can make it a very good choice as part of a Java middleware infrastructure.

In this article, I'm going to cluster two instances using ColdFusion server 2016 with Apache (2.2.31) configured with mod_jk (1.2.41) and do so according to the following logic architecture.





To accomplish this, I'm going to need...

  • 1x Load Balancer
    Whilst I'm using Apache with mod_jk, any other similar solution would do just fine.
  • 2x ColdFusion server instances
    Depending on your resources and needs, these could be on two different machines, or the same.


ColdFusion Configuration


ColdFusion doesn’t have a central admin server like other Java application servers, but instead offers us a “cfusion” instance that is used as a repository for the default configuration, and to create instances and clusters.

So, from the “cfusion” admin web interface (CFIDE) of node2, create a ColdFusion instance named "instance2".

From the “cfusion” admin web interface (CFIDE) of node1 create a ColdFusion instance named "instance1" and register "instance2" of the node2 as a remote instance.

Note - Pay attention to identify the correct HTTP and AJP port, and the JVM Route (double check the server.xml on node2).

From the “cfusion” admin web interface (CFIDE) of node1 create a cluster, add both "instance1" and "instance2" to the newly created cluster.

Remember to flag the options “sticky sessions” and “sessions replication” - and to take note of the multicast port...




Since version 10, ColdFusion no longer uses Jrun, but is running on top of Apache Tomcat, and for this reason, the cluster configuration basically follows the same process as a clean Tomcat installation.

In the [coldfusion_installation_dir]/instance2/runtime/conf/server.xml file add the following configuration between “</Host>” and “</Engine>”:



<Cluster channelSendOptions="8" className="org.apache.catalina.ha.tcp.SimpleTcpCluster">
  <Manager className="org.apache.catalina.ha.session.DeltaManager"  expireSessionsOnShutdown="false" notifyListenersOnReplication="true" />
   <Channel className="org.apache.catalina.tribes.group.GroupChannel">
      <Membership address="228.0.0.4" port="45564" className="org.apache.catalina.tribes.membership.McastService" dropTime="3000" frequency="500"/>
      <Receiver selectorTimeout="5000" address="node2" autoBind="100" port="4001" className="org.apache.catalina.tribes.transport.nio.NioReceiver" maxThreads="6"/>
      <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
        <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
      </Sender>
      <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
      <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
    </Channel>
    <Valve filter="" className="org.apache.catalina.ha.tcp.ReplicationValve"/>
    <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
    <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/>
</Cluster>



Pay attention that the following has been set up correctly:
  • The membership address and port must be the same between all members of the cluster - so double check the server.xml of "instance1" to make sure they are the same.

  • The receiver element address must be the IP address or the hostname of the related node. This IP must be reachable by the other members of the cluster (no 127.0.0.1 or localhost);

Then edit the [coldfusion_installation_dir]/instance2/runtime/conf/context.xml and comment out the <Manager></Manager> section. In my case this was:


<!--<Manager pathname="" />-→


Do the same on node 1.


Load Balancer Configuration


At the moment, the preferred way to configure Apache to work with ColdFusion is using mod_jk - however, Adobe can automatically configure your Apache installation via the wsconfig tool. This can be done during the installation or after, but only works if wsconfig has access to the Apache configuration files - otherwise we can proceed manually.

The mod_jk binary is shipped in the ColdFusion installation directory. Check into the [coldfusion_installation_dir]/cfusion/runtime/lib/wsconfig.jar to find the binary for your Apache version and operating system.

Extract the binary in [apache_installation_dir]/modules.

Then edit the Apache configuration, adding the following:


LoadModule    jk_module  "[apache_installation_dir]/modules/mod_jk.so"

JkWorkerProperty worker.list=lb
JkWorkerProperty worker.instance1.type=ajp13
JkWorkerProperty worker.instance1.host=192.168.204.131
JkWorkerProperty worker.instance1.port=8012
JkWorkerProperty worker.instance1.connection_pool_timeout=60
JkWorkerProperty worker.instance2.type=ajp13
JkWorkerProperty worker.instance2.host=192.168.204.129
JkWorkerProperty worker.instance2.port=8012
JkWorkerProperty worker.instance2.connection_pool_timeout=60
JkWorkerProperty worker.lb.type=lb
JkWorkerProperty worker.lb.balance_workers=instance1,instance2
JkWorkerProperty worker.lb.sticky_session=false

JkMount /session_test/* lb


It’s the standard configuration for mod_jk, to define a module of type “lb” to balance the requests between the other 2 workers - one for the node1 and one for the node2.

Finally, we map the context /session_test to the load balancer worker so that any request starting with /session_test will/ be balanced on the two ColdFusion nodes.


Configuration Test


A note on the “JkWorkerProperty worker.lb.sticky_session” setting:

The default value usually is “true” because it's preferred to keep the requests of the same client on the same server- and/or depending on your applications, it could be a requirement. In our case we need to test the session replication and the load balancing between the nodes, so that setting will make everything easier.

I’m going to use the session_test web application provided by Steven Erat in his No Frills Guide to CFMX61 Clustering post, but since this was created for ColdFusion 6, I had to update the code to run on Tomcat.

On both nodes, create the folder [coldfusion_installation_dir]/[node]/wwwroot/session_test, and inside it create the following five files:

Application.cfm:


<cfapplication name="j2ee_session_replication_test" sessionmanagement="Yes" clientmanagement="No">
<cfscript>
        System = createObject("java","java.lang.System");
</cfscript>


index.cfm:

<cfscript>
    session.currentTimestamp = timeformat(now(),"HH:mm:ss");
    message = "[#session.currentTimestamp#] [#createobject("component","CFIDE.adminapi.runtime").getinstancename()#] [New Session: #session.isnew()#] [id: #session.sessionid#]";
    System.out.println(message);
    WriteOutput(message);
</cfscript>
<br><br>System.out has written the above data to the console of the active jrun server
<br><br><a href="index.cfm">Refresh</a> | <a href="cgi.cfm">CGI</a> | <a href="sessionData.cfm"><cfif not isdefined("session.myData")>Create<cfelse>View</cfif> nested structure</a>
<br><br>
<cfif not isDefined("session.session_lock_init_time")>
    <cflock scope="SESSION" type="EXCLUSIVE" timeout="30">
        <cfset session.session_lock_init_time = timeformat(now(),"HH:mm:ss")>
        <cfset session.session_lock_init_servername = #createobject("component","CFIDE.adminapi.runtime").getinstancename()#>
    </cflock>
</cfif>
<cfif session.session_lock_init_servername neq #createobject("component","CFIDE.adminapi.runtime").getinstancename()#>
    <CFSET session.session_failedOver_to = #createobject("component","CFIDE.adminapi.runtime").getinstancename()#>
    <br><br>
    <strong><font color="red">
    Session has failed over
    <BR>from <cfoutput>#session.session_lock_init_servername#
    <BR>to #createobject("component","CFIDE.adminapi.runtime").getinstancename()#</cfoutput>
    </font></strong><br><br>
<cfelseif isDefined("session.session_failedOver_to")>
    <br><br><strong><font color="green">
    Session has been recovered to original server
    after a failover to <cfoutput>#session.session_failedOver_to#</cfoutput>
    </font></strong><br><br>
</cfif>
<cfdump var="#session#" label="CONTENTS OF SESSION STRUCTURE">


cgi.cfm:


<a href="index.cfm">Back</a><br><br>
<cfdump var="#cgi#" label="current file path: #getDirectoryFromPath(expandPath("*.*"))#">


session.cfm:


<a href="index.cfm">Index</a><br><br>
<cfdump var="#session#" label="session scope">


sessionData.cfm:


<a href="index.cfm">Back</a><br><br>

<cfscript>
    if(not isdefined("session.myData")){
        writeOutput('<font size="4" color="red">Creating nested session data...</font><br><br>');
        // create deep structure for replication
        a.time1 = now();
        a.time2 = now();
        b.time1 = now();
        b.time2 = now();
        session.myData["ab"]["a"] = a;
        session.myData["ab"]["b"] = b;
        session.myData["a"] = a;
        session.myData["b"] = b;
        session.myData["mydata_session_init_time"] = timeformat(now(),"HH:mm:ss");
        session.myData["mydata_session_init_servername"] = #createobject("component","CFIDE.adminapi.runtime").getinstancename()#;
    }
</cfscript>

<br><br><cfdump var="#session.myData#" label="CONTENTS OF SESSION.MYDATA">
<cfoutput>
<br><br>Current Time: #timeformat(now(),"HH:mm:ss")#
<br><br>Current Server: #createobject("component","CFIDE.adminapi.runtime").getinstancename()#
</cfoutput>


Then, make multiple attempts to access the following URL:

http://[load_balancer_host]/session_test/ 

and test that:

  1. The requests are balanced alternatively between "instance1" and "instance2"
  2. The session created with the first request is replicated between the two nodes (id doesn’t change)
  3. That, in case of failure of one of the two nodes, the requests are sent only to active node;
  4. In case the failed node starts to be active again, the load balancer once again balances the requests on both nodes.

With this setup you will have a simple but solid ColdFusion infrastructure with high availability thanks to the session sharing. Also in case of increased load, you can easily horizontally scale your setup, adding additional nodes to the cluster and load balancer.

And on a final note, to separate the client's network traffic from the cluster session replication traffic, the best way forward is to have two dedicated network interfaces.