So first session of the day is;
Apache TomEE a Java EE 6 Web Profile on TomcatDavid Blevins
David is the founder of the TomEE project and he started the session with a poll about who uses Tomcat and who uses other JEE servers and not surprisingly most people use both. Also about half the hall were using Spring and half using JEE 5 or 6.
In case you didn't know TomEE is a web profile certified Tomcat it adds OpenJPA, OpenWebBeans, OpenEJB, MyFaces and CXF and ActiveMQ if need be. The aim is to be small, certified and Tomcat (in that your Tomcat knowledge is valid).
There are a number of flavours;
- TomEE (Web Profile certified)
- TomEE JAX-RS (Web Profile certified)
- TomEE Plus JAX-WS, Connector, JMS (NOT certified)
TomEE is built by taking a Tomcat zip and unzipping it and adding in and modifying a few files. A couple of conf files, some bin files and a bunch of jars including patching the endorsed directory to patch jaxb and annotations.
The only changes to Tomcat are;
- Adding a listener into Tomcat which does all the JEE annotations scanning.
- Adding a Java Agent to the JAVA_OPTS to do the JPA byte code fraggling.
- Patching ENDORSED only on the JDK6 to the command line
TomEE is small 27Mb download and runs in 64Mb RAM and works with all Tomcat tools. TomEE boots in around 900ms.
David then demos creating a servlet adding @Singleton to a bean to get EJB support and then adding JAX-WS and JAX-RS annotations to expose the bean service as a web service and as a REST service in seconds showing the power of JEE6 and TomEE over Tomcat. As David says in the talk it is difficult to justify building all this stuff onto Tomcat yourself by pulling in all those capabilities into a war as TomEE does it all, scans the jars only once and passes the TCK.
I wonder if you can create a TomEE template for a vFabric tcServer.
Advanced JVM Tuning
This talk on tuning is a packed session by some of the JVM engineers. Their proposed methodology in development is;
- Monitor the OS
- Monitor the JVM
- Profile Application
- Modify Code
-XX:+TieredCompilation is not on by default uses Interpreter->Client Compiler->Server Compiler. Large majority will see performance benefit but in large applications (100,000's classes) puts pressure on the Reserved Code Cache so you may need to increase that.
-XX++AlwaysPreTouch - touches the memory on large heaps so you won't take a hit on faulting pages in after the heap has been reserved but never used sometime during the application. Use if you see performance hits on large heaps.
-XX:StringTableSize=n has default size of 1009 and is used by interned strings so if you need more you can increase the size. -XX:+PrintStringTableStatistics will give you diagnostics.
-XX:+ReservedCodeCacheSize= default size is 64mb or 96Mb when running tiered compilation if you run out of code cache compilation stops. You will see that in -XX:+PrintCompilationOutput that you've hit the limit.
For compressed OOPS between 26Gb - 32 Gb does extra calculations to workout the address above 64Gb you need to add -XX:+ObjectAllignmentInBytes=16
Some OS tuning that may help is tuning the Linux Scheduler by using groups with CFS.
If there are multiple JVMs on an OS try to reduce the ParallelGCThreads so that all GC threads <= the number of hardware threads on the host.
set java.io.tmpdir to the operating system temp directory especially if you do a lot of bytecode manipulation.
The talk then switched to a section on GC tuning.
Tne Number one goal is collect as many objects in the young generation so size your young gen to make sure you collect young objects while they are young. Then you try and size old to hold your long term working set. If you have medium lived objects size the survivor space but make sure it's not too big otherwise you are wasting heap.
If latency is important their is a tradeoff between incremental and non-incremental collectors. Incremental collectors have more uniform pauses but throughput suffers.
Tuning for throughput
First do nothing let the ergonomics of the JVM do their stuff. Monica suggested setting -Xmx to a little bit higher than -Xms if you are hitting OOME when setting the two the same. This should give you the full normal working heap but give a little head room to prevent failures and OOME could be useful in critical systems. -XX:PrintAdaptiveSizePolicy and -XX:+PrintTenuringDistribution will both help you view and tune the JVM ergonomics along with -XX:+PrintGCDetails
G1 for throughput. The defaults for Min and Max nursery are 20% and 80%. Max pause time target defaults to 200ms. GCTimeRatio defaults to 9 so the GC can steal 10% of the cpu time. Don't set aggressive pause time targets as this will increase the GC overhead as you have asked for low pause times so concurrent collection will use more of your cpu .
Tuning for Low Latency
You need to know your pauses by monitoring the GC time. Young GC pauses in CMS is a parallel collector so if you are seeing too long pause times reduce your Young size if they are coming too frequently increase the YoungGC. Promotion of objects in CMS is expensive the more objects promoted and subsequently collected the more fragmentation will occur. and fragmentation should be avoided. Occupancy flags need tuning the JVM tries to optimise these but you need to make sure the CMS collector starts in sufficient time not to run out of heap. When tuning G1 for low latency -XX:MaxGCPauseMillis G1 will adjust young generation sizes and heap size to meet the pause time. Monitor MixedGCs to make sure they aren't killing your pause time goals. You can tune using the HeapOccupancy factor flags.
Tuning for Footprint
You need to understand your allocation rate, promotion rate and live data set size These can be tracked from -XX:+PrintGCDetails
The axis of tuning for GC is Throughput, Footprint and Low Latency you typically have to sacrifice one for the other.
GlassFish: From the clustering to the Cloud
This was a talk about clustering and high availability in GlassFish 4. Also PAAS related features that may or may not be coming in version 4.
A useful feature in GlassFish 3 is active redeployment which enables redeployment of application while retaining session state.
GlassFish supports versioned deployments
asadmin deploy --name=foo:BETA-1.0 foo.war
asadmin enable foo:BETA-1.1
asadmin undeploy foo:BETA-*
sessions can be retained if the session is compatible. Full side by side deployment may come in GlassFish 4 with old users on the old version and new users are directed onto the new version.
Clusters are configured from the DAS which is the GlassFish admin server. Other GlassFish instances are "nodes". Load balancing is via mod-jk and Apache. New instances can be added to the cluster and it changes dynamically. Session replication requires the usual <distributable/> tag.
The talk covered some cloud features which may come to GlassFish in the JEE8 timeframe which is 2015! These were around deploying a cluster configuration with a single click when deploying a web application. For example creating a load balancer and a number of GlassFish nodes in a cluster while deploying the application. Not sure what would happen if you want to deploy multiple applications to the same cluster!
The features seen were in an old promoted build of GlassFish 4 but are no longer present. There may be capabilities using asadmin to spin up extra virtualised guest machines hosting GlassFish. I'm not sure that GlassFish is the correct architectural component to spin up additional virtualised guests. In my view it is better to ensure middleware can be elastic so that clusters can grow and shrink when additional resources are added via IAAS consoles.
Monitoring and Managing a Complex Data Cloud
Boris Livshutz App Dynamics
A talk by appdynamics looking at monitoring NoSQL stores in operations. Defined a data cloud as a lot of nodes storing a single data set and this can include NoSQL and sharded RDBMS. An interesting but strange Java One talk as it was about MySQL sharding and no Java content.
Dealing with JVM limitations in Apache Cassandra
Jonathan Ellis CTO Data Stax
This talk was about Cassandra's use of the JVM and the problems encountered. Main pain points experienced were;
- Garbage Collection
- Platform specific code
-XX:+UseCondCardMark on multicore systems/ Cassandra now reads/writes per second - 30000 writes per second.
If you enjoyed this post, why not subscribe to our feed?