28 July 2014

Getting the most out of WLDF Part 4: The Monitoring Dashboard

Read Part 1: "What is the WLDF?" here
Read Part 2: "Watches" here
Read Part 3: "Notifications" here

This is going to be a fairly short post, because there isn’t a huge amount to go into that we haven’t already covered!

The WLDF monitoring dashboard gives a visual representation of available metrics from WebLogic MBeans. If you know how to drag-and-drop, then you have all the technical ability you need.


In this blog post, I will refer to an annotated image with colour coded headings so you can see which part I’m talking about.

25 July 2014

MIDDLEWARE INSIGHT - C2B2 Newsletter Issue 18



Featured News
What's New in Oracle SOA Suite 12c? - read more 
What's Happening with Java EE? - read more 


JAVA EE / OPEN SOURCE
What's Happening with Java EE? Short interview with David Delabassee, see here 
Java Magazine: The Java Virtual Machine, see more on the Oracle Blog 
It's time to begin JMS 2.1! Read more here 
Java EE Concurrency API Tutorial, read the article by Francesco Marchioni 
HornetQ and ActiveMQ: Messaging - the next generation, find out more on Jaxenter.com 
Spring Boot 1.1.4 supports the first stable Tomcat 8 release, read more on Jaxenter.com 
RxJava + Java8 + Java EE 7 + Arquillian = Bliss , read the article by Alex Soto 
The 5 Best New Features of the Raspberry Pi Model B+, read more on the Life Hacker website 
Spotlight on GlassFish 4.0.1: #2 Simplifying GF distributions , read more on the Aquarium blog 
Jersey SSE Capability in GlassFish 4.0.1 read the article by Abhishek Gupta 

ORACLE
SOA Suite 12c is available for download , find out more on the SOA Community Blog 
‘What's New in Oracle SOA Suite 12c?’ read the blog post by Andrew Pielage here  
‘What's New in Oracle SOA Suite 12c?’ Register for the C2B2 launch event in London on the 12th of September 
Oracle urges users to adhere 113 patches pronto, read more on Jaxenter.com 
Docker, Java EE 7, and Maven with WebLogic 12.1.3, read the article by Bruno Borges
'Testing Java EE Applications on WebLogic Using Arquillian' with Reza Rahman, join the Oracle Webcast on the 29th of July

JBOSS & RED HAT
Red Hat JBoss Data Grid 6.3 is now available! , read more on Arun Gupta’s Blog 
JBoss-Docker shipping continues with launch of microsite, read more on Jaxenter.com 
Your tests assume that JBoss is up and running, read the article by Antonio Goncalves  
Rule the World - Practical Rules & BPM Development join London JBUG Event on the 7th of August, fing out more and register here 
Red Hat JBoss BRMS & JBoss BPM Suite 6.0.2.GA released into the wild, read more on Eric Schabell’s blog  
Hibernate Hidden Gem: The Pooled-Lo Optimizer, read the article by Vlad Mihalcea 
Camel on JBoss EAP with Custom Modules, read the article by Christian Posta 
Red Hat JBoss Fuse - Getting Started, Home Loan Demo Part , read the article by Christina Lin 

  BIG DATA & DATA GRIDS
Processing on the Grid, read the article by Steve Millidge
James Governor In-Memory Data Grid: Less Disruptive NoSQL, see more on the Hazelcast Blog 
Designing a Data Architecture to Support both Fast and Big Data, read more on  Dzone 
Scaling Big Data fabrics, read the article by Mike Bushong 
Industry Analyst Insight on How Big Data is Bigger Than Data, read more on the Pivotal blog 

18 July 2014

Processing on the Grid

If you ever have the luxury of designing a brand new Java application there are many, new, exciting and unfamiliar technologies to choose from. All the flavours of NoSQL stores; Data Grids; PaaS and IaaS; JEE7; REST; WebSockets; an alphabet soup of opportunity combined with many programming frameworks both on the server side and client side adds up to a tyranny of choice.

However, if like me, you have to architect large scale, server-side, Java applications that support many thousands of users then there are a number of requirements that remain constant. The application you design must be high-performance, highly available, scalable and reliable. 

It doesn’t matter how fancy your new lovingly crafted JavaScript Web2.0 user interface is, if it is slow or simply not available nobody is going to use it. In this article I will try and demystify one of your choices, the Java Data Grid and show how this technology can meet those constant non-functional requirements while at the same time taking advantage of the latest trends in hardware.

Latency: The performance killer


When building large scale Java applications the most likely cause of performance problems in your application is latency. Latency is defined as the time delay between requesting an operation, like retrieving some data to process, and the operation occurring. Typical causes of latency in a distributed Java application are:

• IO latency pulling data from disk
• IO latency pulling data across the network
• Resource contention for example a distributed lock
• Garbage Collection pauses



For example typical ping times across a network range from; 57 μs on a local machine; 300 μs on a local LAN segment through to 100 ms from London to New York. When these ping times are combined
with typical network data transfer rates; 25 MB–30 MB/s for 1 Gb Ethernet; 250 MB/s–350 MB/s for 10 Gb Ethernet a careful trade-off between operation frequency and data granularity must be made to achieve acceptable performance. Ie. if you have 100 MB of data to process the decision between making 100 calls across the network each retrieving 1 MB, or 1 call retrieving the full 100 MB will depend on the network topology. Network latency is normally the cause of the developer cry, “It was fast on my machine!” Latency due to disk IO is also a problem, a typical SSD when combined with a SATA 3.0 interface can only deliver data at a sustained data rate of 500–600 MB/s so if you have Gigabytes of data to process disk latency will impact your application performance.

The hardware component with the lowest latency is memory, typical main memory bandwidth, ignoring Cache hits, is around 3–5 GB/s and scales with the number of CPUs. If you have 2 processors you will get 10 GB/s and with 4 CPUs 20 GB/s etc. John McCalpin at Virginia maintains a memory benchmark called STREAM (http://www.cs.virginia.edu/stream/) which measures the memory throughput of many computers with some achieving TB/s with large numbers of CPUs. In conclusion:

Memory is FAST: And therefore, for high performance, you should process data in memory.
Network is SLOW: Therefore for high performance minimise network data transfer. 

The question then becomes is it feasible to process many Gigabytes of data in memory? With the costs of memory dropping it is now possible to buy single servers with 1 TB of memory for only a few £30K–£40K and the latest SPARC servers are shipping supporting 32 TB of RAM so Big Memory is here. The other fundamental shift in hardware at the moment is the processing power of single hardware threads is starting to reach a plateau with manufactures moving more into providing CPUs with many cores and many hardware threads. This trend forces us to design our Java applications in a fashion that can utilise the large number of hardware threads appearing in modern chips.
Parallel is the Future: For maximum performance and scalability you must support many hardware threads.

Data Grids


You may wonder what all this has to do with Java Data Grids. Well, Java Data Grids are designed to take advantage of these facts of modern computing and enable you to store many 100 s of GB of Java objects in memory and enable parallel processing of this data for high performance.

A Java Data Grid is essentially a distributed key value store where the key space is split across a cluster of JVMs and each Java object stored within the grid has a primary object on one of the JVMs and a secondary copy of the object on a different JVM. These duplicates ensure High Availability as if a single JVM in the grid fails then no Java objects will be lost.

The key benefits of the partitioned key space in a Data Grid when compared to fully replicated clustered Cache are that the more JVMs you add the more data you can store and access times for individual keys are independent of the number of JVMs in the grid.

For example, if we have 20 JVM nodes in our Grid each with 4 GB of free heap available for the storage of objects then we can store, when taking into account duplicates, 40 GB of Java objects. If we add a further 20 JVM nodes then we can store 80 GB. Access times are constant to read/write objects as the grid will go directly to the JVM which owns the primary key space for the object we require.
JSR 107 defines a standards based API to data grids which is very similar to the java.util.Map API as shown in Listing 1. Many Data Grids also make use of Java NIO to store Java objects “off heap” in Java NIO buffers. This has the advantage that we can increase the memory available for storage without increasing the latency from garbage collection pause times.
 
Listing 1
public static void main( String[] args )
{
CacheManager CacheManager = Caching.getCachingProvider().
getCacheManager();
MutableConfiguration<String, String> config = new MutableConfiguration<String,
String>();
CacheManager.configureCache("C2B2",config);
Cache Cache = CacheManager.getCache("C2B2");
Cache.put("Key", "Value");
System.out.println(Cache.get("Key"));
}


Parallel processing on the Grid

The problem arises when we store many 10 s of GB of Java objects across the Grid in many JVMs and then want to run some processing across the data set. For example, we may store objects representing hotels and their availability on dates. What happens when we want to run a query like “find all the hotels in Paris with availability on Valentines day 2015”? If we follow the simple Map API approach we would need to run code like that shown in Listing 2.

However the problem with this approach, when accessing a Data Grid, is that the objects are distributed according to their keys across a large number of JVMs and every “get” call needs to serialize the object over the network to the requesting JVM. Using the listing above this could pull 10s of GB of data over the network which as we saw earlier is slow.

Thankfully most Java Data Grid products allow you to turn the processing on its head and instead of pulling the data over to the code to process they send the code to each of the Grid JVMs hosting the data and execute it in parallel in the local JVMs. As typically the code is very small in size only a few KB of data needs to be sent across the network.

Processing is run in parallel across all the JVMs making use of all the CPU cores in parallel. Example code, which runs the Paris query across the Grid, for Oracle Coherence,a popular Data Grid product is shown in Listing 3 and 4.

Listing 3 shows the code for a Coherence EntryProcessor which is the code that will be serialized across all the nodes in the data grid.

This EntryProcessor will check each hotel as before to see if there is availability for Valentine’s day but unlike in Listing 2 it will do so in each JVM on local in-memory data. JSR107 also has the concept of an EntryProcessor so the approach is common to all Data Grid products.

Listing 4 shows the Oracle Coherence code needed to send this processor across the Data Grid to execute in parallel in all the grid JVMs. Processing data using EntryProcessors as shown in Listings 3 and 4 will result in much greater performance on a Data Grid than access via the simple Cache API. As only a small amount of data will be sent across the network and all CPU cores across all the JVMs will be used to process the search.




Fast Data: Parallel processing on the Grid

As we’ve seen, using a Data Grid in your next application will enable you to store large volumes of Java objects in memory for high performance access in a highly available fashion. This will also give you large scale parallel processing capabilities that utilise all the CPU cores in the Grid to crunch through processing Java objects in parallel. Take a look at Data Grids next time you have a latency problem or you have the luxury of designing a brand new Java application.

Listing 2
public static void main( String[] args )
{
CacheManager CacheManager = Caching.getCachingProvider().
getCacheManager();
MutableConfiguration<String, String> config = new MutableConfiguration<String,
String>();
CacheManager.configureCache("ParisHotels",config);
Cache hotelCache = CacheManager.getCache("ParisHotels");
Date valentinesDay = new Date(2015,2,14); // I know it is deprecated
for (String hotelName : hotelNames ) {
Hotel hotel = (Hotel)hotelCache.get(hotelName);
if (hotel.isAvailable(valentinesDay)){
System.out.println("Hotel is available" + hotel);
}
}
}
Listing 3
Public class HotelSearch implements EntryProcessor {
HotelSearch(Date availability) {
this.availability = availability;
}
Map processAll(Set hotels) {
Map mapResults = new ListMap();
for (Entry entry : hotels) {
Hotel hotel = (Hotel)entry.getValue();
if (hotel.isAvailable(this.availability)) {
}
}
}
}

Listing 4
public static void main( String[] args )
{
NamedCache hotelCache = CacheFactory.getCache("ParisHotels");
Date valentinesDay = new Date(2015,2,14); // I know it is deprecated
Map results = hotelCache.processAll((Filter)null, new
HotelSearch(valentinesDay);
}


This article was originally published in JaxMagazine #35 Issue Janauary 2014