3 April 2014

Getting the most out of WLDF Part 2: Watches

Read Part 1: What is the WLDF? here

In this post, I'll be looking at using watches in WLDF.

What is a watch?
A watch, at its most basic, is simply a way to monitor one of three things:
  • MBeans
  • A server log
  • Instrumentation (event) data
To configure an instrumentation watch, you first need to know what instrumentation is, and how to instrument applications or servers, so we’ll put that to one side for now.

A server log watch is exactly that – a watch to monitor the server log for anything you want! For example, all Critical severity log entries, entries which mention a particular server or particular log message IDs.

An MBean watch relies on the Harvester to collect server runtime MBean metrics which does not need to be configured separately for your watch to work, but do bear in mind that the data gathered will not be archived unless you configure the Harvester properly:

Note:
If you define a watch rule to monitor an MBean (or MBean attributes) that the Harvester is not configured to harvest, the watch will work. The Harvester will "implicitly" harvest values to satisfy the requirements set in the defined watch rules. However, data harvested in this way (that is, implicitly for a watch) will not be archived. See Chapter 7, "Configuring the Harvester for Metric Collection," for more information about the Harvester.

How do I make a watch?
I’ve already mentioned that Instrumentation watches require a little understanding of instrumentation first, so I won’t cover them here. If you’re already familiar with instrumentation, then configuring watches for your instrumented applications isn’t too tricky.


Step 1: Create a Diagnostic Module
The first step in creating watches is always the same. In the Domain Structure pane, select “Diagnostic Modules” under the “Diagnostics” entry. 

Select a diagnostic module if you’ve created one, or create a new one if not. Since creating a new module only requires you to name it (and provide an optional description), you’ll need to configure it once you’ve created it. The most important thing to do is to target it to the server you want to monitor.



Step 2: Create a Watch
Once the module is targeted, click the configuration tab, then “Watches and Notifications”. In the lower pane, click the “New” button to create a new watch and choose whether it should be a Server Log or Collected Metric watch (making sure that the “Enable Watch” checkbox is checked) then click “Next”


Step 3: Define the Rule Expressions
You should now be presented with the following screen to create your watch rule expressions:


There are two ways to build rule expressions. Highlighted in red is a large text box which you will find is not editable. Clicking the “Edit” button will take you to a page where you can directly edit the rule as text. If you’re not familiar with WLDF query language rules, that might not seem like the most helpful feature but when you consider that it allows you to create a rule expression and copy the text for future reuse, the value becomes clear.

Highlighted in blue is the expression builder. Clicking “Add Expressions” will take you to a page where you can construct individual expressions. The most useful part of this feature is that it gives dropdown lists for the available attributes and operators:


The expression builder can also be used to arrange these expressions in a complex, but helpfully visual way, as shown below in an example server log watch:

Step 4: Configure an alarm
An alarm can be manually or automatically reset. If manually reset then the alarm will fire once and be disabled until there is manual intervention to reset it. Automatically resetting alarms will reset after a period of time (specified in seconds. This value will be the maximum frequency of the alarm triggering. For example, if an event happens regularly every 10 seconds and an alarm is configured to reset every 11 seconds, then we will get this scenario:
  • The alarm is active and the event occurs, triggering and disabling the alarm.
  • 10s later, the event happens again, but the alarm is still disabled.
  • 1s later, the alarm is reset
  • 9s later, the event happens again and this time, the alarm is not disabled, so triggers again.
This scenario is a little contrived, but it shows that setting the reset period to 11 seconds does not mean that the alarm will fire every 11 seconds, as in this case where it fired with a 20 second gap.


Step 5: Configure watch notifications
If you have already configured a notification, you can add it here. If not, just click save. We won’t cover notifications in this post, but they can always be added retrospectively to any watch.


Using the WLDF Query Language
We’ve actually already touched on the WLDF query language when we covered rule expressions. The example above shows how you can add expressions very easily to build complex rules for log watches so I won’t go over that again, other than to point out the WLDF Query Language reference page which contains a table showing all possible variables for log messages: http://docs.oracle.com/cd/E17904_01/web.1111/e13714/appendix_query.htm#g1062247

MBean watches are a little more complex, however, although they can still be constructed with a step-by-step interface in the admin console or written as text. Either way, there are a huge number of possible MBeans to monitor; each with their own list of attributes which need to be specified in expressions. The full MBean reference, including attributes, is documented here: http://docs.oracle.com/cd/E12839_01/apirefs.1111/e13951/core/index.html 

Browsing the “Runtime MBeans” topic in the list shows a number of available MBeans, one of which is the ServerRuntimeMBean, which has an attribute called OpenSocketsCurrentCount. I’ll show how to create an MBean watch expression which uses this attribute using the graphical interface.


Step 1
As in the log example, the first thing to do is to create a diagnostic module, if one does not already exist, and to create a new watch, choosing to create a Collected Metric watch. Once the watch is created, configure it as before and click “Add Expressions” on the Rule Expressions tab as before:


As you can see, I have already configured one expression to watch the number of currently open sessions. There are a few different parts to this rule, which apply to any MBean watch rule. The first three parts (red, blue and green) are enclosed in a dollar and parentheses ( ${…} ) because they contain special characters. The red part is the name of the server which holds the instance of the MBean to be queried. On my development server, I only have an AdminServer instance. Next, in blue, is the “type” which refers to the MBean to look up on the server. The green part, separated by a double forward-stroke, is the attribute name of the MBean. Finally, in orange, is the rule itself to apply to that MBean attribute.

Some of you reading this blog post might have already guessed exactly what the open sockets rule is going to look like: (${com.bea:Name=AdminServer,Type= ServerRuntime // OpenSocketsCurrentCount } >= 1). I’ll still show the graphical steps to how to get to that point, since it demonstrates how the GUI can be used effectively.


Step 2
After clicking “Add Expression”, you’ll need to choose whether you want to query the Domain Runtime, or the Server Runtime. We want to look at a value which is specific to a server instance, so choose Server Runtime and click “Next”. You will be presented with a dropdown box of available MBeans. The WebLogic MBean reference I linked to earlier shows all weblogic.management.runtime.* MBeans, so choose the ServerRuntimeMBean as shown:



Step 3
Clicking “Next” will allow you to choose the MBean instance on the correct server:



Step 4
Finally, we can select the MBean attribute and choose the operator and value to evaluate by:



Clicking Finish will show our completed WLDF expression:




Going Further
On my test server, I create two watches: one Server Log watch and one Collected Metrics watch. Both are monitoring sockets, the first monitoring the logs for any socket errors and the second monitoring the OpenSocketsCurrentCount attribute of the ServerRuntimeMBean and alerting when there is more than one socket open.

Below is the output from the watches as I have configured them:

 ####<26-Mar-2014 11:17:30 o'clock GMT> <Notice> <Diagnostics> <Mike-PC> <AdminServer> <[ACTIVE] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1395832650884> <BEA-320068> <Watch 'SocketsOpen' with severity 'Notice' on server 'AdminServer' has triggered at 26-Mar-2014 11:17:30 o'clock GMT. Notification details:   
 WatchRuleType: Harvester   
 WatchRule: ${com.bea:Name=AdminServer,Type=ServerRuntime//OpenSocketsCurrentCount} >=1   
 WatchData: com.bea:Name=AdminServer,Type=ServerRuntime//OpenSocketsCurrentCount = 2   
 WatchAlarmType: AutomaticReset   
 WatchAlarmResetPeriod: 10000   
 >   
 ####<26-Mar-2014 11:17:39 o'clock GMT> <Error> <Socket> <Mike-PC> <AdminServer> <[ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1395832659211> <BEA-000403> <IOException occurred on socket: Socket[addr=/127.0.0.1,port=58139,localport=7001]  
  java.net.SocketException: recv failed: Descriptor not a socket.  
 java.net.SocketException: recv failed: Descriptor not a socket  
      at jrockit.net.SocketNativeIO.readBytesPinned(Native Method)  
      at jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:32)  
      at java.net.SocketInputStream.socketRead0(SocketInputStream.java)  
      at java.net.SocketInputStream.read(SocketInputStream.java:129)  
      at weblogic.socket.SocketMuxer.readFromSocket(SocketMuxer.java:980)  
      at weblogic.socket.SocketMuxer.readReadySocketOnce(SocketMuxer.java:922)  
      at weblogic.socket.SocketMuxer.readReadySocket(SocketMuxer.java:888)  
      at weblogic.socket.JavaSocketMuxer.processSockets(JavaSocketMuxer.java:339)  
      at weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)  
      at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)  
      at weblogic.work.ExecuteThread.run(ExecuteThread.java:178)  
 >   
 ####<26-Mar-2014 11:17:39 o'clock GMT> <Notice> <Diagnostics> <Mike-PC> <AdminServer> <[ACTIVE] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1395832659211> <BEA-320068> <Watch 'LogWatch' with severity 'Notice' on server 'AdminServer' has triggered at 26-Mar-2014 11:17:39 o'clock GMT. Notification details:   
 WatchRuleType: Log   
 WatchRule: (SEVERITY = 'Error')   
 WatchData: DATE = 26-Mar-2014 11:17:39 o'clock GMT SERVER = AdminServer MESSAGE = IOException occurred on socket: Socket[addr=/127.0.0.1,port=58139,localport=7001]  
  java.net.SocketException: recv failed: Descriptor not a socket.  
 java.net.SocketException: recv failed: Descriptor not a socket  
      at jrockit.net.SocketNativeIO.readBytesPinned(Native Method)  
      at jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:32)  
      at java.net.SocketInputStream.socketRead0(SocketInputStream.java)  
      at java.net.SocketInputStream.read(SocketInputStream.java:129)  
      at weblogic.socket.SocketMuxer.readFromSocket(SocketMuxer.java:980)  
      at weblogic.socket.SocketMuxer.readReadySocketOnce(SocketMuxer.java:922)  
      at weblogic.socket.SocketMuxer.readReadySocket(SocketMuxer.java:888)  
      at weblogic.socket.JavaSocketMuxer.processSockets(JavaSocketMuxer.java:339)  
      at weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)  
      at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)  
      at weblogic.work.ExecuteThread.run(ExecuteThread.java:178)  
  SUBSYSTEM = Socket USERID = <WLS Kernel> SEVERITY = Error THREAD = [ACTIVE] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)' MSGID = BEA-000403 MACHINE = Mike-PC TXID = CONTEXTID = TIMESTAMP = 1395832659211   
 WatchAlarmType: AutomaticReset   
 WatchAlarmResetPeriod: 5000   
 >  


As they are, these two watches are not too useful. They have alarms configured, but both just write to the server log! Since one of them is a watch on the server log anyway, then why wouldn’t I just look at the server log to see when there were socket errors?

This is where notifications come in! I’ll cover notifications in a separate blog post.



| View Mike Croft's profile on LinkedIn | Mike Crofton Google+

No comments :

Post a Comment