16 November 2011

Extending Hyperic with Simple Agent Plugins

Intro
Hyperic is a powerful monitoring and management platform with a bunch of functionality available out-of-the-box. Hyperic Agents report metrics to a Hyperic Server, which then exposes a web portal to the Hyperic view of your environment.


Hyperic Agents monitor resources based on plugins that reside locally within the Agent. By default many plugins are available for monitoring things like MySQL, Apache Tomcat and OS information (even more are in the vFabric enterprise edition). 


Hyperic really adds value with it's ability to set proactive alerts on data being collected, and turn this into physical actions that can be invoked across your infrastructure. Think of being able to redirect a load balancer, or spawn new instances of servers to cope with demand and you're on the right lines.


This blog post will focus on providing a getting started tutorial to begin writing your own Agent plugins. We'll cover everything, presuming that you have a working Hyperic Agent and Server installed and ready to go! In particular I'm concerned with monitoring a single running executable process, one which Hyperic by default would not pick up. We'll then look at putting some actions in place to react to the absence of this resource. This is a real-world scenario that reflects the sort of day to day operations where automation has tangible benefits.


Getting Started
We'll begin the tutorial by reviewing what we intend to achieve and seeing what documentation is already available to help us.


Docs
The Hyperic docs are pretty good. This page shows that the basic concepts to understand are: 
  • Auto Discovery - Agent plugins have a mechanism of automatically finding the resource to monitor
  • Monitoring - Agent plugins are configured with the information to collect and send back to the Hyperic Server, and how to collect it
  • Control - optionally, plugins can define operations to perform on a resource
We can see that plugins are grouped into typed categories depending on how they work. Base support classes are available in Hyperic to help bootstrap plugin developers, these classes allow us to simply define XML descriptors and wire functionality together.
The support types available are:
    -  Scripting  - for interacting with your own scripts
    -  SNMP       - for dealing with SNMP messages to external devices
    -  JMX         - for interfacing with Java processes via available JMX instrumentation
    -  JDBC       - to call directly into a database
    -  Win-Perf  - to gather information from Windows Performance Counters
    -  SIGAR       - to interact with an OS in an agnostic fashion (IE, we don't care if you're a Linux, OSX or Win platform - tell me what processes are running!)
    -  Net          - HTTP, FTP, etc operations
    -  Vendor     - for platform specific things, like talking to ESX for instance



Requirements
I want to monitor a process, for now my Dropbox process will do. If I cannot easily share my resources across development environments then I am going to suffer! In reality, something like Apache HTTPD Web Server or another business critical process is a good choice.


Here's a snap of what I'm after.
For now, we'll settle with detecting the process and collecting some information about it.


In order to see something meaningful in Hyperic we're going to leverage it's mechanism of describing environments - it's Inventory model.


Hyperic describes things in terms of 3 base types; Services, Servers and Platforms. Platforms are the root element - everything starts with a platform, be it a GemFire data grid, an Operating System or a Router. On Platforms we can define Services or Servers to describe the resources on the platform. Servers can contain other Servers or Services. A good analogy is an OS is a platform with it's CPU usage available as a Service, a Server would be Tomcat, and the Tomcat HTTP Connector is a Service within the Tomcat Server.


Pre-Reqs
As mentioned in the intro, I'm presuming that you already have a working Hyperic Agent and Server available. 


Here's a screenshot of my Hyperic Server showing my a platform (server1) to which I have an Agent deployed.


And here's the default resources I have already imported. We have green availability for these servers across the board; my plugins are all operating without error and we are good to go. 
As you can see, I'm only collecting details on the Hyperic components themselvels, plus the MySQL instance acting as a datastore for Hyperic. I have a tcServer instance running but you can ignore that for now.


To Your Marks...


How do we start?


Well, all plugins need a base structure, here's one for you to begin.

<?xml version="1.0" encoding="UTF-8"?>
<plugin name="c2b2-dropbox-process-detection">
  <metric name="Availability"
    indicator="true"
    template=""/>
  <plugin type="measurement"     class="org.hyperic.hq.product.MeasurementPlugin"/>
</plugin>



Ok what have we got here? We've built a plugin named 'c2b2-dropbox-process-detection' and created a metric we're interested in. So far we've set only one metric called 'Availability' - this is what will be assessed to determine if our process is available. We haven't put anything in the template attribute yet. We've also stated that the plug is of type measurement, and leverages a Hyperic base class to sort out the complex processes under the hood.


This plugin is still pretty useless for now. Before we can continue we need to find a way of telling the plugin how to find our Dropbox.exe process.


Sigar base type
In the Getting Started section, we briefly referenced the SIGAR base type. In short, SIGAR is a Java wrapper around Operating System specific commands and information. We can invoke SIGAR commands and it will translate them into native operations for us, it will then give us standardised responses.


This is where Hyperic's development cycle really gives you a boost. As Sigar is a Java wrapper, we can leverage it standalone and interrogate our platform as if we were the Agent. Lets do that now, to find out how our Agent can gain visibility of our Dropbox process.


In a command prompt navigate to the directory ${Agent_Install_Home}/bundles/agent/pdk/lib
As you can see for me in the image below, ${Agent_Install_Home} is J:\java\hyperic\agent-4.6-EE.
Then exectute the command java -jar sigar-[your sigar.jar version number].jar. Again, below you can see that for me this was sigar-1.6.4.jar.


You'll need Java installed for this to work. When successful you'll be greeted with the sigar command prompt.


You can type help, or reference the Sigar docs for a full listing of the sort of thing you can do with this, but for now we'll kick on.


Typing the 'ps' command shows you running processes (Ah-ha!). But if you invoke this you'll see many pages of processes running like below.




(If you don't see many processes, then make sure you're running as an account with Sufficient privileges. On my machine Dropbox runs as a Windows service, as such for the sake of simplicity during development I need to run as an administrator to be able to list it).


Hmm we need to find our Dropbox process, and only our Dropbox process...


Thankfully this is a piece of cake ;)


The PTQL (Process Table Query Language) documentation for Hyperic shows how we can construct a ps query to only return Dropbox. We can match on name, executable arguments, binary location, a bunch of things.


I know that I only have one Dropbox process to find, so I can simply create an expression to match this single instance. For other scenarios you'll have to look at more advanced PTQL expressions. If I place Exe.Name.ct=Dropbox as an argument to the Sigar ps command then I can achieve a return process listing that only shows my Dropbox running.
So, now let's stitch that into our plugin. We do that by adding our command to the template attribute in our plugin definition. Templates are defined in detail here, but in short we'll be leveraging the Sigar base template available to the Agent and passing in some parameters.
sigar:Type=ProcState,Arg=Exe.Name.eq=Dropbox:State
Templates are strings that describe how to collect data. The first part 'sigar:' is our template domain, Type=ProcState is an instruction to invoke 'ps', Arg is what to pass to our ps command and State is to do with the fact that we care about knowing the state of the process (available or not).


Putting that into our plugin we get:


<?xml version="1.0" encoding="UTF-8"?>
<plugin name="c2b2-dropbox-process-plugin">


   <metric name="Availability"
           indicator="true" template="sigar:Type=ProcState,Arg=State.Name.eq=Dropbox:State"/>


  <plugin type="measurement" class="org.hyperic.hq.product.MeasurementPlugin"/>

</plugin>


Not quite there
But we're still not quite ready yet, we need to massage this Dropbox process that we know we can see into the Hyperic Inventory model, so we'll make one more change; we'll put the metric within a Service element like below:


<plugin name="c2b2-dropbox-process-detection"> 
  <service name="Dropbox"> 
    <metric name="Availability" 
            indicator="true"   template="sigar:Type=ProcState,Arg=State.Name.eq=Dropbox:State"/> 
    <plugin type="measurement" class="org.hyperic.hq.product.MeasurementPlugin"/> 
  </service> 
</plugin>


Get Set...


Putting them together
How do we test this now within the Agent runtime?


Again, Hyperic makes this easy. First save your plugin definition somewhere, I chose j:/java/hyperic/testing-plugins/c2b2-dropbox-plugin.xml (I've purposely made this file name different to the plugin name in the xml file).


Navigate to the directory ${Agent_Install_Home}/bundles/agent in a command prompt and run the following command:
java -jar pdk\lib\hq-pdk-4.6.jar -Dlog=info -Dplugin.dir=J:\java\hyperic\testing-plugins -Dplugins.include=c2b2-dropbox -p c2b2-dropbox-process-detection -m metric -t Dropbox
A few things to note
  • Observe your paths correctly, the above would not work in the cygwin window I've been using in the screenshots up till this point!
  • Your pdk.jar version number may be different
  • The plugins.include contains the filename.... This file must be in plugin.dir
  • And the -p operand is the plugin name (which is different to the filename!) and defined <plugin name="...
  • The -t is the type of resource we're testing

If you've got everything right up to now then you should see something like the following:


We can now see our process, and Availability is retrievable. 


You can define more metric elements, and fill in the template attributes with Sigar commands to retrieve the metric data you want!


I'm going to add two more things to leverage some in-built Hyperic process knowledge; an import for process metrics and a symbol to bring them into this service, but in order to utilize these I need to define a parameter called "process.query", I'll do this in a <config> element. These changes are reflected in final plugin below:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plugin [
<!ENTITY process-metrics SYSTEM "/pdk/plugins/process-metrics.xml">
]>
<plugin name="c2b2-dropbox-process-detection">

 <service name="Dropbox">
   <config>
     <option name="process.query" default="State.Name.eq=Dropbox"
        description="Process query to match">
     </option>
    </config>
    <metric name="Availability"
            indicator="true"
            template="sigar:Type=ProcState,Arg=State.Name.eq=Dropbox:State"/>
    &process-metrics; 
    <plugin type="measurement" class="org.hyperic.hq.product.MeasurementPlugin"/>
  </service>

</plugin>


Notes
  • When you test this, you may see a warning about not being able to enumerate child processes, don't worry about this - you're seeing this because we're leveraging the process-metrics template which defines this metric, but our process has no children. So as we leave this disabled in Hyperic you'll see no repercussions.

Go!


Lets use our new plugin!


To deploy my plugin I simply placed it in the folder ${Hyperic_Install_Root}/hq-plugins. This folder was created as part of my install process, for me this was J:\java\hyperic\hq-plugins.


A little recap on how we're going to do this: as we've made a Hyperic Service in our plugin, we need to tell our Platform in Hyperic Server that there's a Service to request metrics for on our Agent.


In the Hyperic Server Resources tab, click on the Platform running Dropbox. Within your platform, click the Tools Menu and select 'New Platform Service' as below 


In the next screen select a name of Dropbox, and from the Service Type the selection Dropbox should now be available. Note this selection will depend on what you put in the <service name=...> attribute. If you do not see yours then check that the plugin deployed to the server correctly.


Nearly there!


After the service was created I was prompted to check the configuration, which will jump you to the service configuration tab. Here you can edit the Sigar query to run, I safely hit ok without modifying anything. 


My platform now has a new Service available!


If I click into the service, and under Monitoring select 'Metric Data' I can see the process metrics made available to me by Hyperic.


I've selected a few more to be collected at an interval of 5 minutes.


And here's the metrics now being collected!


To follow on from this we could now set Alerts based on outages, and invoke control actions that interact with other systems using Hyperic's built in tooling. We are now monitoring a native process with deep insight into it's runtime behavior!

Further Things We Could Investigate
Want to get your hands dirty with writing plugins from the ground up with actual Java Classes in your IDE? 

Let us know!

Nick