Service Coroner Diagnostics Tool

Overview

The ServiceCoroner is an experimental diagnostics tool targeting the OSGi services platform. This tool allows the analysis and detection of stale references, a known problem in OSGi. Our tool has combined Aspect Oriented Programming and Java Weak References to enable the diagnosis of stale references in OSGi implementations.

Context

The OSGi™ Service Platform allows the dynamic loading and unloading of bundles and their classes during JVM execution. However, developers must take special care to handle the departure of services and bundles. Since OSGi™ bundles are not isolated from each other in separate object spaces, when they are stopped there is no guarantee they are safely removed from runtime. There is a high possibility of inconsistencies due to the mishandling of such events. The platform cannot ensure that objects from a stopped bundle will no longer be referenced by other bundles – a problem referred by OSGi™ specification (Core R4 section 5.4) as stale references.

"a reference to a Java object that belongs to the class loader of a bundle that is stopped or is associated with a service object that is unregistered"

This happens as an invisible problem that compromises application integrity: Stale References cause memory leaks and prevent the classes of a bundle (as well as its class loader) to be unloaded from memory; inconsistencies can silently propagate errors throughout the system due to calls to an unregistered service that returns stale data (e.g., old cached data).

It is difficult to say that OSGi™ applications and components are ready to cope with the OSGi™ dynamics, since there are no custom mechanisms to measure or evaluate that. The usage of component models does not necessarily avoid the occurrence of stale references. We have developed a tool called Service Coroner, which detects different patterns of stale references and is able to provide information on stale references objects. The implementation relies on Aspect Oriented Programming to weave into OSGi the code detect such problems during application runtime;

The ServiceCoroner is able to automatically identify patterns of stale services. In the current state of the tool we can say that we can automatically find the "victims" but the "guilty ones" are found through a process that needs manual intervention to examine memory dumps by using the evidence found with the ServiceCoroner. Identification of stale threads is also implemented but we currently need some manual intervention to give a precise diagnosis on it.

We have validated this diagnostic tool by doing a runtime analysis in four open source applications constructed on top of OSGi™: OW2 JOnAS 5.0.1, SIP Communicator Alpha 3, Newton 1.2.3 and Apache Sling. All applications are of significant size, especially JOnAS, whose core is about 400 000 lines of code but comes to over 1 500 000 when the other components are taken into account. Some of those applications are partially developed with component models for the OSGi™ Platform: Service Binder, R4 Declarative Services and iPOJO. The experiment shows that even using such mechanisms applications still present stale references are not completely ready to handle the dynamic update of components. After the simulation of some life cycle events (update, start, stop) on a limited range of bundles in each the application we found out a number of stale references.

Demo

A demo in flash can be seen here

ServiceCoroner Slides

Downloads

Weaved Felix 1.0.1.

Weaved Knopflerfish 2.0.1.

Unfortunately due to time constraints there is no manual or read me. Replace your "clean" OSGi implementation with one of the above, and run you application. The flash demo may be helpful as a reference for using the tool.

There are still some issues to fix, such as the GUI and also a better mechanism for discovering a bundle's class loader, which in some cases we can't. The current approach is limited and sometimes makes difficult to identify if a "suspect" is actually a stale reference without knowing its bundle classloader.

Other downloads/versions may be available in the second semester of 2008

Screenshots


GUI Embedded into weaved OSGi platforms

Remote diagnosis with JMX

Related publications and presentations

Home