Appbloggery: Trust but verify, and the Yourkit JVM profiler (Part 1)

A recent experience taught me 2 lessons of exceeding importance in application development: The value of knowing how to effectively use a profiler, and the peril of not closely examining assertions made by others before heading down a development path. This is a story of me solving the problem that I was tasked with, but in a way that nobody (especially me) expected. However, a bit of context is helpful. As is common in programming, the history of this problem goes back far before my involvement.

Much of my time at my current job has been spent developing and maintaining Java Swing applications, among them 2 chemistry tools: One for planning chemical reaction libraries (which I will refer to as rxn plr) and one for storing Experiments (which I will refer to as eln). These tools are (not necessarily but) usually used together, in a workflow where a library from rxn plr is converted to xml and then imported into an eln experiment. The apps allow some of the same operations, such as enumerating the products produced by a reaction. With these requirements in mind, it is clear that the codebases of rxn plr and eln both must include classes to model chemicals and reactions. Unfortunately, each of these apps implement these classes themselves; they do not share data model code. In some cases an identical class exists in both codebases. Other times, classes have evolved away from each other. The problems created by this practice are well understood and well documented, so I am going to assume that we can agree that this is a bad thing. Having to make the same changes in multiple places is annoying, but bigger problems arise when what should be an identical operation on identical data yields different results. And this, is what I was tasked with fixing.

Another thing to keep in mind; no unit tests exist for either of these apps.
Anyways....

My first thought was, why do we even have 2 separate apps for modeling the same type of data and performing the same operations? Yes, the apps have significantly different UIs and workflows, but should they? That means that the users have to learn 2 separate UIs. Shouldn't we be developing an innovative UI that accomplishes both workflows, rather than hacking these divergent, untestable apps into a tenuous synchronization? I suggested building a library planning feature into eln, using and extending the eln codebase. My idea was was met with a mixed reception. Any change to something people have become familiar with is going to have detractors. The more legitimate criticisms came from other developers who were more familiar with the history of these apps. The reason, I learned, for the existence of rxn plr is that some chemists use it to enumerate extraordinarily large reaction libraries, in the order of thousands. These reactions are then filtered down to a reasonable size, and then imported into eln. Eln, they explained, would never be able to handle that many reactions. You see, as I mentioned before the data models of these apps are the same at their core, but over time many extra fields have been added to the eln versions of Chemicals and Reactions. The data model classes of rxn pln can be thought of as light versions of those in eln. This, they told me, is why so many more reactions can be loaded into rxn plr. With each reaction making a much larger memory footprint in eln, loading libraries of this size would cause eln to run out of heap space and become non-performant, or even crash. And indeed, when I loaded libraries on this scale into eln, it crawled and eventually became unresponsive. Their explanation made sense. After all, these eln objects have members for analytical data, file attachments, chemical properties, daos, parent and child references, and a host of other things that rxn plr is not concerned with. At the scale we set out to deal with, it made sense that this would have significant memory implications.

After round tabling the issue, we had a few potential paths forward, none of them trivial. One idea was to extract interfaces from the most heavy weight eln classes, and create light implementations for the enumeration and filtering part of the workflow. After all, those other heavy weight fields wouldn't be needed until later. This didn't sit well with me. My motivation for merging these apps, in addition to solving the compatibility problem, was to simplify the overall code that we were managing. This sounded like adding more complexity. Additionally, I wasn't convinced that it would work. If these fields are empty, how much could the be affecting the retained size of the objects?

The idea with the most support was to store the list of reactions in a client side key/value database. This way, we could hold a reasonably sized cache of reactions in memory, and implement a policy for loading from the db and evicting from the cache. Berkley DB was the obvious choice, since the team had been using it for years in other apps. However this started another debate. In these other apps, the values that we where storing in Berkley DB were very small objects or primitive types. There was uncertainty whether serializing and deserializing reactions with all of their members would be performant.

This conversation, and ones about other possible solutions, went on for days. Eventually I became uneasy with the amount of time we were spending discussing hypotheticals. Moving forward with any of these ideas would be non trivial, and we would have to implement it entirely before we were able to test and know if we were even on the right track (recall the lack of unit tests). Furthermore, is our premise even correct? Our analysis so far was based on simply loading a large library and observing how that client behaves from a usage standpoint. What was really happening under the hood though? I decided that I needed to dig deeper.

We had a license to Yourkit, a profiler for the jvm, but so far I had only scratched the surface. Soon, I would view it as a tool as crucial to my job as my debugger or version control. The first thing that I discovered was that eln was replete with memory leaks. Whats that you say?? Memory leaks leaks in a Java application?!? Impossible! That's what garbage collection is for! Well, it turns out that poor coding practices leave plenty of opportunity for the garbage collector to leave dead object hanging around. Case in point: I have mentioned that in the data model at hand, a library experiment has many reactions. The class representing a reaction is ReactionSection. The reactions of an experiment are accessed through a (generic typed) member of Experiment called 'content', which is accessed through the following methods:

As you can see, content is lazy loaded. This is so that all experiments for a user can be displayed in a tree, without having to load any reactions (which is not necessary to simply display the name of the experiment). Then when an experiment is opened, the content (containing reactions) is loaded from db and displayed in another view.

To test our assumptions regarding memory consumption, I open an experiment, and then took a memory snapshot in Yourkit (the following could also be seen in the live views). I select the 'memory' tab and the 'class list' option, and search for 'ReactionSection'

Figure 2

In the search results, the record that we are interested in is the first one (selected). The others are relevant, but they can be ignored for now. We can see that at the moment this snapshot was taken, 45 instances of this class exist on the heap. That corresponds with the number of reactions that I see in the experiment that I have opened. So far so good. Now, I close the experiment. The code that is executed when I do this is seen below:

Figure 3

Notice that content is set to null. The next time content for this experiment is accessed, a fresh copy will be loaded from the database (recall the lazy loading in getContent, figure 1). This is so that unsaved changes will be discarded. I take another snapshot, and this is where things get weird. My profiler tells me that the same number of ReactionSections exist in heapspace. Hmmm... I know, the garbage collector has not engaged because I still have plenty of heap space available (after all, this is only a 45 reactions). Luckily, Yourkit provides a means for forcing garbage collection at any time.

Figure 4

But even after running GC, I am still seeing the same number of ReactionSections. Now I reopen the same experiment and take another snapshot:

Figure 5

Whoa, what is going on here? I have just loaded the same data, and am now seeing twice the number of ReactionSections! How can this be? I repeat this test and sure enough, everytime I load the experiment (therefore calling getContent on it), I see 45 more ReactionSection objects. Closer inspection (cmd/ctrl + n from the memory tab) confirms that these are indeed multiple instances representing the same data:

Figure 6

Clearly something is preventing GC from happening when I would like it to. Luckily, Yourkit provides tools for situations like this. In the instances view (figure 6), I right click an instance of ReactionSection (which I will from here on refer to as 'rs') and select 'Paths from GC Roots'.

Figure 7

And I get this view:

Figure 8

What this shows is every non-garbage-collectable object holding a reference to rs. The reason that these objects are not eligible for garbage collection, is that there exists a chain (which I can see in an expanded tree by using the + button) of object references to said object from a GC root. Because these objects each hold a reference to rs, rs is also part of such a reference chain! Therefore, rs cannot be garbage collected. After digging into the code, I found the cause of one such hanging reference. Have a look at the expanded chain in figure 8. Notice the object named 'changeSupport', 3 references up from the GC root at the bottom. The description tells us that changeSupport is a member of Experiment. Further up the chain, we see an object of type PropertyChangeListener holding a reference to rs, which it sees as a member of LibrarySectionToolbar. Hmmm... now I start to understand what is happening here. I find the code where LibrarySectionToolar is created:

It's beginning to make sense now. Recall figure 3. When the experiment is closed and Experiment.content is set to null, do you remember any PropertyChangeListeners (who hold a reference to rs) being removed? I sure don't. This violation of the observer design pattern is to blame for this particular hanging reference. After correcting this and tracking down the other paths from GC roots, I had resolved the memory leak. Repeating my test from earlier, when I close the experiment and then force garbage collection, I see all instances or ReactionSection disappear.

This has been a lengthy, but I think valuable detour from the problem that I originally set out to solve, which was to verify that the reaction libraries produced by rxn plr really do use too much memory for eln to handle. After all, the problem that I just discovered and fixed affected experiments of any size, and only became an issue after closing and re-opening an experiment. The problem that I set out to investigate manifests the upon the first opening. More on that in my next post. The point that I wanted to make here, is the value in having a profiler part of my development workflow. If observing the UI of an application is like looking at a car from 5 feet away, then seeing the source code is like being able to open the doors and get inside, and using a debugger is like opening up the hood and watching the parts move. But using a profiler is like having xray vision, and a time machine!

Appbloggery

Friday, January 3, 2014

Trust but verify, and the Yourkit JVM profiler (Part 1)

No comments:

Post a Comment