Well, after two weeks of slogging through myriad combination of GC settings, GC logs, application logs, multiple JConsole windows, and cursing the various spreadsheet programs' pathetic ability to deal with data set that is of any significant size, I finally hit a jackpot of sort. Looking back though I practically would have to slap myself for not seeing this from the very beginning, it was right there, so apparent, so obvious.
Long story short. We have a service application that is used by various groups within the company. Lately we had been struggling a little bit to maintain our SLAs under much high load then we used to have. So we started digging into logs, and pretty soon started to focus on GC tuning. This application runs on Java 5, so we started with replacing the default serial collector with concurrent mark sweep collectors. There were some improvements, but not enough. The application still have way too many full GC events, and way too many full GCs that were caused by concurrent mode failures. However, there were not many regular CMS collection at all.
Looking through JConsole, full GCs started pretty much right after application start up and just keep going and going, and would later on be joined by concurrent mode failures when load starts to go up. So we increased total heap, and propotionally added more space to both young generation and old generation, hoping this would give objects in the young generation more time to die, and give old generation enough space to avoid concurrent mode failure. Well, that did not quit work out as we had hoped. YoungGen collections still happen at about the same frequency, and average YoungGen collection time really took a dive, using 5x more time. Worse, the full GCs and concurrent mode failures are still there, only this time there were less concurrent mode failures. Presumably the increased OldGen size helped out a little in terms of avoiding concurrent mode failures. The worst part of all this though is the large heap actually increased GC pause times and caused even more outliers that fall out of our SLA.
Now comes the jackpot part. I was staring at the GC logs which I imported into Excel, and at JConsole windows, watching the count of full CMS gcs going up steadily. Then it hit me, the pattern that is so obvious now: the full GCs were coming in once a minute, almost like clock work, right from application start up, when there were barely anything in OldGen. There is no way these were triggered by OldGen being used up, something must be calling System.gc() somewhere. A quick googling indicate this is probably the case. And the fix is straight forward, disable explicit GC calls by using the flag -XX:+DisableExplicitGC. Presto, no more this one full GC per minute nonsense.
An added bonus, with explicit GC disabled, now we see the expected regular CMS collections. And more importantly, we found out our PermGen was getting full. Our load test still led to two concurrent mode failures. From what we can see with JConsole, these two failures happens right after PermGen got full, the full GC cleaned up about half of the PermGen space. So we will up our PermGen as well. Our application instantiate a whole bunch of objects every few minutes and caches them, so we know we are thrashing PermGen space. The once per minute full GCs were masking the fact that our PermGen was too small.
One day I might spend some time to find out which library we use is calling System.gc(). I am pretty sure I will go anywhere near it, not after these last couple weeks in GC tuning paradise :-)
Friday, August 7, 2009
Thursday, August 6, 2009
Graphing GC Events
Having been knee deep in tuning GC, trying to reduce GC pause time. Unfortunately Microsoft Excel, Apple Numbers, and OpenOffice Spreadsheet are all quite weak when dealing with the amount of GC and non-GC events we are trying to analyze, especially when it comes time trying to graph these events. Open Office on Ubuntu actually performed better, was able to deal with number of events that chokes both Excel and Numbers. However working with OpenOffice Spreadsheet is not the smoothed experience, as there are quite some rough edges here and there.
We've been using JConsole to watch live GC events, which is nice. If you can, always use JDK 6 JConsole instead of that from JDK 5. The former keeps history when JVM terminates, which is very nice.
Next we probably will investigate VisualGC (http://java.sun.com/performance/jvmstat/visualgc.html), VisualVM (https://visualvm.dev.java.net/), gchisto (https://gchisto.dev.java.net/). Hopefully one of them will turn out to be a better tool then pure spreadsheets.
Or, there is always JFreeChart or GnuPlot :-) One of our guys already used JFreeChat to plot some data, which actually looked quite nice.
Update on 08/07/2009:
Checked out VisualVM today along with its VisualGC plugin. The application itself looked pretty nice. However I was getting some kind of NullPointerException with the monitoring view, so there is no graph of heap usage etc. I am using Java 6 on Ubutnu, and I was having trouble get the VisualGC plugin to work with both of my remote JVMs. Of the the JVM is Java 5 32bit on Ubuntu, the other is Java 6 64 bit on Ubuntu. In both cases VisualGC was complaining it cannot work with the JVM version. Anyhow, from what I saw, I don't think VisualVM/VisualGC will give me the kind of graph that I was trying to do with spreadsheets, as I was plotting very detailed GC times for CMS initial mark, remark time, full GC, full GC from concurrent mode failure, and ParNew collections. Given that, I am still quite impressed with what I saw from VisualVM, and I am surely will visit it again the next time I need to dig into some other tough performance tuning problems.
We've been using JConsole to watch live GC events, which is nice. If you can, always use JDK 6 JConsole instead of that from JDK 5. The former keeps history when JVM terminates, which is very nice.
Next we probably will investigate VisualGC (http://java.sun.com/performance/jvmstat/visualgc.html), VisualVM (https://visualvm.dev.java.net/), gchisto (https://gchisto.dev.java.net/). Hopefully one of them will turn out to be a better tool then pure spreadsheets.
Or, there is always JFreeChart or GnuPlot :-) One of our guys already used JFreeChat to plot some data, which actually looked quite nice.
Update on 08/07/2009:
Checked out VisualVM today along with its VisualGC plugin. The application itself looked pretty nice. However I was getting some kind of NullPointerException with the monitoring view, so there is no graph of heap usage etc. I am using Java 6 on Ubutnu, and I was having trouble get the VisualGC plugin to work with both of my remote JVMs. Of the the JVM is Java 5 32bit on Ubuntu, the other is Java 6 64 bit on Ubuntu. In both cases VisualGC was complaining it cannot work with the JVM version. Anyhow, from what I saw, I don't think VisualVM/VisualGC will give me the kind of graph that I was trying to do with spreadsheets, as I was plotting very detailed GC times for CMS initial mark, remark time, full GC, full GC from concurrent mode failure, and ParNew collections. Given that, I am still quite impressed with what I saw from VisualVM, and I am surely will visit it again the next time I need to dig into some other tough performance tuning problems.
Subscribe to:
Posts (Atom)