Issue Details (XML | Word | Printable)

Key: HHH-2481
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Critical Critical
Assignee: Scott Marlow
Reporter: Paul Malolepsy
Votes: 3
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hibernate Core

Big memory leak in the use of CGLIB

Created: 09/Mar/07 11:19 AM   Updated: 22/Sep/07 09:37 AM
Component/s: core
Affects Version/s: 3.2.2
Fix Version/s: 3.2.3

Time Tracking:
Original Estimate: 5 minutes
Original Estimate - 5 minutes
Remaining Estimate: 5 minutes
Remaining Estimate - 5 minutes
Time Spent: Not Specified
Remaining Estimate - 5 minutes

File Attachments: 1. File fix.diff (7 kB)

Environment:
Hibernate 3.2.2
MySQL 5.1
Issue Links:
Duplicate
 
Relates
 


 Description  « Hide
The way CGLIBLazyInitializer is creating proxies is resulting in a potentially massive memory leak.

In CGLIBLazyInitializer.getProxy() just before the proxy is instantiated, a call is made to Enhancer.registerCallbacks() passing in the instance of CGLIBLazyInitializer that will manage the proxy. This variable is stored in a static ThreadLocal on the CGLIB created persistentClass so that any subsequent objects instantiated will get this callback class.

The problem is that once this ThreadLocal is set, it is never cleared, so it will stay around (together with the object it's managing, and whatever object graph it may be connected to) until the next time a proxy is created for that type on that thread.

For our application we have about 150 different proxy types, and our app can have over 100 threads. This results in potentially 15,000 proxy objects and their graphs stuck in memory.

The fix for this is simple. Just make another call the Enhancer.registerCallbacks() with a null callback arg, right after the proxy class is instantiated:

Enhancer.registerCallbacks(factory, null);


 All   Comments   Work Log   Change History   FishEye      Sort Order: Ascending order - Click to sort in descending order
Max Rydahl Andersen added a comment - 09/Mar/07 12:08 PM
testcase + patch ?

Paul Malolepsy added a comment - 12/Mar/07 02:33 AM
Here is a diff of the fix for this issue as well as a test case.

Just to give an idea of how potentially bad this leak was, making this change had a dramatic impact on the the memory use and performance of our servlet based web application. Before, we had to max out the Xmx of our servlets at 2gb, and even still we'd get OOM errors after about 6 hours under heavy stress testing. Now they run fine at 500mb. This has also translated to some astonishing performance gains in the overall performance of our app. I'd be very interested to see if other people have a similar experience.

Paul Malolepsy added a comment - 12/Mar/07 12:41 PM
For those who want to fix this issue without waiting for this fix, or making a custom build of hibernate, here is a workaround you can implement in your own code. This code should go after you create a Configuration object and before you create the SessionFactory:

{code:title=Bar.java|borderStyle=solid}

        //assumes an instantiated Configuration variable called cfg
        cfg.setListeners("load", new LoadEventListener [] {new DefaultLoadEventListener(), new LoadEventListener() {
            public void onLoad(LoadEvent event, LoadType loadType) throws HibernateException {
                Object obj = event.getResult();
                if (obj instanceof HibernateProxy) {
                    Enhancer.registerCallbacks(obj.getClass(),null);
                }
            }
        }});


{code}

Max Rydahl Andersen added a comment - 13/Mar/07 02:50 AM
thanks for the patch and related testcase; but a more interesting testcase would be a test that actually showed the OOME. Are you saying that simply creating and closing a SessionFactory in multiple threads will do it ?

And i'm a bit troubled on how setting the callback will actually affect future instantiation.

Max Rydahl Andersen added a comment - 13/Mar/07 02:51 AM
Scott, you have had fun with this before - could you verify/fix it ?

Paul Malolepsy added a comment - 13/Mar/07 01:16 PM
The test case demonstraits that during the course of normal operation, a fully hydrated object managed by hibernate is stuck in static memory. Creating any proxy object on any given thread will cause this behavior. How much memory this leaks, and weather or not you get an OOM is dependent on:

1) How many proxy types your application has.
2) How many threads you create proxies on.
3) Most importantly, how large the object graph is that is attached to these proxy objects.

To create a test case that shows the OOM is very easy, but coming up with a test case that actually asserts if there is an imminent OOM is next to impossible in Java. But the OOM is just a symptom of the real issue which is hydrated objects stuck in memory. This alone should be a giant red flag.


As far as future instantiations, there should be no worry here. Any time a proxy is ever instantiated, it is always preceded by the Enhancer.registerCallbacks() method. In fact, the callback object is directly tied to the instance of the proxy being created so if this callback object was ever used by future instantiations, that would be a very bad thing.

Scott Marlow added a comment - 14/Mar/07 01:53 PM
Paul,

Thanks for submitting the problem, fix and test.

Scott

Scott Marlow added a comment - 14/Mar/07 09:47 PM
My next step is to evaluate whether we could fix the root issue differently (or work around it), that being HHH-1293. In the original code, we passed the instance in directly which didn't always work on some platforms (especially Linux). At the very least, we should clear the ThreadLocalStorage reference before returning as your patch does.

In your example above, is the potential of the leak really that high? I wouldn't expect 15,000 objects leaked. Correct me if I'm wrong, but I would expect one TLS slot per thread to hold a reference. If you have 100 threads, that is a max of 100 objects leaked. Unless a collection is stored in each TLS slot and then the potential is higher. Either way, we should fix the leak.

http://forum.hibernate.org/viewtopic.php?t=947902 seems to discuss this issue as well.

Paul Malolepsy added a comment - 15/Mar/07 11:12 AM
Yep, that thread discusses this issue exactly.

I'm afraid the leak is as large as I stated. The static thread local is created on *each* proxy type. So if my application has a Customer class and a Order class, each type has their own static thread local, and the last proxy object created of each type on each thread will be leaked.

RE: fixing the issue differently. That occurred to me too. I though you might just be able to call one of the Enhancer.create() methods which let you pass in the callback directly, but I figured there must have been a reason why that wasn't chosen (sounds like I was right) so I decided to respect the original design. But one little extra call to clear the thread local should be a pretty minimal impact.

Scott Marlow added a comment - 17/Mar/07 10:33 PM
The fix looks good (except we use hard tabs instead of spaces but no big deal).

We used to pass in the callback directly until we hit HHH-1293. Registering the callback, works around HHH-1293, but not clearing the TLS caused this Jira.

I'll check the fix into trunk and the 3_2 branch soon.

Scott Marlow added a comment - 17/Mar/07 11:15 PM
Fix checked into 3_2 branch and trunk.