|
|
|
Afraid not, I'm not working for that customer at present, so I don't have the code. However, only a single line of code needs to be changed and the description contains both the old and the new version of that line.
Attached is a patch against 3.2.4sp1 which makes the change that Erik suggested.
Note that this isn't QUITE a 100% replacement; this will introduce 2 behavioural changes which will need to be checked; I don't see either as being a problem from my code inspection but I'm hardly an expert. 1) Previously, the code was using a SequencedHashMap; this guarantees insertion order when navigating the entries in guessEntityMode(). This means that if the same Tuplizer is added twice for two different EntityModes, then the behaviour of guessEntityMode changes from 'first-one-added wins' to 'who knows'. If the same Tuplizer cannot be added twice (it seems so, from a code inspection of ComponentEntityModeToTuplizerMapping and EntityEntityModeToTuplizerMapping, but I'm not 100% sure) then this is not an issue. 2) ConcurrentReaderHashMap, like HashTable, does NOT allow null to be used as a key or value. This is, again, a behaviour change, but again, from inspection of the above two classes it's pretty clear that no nulls are ever passed in (in fact, they're wrapped in an explicit check). Otherwise, it's fairly straightforward. We have the same issue running under Java 1.5. Our solution uses the new java.util.concurrent.ConcurrentHashMap with one concurrent updater thread. We replaced the same line with:
private final Map<EntityMode, Tuplizer> tuplizers = new ConcurrentHashMap<EntityMode, Tuplizer>(64, 0.75f, 1); Again it has a different behavior then the original but we did see any problem with our code. This is a very severe bottleneck and I think it should be tagged as major bug. Just to remember, we can't use java 5+ features, since we must sill compatible with java 1.4(at least)
Just to clarify, the patch I attached doesn't use any Java 5 features, it's just using Doug Lea's original concurrent library which is already used elsewhere in the current codebase.
We do not use Doug Lea's concurrency package. It is included in the download, but that is simply because it is a required dependency for JBossCache. If you do not use JBossCache, right now you do not need that concurrency package either. So tis would constitute a need for a new dependency...
Fair enough, Steve, I didn't realise that. I'm assuming from this that adding this dependency is considered inappropriate. As this is a such a big performance hit for us (And obviously others) what about the possibility of just including Doug Lea's ConcurrentReaderHashMap in the code tree?
It's a single, standalone class (no dependencies on anything else in his package) and as far as I can tell there are no license issues: "All classes are released to the public domain and may be used for any purpose whatsoever without permission or acknowledgment." (from http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html) Let me know if this is likely to be a suitable solution and I can prepare a new patch. I am not saying that I would not consider using the oswego package (via dep or copying). I am just pointing out that there would have to be a clear cut need to include the new package.
As of this moment, I have not seen any hard facts showing that oswego's ConcurrentReaderHashMap is any faster than say Hibernate's FastHashMap (originally copied from commons-collection) or even the current code for that matter. What I would ideally like to see is a series of tests showing the discrepancy. I tried testing this in a non-concurrent env, but of course got no differences. I tried using junitperf for the concurrent load testing, but that library is new to me and it was not working out so well. The problem is not the speed of the hash map class but the usage of Collections.synchronizedMap() wraper. To recreate the problem you need on object with multiple collections and (in my case) code that forces session flush. Also you will have to run with more than 20 threads running long enough to execute more then 1000 transactions.
1) I don't see where anyone is saying that using HashMap is the problem. In fact the current Hibernate code is not even using HashMap, if you actually take a look. In fact most comments are explicitly pointing to the use of synchronization as the cause. And that probably is the case; call me pragmatic, I just prefer hard numbers.
2) This should be a *unit test* of the specific map usage pattern. We already have enough integration tests that dont cover enough low-level details like this. We need to start scaling out the test suite to contain both more unit tests and more load/perf tests. 3) Where did these numbers come from? I assume this is because you are discussing integration tests and it took that long for you to get proper concurrent load on the Map structure. This is another reason to have unit tests for this as well. Attached is a very rough-and-ready, cobbled-together test which very approximately demonstrates the problem and is fairly adjustable to show different alternatives for the different 'real-world' scenarios people might be encountering. The patch should apply cleanly to the 3.2.4sp1 codebase, which is what we're running; not sure how cleanly it will apply elsewhere but it's extremely simple. It consists of 2 parts:
1) It changes EntityModeToTuplizerMapping to (optionally) take a Map implementation in the constructor, defaulting to the current implementation (synchronized map wrapped around a SequencedHashMap). 2) A simple test (not JUnit, just a main() method) which performs a very simple timing test. It does 2 runs, one of which is thrown away and is just used to try and get the VM up to speed with JITting etc. so as to produce a (slightly) more accurate result. The main emphasis of the test is to create a number of threads, each of which will perform a particular number of iterations. In each iteration, a certain number of guessEntityMode()s and a certain number of getTuplizerOrNull()s will be performed against a very simple dummy subclass of EntityModeToTuplizerMapping which in turn uses a simple dummied-out Tuplizer implementation. The test is performed once using the current synchronised version, once with FastHashMap, once with Doug Lea's ConcurrentHashMap, and once with his ConcurrentReaderHashMap (obviously this requires the concurrent library to be on the classpath). It cuts a lot of corners, but it at least gives us something to work from. The behaviour can be adjusted by modifying the values of the 6 constants at the top of the file. Running 20 threads, with a 20:1 getTuplizerOrNull():guessEntityMode() ratio (which is a very rough attempt to recreate what we see in our system, in that we flush extremely frequently and I gather this triggers the getTuplizerOrNull(); as far as I can tell guessEntityMode() is most likely to be used in finding-by-example, which is rare in our application) on the Server VM I get the following results: Map class: class java.util.Collections$SynchronizedMap: 1510ms Map class: class org.hibernate.util.FastHashMap: 292ms (x0.19337748344370861) Map class: class EDU.oswego.cs.dl.util.concurrent.ConcurrentHashMap: 787ms (x0.5211920529801325) Map class: class EDU.oswego.cs.dl.util.concurrent.ConcurrentReaderHashMap: 532ms (x0.352317880794702) which shows a ~3x speedup with ConcurrentReaderHashMap and a ~5x speedup with FastHashMap. If you play with the ratios, you'll see that the higher the ratio of tuplizer gets:entity mode guesses, the bigger the problem is and the bigger improvements can be gained. If you eliminate the guessEntityMode() calls completely (this is actually probably closer to the real-world figure we get, though I don't have exact counts to hand) the speedup is ~ 17x with FastHashMap. From this, I would draw a few conclusions: * Given that update frequency is (as far as I can observe) close to zero -- I think these mappings are largely set up once and forgotten? I may be wrong though -- FastHashMap is probably more than sufficient, particularly given it's in the codebase already * It may be worth considering changing FastHashMap into a more generic FastMap, which takes in some sort of MapFactory. The default implementation could use HashMap, but the TuplizerMapping could use SequencedHashMap to preserve existing behaviour (if this is important). I'd encourage people to calculate what loading makes this test (very roughly) reflect any results they see in their application to check that our figures are not 'outliers' and we're operating off accurate data. Thanks for that Paul. I'll start working with that.
The structure of this map was chosen largely because of future considerations. Mainly, I want eventually to allow registration of custom entity-modes which would require "later writes" into this map by the users. Additionally, I chose an ordered map because I would want the Hibernate mappings considered first. But as it is currently, neither of those is necessary. In fact, something I mentioned to Gail that we should also take a look at was in fact not using a map at all here. It is essentially a series of 3 options always right now... I took this maptest.patch stuff, modified it slighty and added timings for jdk5 ConcurrentHashMap also (just to see). I get slightly different results:
baseline : 9051 ms FastHashMap : 8499 ms (ratio=0.9390123) ConcurrentHashMap (jdk) : 509 ms (ratio=0.05623688) ConcurrentHashMap (dl) : 6380 ms (ratio=0.7048945) ConcurrentReaderHashMap (dl) : 3544 ms (ratio=0.39155895) To be completely happy, I'd still want to average a couple of 'liveRun' executions together rather than relying on one. I already added "forceful GC" code to attempt to isolate the tests from GC from other tests. That would be all I could think of to get the best results... From my numbers, the JDK ConcurrentHashMap is clearly the best option, but unfortunately not available as a solution until 4.0 when we move to jdk5. Other than that, the Doug Lea / Oswego ConcurrentReaderHashMap seemed to significantly beat the others for this access pattern. Steve, are you able to post your modified test? It'd be useful for me to be able to run it, that's a fairly hugely substantial different between the figures so I'm curious as to what the difference is and if your changes give me substantially different results also.
Here you go.
I really liked the use of oswego to control the test threads, so i was looking at doing something junitperf-like using that concept (curious where you found that). I'll look at moving to use that today. The biggest changes were: 1) the gc calls 2) doing the map filling one time rather than (included in timings) for each test the rest were 'cosmetic' Thanks Steve -- those results ARE extremely confusing, as I can reproduce your results on my machine, but running MY version produces basically the same results as before. This, to put not too fine a point on it, confused the @#$! out of me. After spending WAY too long incrementally refactoring my code towards yours, though, continually running mine until I got results like yours, I worked it out. There IS one subtle difference between our codebases.
You have: Map fastMap = new org.apache.commons.collections.FastHashMap(); where I had: liveRun(new FastHashMap(), base); with (drumroll): import org.hibernate.util.FastHashMap; org.hibernate.util.FastHashMap != org.apache.commons.collections.FastHashMap. D'oh! Can't believe it took me that long to work it out. :) If I amend your code to add: Map fastMapHibernate = new org.hibernate.util.FastHashMap(); fastMapHibernate.put( EntityMode.DOM4J, tuplizer1 ); fastMapHibernate.put( EntityMode.POJO, tuplizer2 ); and the according dry + live runs, I get: baseline : 2815 ms FastHashMap (Apache) : 2576 ms (ratio=0.9150977) FastHashMap (Hibernate) : 380 ms (ratio=0.13499112) ConcurrentHashMap (jdk) : 473 ms (ratio=0.16802841) ConcurrentHashMap (dl) : 1740 ms (ratio=0.6181172) ConcurrentReaderHashMap (dl) : 1060 ms (ratio=0.37655416) which neatly encapsulates the difference we got (again, this is with the server VM, the proportions differ somewhat with the client VM but they're in the ballpark). Why is this the case? Well, change the code to call .setFast(true) on the Apache map and suddenly the figure drops to ~ the org.hibernate version. If the Apache one isn't in fast mode, it's effectively fully synchronised (which explains why the performance is close to the Collections.synchronizedMap wrapper). So that explains that. Phew! Hopefully that gets us all on the same page. 2 other things: 1) Minor thing but, if you're thinking of landing this test in the codebase, check the classname: 'Acess' -> 'Access'. 2) I don't know where the 'open the floodgates' latch-based concurrent test idiom came from, to be honest. Been using it for a while; either stumbled upon it myself one day or absorbed it by osmosis from a colleague, not sure. It's damn handy though! LOL, sorry, I actually intended that... not the confusion part ;) the "test c-collections FastHashMap" part. But, yes I forgot about the 'fast switch' on that impl
So then it looks like the Hibernate FastHashMap is actually the best option until we move to JDK5, at which point I would drop it in favor of java.util.ConcurrentHashMap since (a) the diff is statistically minor and (b) less for us to maintain. so, I think we have a winner Any chance of this making it into 3.2.6? I've attached a patch which I think captures the final decision on this bug.
Fix committed to trunk / 3.2.
I've added a constructor to EntityModeToTuplizerMapping with no arguments that creates a org.hibernate.util.FastHashMap for the tuplizers. I've also added a constructor that takes a Map argument to facilitate performance testing with various map implementation (like the EntityModeToTuplizerMapping implementation in Paul Cowen's test). All EntityModeToTuplizerMapping subclasses use the constructor with no arguments. This issue appears to be fixed in core/tags/v326, but doesn't show in the 3.2.6 release notes (I assume because this issue doesn't have a 'fix version'). Might want to amend that?
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I'm going to see if we can try the custom build using a java 5 concurrent reader hash map, it's a good idea.