|
|
|
[
Permlink
| « Hide
]
Giampaolo Tomassoni - 22/May/06 04:47 AM
For another example of the very same problem, see: http://forum.hibernate.org/viewtopic.php?t=959692
Just ran into this this other day. Seems odd since this is in the examples in Chapter 11 of the reference guide. Does it only affect PosgreSQL?
Ran into this today with Microsoft SQL Server.
Can anyone tell me if this can be circumvented some way? I'm migrating from a MySQL to PostgreSQL database soon and can't afford my system going down.
And if someone knows a fix, please tell me. Until this expansion is implemented, use:
select count(foo.bar), foo from Foo foo group by foo.id, foo.all, foo.other, foo.properties, foo.of, foo.entity Changed from Bug to Improvement and Priority changed from Major to Minor?
Something that's used in the docs as one of the things hibernate does that doesn't work at all except on one database is "minor and an improvement"? For reference, third example in section 14.11 . This has nothing to do with the database dialect. Hibernate simply doesn't expand entity properties in the GROUP BY clause. The examples and tests work because the entity there has no expandable properties, only an identifier. Would you like to submit a patch that implements this feature?
And I've now changed this _SINGLE_ wrong example in the documentation, the other dozen examples of GROUP BY are correct.
Thank you for updating the documention, as the example outlined by :
select cat from Cat cat join cat.kittens kitten group by cat having avg(kitten.weight) > 100 order by count(kitten) asc, sum(kitten.weight) desc is very misleading. If this wasn't an intended feature, than sure, this issue is minor and an improvment as you could work around it. And _ANY_ wrong example in the documentation will create a bug report like this. Doesn't matter that there was just one wrong one. A wrong example in any manual will have users trying to do something that can't be done. If I can find some time in the next little bit, I'd be happy to find a way to add this and submit a patch. I'm glad to see this is marked 'critical', but I agree with Patrick Burleson's comment that this really is a bug and not an improvement.
Well, since it has 27 votes and 18 watchers, I think that someone from the community could provide a patch :-)
Well, it turns out I can't use the workaround recommended above (because I don't know which entity I'm querying when I generate my query) nor any other workaround I can think of, so I started looking into fixing Hibernate yesterday. Hope to have a patch ready by tomorrow.
Patch for converting entity refs in group by clauses in HQL to all of the entity fields in SQL.
If issue #2436 is a bug and is the exact same issue, why is this one is marked as "Improvement"? Also, are there any changes for this bug to be fixed in the next release? Thanks!
This is, in my opinion, a huge bug. It also breaks compliance with jpql:
"groupby_clause ::= GROUP BY groupby_item {, groupby_item}* groupby_item ::= single_valued_path_expression | identification_variable ... The requirements for the SELECT clause when GROUP BY is used follow those of SQL: namely, any item that appears in the SELECT clause (other than as an argument to an aggregate function) must also appear in the GROUP BY clause. In forming the groups, null values are treated as the same for grouping purposes. Grouping by an entity is permitted. In this case, the entity must contain no serialized state fields or lob-valued state fields..." Agreed. I couldn't live with this when I ran into it, so I tweaked Hibernate and attached a patch to this bug. If you need this functionality, try out my patch.
That said, doing this can make for really slow query execution in MySQL, and in the end I stopped doing queries like the examples above in my app and instead do id queries followed by object queries. select c from Cat c, ... where ... group by c having ... order by ... => select c.id from Cat c, ... where ... group by c.id having ... order by ... select c from Cat c where c.id in (id1, id2, ...) This is more work in the application: 1. If you care about order, you need to keep track of the id query result order so that you can rejigger the results from the object query. 2. There's a limit on how many ids you can supply in the object query predicate. So you may need to do multiple object queries and re-order results as you go. But this is generally quite fast because the queries should be using the primary key index....well, it's not that simple either. Fields in the order by clause must be in the group by as well (not sure if this is true for MySQL, but it's true for some db's that my app has to work with). In any case, fewer fields in group by clause means you need smaller indices to support the queries, so overall performance is much better. I have the impression your example is a bit misleading. Grouped queries in which you need to get a whole entity are generally ones in which some ORM is involveld, otherwise the "group by" construct is redundant.
In example, the HQL query: SELECT c, COUNT(*) FROM Company AS c INNER JOIN c.products AS p WHERE p.price > 150.00 GROUP BY c HAVING COUNT(*) > 1 ORDER BY c.income ASC could be mapped by a patched Hibernate to something like, say, the following SQL select: SELECT c.id, c.name, c.income, COUNT(*) FROM companies AS c RIGHT OUTER JOIN products AS p ON c.id=p.idcompany WHERE p.price > 150.00 GROUP BY c.id, c.name, c.income HAVING COUNT(*) > 1 ORDER BY c.income ASC I think that many SQL optimizers should discover the speed-up pattern available on this kind of redundanly-grouped queries: c.id is a primary key and the value of c.name and c.income depend on it. Pheraps MySQL misses a true query optimizer? Anyway, in order to improve performances when the optimizer isn't smart enough, the trick in SQL is field post-fetching. In example, a patched Hibernate could instead map the above HQL query in the following SQL one: SELECT c1.id, c1.name, c1.income, c2.cnt FROM companies AS c1 RIGHT OUTER JOIN ( SELECT c.id, COUNT(*) AS cnt FROM companies AS c INNER JOIN products AS p ON c.id=p.idcompany WHERE p.price > 150.00 GROUP BY c.id HAVING cnt > 1 ) AS c2 ON c1.id=c.id ORDER BY c1.income ASC See? The HQL query is basically mimed by the inner select, which is then wrapped by one meant mostly to dereference entity fields. This needs a grouping index having the same width of the company table's primary key (1 assumed in this example), and results in better performances with respect to fetching all the grouped data in the application and then dereferencing the IDs from there. It needs subqueries and joins, besides. I don't know if MySQL does support subqueries, but most of the other SQL engines handled by Hibernate do. So, a general solution to the problem outlined in this issue would be to adopt the subquery+joins mapping when the target SQL engine supports them, falling back to the multi-field grouping when it doesn't. Please note I didn't run the above HQL and SQL queries, thereby it may be there is something wrong in them. At least Oracle 10g does not seem to have an optimizer smart enogh to discover the speed-up pattern you proposed. A query very similar to yours turned out to be quite more costly if you added just one more column to the select and the group by clause. The difference seems to be, that when only the ID is selected, no sort is performed.
Do you experience the same slowdown with both SQL queries or the second (the one with a subselect) is faster?
I only examined the first query, thinking the second query should be faster anyway. However, as it turns out the second query is even slower than the first in my example. The fastest way in this simple example seems to be to perform a subselect in the select clause instead.
You mean, to fetch each single field with a subselect? Timings are also strongly tied to the number of fields to fetch, then. Right?
I don't have an Oracle installation handy. I can only produce timings with PostgreSQL 8.2. Nor I have a suitable companies/products database handy. But I have an Italian provinces/localities database relating 15125 localities (towns) many-to-one with the corresponding province from a set of 104. Timings are this way: Case 1: explode attributes. SELECT p.idlocality, p.code, p.name, COUNT(*) FROM italy.provinces AS p INNER JOIN italy.cities AS c ON c.idprovince=p.idlocality WHERE p.code NOT LIKE '_A' GROUP BY p.idlocality, p.code, p.name HAVING COUNT(*) > 100 ORDER BY p.name ASC Output row count: 69 Total runtime: 36.062 ms Case 2: wrapping dereferencing query SELECT p1.idlocality, p1.code, p1.name, p2.cnt FROM italy.provinces AS p1 RIGHT OUTER JOIN ( SELECT p.idlocality, COUNT(*) AS cnt FROM italy.provinces AS p INNER JOIN italy.cities AS c ON c.idprovince=p.idlocality WHERE p.code NOT LIKE '_A' GROUP BY p.idlocality HAVING COUNT(*) > 100 ) AS p2 ON p1.idlocality=p2.idlocality ORDER BY p1.name ASC Output row count: 69 (of course, the same as case#1) Total runtime: 29.829 ms Now, I believe that in this examples timings differences mostly depend on attribute dereferencing, not on the "core" query. They depend on the count of output rows, then. Hoever, it seems to me that case#2 performs better and that this improvement in performances would be even more evident as the number of output rows increases. Finally, albeit all this performance tests aresomehow related to this issue, I think it would be better to open a thread in some forum. In this way we could even exange test datasets, don't you agree? Here is a (naive) implementation of support for subclasses (i.e. the discriminator column) for single-table polymorphism.
It's based 3.2.6 and requires Sandeep's patch be applied first. Did you try this ? select new list(cat, count(*)) from Cat cat group by cat
Something strange with the generated SQL of this solution: select p.productType, count(*) from Product p group by p.productType -> SELECT P.PRODUCT_TYPE_ID, COUNT(*) FROM PRODUCT p INNER JOIN PRODUCT_TYPE T ON (T.ID = P.PRODUCT_TYPE_ID) GROUP BY P.TYPE_ID The inner join is not useful. Why Hibernate do this ? |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||