Issue Details (XML | Word | Printable)

Key: HHH-1615
Type: Improvement Improvement
Status: Open Open
Priority: Critical Critical
Assignee: Anthony Patricio
Reporter: Xavier Bugaud
Votes: 59
Watchers: 45
Operations

If you were logged in you would be able to see more operations.
Hibernate Core

GROUP BY entity does not work

Created: 28/Mar/06 07:06 AM   Updated: 06/Jan/10 10:07 AM
Component/s: query-hql
Affects Version/s: 3.2.5
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments: 1. Text File group-by-discriminator.patch (2 kB)
2. File hibernate-group-by.diff (6 kB)

Environment: PostgreSQL 8.1.3, Hibernate 3.1.3
Issue Links:
Duplicate
 
Prerequisite
 

Participants: Anders Wallgren, Anthony Patricio, Benny Goemans, Bob Tiernay, Christian Bauer, Diego Plentz, Eitan Suez, Gavin Clarke, Giampaolo Tomassoni, Mathieu Avoine, Nick Palombo, Patrick Burleson, Rafael Santini, Sandeep Tamhankar, Stephane Epardaud, Steve Van Reeth, Tim Lüecke and Xavier Bugaud


 Description  « Hide

The query : "select cat, count from Cat cat group by cat" does not work.

ERROR: column "xxxx" must appear in the GROUP BY clause or be used in an aggregate function

See :



Giampaolo Tomassoni added a comment - 22/May/06 04:47 AM

For another example of the very same problem, see: http://forum.hibernate.org/viewtopic.php?t=959692


Patrick Burleson added a comment - 10/Jun/06 04:29 PM

Just ran into this this other day. Seems odd since this is in the examples in Chapter 11 of the reference guide. Does it only affect PosgreSQL?


Nick Palombo added a comment - 26/Jul/06 02:56 PM

Ran into this today with Microsoft SQL Server.


Gavin Clarke added a comment - 25/Aug/06 11:06 AM

Also seen with HSQLDB


Benny Goemans added a comment - 17/Sep/06 01:14 PM

Can anyone tell me if this can be circumvented some way? I'm migrating from a MySQL to PostgreSQL database soon and can't afford my system going down.
And if someone knows a fix, please tell me.


Christian Bauer added a comment - 17/Sep/06 01:27 PM

Until this expansion is implemented, use:

select count(foo.bar), foo from Foo foo group by foo.id, foo.all, foo.other, foo.properties, foo.of, foo.entity


Patrick Burleson added a comment - 17/Sep/06 03:29 PM

Changed from Bug to Improvement and Priority changed from Major to Minor?

Something that's used in the docs as one of the things hibernate does that doesn't work at all except on one database is "minor and an improvement"?

For reference, third example in section 14.11 .


Christian Bauer added a comment - 17/Sep/06 03:58 PM

This has nothing to do with the database dialect. Hibernate simply doesn't expand entity properties in the GROUP BY clause. The examples and tests work because the entity there has no expandable properties, only an identifier. Would you like to submit a patch that implements this feature?


Christian Bauer added a comment - 17/Sep/06 04:05 PM

And I've now changed this SINGLE wrong example in the documentation, the other dozen examples of GROUP BY are correct.


Patrick Burleson added a comment - 17/Sep/06 04:14 PM

Thank you for updating the documention, as the example outlined by :

select cat
from Cat cat
join cat.kittens kitten
group by cat
having avg(kitten.weight) > 100
order by count(kitten) asc, sum(kitten.weight) desc

is very misleading. If this wasn't an intended feature, than sure, this issue is minor and an improvment as you could work around it.

And ANY wrong example in the documentation will create a bug report like this. Doesn't matter that there was just one wrong one. A wrong example in any manual will have users trying to do something that can't be done.

If I can find some time in the next little bit, I'd be happy to find a way to add this and submit a patch.


Sandeep Tamhankar added a comment - 07/Nov/07 02:12 PM

I'm glad to see this is marked 'critical', but I agree with Patrick Burleson's comment that this really is a bug and not an improvement.


Diego Plentz added a comment - 07/Nov/07 05:12 PM

Well, since it has 27 votes and 18 watchers, I think that someone from the community could provide a patch


Sandeep Tamhankar added a comment - 08/Nov/07 11:51 AM

Well, it turns out I can't use the workaround recommended above (because I don't know which entity I'm querying when I generate my query) nor any other workaround I can think of, so I started looking into fixing Hibernate yesterday. Hope to have a patch ready by tomorrow.


Sandeep Tamhankar added a comment - 09/Nov/07 06:48 PM

Patch for converting entity refs in group by clauses in HQL to all of the entity fields in SQL.


Mathieu Avoine added a comment - 06/Dec/07 08:13 AM

If issue #2436 is a bug and is the exact same issue, why is this one is marked as "Improvement"? Also, are there any changes for this bug to be fixed in the next release? Thanks!


Bob Tiernay added a comment - 06/Feb/08 03:07 PM

This is, in my opinion, a huge bug. It also breaks compliance with jpql:

"groupby_clause ::= GROUP BY groupby_item {, groupby_item}*

groupby_item ::= single_valued_path_expression | identification_variable

... The requirements for the SELECT clause when GROUP BY is used follow those of SQL: namely, any item that appears in the SELECT clause (other than as an argument to an aggregate function) must also appear in the GROUP BY clause. In forming the groups, null values are treated as the same for grouping purposes. Grouping by an entity is permitted. In this case, the entity must contain no serialized state fields or lob-valued state fields..."


Sandeep Tamhankar added a comment - 06/Feb/08 03:50 PM

Agreed. I couldn't live with this when I ran into it, so I tweaked Hibernate and attached a patch to this bug. If you need this functionality, try out my patch.

That said, doing this can make for really slow query execution in MySQL, and in the end I stopped doing queries like the examples above in my app and instead do id queries followed by object queries.

select c
from Cat c, ...
where ...
group by c
having ...
order by ...

=>

select c.id
from Cat c, ...
where ...
group by c.id
having ...
order by ...

select c
from Cat c
where c.id in (id1, id2, ...)

This is more work in the application:
1. If you care about order, you need to keep track of the id query result order so that you can rejigger the results from the object query.
2. There's a limit on how many ids you can supply in the object query predicate. So you may need to do multiple object queries and re-order results as you go.

But this is generally quite fast because the queries should be using the primary key index....well, it's not that simple either. Fields in the order by clause must be in the group by as well (not sure if this is true for MySQL, but it's true for some db's that my app has to work with). In any case, fewer fields in group by clause means you need smaller indices to support the queries, so overall performance is much better.


Giampaolo Tomassoni added a comment - 06/Feb/08 05:08 PM

I have the impression your example is a bit misleading. Grouped queries in which you need to get a whole entity are generally ones in which some ORM is involveld, otherwise the "group by" construct is redundant.

In example, the HQL query:

SELECT c, COUNT
FROM Company AS c INNER JOIN c.products AS p
WHERE p.price > 150.00
GROUP BY c
HAVING COUNT > 1
ORDER BY c.income ASC

could be mapped by a patched Hibernate to something like, say, the following SQL select:

SELECT c.id, c.name, c.income, COUNT
FROM companies AS c RIGHT OUTER JOIN products AS p ON c.id=p.idcompany
WHERE p.price > 150.00
GROUP BY c.id, c.name, c.income
HAVING COUNT > 1
ORDER BY c.income ASC

I think that many SQL optimizers should discover the speed-up pattern available on this kind of redundanly-grouped queries: c.id is a primary key and the value of c.name and c.income depend on it. Pheraps MySQL misses a true query optimizer?

Anyway, in order to improve performances when the optimizer isn't smart enough, the trick in SQL is field post-fetching. In example, a patched Hibernate could instead map the above HQL query in the following SQL one:

SELECT c1.id, c1.name, c1.income, c2.cnt
FROM companies AS c1 RIGHT OUTER JOIN (
SELECT c.id, COUNT AS cnt
FROM companies AS c INNER JOIN products AS p ON c.id=p.idcompany
WHERE p.price > 150.00
GROUP BY c.id
HAVING cnt > 1
) AS c2 ON c1.id=c.id
ORDER BY c1.income ASC

See? The HQL query is basically mimed by the inner select, which is then wrapped by one meant mostly to dereference entity fields. This needs a grouping index having the same width of the company table's primary key (1 assumed in this example), and results in better performances with respect to fetching all the grouped data in the application and then dereferencing the IDs from there.

It needs subqueries and joins, besides. I don't know if MySQL does support subqueries, but most of the other SQL engines handled by Hibernate do.

So, a general solution to the problem outlined in this issue would be to adopt the subquery+joins mapping when the target SQL engine supports them, falling back to the multi-field grouping when it doesn't.

Please note I didn't run the above HQL and SQL queries, thereby it may be there is something wrong in them.


Tim Lüecke added a comment - 07/Feb/08 02:54 AM

At least Oracle 10g does not seem to have an optimizer smart enogh to discover the speed-up pattern you proposed. A query very similar to yours turned out to be quite more costly if you added just one more column to the select and the group by clause. The difference seems to be, that when only the ID is selected, no sort is performed.


Giampaolo Tomassoni added a comment - 07/Feb/08 11:27 AM

Do you experience the same slowdown with both SQL queries or the second (the one with a subselect) is faster?


Tim Lüecke added a comment - 08/Feb/08 02:13 AM

I only examined the first query, thinking the second query should be faster anyway. However, as it turns out the second query is even slower than the first in my example. The fastest way in this simple example seems to be to perform a subselect in the select clause instead.


Giampaolo Tomassoni added a comment - 08/Feb/08 07:47 AM

You mean, to fetch each single field with a subselect? Timings are also strongly tied to the number of fields to fetch, then. Right?

I don't have an Oracle installation handy. I can only produce timings with PostgreSQL 8.2.

Nor I have a suitable companies/products database handy. But I have an Italian provinces/localities database relating 15125 localities (towns) many-to-one with the corresponding province from a set of 104.

Timings are this way:

Case 1: explode attributes.

SELECT p.idlocality, p.code, p.name, COUNT
FROM italy.provinces AS p INNER JOIN italy.cities AS c ON c.idprovince=p.idlocality
WHERE p.code NOT LIKE '_A'
GROUP BY p.idlocality, p.code, p.name
HAVING COUNT > 100
ORDER BY p.name ASC

Output row count: 69
Total runtime: 36.062 ms

Case 2: wrapping dereferencing query

SELECT p1.idlocality, p1.code, p1.name, p2.cnt
FROM italy.provinces AS p1 RIGHT OUTER JOIN (
SELECT p.idlocality, COUNT AS cnt
FROM italy.provinces AS p INNER JOIN italy.cities AS c ON c.idprovince=p.idlocality
WHERE p.code NOT LIKE '_A'
GROUP BY p.idlocality
HAVING COUNT > 100
) AS p2 ON p1.idlocality=p2.idlocality
ORDER BY p1.name ASC

Output row count: 69 (of course, the same as case#1)
Total runtime: 29.829 ms

Now, I believe that in this examples timings differences mostly depend on attribute dereferencing, not on the "core" query. They depend on the count of output rows, then.

Hoever, it seems to me that case#2 performs better and that this improvement in performances would be even more evident as the number of output rows increases.

Finally, albeit all this performance tests aresomehow related to this issue, I think it would be better to open a thread in some forum. In this way we could even exange test datasets, don't you agree?


Anders Wallgren added a comment - 21/Jul/08 05:15 PM

Here is a (naive) implementation of support for subclasses (i.e. the discriminator column) for single-table polymorphism.

It's based 3.2.6 and requires Sandeep's patch be applied first.


Steve Van Reeth added a comment - 21/Nov/08 12:51 PM

Did you try this ? select new list(cat, count) from Cat cat group by cat

Something strange with the generated SQL of this solution:

select p.productType, count from Product p group by p.productType
->
SELECT P.PRODUCT_TYPE_ID, COUNT
FROM PRODUCT p INNER JOIN PRODUCT_TYPE T ON (T.ID = P.PRODUCT_TYPE_ID)
GROUP BY P.TYPE_ID

The inner join is not useful. Why Hibernate do this ?


Rafael Santini added a comment - 01/Jul/09 09:38 AM

I also ran into this problem with SQL Server. With MySQL the same query works.

HQL:

select count(emprestimo.exemplar.obra), emprestimo.exemplar.obra from Emprestimo emprestimo where emprestimo.data between ? and ? group by emprestimo.exemplar.obra order by count(emprestimo.exemplar.obra) desc

SQL generated:

select count(exemplar1_.id_obra) as col_0_0_, exemplar1_.id_obra as col_1_0_, obra2_.id as id157_, obra2_.mfn as mfn157_, obra2_.titulo as titulo157_, obra2_.localizacao as localiza4_157_, obra2_.id_idioma as id5_157_, obra2_.id_tipo_obra as id6_157_, obra2_.id_categoria_obra as id7_157_, obra2_.id_area_obra as id8_157_, obra2_.id_base as id9_157_ from bs_emp_mov_emprestimo emprestimo0_, bs_emp_ace_exemplar exemplar1_, bs_emp_ace_obra obra2_ where emprestimo0_.id_exemplar=exemplar1_.id and exemplar1_.id_obra=obra2_.id and (emprestimo0_.data between ? and ?) group by exemplar1_.id_obra order by count(exemplar1_.id_obra) desc

Error:

A coluna 'bs_emp_ace_obra.ID' é inválida na lista de seleção porque não está contida em uma função agregada nem na cláusula GROUP BY.


Stephane Epardaud added a comment - 18/Sep/09 08:00 AM

Same issue here, I had to transform my nice short HQL that didn't work into a super long SQL that did work.


Eitan Suez added a comment - 06/Jan/10 10:07 AM

does anyone have a version of the patch against a more recent version of hibernate core? (v3.3).
thanks, / eitan