Funky GROUP BY

Ivan Smahin ivan_smahin at valentina-db.com
Thu Oct 12 12:38:28 CDT 2006


Hello John,

Thursday, October 12, 2006, 1:41:31 AM, you wrote:

> Hi Ruslan, enclosed is the smaller DB, it contains 1 'candidate' record and
> only the linked race and results records that pertain to that candidate...
> very small # of records.

Ok, let's see.

The initial query is:

SELECT
        Candidate.CandidateID AS 'CCID', 
        Results.CandidateID, 
        Race.RaceCountyID as 'RRID', 
        Results.RaceCountyID, 
        Race.Meta_State, 
        Sum(Results.VoteCount) as 'VoteTally'
FROM    
        Results, Candidate, Race 
WHERE 
        Results.CandidateID = Candidate.CandidateID 
        and Race.RaceCountyID = Results.RaceCountyID
Group BY 
        Candidate.CandidateID, 
        Results.CandidateID, 
        Race.RaceCountyID, 
        Results.RaceCountyID, 
        Race.Meta_State


BTW,  take  a  look  on  the style - all clauses started from the same
position on separate row, tab-delimited elements and so on...
Many peoples believe it's most clear form.

We get 20 rows in the result and you think it's wrong - right?

20  rows means - we get 20 different data groups due to the "GROUP BY"
clause.  Indeed  we  have  to  see  20 groups, because RaceCountyID is
different for each row! So we get what we ask.


Another query:

SELECT
        Candidate.CandidateID AS 'CCID', 
        Results.CandidateID, 
        Race.Meta_State, 
        Sum(Results.VoteCount) as 'VoteTally'
FROM    
        Results, Candidate, Race 
WHERE 
        Results.CandidateID = Candidate.CandidateID 
        and Race.RaceCountyID = Results.RaceCountyID
Group BY 
        Candidate.CandidateID, 
        Results.CandidateID, 
        Race.Meta_State


Any  RaceCountyID  was  thrown away from select list - so we skip it in
the "group by" and we get single row as result.


I   assume   you  want to show RaceCountyID in the result but does not
want to use it in grouping? But how it could be done? You should think
about  "GROUP  BY"  as  operation which prepare NEW resulting table by
grouping  data  according to fields mentioned there! So you need apply
some  aggregation  to RaceCountyID as you want to get the single value
there.  Without aggregation you have to mention this field in group by
clause and get 20 different groups.

So it could be something like this:

SELECT
        Candidate.CandidateID AS 'CCID', 
        Results.CandidateID, 
        Race.Meta_State, 
        Sum(Results.VoteCount) as 'VoteTally',
        max(Race.RaceCountyID )
FROM    
        Results, Candidate, Race 
WHERE 
        Results.CandidateID = Candidate.CandidateID 
        and Race.RaceCountyID = Results.RaceCountyID
Group BY 
        Candidate.CandidateID, 
        Results.CandidateID, 
        Race.Meta_State


max(Race.RaceCountyID  )  will  make  the  single  value  as any other
aggregation  function  -  so  it will be the value calculated for each
group.

In plain English the last query sounds like:
"Show me summarized VoteCount and max. RaceCountyID  for each CandidateID
for each state"

The  tables contain "linked" data for single state and single candidate - so we
get a single row as result.


BTW,   what   the   reason   to   show    Candidate.CandidateID   and
Results.CandidateID as different columns? It is just the same values!

SELECT
        Candidate.CandidateID AS 'CCID', 
        Race.Meta_State,
        Sum(Results.VoteCount) as 'VoteTally',
        max(Race.RaceCountyID )
FROM    
        Results, Candidate, Race 
WHERE 
        Results.CandidateID = Candidate.CandidateID 
        and Race.RaceCountyID = Results.RaceCountyID
Group BY 
        Candidate.CandidateID, 
        Race.Meta_State


Personally,  I  think max(RaceCountyID) is not meaningful data, but it
depends on.

Something like average patients' temperature in the hospital. :)


I hope I was clear enough. :)
Forgot  to  say  -  everything  mentioned above is according to SQL-92
standard. :)

> Though strangely everything worked prior.) The GROUP BY
> statement is intended to return 1 row with the SUM of the votes for that
> candidate. When I used the old SQL with the old SQL db, I only provided 1
> GROUP BY column, which was used to aggregate the results. I think the
> 'problem' is now that I have to provide all the columns in GROUP BY, and the
> SUM function does not know which to aggregate by.

So you want to be allowed something like this?

SELECT
        Candidate.CandidateID AS 'CCID', 
        Race.Meta_State,
        Sum(Results.VoteCount) as 'VoteTally',
        Race.RaceCountyID
FROM    
        Results, Candidate, Race 
WHERE 
        Results.CandidateID = Candidate.CandidateID 
        and Race.RaceCountyID = Results.RaceCountyID
Group BY 
        Candidate.CandidateID, 
        Race.Meta_State


Some  time  ago some dbms vendors allow queries like this, but
tell me - what you expect to see in the RaceCountyID field?
In other words you want to show say 3 values as single one?
Who will take decision which value have to be used?



-- 
Best regards,
 Ivan                            mailto:ivan_smahin ÎÁ valentina-db.com




More information about the Valentina mailing list