Funky GROUP BY
Ivan Smahin
ivan_smahin at valentina-db.com
Thu Oct 12 12:38:28 CDT 2006
Hello John,
Thursday, October 12, 2006, 1:41:31 AM, you wrote:
> Hi Ruslan, enclosed is the smaller DB, it contains 1 'candidate' record and
> only the linked race and results records that pertain to that candidate...
> very small # of records.
Ok, let's see.
The initial query is:
SELECT
Candidate.CandidateID AS 'CCID',
Results.CandidateID,
Race.RaceCountyID as 'RRID',
Results.RaceCountyID,
Race.Meta_State,
Sum(Results.VoteCount) as 'VoteTally'
FROM
Results, Candidate, Race
WHERE
Results.CandidateID = Candidate.CandidateID
and Race.RaceCountyID = Results.RaceCountyID
Group BY
Candidate.CandidateID,
Results.CandidateID,
Race.RaceCountyID,
Results.RaceCountyID,
Race.Meta_State
BTW, take a look on the style - all clauses started from the same
position on separate row, tab-delimited elements and so on...
Many peoples believe it's most clear form.
We get 20 rows in the result and you think it's wrong - right?
20 rows means - we get 20 different data groups due to the "GROUP BY"
clause. Indeed we have to see 20 groups, because RaceCountyID is
different for each row! So we get what we ask.
Another query:
SELECT
Candidate.CandidateID AS 'CCID',
Results.CandidateID,
Race.Meta_State,
Sum(Results.VoteCount) as 'VoteTally'
FROM
Results, Candidate, Race
WHERE
Results.CandidateID = Candidate.CandidateID
and Race.RaceCountyID = Results.RaceCountyID
Group BY
Candidate.CandidateID,
Results.CandidateID,
Race.Meta_State
Any RaceCountyID was thrown away from select list - so we skip it in
the "group by" and we get single row as result.
I assume you want to show RaceCountyID in the result but does not
want to use it in grouping? But how it could be done? You should think
about "GROUP BY" as operation which prepare NEW resulting table by
grouping data according to fields mentioned there! So you need apply
some aggregation to RaceCountyID as you want to get the single value
there. Without aggregation you have to mention this field in group by
clause and get 20 different groups.
So it could be something like this:
SELECT
Candidate.CandidateID AS 'CCID',
Results.CandidateID,
Race.Meta_State,
Sum(Results.VoteCount) as 'VoteTally',
max(Race.RaceCountyID )
FROM
Results, Candidate, Race
WHERE
Results.CandidateID = Candidate.CandidateID
and Race.RaceCountyID = Results.RaceCountyID
Group BY
Candidate.CandidateID,
Results.CandidateID,
Race.Meta_State
max(Race.RaceCountyID ) will make the single value as any other
aggregation function - so it will be the value calculated for each
group.
In plain English the last query sounds like:
"Show me summarized VoteCount and max. RaceCountyID for each CandidateID
for each state"
The tables contain "linked" data for single state and single candidate - so we
get a single row as result.
BTW, what the reason to show Candidate.CandidateID and
Results.CandidateID as different columns? It is just the same values!
SELECT
Candidate.CandidateID AS 'CCID',
Race.Meta_State,
Sum(Results.VoteCount) as 'VoteTally',
max(Race.RaceCountyID )
FROM
Results, Candidate, Race
WHERE
Results.CandidateID = Candidate.CandidateID
and Race.RaceCountyID = Results.RaceCountyID
Group BY
Candidate.CandidateID,
Race.Meta_State
Personally, I think max(RaceCountyID) is not meaningful data, but it
depends on.
Something like average patients' temperature in the hospital. :)
I hope I was clear enough. :)
Forgot to say - everything mentioned above is according to SQL-92
standard. :)
> Though strangely everything worked prior.) The GROUP BY
> statement is intended to return 1 row with the SUM of the votes for that
> candidate. When I used the old SQL with the old SQL db, I only provided 1
> GROUP BY column, which was used to aggregate the results. I think the
> 'problem' is now that I have to provide all the columns in GROUP BY, and the
> SUM function does not know which to aggregate by.
So you want to be allowed something like this?
SELECT
Candidate.CandidateID AS 'CCID',
Race.Meta_State,
Sum(Results.VoteCount) as 'VoteTally',
Race.RaceCountyID
FROM
Results, Candidate, Race
WHERE
Results.CandidateID = Candidate.CandidateID
and Race.RaceCountyID = Results.RaceCountyID
Group BY
Candidate.CandidateID,
Race.Meta_State
Some time ago some dbms vendors allow queries like this, but
tell me - what you expect to see in the RaceCountyID field?
In other words you want to show say 3 values as single one?
Who will take decision which value have to be used?
--
Best regards,
Ivan mailto:ivan_smahin ÎÁ valentina-db.com
More information about the Valentina
mailing list