Before digging into GROUP BY, it's important that we understand aggregate functions first.
Aggregate functions explained
Aggregate functions are a very simple concept to understand, so this'll be a piece of cake.
They're used to conduct calculations on a list of values.
Let's say that we've got the following list of numbers.
We can use the following aggregate functions on these numbers and get the results shown in the examples.
|AVG||Average of values||AVG(numbers) = (1 + 2 + 3) / 3 = 2|
|MIN||Minimum of values||MIN(numbers) = 1, 2, 3 = 1|
|MAX||Maximum of values||MAX(numbers) = 1, 2, 3 = 3|
|SUM||Sum of the values||SUM(numbers) = 1 + 2 + 3 = 6|
|COUNT||How many values||COUNT(numbers) = 3|
A few things to note before moving on.
The difference between SUM and COUNT
It's incredibly easy to mix up SUM and COUNT, so here's the difference:
SUM takes a list of values and totals them.
COUNT takes a list and determines how many values we have.
The reason why SUM(numbers) equals 6 is because 1 + 2 + 3 = 6.
The reason why COUNT(numbers) equals 3 is because we have 3 values.
The difference between COUNT(numbers) and COUNT(*)
For the sake of this concept, let's throw in a null value at the end of numbers.
In this case, COUNT(numbers) will still return a value of 3. Why? Because…
When you specify the field in your COUNT function, it will not include null values in the count.
However, COUNT(*) will return a value of 4. Why? Because…
COUNT(*) essentially tells SQL to return a count of the rows present. It doesn't pay attention to any null values in those rows.
So, since the null value still counts as a row, it's included in COUNT(*).
Alright, now that we've got aggregate functions under our belt, let's move onto GROUP BY!
GROUP BY explained
Let's say I give you the following list of ages for boys and girls…
Boys: 9, 10, 11
Girls: 10, 11, 12
…and asked you what the average age is for boys versus girls.
Easy money. Boys' average age is 10. Girls' average age is 11.
So, how'd you do this mathematically? You grouped each age value by gender and then calculated the average.
First, you took all the boys' ages and grouped them together. Then you summed them and divided by 3, resulting in 10, since (9 + 10 + 11) / 3 = 10.
Then, you took the girls' ages and did the same thing, resulting in 11, since (10 + 11 + 12) / 3 = 11.
In this situation, you accomplished what a GROUP BY statement does. You grouped the values by a field, gender, in order to calculate the average age for each value of that gender field.
GROUP BY statements place records in groups by field values. They then run aggregate functions on each group.
How to code a GROUP BY
Here's the template for GROUP BY statements: