GROUP BY | BUS 104 – SQL Exam

Still feeling nervous for your exam? Check out the SQL Cram Kit for practice questions that emulate the difficulty that you'll face on your exam. Each question includes a thorough step-by-step explanation where I walk you through the entire problem solving process!

SQL Cram Kit logo

Aggregate functions explained

Aggregate functions are a very simple concept to understand, so this'll be a piece of cake.

They're used to conduct calculations on a list of values.

Let's say that we've got the following list of numbers.

numbers
1
2
3

We can use the following aggregate functions on these numbers and get the results shown in the examples.

Function Description Example w/ numbers
AVG Average of values AVG(numbers) = (1 + 2 + 3) / 3 = 2
MIN Minimum of values MIN(numbers) = 1, 2, 3 = 1
MAX Maximum of values MAX(numbers) = 1, 2, 3 = 3
SUM Sum of the values SUM(numbers) = 1 + 2 + 3 = 6
COUNT How many values COUNT(numbers) = 3

A few things to note before moving on.

The difference between SUM and COUNT

It's incredibly easy to mix up SUM and COUNT, so here's the difference:

SUM takes a list of values and totals them.

COUNT takes a list and determines how many values we have.

The reason why SUM(numbers) equals 6 is because 1 + 2 + 3 = 6.

The reason why COUNT(numbers) equals 3 is because we have 3 values.

The difference between COUNT(numbers) and COUNT(*)

For the sake of this concept, let's throw in a null value at the end of numbers.

numbers
1
2
3
(null)

In this case, COUNT(numbers) will still return a value of 3. Why? Because…

When you specify the field in your COUNT function, it will not include null values in the count.

However, COUNT(*) will return a value of 4. Why? Because…

COUNT(*) essentially tells SQL to return a count of the rows present. It doesn't pay attention to any null values in those rows.

So, since the null value still counts as a row, it's included in COUNT(*).

GROUP BY explained

Let's say I give you the following list of ages for boys and girls…

Boys: 9, 10, 11
Girls: 10, 11, 12

…and asked you what the average age is for boys versus girls.

Easy money. Boys' average age is 10. Girls' average age is 11.

So, how'd you do this mathematically? You grouped each age value by gender and then calculated the average.

First, you took all the boys' ages and grouped them together. Then you summed them and divided by 3, resulting in 10, since (9 + 10 + 11) / 3 = 10.

Then, you took the girls' ages and did the same thing, resulting in 11, since (10 + 11 + 12) / 3 = 11.

In this situation, you accomplished what a GROUP BY statement does. You grouped the values by a field, gender, in order to calculate the average age for each value of that gender field.

GROUP BY statements place records in groups by field values. They then run aggregate functions on each group.

How to code a GROUP BY

Here's the template for GROUP BY statements:

GROUP BY field(s);

This will make no sense to you now, so let's learn the concept with an example.

Let's take the above situation of boys versus girls ages, expect attack it in SQL terms.

Here's our table that we'll be working with, children, which contains the ages of the boys and girls above.

children
genderage
M9
M10
M11
F10
F11
F12

In this situation, it helps to first identify the end result we're going for and then work backwards.

So, what do we want to see?

Results
genderAverage age
M10
F11

This matches exactly what we stated above:

"Easy money. Boys' (M) average age is 10. Girls' (F) average age is 11."

So, how should we go about this displaying these results in SQL?

I'd recommend we first find the average age of all children, then group those ages by gender.

So, to find the average age of all children, all we need to do is place the AVG aggregate function in our SELECT statement…

SELECT AVG(___)
FROM database_name.table_name;

…with the field that we're taking the average of, ages, as its argument.

SELECT AVG(ages)
FROM database_name.table_name;

Next, let's nail down the database_name and table_name. We'll call the database people and the table children. (this really doesn't matter. Don't focus too much on it, we've just gotta put something in our FROM statement.)

SELECT AVG(ages)
FROM people.children;

When we execute this query, we get the following results:

Results
AVG(ages)
10.5

This makes sense, because the average of all the ages equals 10.5.

(9 + 10 + 11 + 10 + 11 + 12) / 6 = 10.5

Now, we want to place the gender values to the left of the average age, like we have in our goal results.

Results
genderAVG(ages)
M10
F11

So, all we need to do is place gender in our SELECT statement, right?

SELECT gender, AVG(ages)
FROM people.children;

NO. No, no, no. This is not all we need to do. We forgot something: our GROUP BY statement.

Whenever you have field(s) and aggregate function(s) in your SELECT statement, you must include all field(s) in your GROUP BY statement.

So, do we have field(s) in our SELECT statement? Yes, we have gender.

SELECT gender, AVG(ages)
FROM people.children;

Do we have aggregate function(s) in our SELECT statement? Yes, we have "AVG(ages)".

SELECT gender, AVG(ages)
FROM people.children;

Therefore, we must place all field(s), gender, in a GROUP BY statement.

SELECT gender, AVG(ages)
FROM people.children
GROUP BY gender;

Now, when we execute this query, we get our desired results.

Results
genderAVG(ages)
M10
F11

You might be wondering, why is it so important to include this GROUP BY statement?

Revisiting the purpose of GROUP BY

Let's rewind back to when I first gave you the list of ages for boys and girls.

Boys: 9, 10, 11
Girls: 10, 11, 12

It was pretty easy to calculate the average ages for boys versus girls in this situation, right? That's because I pre-grouped the ages based on gender for you.

Imagine if I had just given you this...

9, 10, 11, 10, 11, 12

...and told you to calculate the average ages for boys versus girls.

It'd be impossible! What ages are for boys and what ages are for girls?

From the perspective of SQL, not utilizing a GROUP BY statement would be exactly like this.

Here in lies the value of GROUP BY statements. They allow SQL to identify the groups of records to run the aggregate functions on.

If there is one thing I want you to take away from this article, or the entire "SQL in 30 Minutes" series for that matter, it's this:

Whenever you have field(s) and aggregate function(s) in your SELECT statement, you must include all field(s) in your GROUP BY statement.

That, right there, is your key to mastering GROUP BY statements in the shortest amount of time.

SQL Follow-along Guide

It's no secret you retain info better when you write it down. That's why I created the SQL Follow-along Guide for you!

As you come upon key concepts highlighted in yellow, like this sentence here, you can fill-in-the-blanks on your Follow-along Guide so that you remember all the important stuff for later.

You can obtain the SQL Follow-along Guide for FREE by entering your email below!

Now that we know how to run calculations on records in SQL, let's move onto learning how to filter those mathematical results with a HAVING statement!

Concepts

Each exam concept broken down with relatable situations to your life!

ToolsSQL Follow-along Guide (FREE)
LessonWhat is SQL?
LessonSELECT & FROM
LessonJOIN
LessonWHERE
LessonGROUP BY
LessonHAVING
LessonORDER BY
LessonTroubleshooting errors

Practice Problems

Include step-by-step explanations through each part of the problem!

SQL Cram Kit Logo

"The SQL Cram Kit was the best thing I could’ve done before my exam. It takes practice and more practice to be good at coding and that is what I needed."

Andrew Green
October 29, 2020

Rating: 5 out of 5.

Leave a Comment