Before digging into GROUP BY, it's important that we understand aggregate functions first.

## Aggregate functions explained

**Aggregate functions** are a very simple concept to understand, so this'll be a piece of cake.

They're used to conduct calculations on a list of values.

Let's say that we've got the following list of numbers.

numbers |
---|

1 |

2 |

3 |

We can use the following aggregate functions on these numbers and get the results shown in the examples.

Function | Description | Example |
---|---|---|

AVG | Average of values | AVG(numbers) = (1 + 2 + 3) / 3 = 2 |

MIN | Minimum of values | MIN(numbers) = 1, 2, 3 = 1 |

MAX | Maximum of values | MAX(numbers) = 1, 2, 3 = 3 |

SUM | Sum of the values | SUM(numbers) = 1 + 2 + 3 = 6 |

COUNT | How many values | COUNT(numbers) = 3 |

A few things to note before moving on.

### The difference between SUM and COUNT

It's incredibly easy to mix up SUM and COUNT, so here's the difference:

SUM takes a list of values and **totals** them.

COUNT takes a list and determines **how many** values we have.

The reason why SUM(numbers) equals 6 is because 1 + 2 + 3 = 6.

The reason why COUNT(numbers) equals 3 is because we have 3 values.

### The difference between COUNT(numbers) and COUNT(*)

For the sake of this concept, let's throw in a null value at the end of numbers.

numbers |
---|

1 |

2 |

3 |

(null) |

In this case, COUNT(numbers) will *still* return a value of 3. Why? Because…

When you specify the field in your COUNT function, it will not include **null** **values** in the count.

However, COUNT(*) will return a value of **4**. Why? Because…

**COUNT(*)** essentially tells SQL to return a count of the **rows** present. It doesn't pay attention to any **null** **values** in those rows.

So, since the null value still counts as a row, it's included in COUNT(*).

Alright, now that we've got aggregate functions under our belt, let's move onto GROUP BY!

## GROUP BY explained

Let's say I give you the following list of ages for boys and girls…

Boys: 9, 10, 11

Girls: 10, 11, 12

…and asked you what the average age is for boys versus girls.

Easy money. Boys' average age is **10**. Girls' average age is **11**.

So, how'd you do this mathematically? You *grouped* each age value by gender and then calculated the average.

First, you took all the boys' ages and *grouped* them together. Then you summed them and divided by 3, resulting in 10, since (9 + 10 + 11) / 3 = 10.

Then, you took the girls' ages and did the same thing, resulting in 11, since (10 + 11 + 12) / 3 = 11.

In this situation, you accomplished what a GROUP BY statement does. You grouped the values by a field, gender, in order to calculate the average age for each value of that gender field.

GROUP BY statements place records in **groups** by **field** values. They then run aggregate functions on each group.

## How to code a GROUP BY

Here's the template for GROUP BY statements:

I’m a Miami University (OH) 2021 alumni who majored in Information Systems. At Miami, I tutored students in Python, SQL, JavaScript, and HTML for 2+ years. I’m a huge fantasy football fan, Marvel nerd, and love hanging out with my friends here in Chicago where I currently reside.