In situations where we do not know the true population mean, we can use confidence intervals to find a "best guess" range of values where we think the true population mean may lie.

## Confidence intervals with sample means explained

Going off the example in What is a t-score?, imagine that you've just enrolled at Crammer Nation University and finished your first day of classes. You just called it quits on your high school relationship, and are looking to hop right into the college action.

It's fraternity rush season, and you hear that the brothers of Sigma Apple Pi absolutely pull. The average Tinder matches per day that brothers of Sigma Apple Pi get is unknown, but you've heard it's high.

You decide to test things out by taking a sample of Sigma Apple Pi brothers to establish a range of values in which you can be confident the true population mean for brothers' Tinder matches per day lies.

Herein lies the purpose of a confidence interval...

A confidence interval enables one to obtain a **range of values** in which the **true population** parameter lies, with a **defined confidence level**.

Without digging into the math yet, let's say you surveyed 35 random brothers and found that the sample had a mean of 23.2 Tinder matches per day with a standard deviation of 3.2 matches. Based on these numbers, we can compute the following 95% confidence interval (CI):

This means we're 95% confident that the true mean Tinder matches per day for brothers of Sigma Apple Pie lies within this interval for the entire population of brothers. **We are not** saying that there's a 95% *probability*... rather that we are 95% *confident*.

### The confidence interval represented visually

When we took the sample of 35 random Sigma Apple Pi brothers, it resulted in a sampling t-distribution that looks like so:

Because we were not given the population standard deviation (**σ**) of the Sigma Apple Pi brothers average Tinder matches per day. We were given the sample standard deviation (**s**), and based on this graphic from What is a t-score?...

...that means we have to use a t-score (which means we'll be using a t-distribution).

If you want an example using z-score (**Z***) instead of t-score (**t* _{n-1}**), skip ahead to What is a confidence interval, in relation to a sample proportion?

Our 95% confidence interval means that we're looking for the t-score values (which relate to daily Tinder match values) that account for the middle 95% area under the sampling distribution curve.

## Pause... let's hit rewind for a second.

Before moving into the calculations needed to find the above confidence interval, there's some crucial understandings we must establish.

### The crucial difference between "probability" and "confidence"

Above, we said:

"**We are not** saying that there's a 95% *probability*... rather that we are 95% *confident*."

What's the difference between "probability" and "confidence" here?

It is summarized well in this Math StackExchange article when BruceET states the following:

Either μ lies in the interval or it doesn't. There is no "probability" about it. The

processby which the interval is derived leads to coverage in 95% of cases over the long run.

In other words, the true population mean already exists. There's not a "probability" of it being a certain value.

However, we often can't know for certain what that value is. It's often impossible to gather all the data from a given population to get that true population mean. That's why we must gauge our *confidence* that it lies between a range of values!

The population mean already exists. There's no **probability** of it being in a certain value. That's why we gauge our **confidence** of it lying between a range of values, since we often can't know for certain where the actual value lies.

### How can we be that confident?

In the situation above, let's assume that the chapter of Sigma Apple Pi has 500+ brothers. Therefore, there's millions of different combinations of 35 brothers that we could've sampled. Due to limitations on time and resources, we were limited to only taking one sample.

Imagine, however, we took 20 different samples (each of 35 Sigma Apple Pi brothers) instead of 1. Each of those samples would produce different sample means and different confidence intervals, because each sample would be composed of different Tinder match data.

And... a certain *percentage* of them would contain the true population mean!

But, what percentage would that be?

In our case, it'd be 95%, since that was our confidence level. Check out how 95% (so 19) of the intervals contain the true population mean...

...and 5% (so 1) doesn't contain it.

In summary...

Your confidence interval percentage / confidence level determines what **percentage** of random samples **of the same size** would contain the true population parameter.

If you'd like deeper reading on this topic of understanding the confidence level, check out this article from Statistics by Jim!

### What impacts the range that our confidence interval covers?

There's two factors that impact the range of values that our confidence interval covers: sample size and confidence level.

#### Sample size

Simply put, the more data points you have, the more confident you will be in your findings!

This returns back to the whole concept of standard error and Law of Large Numbers that we discussed in What is a z-score, in relation to sample means?The larger your sample size, the closer your sample means will hug the true population mean. Click here to revisit that concept.

In summary...

The **smaller / bigger** your **sample size** (while holding confidence level constant), the **wider / narrower** your confidence interval will be.

#### Confidence level

Returning back to the image of the sampling distribution...

...if we wanted to be even more confident, say 99% confident, that the true population mean of Sigma Apple Pi Tinder matches lies in a range of values, then we'd need to expand the bounds of our confidence interval to include the middle 99% area under the t-distribution curve.

Notice how now, our confidence interval has widened to account for more of the area under the sampling distribution curve?

In summary...

The **higher** / **lower** your **confidence level** (while holding sample size constant), the **wider** / **narrower** your confidence interval will be.

## How to calculate your confidence interval

Let's frame the scenario above into a formalized question and solve!

You're going through fraternity rush at Crammer Nation University and hear that Sigma Apple Pi brothers get a lot of Tinder matches. You take a random sample of 35 brothers and find a sample mean of 23.2 daily Tinder matches with a standard deviation of 3.2 matches. Based on this, find a 95% confidence interval for the true mean daily Tinder matches of Sigma Apple Pi brothers.

Based on this graphic...

...since we don't have the population standard deviation, we'll be using a t-score (represented by **t* _{n-1}**) to solve for our confidence interval with the following formula:

Then we'd plug in the population standard deviation...

...as well as a z-score (represented by **Z***) instead of a t-score (**t* _{n-1}**), as long as our sample size is above 30 (due to Central Limit Theorem).

To find our Z* value, we'd just utilize the below table!

Confidence Level | Z* value |

90% | 1.645 |

95% | 1.960 |

99% | 2.576 |

We'll touch on this a little more in What is a confidence interval, in relation to sample proportions? if you'd like to skip ahead!

Before we start breaking down each of these variables, it's important we understand where our margin of error is occurring.

### Recognizing margin of error

The whole purpose of a confidence interval is to establish a *range* of values that we are confident our population mean will lie. That range of value centers around our *sample mean*.

The margin of error essentially creates a buffer above & below our sample mean to establish the range of values we're confident the true population mean lies.

The **margin of error** acts as a **buffer** from the sample mean to account for the random sampling error that can occur when taking a **sample** from a population. It determines the **distance** between the sample parameter and the **upper** & **lower** bounds of the confidence interval.

This margin of error can be seen mathematically in our formula for confidence interval here:

By adding and subtracting (±) the margin of error from the sample mean...

...we get the upper (+) and lower (-) bounds of our confidence interval!

### Plugging in sample mean

As stated in the prompt, our sample mean is 23.2 daily Tinder matches...

You're going through fraternity rush at Crammer Nation University and hear that Sigma Apple Pi brothers get a lot of Tinder matches. You take a random sample of 35 brothers and find a sample mean of 23.2 daily Tinder matches with a standard deviation of 3.2 matches. Based on this, find a 95% confidence interval for the true mean daily Tinder matches of Sigma Apple Pi brothers.

...therefore, we'll plug in 23.2 for **x-bar**!

### Finding t*_{n-1}

Remember, **t* _{n-1}** represent s a t-score value that we must find in the t-table.

To determine where our **t* _{n-1}** value is within this table, we need to identify our (1) alpha level and (2) degrees of freedom.

#### Recognizing our alpha level

We're working with a 95% confidence interval in this problem. So... what p-value does that mean we should be looking for in the t-table?

We should look for the p-value that corresponds to our alpha level, which can be solved mathematically in the equation below:

**alpha** = (1 - **Confidence Level**) / 2

In our case, our **Confidence Level** is 95% (a.k.a. 0.95)...

**alpha** = (1 - 0.95) / 2

...so our **alpha** value is 0.025.

**alpha** = (1 - 0.95) / 2 = 0.05 / 2 = 0.025

This alpha level essentially means that on our sampling distribution, we're looking for the t-score that enables us to have an area of 0.025 in the right tail of our t-distribution.

A t-distribution curve is symmetrical. Therefore, the t-score that we find for the right-side will be the same as the left-side, just negative.

This is accounted for by the fact that our confidence interval equation has the plus/minus (±) sign!

You'll notice this corresponds to the graphic at the top of the t-table, with the area to the right of the "t_{α}" (the t-score) shaded, resulting in a p-value of "α"!

Therefore, on the t-table, by looking for p-values corresponding to our alpha level of 0.025, we'll be able to find the t-score value (**t* _{n-1}**) that we need to compose our confidence interval!

#### Recognizing our degrees of freedom

If you remember in What is a t-score?, we used the following formula for degrees of freedom:

**df** = **n** - 1

In our case, we've got a sample size of 35...

**df** = 35 - 1

...therefore our degrees of freedom are 34.

**df** = 34

#### Locating our t*_{n-1} value

Now that we've got our alpha level and our degrees of freedom, let's go to the row in the t-table corresponding to 34 degrees of freedom...

...and in the column corresponding to an p-value of 0.025 (which is our alpha level!)...

...find our t-score of 2.032!

Let's go ahead and plug in that t-score of 2.032 for **t* _{n-1}**!

### Plugging in sample standard deviation

Based on the prompt, our sample standard deviation is 2.3 matches...

You're going through fraternity rush at Crammer Nation University and hear that Sigma Apple Pi brothers get a lot of Tinder matches. You take a random sample of 35 brothers and find a sample mean of 23.2 daily Tinder matches with a standard deviation of 3.2 matches. Based on this, find a 95% confidence interval for the true mean daily Tinder matches of Sigma Apple Pi brothers.

...therefore, let's plug it in for **s**!

### Plugging in sample size

Our sample size is 35 based on the prompt...

...therefore, let's plug in 35 for **n**!

### Solve for the confidence interval

When we solve this out, we get the following:

When we separate the "±" to create our lower bound (-) and upper bound (+)...

...it results in the following confidence interval!

## How to interpret the confidence interval

Before interpreting a confidence interval, I always recommend that you re-read the question to remind yourself what exactly you're interpreting. It's easy to get caught up in the math and forget.

You're going through fraternity rush at Crammer Nation University and hear a rumor that Sigma Apple Pi brothers get an average of 25 daily Tinder matches. You take a random sample of 35 brothers and find a sample mean of 23.2 daily Tinder matches with a standard deviation of 3.2 matches. Based on this, find a 95% confidence interval for the true mean daily Tinder matches of Sigma Apple Pi brothers.

Now, to create our interpretation, I'm first going to give you a template that you can apply to every confidence interval problem you face, then we'll break down what it actually means.

We are **confidence level ** % confidence that the true population __ __**mean vs. **__proportio__**n**__ __ **description of what problem is finding** is between **lower CI bound** and **upper CI bound** .

In our case, our answer would look like so:

We are 95% confidence that the true population mean daily Tinder matches for Sigma Apple Pi brothers is between 22.1 and 24.3.

### Don't get tricked by this question!

Sometimes, you'll be a given a follow-up question to this interpretation like so:

**Follow-up** **Question**: Can you be sure that the true mean daily Tinder matches for Sigma Apple Pi brothers is not 25?

This is in reference to the first part of the prompt...

You're going through fraternity rush at Crammer Nation University and hear a rumor that Sigma Apple Pi brothers get an average of 25 daily Tinder matches. You take a random sample of 35 brothers and find a sample mean of 23.2 daily Tinder matches with a standard deviation of 3.2 matches. Based on this, find a 95% confidence interval for the true mean daily Tinder matches of Sigma Apple Pi brothers.

...and essentially is a trick question. How so?

Considering that 25 is outside our confidence interval, it's *possible* that it's the true population mean is not 25... however, it's not* guaranteed.* Remember, we're just 95% confident it's in the range of 22.1 and 24.3... not 100% confident!!

For this reason, we can *not* be sure that the true mean daily Tinder matches for Sigma Apple Pi brothers is not 25.

Just because a **value** falls outside the **confidence** **interval** doesn't mean the true population parameter is *definitely* not that value!

Don't get tricked by that on your exam!