In situations where we do not know the true population proportion, we can use confidence intervals to find a "best guess" range of values where we think the true population proportion may lie.

## Confidence intervals with sample proportions explained

Going off our example in What is a z-score, in relation to a sample proportion?, imagine that you're sitting in class at Crammer Nation University and notice a large amount of your fellow freshmen are wearing their new Greek Life chapter merch.

You've heard a rumor that 30% of freshmen at Crammer Nation University joined Greek Life this semester, and you want to test that claim out. You take a sample of freshmen students to establish a range of values that you can be confident the true population proportion of freshmen Greek Life involvement lies.

As discussed in What is a confidence interval, in relation to a sample mean?, herein lies the purpose of a confidence interval!

A confidence interval enables one to obtain a **range of values** in which the **true population** parameter lies, with a **defined confidence level**.

Without digging into the math yet, let's say you surveyed 50 random freshmen students and found that 18 of them joined a Greek Life chapter. Based on this, we can compute the following 95% confidence interval (CI):

This means we're 95% confident that the true proportion of Crammer Nation University freshmen Greek Life involvement lies within this interval for the entire population of freshmen students. **We are not** saying that there's a 95% *probability*... rather that we are 95% *confident*.

### The confidence interval represented visually

This works pretty much the exact same as in What is a confidence interval, in relation to a sample mean?, except this time, our sample of 50 freshmen students with a sample proportion of 0.36 (since 18 / 50 = 0.36) results in the following z-distribution:

Because proportions only use z-scores (which utilize the z-distribution)!

Whenever you're given a confidence interval problem that involves sample proportions, you don't even have to worry about finding a t-score! Just use z-score (which utilizes the z-distribution)!

Our 95% confidence interval means that we're looking for the Greek Life involvement proportion values that account for the middle 95% area under the sampling distribution curve.

## Pause... let's hit rewind for a second.

Confidence intervals for sample proportions work pretty much the exact same as confidence intervals for sample means.

***IMPORTANT*** If you haven't already, I encourage you to read through the "Pause... let's hit rewind for a second." section of What is a confidence interval, in relation to sample means? It addresses the critical points about confidence intervals, which work the same between sample means vs. sample proportions!

## How to calculate your confidence interval

Let's frame the scenario above into a formalized question and solve!

You are curious the how many freshmen students from Crammer Nation University joined Greek Life this year. You conduct a random sample of 50 randomly selected freshmen and find that 18 of them joined a Greek Life chapter. What is the 95% confidence interval?

We will use the following confidence interval formula to solve:

Before we start breaking down each of these variables, it's important we understand where our margin of error is occurring.

### Recognizing margin of error

Very similar to What is a confidence interval, in relation to sample means?, the whole purpose of a confidence interval is to establish a *range* of values that we are confident our population proportion will lie. That range of value centers around our *sample proportion*.

As stated in What is a confidence interval, in relation to a sample mean?...

The **margin of error** acts as a **buffer** from the sample mean to account for the random sampling error that can occur when taking a **sample** from a population. It determines the **distance** between the sample mean and the **upper** & **lower** bounds of the confidence interval.

This margin of error can be seen mathematically in our formula for confidence interval here:

By adding and subtracting (±) the margin of error from the sample mean...

...we get the upper and lower bounds of our confidence interval!

### Plugging in sample proportion

The prompt states that 18 out 50 freshmen students joined Greek Life...

You are curious the how many freshmen students from Crammer Nation University joined Greek Life this year. You conduct a random sample of 50 randomly selected freshmen and find that 18 of them joined a Greek Life chapter. What is the 95% confidence interval?

...therefore, since 18 / 50 = 0.36, we'll plug in 0.36 for **p-hat**!

### Finding Z*

Remember, **Z*** represents a z-score value. In What is a confidence interval, in relation to sample means? we utilized the t-table to find our **t* _{n-1}** value. Lucky for us, we don't have to even touch the z-table when finding

**Z***...

...all we need to do is reference the below table!

Confidence Level | Z* value |

90% | 1.645 |

95% | 1.960 |

99% | 2.576 |

Unlike t-distributions (that change dependent on the degrees of freedom), there is only one z-distribution.

And in the case of confidence intervals, we often will only be dealing with 90%, 95%, and 99% confidence levels.

Therefore, instead of going through the effort finding our **Z*** value in the z-table each time, it's easier to just memorize the values in the table!

In our case, our confidence level is 95%...

You are curious the how many freshmen students from Crammer Nation University joined Greek Life this year. You conduct a random sample of 50 randomly selected freshmen and find that 18 of them joined a Greek Life chapter. What is the 95% confidence interval?

...therefore, we'll plug in 1.960 for **Z***!

### Plugging in sample proportion of failure

If you remember in What is a z-score, in relation to a sample proportion?, **q** represents the proportion of failure. In other words, it's the exact opposite of **p**.

**q** and **p** represent the *population* proportions of failure and success (respectively). **q-hat** and **p-hat** represent the *sample* proportions of failure and success (respectively).

Therefore, we can solve for the sample proportion of failure with the following equation:

**q-hat** = 1 - **p-hat**

Above, we found that **p-hat** equalled 0.36...

**q-hat** = 1 - 0.36

...therefore **q-hat** equals 0.64!

**q-hat** = 1 - 0.36 = 0.64

Let's go ahead and plug that in for **q-hat**!

### Plugging in sample size

The prompt states that the sample size was 50 freshmen students...

...therefore we'll plug in 50 for **n**!

### Solve for the confidence interval

When we solve this out, we get the following:

When we separate the "±" to create our lower bound (-) and upper bound (+)...

...it results in the following confidence interval!

## How to interpret the confidence interval

Before interpreting a confidence interval, I always recommend that you re-read the question to remind yourself what exactly you're interpreting. It's easy to get caught up in the math and forget.

Now, to create our interpretation, I'm first going to give you a template that you can apply to every confidence interval problem you face, then we'll break down what it actually means.

We are **confidence level ** % confidence that the true population __ __**mean vs. **__proportio__**n**__ __ **description of what problem is finding** is between **lower CI bound** and **upper CI bound** .

In our case, our answer would look like so:

We are 95% confidence that the true population proportion of Crammer Nation University freshmen who joined Greek Life this year is between 0.227 and 0.493.

### Don't get tricked by this question!

Sometimes, you'll be a given a follow-up question to this interpretation like so:

**Follow-up** **Question**: Can you be sure that the true proportion of Crammer Nation University freshmen students who joined Greek Life is between 0.227 and 0.493?

This is a trick question! But... how so?

Since our 95% confidence interval resulted in the (0.227, 0.493), it's *possible* that the true population proportion lies between those values... however, it's not *guaranteed*. Remember, we're just 95% confident it's in that range... not 100% confident!

For this reason, we can *not* be sure that the true proportion of Crammer Nation University freshmen students who joined Greek Life is between 0.227 and 0.493.

Just because a **value(s)** falls inside the **confidence** **interval** doesn't mean the true population parameter is *definitely* that value(s)!

Don't get tricked by that on your exam!

I’m a Miami University (OH) 2021 alumni who majored in Information Systems. At Miami, I tutored students in Python, SQL, JavaScript, and HTML for 2+ years. I’m a huge fantasy football fan, Marvel nerd, and love hanging out with my friends here in Chicago where I currently reside.