Oftentimes in the world, claims will be made about a given population proportion. Through hypothesis testing, we can assess the validity of such claims.

## Hypothesis testing with sample proportions explained

Going off our example in What is a z-score, in relation to a sample proportion? and What is a confidence interval, in relation to a sample proportion?, imagine that you're sitting in class at Crammer Nation University and notice a large amount of your fellow freshmen are wearing their new Greek Life chapter merch.

The university released a statement stating the proportion of freshmen students who joined Greek Life this year was 30%, but you want to test that claim out. You take a sample of 50 random freshmen students to establish a range of values that you can be confident the true population proportion of freshmen Greek Life involvement lies.

As discussed in What is a confidence interval, in relation to a sample mean?, herein lies the purpose of a confidence interval!

Hypothesis testing is a way to **test a claim** about a given population. It enables you to determine whether or not an outcome of a given sample was due to **random chance** or was **statistically significant**.

What’s the population in this situation? The freshmen student body at Crammer Nation University.

What’s the sample? The 50 freshmen that you sampled.

What’s the claim that you’re testing? That the proportion of freshmen students who joined Greek Life is actually 0.30.

### Potential outcomes of this hypothesis test

As stated in What is a hypothesis test, in relation to a sample mean? there's two potential conclusions our hypothesis test will come to based on our sample.

**We do have enough evidence to reject**Crammer Nation University's claim that 30% of their freshmen students joined Greek Life.**We don't have enough evidence to reject (a.k.a. "fail to reject")**Crammer Nation University's claim that 30% of their freshmen students joined Greek Life.

To arrive to either of the above two solutions, there's 4 crucial steps that we'll take:

- State the
**hypotheses** - Calculate the
**test statistic** - Find the
**p-value** - Make your
**concluding statement**

Let's dig into the key elements of each step, in relation to a sample proportion problem!

## How to conduct a hypothesis test

Before beginning, let's formalize the above prompt:

Crammer Nation University claims that 30% of their freshmen students joined a Greek Life chapter this year. You are curious if that's a truthful proportion, or if a different proportion of students joined a chapter. You collect a random sample of 50 freshmen students and find that 22 of them joined a chapter this year. Provide support for your claim using a hypothesis test with an alpha level of 0.05.

Now, let's go ahead and start with "Step 1 - State the hypotheses"!

### Step 1 - State the hypotheses

As stated in What is a hypothesis test, in relation to a sample mean?, there will be two hypotheses that you need to state in any hypothesis test problem. (1) The null hypothesis and (2) the alternative hypothesis.

Each of these hypotheses will be making a claim about the *population* parameter (in this case, **p**). We will not make claims about the *sample* parameter (in this case, **p-hat**), because the whole point of us taking the sample is to figure out if we have enough evidence to support a claim made about the *population*!

As stated in What is a hypothesis test, in relation to a sample mean?...

Your **hypotheses** will involve the **population** parameter (mean "µ" or proportion "p"), not the sample parameter!

#### Understanding the null hypothesis

Your null hypothesis essentially restates the claim about the given population.

Your **null** hypothesis (H_{0}) embodies the claim made about the **population** parameter.

In the case of the situation above, it's that Crammer Nation University freshmen students joined Greek Life this year at a proportion of 0.30.

Crammer Nation University claims that 30% of their freshmen students joined a Greek Life chapter this year. You are curious if that's a truthful proportion, or if a different proportion of students joined a chapter. You collect a random sample of 50 freshmen students and find that 22 of them joined a chapter this year. Provide support for your claim using a hypothesis test with an alpha level of 0.05.

We'd write that null hypothesis (H_{0}) like so:

H_{0}: p = 0.30

Essentially, what we're saying here is that the true population proportion for Greek Life involvement among Crammer Nation University freshmen...

H_{0}: p = 0.30

...is equal to 0.30.

H_{0}: p = 0.30

Something else important to note...

The **null** hypothesis will always have an **equal **sign.

Why? Because there will always be a claim made that the population parameter *equals* something.

#### Understanding the alternative hypothesis

Your alternative hypothesis in essence makes a claim that the population parameter is different than the null hypothesis says.

Your **alternative** hypothesis (H_{a}) makes a claim that the population parameter **differs** from what the H_{0} says.

In the case of the situation above, we're claiming that Crammer Nation University freshmen students joined Greek Life this year at a proportion *not equal to* 0.30.

Crammer Nation University claims that 30% of their freshmen students joined a Greek Life chapter this year. You are curious if that's a truthful proportion, or if a different proportion of students joined a chapter. You collect a random sample of 50 freshmen students and find that 22 of them joined a chapter this year. Provide support for your claim using a hypothesis test with an alpha level of 0.05.

We'd write that alternative hypothesis (H_{a}) like so:

H_{0}: p = 0.30

H_{a}: p ≠ 0.30

Essentially what we're saying here is that the true population proportion for Greek Life involvement among Crammer Nation University freshmen...

H_{0}: p = 0.30

H_{a}: p ≠ 0.30

...is actually *not equal to* 0.30.

H_{0}: p = 0.30

H_{a}: p ≠ 0.30

A helpful tip when writing alternative hypotheses:

The **alternative** hypothesis can have the **greater than** (**>**), **less than** (**<**), or **not equal to** (**≠**) sign.

In this situation, we're testing the claim that Crammer Nation University freshmen students joined Greek Life at a proportion *not equal to* 0.30 (≠). However... we could've tested that the proportion is *greater than* 0.30 (>) or *less than *0.30 (<).

### Step 2 - Calculating your test statistic

Before even beginning to calculate our test statistic, we have to check out assumptions!

#### Check your assumptions!

We're working with a sample proportion here, so in accordance with Assumptions for sampling distributions, that means we must check the following assumptions:

1. Sample is randomly selected from the population

2. The sample size (n) is less than or equal to 10% of of the population size N

3. There are 10 successes and 10 failures in the sample OR np >= 10 and nq >= 10

For the sake of zoning in on the hypothesis test, we're going to assume that the assumptions are met and move on.

#### Recognizing we'll use z-score

Remember, you'll only use t-score if you're dealing with sample means! Therefore, we know we will be using z-score here, which will be computed with the following formula:

You'll notice this formula is very similar to the formula for z-scores in What is a z-score, in relation to a sample proportion?...

...but now it's using **p _{0}** and

**q**instead of

_{0}**p**and

**q**.

Long story short, that's because with hypothesis tests, we use the **" _{0}"** (that little "0" is often called "knot") to signify that we don't know for sure that it's the true population proportion for

**p**and

**q**. It's the

*claimed*population proportion that's being tested within our hypothesis test!

Before plugging in variables into our z-score formula, it's often helpful to understand what's going on visually. Let's dig into that below!

#### Visualizing our z-score

Our z-distribution will look like so:

Considering that our alternative hypothesis is making a claim that the true proportion of Crammer Nation University freshmen who joined Greek Life is *not equal to *0.30...

H_{0}: p = 0.30

H_{a}: p ≠ 0.30

...we are going to be assessing if the combined p-values on the left and right tails of our z-distribution fit within the alpha level (⍺) of 0.05.

In simple terms, it's because our alternative hypothesis is working with ≠.

That means we're not assessing if the true population proportion is *only* *greater than* 0.30 (which would be a right-tail test)...

...or is *only* *less than* 0.30 (which would be a left-tail test)...

...we're assessing if it *does not equal* 0.30.

In other words, do we have evidence that the true population proportion something greater than *or* less than the claimed one?

This means that we need to assess *both tails* of our sampling distribution, and therefore split our alpha level in half to account for both tails.

Since we're splitting our alpha level among the two tails, that means we'll also be reflecting our z-scores among the two tails. Keep reading to see this in action.

This means that we'll also be reflecting our z-score on the right and left tails of our z-distribution.

When your alternative hypothesis deals with **≠**, that means you're working with a **two-tail** hypothesis test. Therefore, split your alpha level in **half** on both the right and left tails of the sampling distribution! Don't forget to **reflect** your z-scores among the two tails!

For example, imagine if our **p-hat** value was here.

Through calculating z-score, we'd find a corresponding p-value of 0.01 on the right-tail of the sampling distribution.

This p-value is a certain distance from the claimed population proportion (**p _{0}**) at the center of this sampling distribution.

Since we're doing a two-tail hypothesis test, we need to additionally test for the **p-hat** value on the opposite side of the sample proportion. Therefore, we'll reflect our **p-hat** value onto the left-tail like so:

Since our p-values on both tails are within the range of our split alpha level...

...this indicates the probability of another sample of the same size having a sample proportion (**p-hat**) the same distance from the claimed population proportion (**p _{0}**) is so slim, that it is statistically significant and could be due to something other than random chance.

On the flip side, imagine if our **p-hat** value was here.

Through calculating z-score, we'd find a p-value of 0.03 on the right-tail of the sampling distribution.

This p-value is a certain distance from the sample proportion at the center of this sampling distribution.

Since we're doing a two-tail hypothesis test, we need to additionally test for the **p-hat** value on the opposite side of the sample proportion. Therefore, we'll reflect our **p-hat** value onto the left-tail like so:

Since our p-values on the right-tail and left-tail are outside the range of our split alpha level...

...this indicates the probability of another sample of the same size having a sample proportion (**p-hat**) the same distance from the claimed population proportion (**p _{0}**) is not slim enough to indicate statistical significance and could just be due to random chance.

If your **p-hat** value was instead on the left-side like so...

...we'd still reflect it to the other side of the sampling distribution!

We'll get into this a little more in "Step 4 - Make your concluding statement". Let's move on and calculate our z-score!

#### Plugging in the variables for your z-score

Here, again, is our equation for our z-score:

For the sake of focusing on hypothesis tests, I will tell you the final answer here: our z-score equals 2.15.

If, however, you'd like to see how z-score was calculated, click below. Keep in mind, this is *no different* than how we calculated it in What is a z-score, in relation to a sample proportion?, so if you read that article, you're probably good to move on!

Based on the prompt, the sample proportion was 0.44 (because 22 / 50 = 0.44)...

...therefore, we'll plug in 0.44 for **p-hat**.

Crammer Nation is claiming that the population proportion is 0.30...

...therefore, we'll plug in 0.30 for **p _{0}**.

Next, for our population proportion of failure (**q _{0}**), we'll do what we did in What is a z-score, in relation to sample proportions? and utilize the following formula:

**q _{0}** = 1 -

**p**

_{0}Since **p _{0}** equals 0.30...

**q _{0}** = 1 - 0.30

...this results in **q _{0}** equalling 0.70...

**q _{0}** = 1 - 0.30 = 0.70

...so we'll plug in 0.70 for **q _{0}**!

Lastly, the prompt states that the sample size is 50...

...therefore, we'll plug in 50 for **n**!

When we solve this out, it results in a z-score of 2.15!

### Step 3 - Find your p-value

Finding your p-value works differently between z-scores vs. t-scores. If you want to see how it's done with t-scores, click here to access What is a hypothesis test, in relation to a sample mean?

For the sake of focusing on hypothesis tests, I will tell you the final answer here: our p-value equals 0.9842.

If you'd like to see how this p-value was found, click below. Keep in mind, this is *no different* than how we found it in What is a z-score, in relation to a sample proportion?, so if you read that article, you're probably chillin'!

Knowing that our z-score is 2.15, all we need to do is go to our z-table...

...find "2.1" in the left-hand column (representing 2.15)...

...and then "0.05" in the top row (representing 2.15)...

...to locate our p-value of 0.9842!

Wait... a p-value of 0.9842? That's way bigger than our alpha level of 0.05... why?

We need to remember that the z-table displays the area to the *left* of your z-score...

...therefore this p-value of 0.9842 can be understood visually like so:

To find the p-value to the *right* of our **p-hat** value, we must subtract 0.9842 from 1.

1.00 - 0.9842 = 0.0158

When we reflect our **p-hat** and p-value onto the left-tail of the distribution...

...we're able to find that they fit within our alpha level!

### Step 4 - Make your concluding statement

Your concluding statement is going to center around the alpha level declared in the problem. In most cases, that alpha level will be 0.05. Each problem should explicitly state the alpha level. In our problem, it's 0.05.

As stated in What is a hypothesis test, in relation to a sample mean?...

- If the p-value is **below** the alpha level, then we **reject** the null hypothesis

- If the p-value is **above **the alpha level, then we **fail to reject** the null hypothesis.

Since we're dealing with a two-tail test here, our alpha level (⍺) was split among the left-tail and right-tail of our sampling distribution.

That meant that we reflected (a.k.a. duplicated, or "x2") our p-value among both tails...

...therefore, our total p-value was:

0.0158 x 2 = 0.0316

A p-value of 0.0316 is below our alpha level of 0.05, therefore we'll reject our null hypothesis!

(If you want an example of us failing to reject the null hypothesis, click here to see that in What is a hypothesis test, in relation to a sample mean?)

#### What does it mean to "reject" the null hypothesis?

When we reject the null hypothesis, we're essentially saying that we have enough evidence to support the alternative hypothesis.

Why is this the case?

Because our p-value, or probability of our sample results occurring, is below our alpha level!

In other words, if the null hypothesis was actually a truthful claim about the population, then the probability of a sample of the same size occurring (the p-value) was so low (below the alpha level) that it indicates the results of our sample hold statistical significance and are due to something outside of random chance.

Since the probability of our sample results occurring was so low, that means we have enough evidence to reject the null hypothesis (the baseline claim about the population parameter) and support the alternative hypothesis.

When you** reject** the null hypothesis, you are saying that the outcome of the sample was **statistically significant** enough to support the alternative. It does not mean you **accept** the alternative hypothesis!

NOTE: we are NOT *accepting* the null hypothesis! That would mean that we are 100% certain that the null hypothesis is true... which is not the case. Rather, the outcome of our sample provided enough evidence to support the alternative hypothesis we proposed, and therefore we "reject" the null hypothesis.

#### Write your concluding statement (z-score edition)

Here is the template to write your concluding statement with z-scores:

Since our p-value of **p-value** is **less / greater** than our alpha level of **alpha level value**, we **reject / fail to reject** the null hypothesis and **do / don't** have enough evidence to support the alternative hypothesis, implying that **description of alternative hypothesis.**

Based on what we found with the Crammer Nation University Greek Life involvement situation above, here's what our concluding statement would look like:

Since our p-value of 0.0316 is less than our alpha level of 0.05, we reject the null hypothesis and do** **have enough evidence to support the alternative hypothesis, implying that** **the proportion of freshmen students at Crammer Nation University who joined Greek Life this year is not equal to 0.30__.__

I’m a Miami University (OH) 2021 alumni who majored in Information Systems. At Miami, I tutored students in Python, SQL, JavaScript, and HTML for 2+ years. I’m a huge fantasy football fan, Marvel nerd, and love hanging out with my friends here in Chicago where I currently reside.