What is a hypothesis test, in relation to sample means?

Oftentimes in the world, claims will be made about a given population mean. Through hypothesis testing, we can assess the validity of such claims.

Hypothesis testing with sample means explained

Going off the situation in What is a t-score? and What is a confidence interval, in relation to sample means? imagine that you've just enrolled at Crammer Nation University and finished your first day of classes. You just called it quits on your high school relationship, and are looking to hop right into the college action.

It's fraternity rush season, and you hear that the brothers of Sigma Apple Pi absolutely pull. The brothers of Sigma Apple Pi claim they get an average of 25 daily Tinder matches. Before you completely ignore every other fraternity and solely rush Sigma Apple Pi for their Tinder clout, you want to gather some data to determine if this claim holds its weight.

You decide to ask 35 random Sigma Apple Pi brothers for their average Tinder matches per day. Keep in mind: you’re not checking to see if every brother has exactly 25 average Tinder matches per day… there’s obviously going to be some wiggle-room. However, if the brothers’ average Tinder matches are consistently below 25… there’s a chance Sigma Apple Pi doesn’t have as much Tinder game as they claim.

In this situation, you’ve essentially set up a hypothesis test!

Hypothesis testing is a way to test a claim about a given population. It enables you to determine whether or not an outcome of a given sample was due to random chance or was statistically significant.

What’s the population in this situation? The brothers of Sigma Apple Pi.

What’s the sample? The 30 brothers that you sampled.

What’s the claim that you’re testing? That the brothers of Sigma Apple Pi actually get an average of less than 25 daily Tinder matches per day.

Potential outcomes of this hypothesis test

Without digging into the math yet, there's two potential conclusions our hypothesis test will come to based on our sample.

  • We do have enough evidence to reject Sigma Apple Pi's claim of 25 average Tinder matches per day.
  • We don't have enough evidence to reject (a.k.a. "fail to reject") Sigma Apple Pi's claim of 25 average Tinder matches per day.

To arrive to either of the above two solutions, there's 4 crucial steps that we'll take:

  1. State the hypotheses
  2. Calculate the test statistic
  3. Find the p-value
  4. Make your concluding statement

Let's dig into the key elements of each step!

How to conduct a hypothesis test

Before beginning, let's formalize the above prompt:

Sigma Apple Pi claims that their brothers get on average 25 Tinder matches per day. You have a hunch that their daily Tinder matches are actually lower than that, so you collect a random sample of 35 Sigma Apple Pi brothers' daily Tinder matches. You measure a mean of 23.5 daily matches with a standard deviation of 5.7 matches. Provide support for your claim using a hypothesis test with an alpha level of 0.05.

Now, let's go ahead and start with "Step 1 - State the hypotheses"!

Step 1 - State the hypotheses

There will be two hypotheses that you need to state in any hypothesis test problem. (1) The null hypothesis and (2) the alternative hypothesis.

Each of these hypotheses will be making a claim about the population parameter (in this case, µ). We will not make claims about the sample parameter (in this case, x-bar), because the whole point of us taking the sample is to figure out if we have enough evidence to support a claim made about the population!

Your hypotheses will involve the population parameter (mean "µ" or proportion "p"), not the sample parameter!

Understanding the null hypothesis

Your null hypothesis essentially restates the claim about the given population.

Your null hypothesis (H0) embodies the claim made about the population parameter.

In the case of the situation above, it's that Sigma Apple Pi brothers get an average of 25 Tinder matches per day. We'd write that null hypothesis (H0) like so:

H0: µ = 25

Essentially, what we're saying here is that the true population mean for the average Tinder matches per day of Sigma Apple Pi brothers...

H0: µ = 25

...is equal to 25.

H0: µ = 25

Something else important to note...

The null hypothesis will always have an equal sign.

Why? Because there will always be a claim made that the population parameter equals something.

Understanding the alternative hypothesis

Your alternative hypothesis in essence makes a claim that the population parameter is different than the null hypothesis says.

Your alternative hypothesis (Ha) makes a claim that the population parameter differs from what the H0 says.

In the case of the situation above, we're claiming that Sigma Apple Pi brothers actually get less than 25 average Tinder matches per day. We'd write that alternative hypothesis (Ha) like so:

H0: µ = 25
Ha: µ < 25

Essentially what we're saying here is that the true population mean for average Tinder matches per day of Sigma Apple Pi brothers...

H0: µ = 25
Ha: µ < 25

...is actually less than 25.

H0: µ = 25
Ha: µ < 25

A helpful tip when writing alternative hypotheses:

The alternative hypothesis can have the greater than (>), less than (<), or not equal to () sign.

In this situation, we're testing the claim that Sigma Apple Pi brothers average Tinder matches per day are actually less than 25 (<). However... we could've tested that they are greater than 25 (>) or that they do not equal 25 (≠).

Step 2 - Calculating your test statistic

Before even beginning to calculate our test statistic, we have to check out assumptions!

Check your assumptions!

We're working with a sample mean here, so in accordance with Assumptions for sampling distributions, that means we must check the following assumptions:

1. Sample is randomly selected from the population
2. The sample size (n) is less than or equal to 10% of of the population size (N)
3. The sample size is greater than or equal to 30, or the population itself is normally distributed

For the sake of zoning in on the hypothesis test, we're going to assume that the assumptions are met and move on.

Which should we use: z-score or t-score?

Your "test statistic" is essentially a z-score or t-score. We'll then utilize this z-score / t-score to locate your p-value in "Step 3 - Find your p-value"!

Based on this graphic...

...since we aren't given population standard deviation in our prompt...

Sigma Apple Pi claims that their brothers get on average 25 Tinder matches per day. You have a hunch that their daily Tinder matches are actually lower than that, so you collect a random sample of 35 Sigma Apple Pi brothers' daily Tinder matches. You measure a mean of 23.5 daily matches with a standard deviation of 5.7 matches. Provide support for your claim using a hypothesis test with an alpha level of 0.05.

...we'll be calculating a t-score with the following formula:

What if we had the population standard deviation?

Then we'd use the following formula to calculate the z-score!

Notice the only difference here are that you're plugging in the population standard deviation (σ) instead of the sample standard deviation (s)!

Plugging in variables to find z-score is pretty much the exact same as t-score... the only difference really comes in "Step 3 - Find your p-value".

Therefore, if you'd like experience working with z-scores in hypothesis tests, please refer to What is a hypothesis test, in relation to a sample proportion? (Yes, it's with a sample proportion instead of a sample mean, but the same principles hold true in regards to "Step 3 - Find your p-value".)

You'll notice that this formula is extremely similar to the t-score formula in What is a t-score?...

...but now it's µ0 instead of µ.

Long story short, that's because with hypothesis tests, we use the µ0 (that little "0" is often called "knot") to signify that we don't know for sure that it's the population mean. It's the claimed population mean that's being tested within our hypothesis test!

Before plugging in variables into our t-score formula, it's often helpful to understand what's going on visually. Let's dig into that below!

Visualizing our t-score

Since our sample size is 35, our t-distribution for this situation will look like so:

Considering that our alternative hypothesis is making a claim that the true mean daily Tinder matches for Sigma Apple Pi brothers is less than 25...

H0: µ = 25
Ha: µ < 25

...we are going to be assessing if our p-value (from our t-score) fits within the alpha level of 0.05 (a.k.a. 5%) on the left tail of the sampling distribution.

For example, if our t-score results in a p-value equal to 0.04...

...then the probability of another sample (of the same size) occurring were so slim, that they are statistically significant and could be due to something other than random chance.

If our t-score instead results in a p-value equal to 0.06...

...then the probability of another sample (of the same size) occurring were not low enough to indicate statistical significance outside of random chance.

We'll get into this a little more in "Step 4 - Make your concluding statement". Let's move on and calculate our t-score!

Plugging in the variables

Here, again, is our equation for our t-score:

For the sake of focusing on hypothesis tests, I will tell you the final answer here: our t-score equals -1.56.

If, however, you'd like to see how t-score was calculated, click below. Keep in mind, this is no different than how we calculated it in What is a t-score?, so if you read that article, you're probably good to move on!

I want a walkthrough of how t-score was calculated.

Based on the prompt, the sample mean is 23.5 daily Tinder matches...

Sigma Apple Pi claims that their brothers get on average 25 Tinder matches per day. You have a hunch that their daily Tinder matches are actually lower than that, so you collect a random sample of 35 Sigma Apple Pi brothers' daily Tinder matches. You measure a mean of 23.5 daily matches with a standard deviation of 5.7 matches. Provide support for your claim using a hypothesis test with an alpha level of 0.05.

...therefore, we'll plug in 23.5 for x-bar.

The claimed population mean of Sigma Apple Pi daily Tinder matches is 25...

Sigma Apple Pi claims that their brothers get on average 25 Tinder matches per day. You have a hunch that their daily Tinder matches are actually lower than that, so you collect a random sample of 35 Sigma Apple Pi brothers' daily Tinder matches. You measure a mean of 23.5 daily matches with a standard deviation of 5.7 matches. Provide support for your claim using a hypothesis test with an alpha level of 0.05.

...therefore, we'll plug in 25 for µ0.

The sample standard deviation is 5.7 matches...

Sigma Apple Pi claims that their brothers get on average 25 Tinder matches per day. You have a hunch that their daily Tinder matches are actually lower than that, so you collect a random sample of 35 Sigma Apple Pi brothers' daily Tinder matches. You measure a mean of 23.5 daily matches with a standard deviation of 5.7 matches. Provide support for your claim using a hypothesis test with an alpha level of 0.05.

...so we'll plug in 5.7 for s.

Lastly, our sample size is 35 brothers...

Sigma Apple Pi claims that their brothers get on average 25 Tinder matches per day. You have a hunch that their daily Tinder matches are actually lower than that, so you collect a random sample of 35 Sigma Apple Pi brothers' daily Tinder matches. You measure a mean of 23.5 daily matches with a standard deviation of 5.7 matches. Provide support for your claim using a hypothesis test with an alpha level of 0.05.

...therefore, we'll plug in 35 for n.

When we solve this out...

...we get a t-score of -1.558!

Step 3 - Find your p-value

Finding your p-value works differently between z-scores vs. t-scores. To skip ahead and see how it's done with z-scores, click here to access What is a hypothesis test, in relation to sample proportions?

For the sake of focusing on hypothesis tests, I will tell you the final answer here: our p-value will be in the following range:

0.05 < p < 0.10

If you'd like to see how this p-value was found, click below. Keep in mind, this is no different than how we found it in What is a t-score?, so if you read that article, you're probably chillin'!

I want a walkthrough of how p-value was found.

Since we're working with a t-score here, that means we'll utilize the t-table.

Determine your degrees of freedom

Similar to What is a t-score? and What is a confidence interval, in relation to a sample mean?, we must find our degrees of freedom with the following formula:

df = n - 1

Our sample size is 35 brothers...

df = 35 - 1

...therefore our degrees of freedom is 34.

df = 35 - 1 = 34

This means that we'll be zoning in on this row of our t-table:

Identify your t-score range

Ask yourself: what range of t-scores (within the row for 34 degrees of freedom) does our t-score of -1.558 (ignore the negative) fall between?

Why can we ignore the negative sign?

Because the t-distribution is symmetrical!

The t-table currently only contains positive t-score values. However, if you have a negative t-score, it's practically like flipping from the right tail...

...to the left tail!

This works, since we're utilizing a left-tail hypothesis test!

It falls between 1.307 and 1.691!

And, what p-values do these t-scores correspond to?

0.10 and 0.05!

Therefore, we know that the p-value for our t-score of -1.558 falls somewhere between 0.10 and 0.05!

0.05 < p < 0.10

Understanding your p-value visually

Our t-score of -1.558...

...falls somewhere in the range of a p-value of 0.05 and 0.10.

And we know that all values between 0.05 and 0.10 are going to fall outside our alpha value of 0.05!

Why do we not need an exact p-value?

Because in hypothesis tests, all that matters is whether or not your p-value is above or below your alpha level!

Knowing that our p-value lies somewhere between 0.05 and 0.10...

0.05 < p < 0.10

...is enough intel for us to determine that our actual p-value corresponding to a t-score of -1.558 is above our alpha level!

As stated above, this means that the probability of a sample of this size occurring is not slim enough to indicate statistical significance beyond random chance. Let's figure out how to state that in "Step 4 - Make your concluding statement" below!

Step 4 - Make your concluding statement

Your concluding statement is going to center around the alpha level declared in the problem. In most cases, that alpha level will be 0.05. Each problem should explicitly state the alpha level. In our problem, it's 0.05.

Sigma Apple Pi claims that their brothers get on average 25 Tinder matches per day. You have a hunch that their daily Tinder matches are actually lower than that, so you collect a random sample of 35 Sigma Apple Pi brothers' daily Tinder matches. You measure a mean of 23.5 daily matches with a standard deviation of 5.7 matches. Provide support for your claim using a hypothesis test with an alpha level of 0.05.

Put simply...

- If the p-value is below the alpha level, then we reject the null hypothesis
- If the p-value is above the alpha level, then we fail to reject the null hypothesis.

In our case, our p-value above the alpha level. Remember: all values between 0.05 and 0.10 are greater than 0.05!

0.05 < p < 0.10

This means that we fail to reject our null hypothesis!

(If you want to jump ahead to the next article where we'll reject the null hypothesis, click here to skip to What is a hypothesis test, in relation to a sample proportion?)

What does it mean to "fail to reject" the null hypothesis?

When we fail to reject the null hypothesis, we're essentially saying that we don't have enough evidence to support the alternative hypothesis.

Why is this the case?

Because our p-value, or probability of our sample results occurring, is above our alpha level!

Put in simpler terms, the probability of a sample of the same size occurring (the p-value) was too high (above the alpha level) to hold statistical significance and potentially be due to something other than random chance. 

Since the probability of our sample results occurring was too high, that means we don't have enough evidence to reject the null hypothesis (the baseline claim about the population parameter) and support the alternative hypothesis.

When you fail to reject the null hypothesis, you are saying that the outcome of the sample was not statistically significant enough to support the alternative. It does not mean you accept the null hypothesis!

NOTE: we are NOT accepting the null hypothesis! That would mean that we are 100% certain that the null hypothesis is true... which is not the case. Rather, the outcome of our sample did not provide enough evidence to support the alternative hypothesis we proposed, and therefore we "fail to reject" the null hypothesis.

Write your concluding statement (t-score edition)

Here is the template to write your concluding statement with t-scores:

Since our p-value range of p-value range is less / greater than our alpha level of alpha level value, we reject / fail to reject the null hypothesis. We do / don't have enough evidence to support the alternative hypothesis, which states that description of the alternative hypothesis.

Based on what we found with the Sigma Apple Pi Tinder situation above, here's what our concluding statement would look like:

Since our p-value range of 0.05 < p < 0.10 is greater than our alpha level of 0.05, we fail to reject the null hypothesis. We don't have enough evidence to support the alternative hypothesis, which states that the brothers of Sigma Apple Pi have an mean daily Tinder match value less than 25.

Leave a Comment