Chi Square – Goodness of Fit (Hypothesis test)

Question: Crammer Nation University has a vibrant bar scene for 21+ students in its home city of Crammerville. It is known throughout town that student bar preference is 55% Study Street Bar & Grille, 25% Exam Street Social, and 20% Quiz Bar, respectively. You decide to test if this is still an accurate distribution of bar-preference, so you take a sample of 50 students and find that 30 prefer Study Street Bar & Grille, 12 prefer Exam Street Social, and 8 prefer Quiz Bar. Is there evidence, at an alpha level of 0.05, to support the claim that the student bar preference distribution has changed?

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128

As with all hypothesis test problems, we're going to take the following steps to solve!

  • Step 1 - State the hypotheses
  • Step 2 - Calculate the test statistic
  • Step 3 - Find the p-value
  • Step 4 - Make the concluding statement

Step 1 - State the hypotheses

Let's start with the null hypothesis (H0), then we'll move onto the alternative hypothesis (Ha).

Defining your null hypothesis (H0)

When working with Chi Square - Goodness of Fit, your null hypothesis claims that the model still fits the population.

We will write that like so:

H0: The model fits, the percentages didn't change.

To be clear, the "model" is right here...

Question: Crammer Nation University has a vibrant bar scene for 21+ students in its home city of Crammerville. It is known throughout town that student bar preference is 55% Study Street Bar & Grille, 25% Exam Street Social, and 20% Quiz Bar, respectively. You decide to test if this is still an accurate distribution of bar-preference, so you take a sample of 50 students and find that 30 prefer Study Street Bar & Grille, 12 prefer Exam Street Social, and 8 prefer Quiz Bar. Is there evidence, at an alpha level of 0.05, to support the claim that the student bar preference distribution has changed?

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128

...and this null hypothesis is basically claiming that our sample indicates that the true percentages have changed.

Defining your alternative hypothesis (Ha)

When working with Chi Square - Goodness of Fit, your alternative hypothesis claims that the model no longer fits the population.

We will write that like so:

H0: The model fits, the percentages didn't change.
Ha: The model doesn't fit, the percentages changed.

In other words, the alternative hypothesis is claiming that the sample has enough evidence to indicate the Crammerville student bar preference is no longer what's stated here:

Question: Crammer Nation University has a vibrant bar scene for 21+ students in its home city of Crammerville. It is known throughout town that student bar preference is 55% Study Street Bar & Grille, 25% Exam Street Social, and 20% Quiz Bar, respectively. You decide to test if this is still an accurate distribution of bar-preference, so you take a sample of 50 students and find that 30 prefer Study Street Bar & Grille, 12 prefer Exam Street Social, and 8 prefer Quiz Bar. Is there evidence, at an alpha level of 0.05, to support the claim that the student bar preference distribution has changed?

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128

Step 2 - Calculate the test statistic

Before even beginning to calculate our test statistic, we have to check out assumptions!

Check your assumptions

We're working with Chi Square - Goodness of Fit, therefore we need to check the following assumptions:

1. The counts must be for a single categorical variable.
2. The counts must be independent of each other.
3. The counts must be randomly selected from the population.
4. Each count must be 5 or greater.

Concerning #1, we are dealing with a single categorical variable: "Bar preference".

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128

Concerning #2, we are going to assume that students' bar preference is independent of each other. All Crammerville students don't let their friends' preferences determine their own favorite bars!

Concerning #3, the counts are being randomly selected from the population:

Question: Crammer Nation University has a vibrant bar scene for 21+ students in its home city of Crammerville. It is known throughout town that student bar preference is 55%, 25%, and 20%, respectively. You decide to test if this is still an accurate distribution of bar-preference, so you take a random sample of 50 students and find that 30 prefer Study Street Bar & Grille, 12 prefer Exam Street Social, and 8 prefer Quiz Bar. Is there evidence, at an alpha level of 0.05, to support the claim that the student bar preference distribution has changed?

Lastly, concerning #4, each of our counts are greater than 5!

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128

Therefore, all of our assumptions are passed! Now we can move onto calculating our test statistic!

The Chi Square test statistic formula

We'll utilize the following formula when calculating the X2 test statistic for our sample:

Just to be clear here, the "X2" represents the test statistic (also referred to as the "Chi Square test statistic")...

...and this "Σ" symbol...

...means that we're going to run this equation...

...on each categorical variable's value in our table.

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128

Calculating the expected values

Let's go ahead and add another row to our data table for us to enter our expected values for each categorical variable.

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128
Expected values?????????

Considering that we have a total of 50 counts (30 + 12 + 8 = 50)...

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128
Expected values?????????

...that means that we can compute our expected values by multiplying 50... ...

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128
Expected values50 x ???50 x ???50 x ???

by each of the expected percentage values here.

Question: Crammer Nation University has a vibrant bar scene for 21+ students in its home city of Crammerville. It is known throughout town that student bar preference is 55% Study Street Bar & Grille, 25% Exam Street Social, and 20% Quiz Bar, respectively. You decide to test if this is still an accurate distribution of bar-preference, so you take a sample of 50 students and find that 30 prefer Study Street Bar & Grille, 12 prefer Exam Street Social, and 8 prefer Quiz Bar. Is there evidence, at an alpha level of 0.05, to support the claim that the student bar preference distribution has changed?

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128
Expected values50 x 55%50 x 25%50 x 20%

When we solve these out, we get the following expected values!

Question: Crammer Nation University has a vibrant bar scene for 21+ students in its home city of Crammerville. It is known throughout town that student bar preference is 55% Study Street Bar & Grille, 25% Exam Street Social, and 20% Quiz Bar, respectively. You decide to test if this is still an accurate distribution of bar-preference, so you take a sample of 50 students and find that 30 prefer Study Street Bar & Grille, 12 prefer Exam Street Social, and 8 prefer Quiz Bar. Is there evidence, at an alpha level of 0.05, to support the claim that the student bar preference distribution has changed?

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128
Expected values50 x 55% = 27.550 x 25% = 12.550 x 20% = 10

Plugging in Study Street Bar & Grille

Considering Study Street Bar & Grille for the following formula...

...we'd plug in 30 for obs (a.k.a. Observed value)...

...and 27.5 for exp (a.k.a. Expected value).

When we solve this, we get the following!

Plugging in Exam Street Social

For Exam Street Social...

...we'd plug in 12 as our obs value...

...and 12.5 as our exp value.

When we solve this, we get the following!

Plugging in Quiz Bar

For Quiz Bar...

...we'd plug in 8 as our obs value...

...and 10 as our exp value.

When we solve this, we get the following!

Solving for X2

When we solve this out, we get 0.647 for X2!

Step 3 - Find your p-value

We're going to use the X2 table for this...

Chi square table

...which corresponds to the area to the right of our X2 test statistic under the X2 curve, which typically looks something like this (it becomes broader with more degrees of freedom!):

Can I see how the X2 curve becomes broader with more degrees of freedom?

Here's how it looks with 5 degrees of freedom (this is the same image as above):

Here's how it looks with 10 degrees of freedom:

Here's how it looks with 20 degrees of freedom:

Before we can find our p-value, we must determine our degrees of freedom!

Calculating your degrees of freedom (df)

To calculate your degrees of freedom (df) with a Chi Square - Goodness of Fit test, you'll utilize the following formula:

df = # of cells - 1

In our case, we've got 3 cells...

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128

df = 3 - 1

...therefore our degrees of freedom will equal 2!

df = 3 - 1 = 2

Recognizing our alpha level (α)

Considering that our alpha level (α) is 0.05...

Question: Crammer Nation University has a vibrant bar scene for 21+ students in its home city of Crammerville. It is known throughout town that student bar preference is 55% Study Street Bar & Grille, 25% Exam Street Social, and 20% Quiz Bar, respectively. You decide to test if this is still an accurate distribution of bar-preference, so you take a sample of 50 students and find that 30 prefer Study Street Bar & Grille, 12 prefer Exam Street Social, and 8 prefer Quiz Bar. Is there evidence, at an alpha level of 0.05, to support the claim that the student bar preference distribution has changed?

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128

...this means that on our X2 curve with 2 degrees of freedom...

...we'll be assessing if our p-value occurs in this alpha level (α) area!

Finding our p-value

We know our X2 value is 0.647, so that means we'll be using the X2 table to find a range of X2 values that our X2 value fits between.

To start, let's zone in on the row corresponding to 2 degrees of freedom (df).

From here, we can identify that our 0.647 X2 value falls between 0.211 and 4.605.

This corresponds to a p-value between 0.90 and 0.10!

0.10 < p < 0.90

Visualizing our p-value

Since the X2 table represents p-values to the right of our X2 value of 0.647...

...this means that the area to the right of our X2 value has a p-value somewhere between 0.10 and 0.90.

This provides us with all the information that we need to know! It shows us that our p-value (corresponding to our X2 value) is not within our alpha level (α) of 0.05!

Why do we not need an exact p-value?

Because in hypothesis tests, all that matters is whether or not your p-value is above or below your alpha level!

Knowing that our p-value lies somewhere between 0.10 and 0.90...

0.10 < p < 0.90

...is enough intel for us to determine that our actual p-value corresponding to our X2 value of 0.647 is above our alpha level!

Step 4 - Make your concluding statement

Your concluding statement is going to center around the alpha level declared in the problem. In most cases, that alpha level will be 0.05. Each problem should explicitly state the alpha level. In our problem, it's 0.05.

Question: Crammer Nation University has a vibrant bar scene for 21+ students in its home city of Crammerville. It is known throughout town that student bar preference is 55% Study Street Bar & Grille, 25% Exam Street Social, and 20% Quiz Bar, respectively. You decide to test if this is still an accurate distribution of bar-preference, so you take a sample of 50 students and find that 30 prefer Study Street Bar & Grille, 12 prefer Exam Street Social, and 8 prefer Quiz Bar. Is there evidence, at an alpha level of 0.05, to support the claim that the student bar preference distribution has changed?

Study Street Bar & GrilleExam Street SocialQuiz Bar
Observed values30128

As we stated in What is a hypothesis test?...

- If the p-value is below the alpha level, then we reject the null hypothesis
- If the p-value is above the alpha level, then we fail to reject the null hypothesis.

In our case, our p-value range above the alpha level. Remember: all values between 0.10 and 0.90 are greater than 0.05!

0.10 < p < 0.90

This means that we fail to reject our null hypothesis!

Applying the Chi Square answer template

If you remember in What is a hypothesis test?, we gave the following answer template when working with t-scores:

Since our p-value range of p-value range is less / greater than our alpha level of alpha level value, we reject / fail to reject the null hypothesis. We do / don't have enough evidence to support the alternative hypothesis, which states that description of the alternative hypothesis.

We're going to use this same exact template for Chi Square tests!

Applied to our question, this would give us the following answer to our original question!

Since our p-value range of 0.10 < p < 0.90 is greater than our alpha level of 0.05, we fail to reject the null hypothesis. We don't have enough evidence to support the alternative hypothesis, which states that the true percentages of student bar preference at Crammer Nation University have changed.

In other words... we don't have enough evidence to prove that the true student bar preference at Crammer Nation University is different from what is currently known throughout Crammerville!

Leave a Comment