# Assumptions for sampling distributions

Now that you've learned how to calculate z-scores and t-scores, let's dig into the necessary assumptions to check before you calculate them in the future.

You might be wondering... why didn't we zone in on this before learning how to calculate z-scores and t-scores?

Frankly, it's just easier to first understand how to calculate them, then understand the assumptions you need to validate before calculating. Learning the assumptions first typically leads to students wondering why they even exist in the first place.

These assumptions ensure that we're able to utilize a sampling distribution to calculate z-scores / t-scores.

Assumptions enable us to ensure we are dealing with a normal sampling distribution, and therefore able to apply z-scores / t-scores.

To be clear, the z-distribution and t-distribution are both types of normal distributions (since they're both bell-curved). The z-distribution is just referred to as the "standard" normal distribution, because it's the like the gold standard of normal distributions. t-distributions are normal distributions too, but have more values at the tails to account for the variability of having to use the sample standard deviation (when not given the population standard deviation).

Check out the below graphic from Scribbr to see this in action.

Keep in mind: this is one version of a t-distribution at a certain degrees of freedom. t-distributions vary dependent on degrees of freedom.

The z-distribution, on the other hand, is constant and never-changing.

These assumptions, however, differ between sample means and sample proportions. Let's dig into the differences below!

## Assumptions with sample means

For problems involving a sample means, we have to check the following 3 assumptions:

1. Sample is randomly selected from the population
2. The sample size (n) is less than or equal to 10% of of the population size (N)
3. The sample size is greater than or equal to 30, or the population itself is normally distributed

If our sample does not pass these assumptions, then we cannot proceed with calculating the z-score / t-score.

Let's practice assessing these assumptions with a quick example:

Question: Amazon ships millions of packages a year using delivery trucks. They want to know the average weight of all packages they ship so that they can determine the maximum capacity of their trucks. They weigh 1000 randomly selected Amazon packages and determine that the mean weight is 2.5 pounds.

### 1.Sample is randomly selected from the population

The problem states that the sample was randomly selected from the population...

Question: Amazon ships millions of packages a year using delivery trucks. They want to know the average weight of all packages they ship so that they can determine the maximum capacity of their trucks. They weigh 1000 randomly selected Amazon packages and determine that the mean weight is 2.5 pounds.

...so this assumption is checked.

### 2. The sample size (n) is less than or equal to 10% of of the population size (N)

The problem states that Amazon ships millions of packages a year...

Question: Amazon ships millions of packages a year using delivery trucks. They want to know the average weight of all packages they ship so that they can determine the maximum capacity of their trucks. They weigh 1000 randomly selected Amazon packages and determine that the mean weight is 2.5 pounds.

...and 1000 packages is less than or equal to 10% of all the packages that Amazon ships (which is the population size)...

Question: Amazon ships millions of packages a year using delivery trucks. They want to know the average weight of all packages they ship so that they can determine the maximum capacity of their trucks. They weigh 1000 randomly selected Amazon packages and determine that the mean weight is 2.5 pounds.

...so this assumption is checked.

### 3. The sample size is greater than or equal to 30, or the population itself is normally distributed

The sample size is 1000...

Question: Amazon ships millions of packages a year using delivery trucks. They want to know the average weight of all packages they ship so that they can determine the maximum capacity of their trucks. They weigh 1000 randomly selected Amazon packages and determine that the mean weight is 2.5 pounds.

...which is greater than 30. Therefore, this assumption is checked.

Because of the Central Limit Theorem! Click here for more clarity on the Central Limit Theorem in...

You will be faced with some problems in which the sample is smaller than 30. Don't immediately assume it doesn't fit this assumption! Remember... if the population itself is normally distributed (the problem will explicitly state this), then it still passes this assumption!

## Assumptions with proportions

For problems involving sample proportions, we have to check the following 3 assumptions:

1. Sample is randomly selected from the population
2. The sample size (n) is less than or equal to 10% of of the population size N
3. There are 10 successes and 10 failures in the sample OR np >= 10 and nq >= 10

If our sample does not pass these assumptions, then we cannot proceed with calculating the z-score (remember, you won't calculate t-scores with sample proportions).

p is the proportion of success. q is the proportion of failure, which can be calculated by doing the following formula:

q = 1 - p

Don't get caught up in the success vs. failure aspect... q is essentially the opposite of p.

For example, if we had a proportion of 0.25 of successes...

q = 1 - 0.25

...that'd mean we'd have 0.75 failures.

q = 1 - 0.25 = 0.75

Let's practice assessing these assumptions with a quick example:

Crazy Christmas is a company that sells Christmas ornaments and decorations. They want to know whether 85% or more of US households celebrate Christmas. They conduct a survey with 1000 randomly selected participants and find that 750 of them do celebrate Christmas.

### 1. Sample is randomly selected from the population

The problem states that the sample was randomly selected from the population...

Crazy Christmas is a company that sells Christmas ornaments and decorations. They want to know whether 85% or more of US households celebrate Christmas. They conduct a survey with 1000 randomly selected participants and find that 750 of them do celebrate Christmas.

...so this assumption is checked.

### 2. The sample size (n) is less than or equal to 10% of of the population size (N)

There are millions of households in the US (which is our population), so 1000 households is less than 10% of all US households.

Crazy Christmas is a company that sells Christmas ornaments and decorations. They want to know whether 85% or more of US households celebrate Christmas. They conduct a survey with 1000 randomly selected participants and find that 750 of them do celebrate Christmas.

Therefore, this assumption is checked.

### 3. There are 10 successes and 10 failures in the sample OR np >= 10 and n(1-p) >= 10

A “success” in this example would be a household that celebrates Christmas and a “failure” would be one that does not. This example has at least 10 successes (houses that celebrate Christmas)...

Crazy Christmas is a company that sells Christmas ornaments and decorations. They want to know whether 85% or more of US households celebrate Christmas. They conduct a survey with 1000 randomly selected participants and find that 750 of them do celebrate Christmas.

...and at least 10 failures (houses that don't celebrate Christmas). How many failures did we have? Based on the following equation...

# of failures = total respondents - # of successes

# of failures = 1000 - # of successes

...and 750 of those were successes...

# of failures = 1000 - 750

# of failures = 250

Based on this finding, we can affirm this assumption is checked.

This is the way that we can ensure the Central Limit Theorem applies when working with proportions instead of means.

For more information on how this works, please refer to What is a z-score, in relation to sample proportions?

## Assumptions when working with two means

In Question #6 and Question #9 of the Practice Midterm, you'll be working with a two-mean test. The assumptions for these differ slightly.

When working with two means, we only need to add on one assumption!

1. Both samples are randomly selected from the population
2. Both sample sizes (n1 and n2) are less than or equal to 10% of their respective population sizes
3. Both sample sizes (n1 and n2) are greater than or equal to 30, or the populations themselves are normally distributed
4. Both samples are independent of each other.

This assumption will either be explicitly stated in the problem, or we can assume it to be true.

## Assumptions with Chi Square

The assumptions for Chi Square - Goodness of Fit & Chi Square - Independence differ slightly.

### Goodness of Fit

In Question #10 of the Practice Midterm, you'll be working with a Chi Square - Goodness of Fit test. You'll need to check the following assumptions:

1. The counts must be for a single categorical variable.
2. The counts must be independent of each other.
3. The counts must be randomly selected from the population.
4. Each count must be 5 or greater.

### Independence

In Question #11 of the Practice Midterm, you'll be working with a Chi Square - Independence test. You'll need to check the following assumptions:

1. The data is counted.
2. The counts must be randomly selected from the population.
3. Each count must be 5 or greater.

## Exam 1 Cram Kit

Want to unlock content? Get your ISA 225 Exam 1 Cram Kit now!