 # Assumptions for sampling distributions | Midterm Exam

Now that you've learned how to calculate z-scores and t-scores, let's dig into the necessary assumptions to check before you calculate them in the future.

You might be wondering... why didn't we zone in on this before learning how to calculate z-scores and t-scores?

Frankly, it's just easier to first understand how to calculate them, then understand the assumptions you need to validate before calculating. Learning the assumptions first typically leads to students wondering why they even exist in the first place.

These assumptions ensure that we're able to utilize a sampling distribution to calculate z-scores / t-scores.

Assumptions enable us to ensure we are dealing with a normal sampling distribution, and therefore able to apply z-scores / t-scores.

To be clear, the z-distribution and t-distribution are both normal (since they're both bell-curved). The z-distribution is just referred to as the "standard" normal distribution, because it's the like the gold standard of normal distributions. t-distributions are normal distributions too, but have more values at the tails to account for the variability of having to use the sample standard deviation (when not given the population standard deviation).

Can I see this difference between z-distribution and t-distribution visually?

Check out the below graphic from Scribbr to see this in action.

Keep in mind: this is one version of a t-distribution at a certain degrees of freedom. t-distributions vary dependent on degrees of freedom.

The z-distribution, on the other hand, is constant and never-changing.

These assumptions, however, differ between sample means and sample proportions. Let's dig into the differences below!

## Assumptions with sample means

For problems involving a sample means, we have to check the following 3 assumptions:

1. Sample is randomly selected from the population
2. The sample size (n) is less than or equal to 10% of of the population size (N)
3. The sample size is greater than or equal to 30, or the population itself is normally distributed

If our sample does not pass these assumptions, then we cannot proceed with calculating the z-score / t-score.

Let's practice assessing these assumptions with a quick example:

Question: Amazon ships millions of packages a year using delivery trucks. They want to know the average weight of all packages they ship so that they can determine the maximum capacity of their trucks. They weigh 1000 randomly selected Amazon packages and determine that the mean weight is 2.5 pounds.

### 1.Sample is randomly selected from the population

The problem states that the sample was randomly selected from the population...

Question: Amazon ships millions of packages a year using delivery trucks. They want to know the average weight of all packages they ship so that they can determine the maximum capacity of their trucks. They weigh 1000 randomly selected Amazon packages and determine that the mean weight is 2.5 pounds.

...so this assumption is checked.

### 2. The sample size (n) is less than or equal to 10% of of the population size (N)

The problem states that Amazon ships millions of packages a year...

Question: Amazon ships millions of packages a year using delivery trucks. They want to know the average weight of all packages they ship so that they can determine the maximum capacity of their trucks. They weigh 1000 randomly selected Amazon packages and determine that the mean weight is 2.5 pounds.

...and 1000 packages is less than or equal to 10% of all the packages that Amazon ships (which is the population size)...

Question: Amazon ships millions of packages a year using delivery trucks. They want to know the average weight of all packages they ship so that they can determine the maximum capacity of their trucks. They weigh 1000 randomly selected Amazon packages and determine that the mean weight is 2.5 pounds.

...so this assumption is checked.

### 3. The sample size is greater than or equal to 30, or the population itself is normally distributed

The sample size is 1000...

Question: Amazon ships millions of packages a year using delivery trucks. They want to know the average weight of all packages they ship so that they can determine the maximum capacity of their trucks. They weigh 1000 randomly selected Amazon packages and determine that the mean weight is 2.5 pounds.

...which is greater than 30. Therefore, this assumption is checked.

Why does it matter that the sample size is over 30?

Because of the Central Limit Theorem! Click here for more clarity on the Central Limit Theorem in...

• What is a z-score, in relation to sample proportions?
• What is a t-score?

You will be faced with some problems in which the sample is smaller than 30. Don't immediately assume it doesn't fit this assumption! Remember... if the population itself is normally distributed (the problem will explicitly state this), then it still passes this assumption!

## Assumptions with proportions

For problems involving sample proportions, we have to check the following 3 assumptions:

1. Sample is randomly selected from the population
2. The sample size (n) is less than or equal to 10% of of the population size N
3. There are 10 successes and 10 failures in the sample OR np >= 10 and nq >= 10

If our sample does not pass these assumptions, then we cannot proceed with calculating the z-score (remember, you won't calculate t-scores with sample proportions).

p is the proportion of success. q is the proportion of failure, which can be calculated by doing the following formula:

q = 1 - p

Don't get caught up in the success vs. failure aspect... q is essentially the opposite of p.

For example, if we had a proportion of 0.25 of successes...

q = 1 - 0.25

...that'd mean we'd have 0.75 failures.

q = 1 - 0.25 = 0.75

Let's practice assessing these assumptions with a quick example:

Crazy Christmas is a company that sells Christmas ornaments and decorations. They want to know whether 85% or more of US households celebrate Christmas. They conduct a survey with 1000 randomly selected participants and find that 750 of them do celebrate Christmas.

### 1. Sample is randomly selected from the population

The problem states that the sample was randomly selected from the population...

Crazy Christmas is a company that sells Christmas ornaments and decorations. They want to know whether 85% or more of US households celebrate Christmas. They conduct a survey with 1000 randomly selected participants and find that 750 of them do celebrate Christmas.

...so this assumption is checked.

### 2. The sample size (n) is less than or equal to 10% of of the population size (N)

There are millions of households in the US (which is our population), so 1000 households is less than 10% of all US households.

Crazy Christmas is a company that sells Christmas ornaments and decorations. They want to know whether 85% or more of US households celebrate Christmas. They conduct a survey with 1000 randomly selected participants and find that 750 of them do celebrate Christmas.

Therefore, this assumption is checked.

### 3. There are 10 successes and 10 failures in the sample OR np >= 10 and n(1-p) >= 10

A “success” in this example would be a household that celebrates Christmas and a “failure” would be one that does not. This example has at least 10 successes (houses that celebrate Christmas)...

Crazy Christmas is a company that sells Christmas ornaments and decorations. They want to know whether 85% or more of US households celebrate Christmas. They conduct a survey with 1000 randomly selected participants and find that 750 of them do celebrate Christmas.

...and at least 10 failures (houses that don't celebrate Christmas). How many failures did we have? Based on the following equation...

# of failures = total respondents - # of successes

# of failures = 1000 - # of successes

...and 750 of those were successes...

# of failures = 1000 - 750