Central Limit Theorem and Confidence Intervals
Sampling Distributions and Confidence Intervals
- Testing Hypothesis
- When do we trust our data?
- Are the differences real?
- Are the data reliable?
- Plus or minus what?
Observed (estimated) vs Actual
- mean: x bar vs. mu
- standard deviation: sd vs. sigma
We want to know the underlying distribution
- Computer failures
- Date rejections
- Grading
- Reaction Time
- How many green balls are in the urn
How do we do this?
- Get a sample
- Is one observation sufficient?
- How about three?
- How about a thousand?
- More is better
- Bigger sample gives us a better idea of what is going on
- One "test" could be a fluke
Will some distributions demand more "tests" to zero in on the mean?
- Yes
- Which ones?
- Ones with greater variance
What we are really talking about is understanding the sampling distribution.
- The distribution of the means of samples
- If you did this exp again, would you find the same thing?
- This might take a bit to get your head around.
Sampling Distributions
- The normal distribution can help.
- If you survey N people, your survey will get some mean response X1.
- If you took another survey of N people from the same population, this survey would have a mean X2.
- If you took a bunch of surveys and plotted the means on a histogram, you would find something that looked like a normal distribution.
- Even if the data you are sampling is not normally distributed.
- Sampling Distribution of the Mean.
Demo on skew and normal distribution
This is the Central Limit Theorem!
- As the size of the samples increases, the distribution of the means becomes normal.
- But what is the standard deviation of these sample means?
Two things are going to affect the standard deviation of the mean
- The size of the sample
- The standard deviation of the population measure
Lots of Means
- This distribution of survey results would follow a normal distribution.
- Mean of the sampling distribution of the mean = x bar
- Standard deviation of the sampling distribution of the mean = sigma/(N)1/2
- Underlying distribution
Increasing Sample Size
- As N (sample size) increases, the variability in this distribution decreases substantially.
- By N = 1000, the true mean is quite likely to be very close to the mean obtained in the survey.
How big a sample?
- The sampling distribution of the mean
- Mean = x bar = mu (with big N)
- standard deviation = sigma/(N)1/2
- As N gets larger, the standard deviation of the sampling distribution gets smaller.
- Diminishing returns for additional observations.
Confidence Intervals
- Sampling distributions
- The central limit theorem
- Creating confidence intervals
- Using confidence intervals
Sampling Distributions
- How is the sample mean related to the population mean?
- Requires thinking about sampling distributions.
- Sampling distribution of the mean.
- Distribution of means of samples of a particular size.
Deconstruct this
- If we take a lot of samples of size N
- Look at the mean of these samples
- What will the mean of these means be?
- Approximately the population mean
- How closely this approximates the population mean will depend on the sample size.
- The larger the sample, the more accurate is the estimate of the mean.
Central Limit Theorem
- Tells us something neat about means
- For any population
- For large sample sizes, the sampling distribution of the mean is normally distributed.
- Even if the population values are not normally distributed.
Using Sampling Distributions
- The sampling distribution is very useful.
- Generally, we want to know the population mean.
- How can we make a guess about what that mean actually is?
- Collect a sample.
- The mean of that sample is an estimate of the population mean.
The population mean
- Where is the population mean likely to be?
- You can construct an interval that is likely to contain the population mean.
- This interval is called a confidence interval.
- You can build an interval that is as wide as the confidence you want to have in it.
An Example
- Suppose you want to know the average number of raisins in a box of Raisin Bran.
- Kelloggs has said that the there is some variability in the number of raisins in a box.
- They list the standard deviation as 26 raisins.
- This is the population standard deviation.
- Take a sample of 100 boxes and count them.
- How confident do you want to be that the actual mean falls in the interval?
Demo on loose vs. stringent confidence intervals
Building Confidence Intervals
- A procedure to use when sigma is known
- For the confidence interval
- Use the mean of the sample (263)
- Use the standard deviation of the mean
- Find the z-score associated with that level of confidence
- z(95%)=1.96 (leaving 2.5% in each tail)
- 95% confidence interval (z * 2.6)
Summary
- To construct a confidence interval when the population standard deviation is known
- Collect a sample
- Find the z-score for the desired confidence
- Leave p/2 of the probability in each tail
- The confidence interval
- Mean ± (z * standard deviation of mean)
Example
The pop-o-matic company says there microwave popcorn leaves 15 kernels unpopped on average (they aren't sure though). We KNOW that the standard deviation (sigma) of the number of kernels that don't pop is 10 (i.e., we KNOW the spread; we DON'T KNOW the mean for sure). We observe 50, 40, 60, and 50 unpopped kernels.
What is x bar?
--- 50
What is the standard deviation of the mean?
--- 10/4(1/2)= 5
What is the 95% confidence interval?
--- z(95%) = 1.96
--- margin = 1.96*5 = plus or minus 9.8
Are the pop-o-matic people wrong?
--- Null Hypotheses: kernels equals 15
--- Alternative Hypothesis: kernels not equal to 15
--- Calculate z score: (15-50)/5=-7
--- Conclusion: 7 z-scores from the mean is about a zero chance. More likely aliens will interrupt class.
More examples
1. Jane needs to score in the top 2.5% percent of an aptitude test to qualify for a job. The mean of test is 75 and standard deviation is
10. How high of a score does she need to get?
2. The standard deviation for the number of students attending a class is 5. Over five classes, 25, 35, 30, 26, and 20 students attend the
class. What is the mean number of students attending for these five classes? What is the standard deviation associated with this mean?
3. Mia needs to figure out the mean number of hours children spend watching TV per week. She needs the standard deviation associated
with the mean to be less than or equal to 2. The standard deviation for how many hours children watch TV is 6 hours per week. How many
children does she need to survey?
4. Dave runs an experiment with 1241 subjects. Kate runs the same experiment (on the same population) with 476 subjects. What is Kate's
standard deviation of the mean divided by Dave's standard deviation of the mean?
5. The weight of a seal has a standard deviation of 20 pounds. An animal trainer at Sea World is interested in how heavy the seals are
at the park (on average). Eight seals at the park are weighed and weigh 200, 220, 300, 175, 218, 315, 180, and 200 pounds.
What is the mean weight of the seals?
What is the confidence 95% confidence interval for the above mean?
What is the confidence 99% confidence interval for the above mean?