Hypothesis Testing
More Stats
- Confidence Intervals
- Margin of Error
- Hypothesis Testing
How many samples do I need?
- the plus or minus margin = z * (sigma / n1/2)
- n = (z * sigma / margin)2
An Example: Sam's weight
- Assume sigma is 3 for this example.
- want to be accurate to 2 pounds (plus or minus 2) with 95% confidence
- z=1.96
- n=(1.96*3/2)2=8.6=9
- Let's say he only wants to be within 3 pounds
Properties of Confidence Intervals
- What is the relationship between
- 50% confidence interval
- 95% confidence interval
- 99% confidence interval
- Which is widest?
- Are they always symmetric?
- Confidence interval and sample size
- You can use your desired width of a confidence interval to decide how many observations you need.
How accurate is the confidence interval?
- If data are well collected
- The true mean will fall within in the interval with probability p.
- For a 95% confidence interval, the true mean will fall in the interval 95% of the time
Potential Problems
- Bad experiment or survey
- Data might come from a poor sample
- Bias will influence the mean of the sample
- Outliers in the data can disrupt estimate of the mean
- The standard deviation of the population has to be known
- We will talk about what to do when the standard deviation is not known soon.
Hypothesis Testing
- Often we want to know whether our data reveal a reliable effect.
- We can test this possibility using the techniques we have described
- This process is called statistical inference or hypothesis testing
- Suppose Kelloggs claimed that there were an average of 250 raisins in each box of Raisin Bran.
- How could we test this claim?
Null Hypothesis
- Start by determining the null hypothesis
- A state of affairs against which we are comparing
- In our case
- H0: mu = 250
- H1: mu < > 250
- The alternative hypothesis (H1) is always stated relative to the null hypothesis.
One Tail vs Two Tail
- Two Tail
- Preferable
- H1 is simply different from H0
- One Tail
- Avoid unless you have a specific prediction
- More power (but don't cheat)
- can't look at the data before choosing one tail
- can miss big unexpected effects
P-value
- We can use the normal distribution to determine the likelihood of an observed outcome.
- Calculate the z-score of the observed mean relative to the sampling distribution with the mean in the null hypothesis.
- Find the probability of a point as extreme or more extreme than that (in either direction)
Calculating the p-value
- In our example:
- mu0 = 250
- z (263) = (250 - 263) / 2.6 = -5.00
- 2.6 is the standard error of the mean (from previous lecture)
- probability associated with a z-score of 5.00 is less than .0001
Pick a significance level
- How sure do you want to be in your answer?
- The probability associated with your sureness is called the alpha level
- In science, the alpha level selected is usually .05
- If the probability of observing a result is as extreme or more extreme than the alpha level, then reject the null hypothesis
In our example
- The probability associated with a mean of 263 is less than .0001.
- .0001 is smaller than .05
- Reject the null hypothesis
- That is, reject the statement that the population mean equals 250
Relationships to Confidence Intervals
- Another way to do the same test
- Formulate a confidence interval around the observed mean
- If the mean from H0 is inside the interval
- If the mean from H0 is outside the interval
What about smaller alpha levels?
- Why do we use an alpha level of .05?
- There is a 5% chance that we will reject H0 when we should not have (Type I error - false positive).
- If we used an alpha level of .01 there would only be a 1% chance that would happen.
- The alpha level of .05 is a good compromise
- Type II error - false negative
- It could be a problem to make a test too conservative
- The Power of a test is 1 minus the probability of a Type II error.
Demo on power and hypothesis testing.
Questions
The Frayed Nerves Anxiety Scale (FNAS) is used to measure the anxiety level of people on a scale from 0 to 1000. Four students were randomly chosen and given the FNAS before their second exam. Their scores were 700, 653, 740, and 707. The standard deviation sigma of the test is 40.
1. What is x bar?
2. What is the standard deviation of the mean?
3. What is the 95% confidence interval for x bar?
4. What is the 99% confidence interval for x bar?
5. How many students do you need to survey to get the plus or minus for the 95% confidence interval down to 20?
6. How about for the 99% confidence interval?
7. A TA for the class states that the mean stress level of the class is 750. The professor is not so sure. Take the TA's belief as
the null hypothesis. Remember, only four subjects were run.
a. State the null and alternative hypotheses
b. Calculate the p-value for the null hypothesis.
c. Is it significant with alpha equal to .05?
d. Is it significant with alpha equal to .01?
More questions
How many days a year does it rain in Seattle? A local weather person states that the average number of days is 50. Take this as the null hypothesis. The alterntive is that mu does not equal 50. Sigma is 12.
Looking at data from the last nine years, the mean is 55 rainy days per year. What is the 95% confidence interval for x bar? What is the p-value for the null hypothesis? Do you reject the null hypothesis with an alpha level of .05? Does the hypothesized mu fall within the 95% confidence interval?
What z-score is associated with a 95% confidence interval or .05 alpha level: z(95%)=1.96
What is the standard deviation associated with the sample mean: 12/sqrt(9)=4
What is the 95% confidence interval: 55 plus or minus 4*1.96=7.84
How many z-scores apart is x-bar from the null hypothesis mu:
(55-50)/4=1.25
Is the p-value more or less than the alpha level: more.
To be exact:
plus or minus 1.25 is 79% of the normal distribution, so the p-value is .21
T-tests
- Hypothesis testing
- T-tests for one sample
- Comparing means
- Comparing means from independent groups
Hypothesis Testing
- For one sample mean with known population standard deviation
- Set a null hypothesis (H0)
- Create an alternative hypothesis as well (H1)
- Collect a sample
- Find z-score of sample mean relative to mean in H0
- Find probability (p) associated with z-score
- If p < a, then reject the null hypothesis
- Otherwise, do not reject H0
Extension
- What if the standard deviation is not known?
- In any sample, you have an estimate of the population standard deviation
- The sample standard deviation!
- Maybe you could just use the sample standard deviation and follow the same procedure as before.
- This will not work straightforwardly.
Standard Error of the Mean
- If we estimate the standard deviation of the sampling distribution of the mean using the data, we are using the standard error of the mean
- Some people call both the standard error (so don't get confused), just concentrate on what you know and what you need to estimate and you will always do the right thing.
- It looks like the statistic we have been using.
- standard deviation of the sampling mean: sigma/n1/2
- standard error of the mean: s/n1/2
z and t distributions
- Recall:
- z=(x bar-mu)/(sigma/n1/2)
- if we don't know the population standard deviation:
t and z distributions
- They are not the same.
- t is actually a whole family of distributions (no standard t)
How is t related to z (normal distribution)?
- same in the limit (df=infinity)
- more uncertainty for lower df
- t is different for smaller degrees of freedom
- use t for normal populations of unknown sigma.
- so, you need to use a t table instead of a z table!
N(0,1) and t(5)
Degrees of Freedom
- The shape of the t distribution changes with the sample size
- Has to do with the fact that the mean and standard deviation are not independent
- Specific curve is identified by the number of degrees of freedom
- As the sample size gets larger, the t distribution looks more like the Normal distribution.
- The family of curves appears in tables in the book.
Demo on comparing t and normal distribution
The logic of the one sample t-test
- Same as the tests using the z distribution
- Population standard deviation is unknown
- Use standard error of the mean
- Use the t distribution
- N-1 degrees of freedom
- Mean must be estimated to know standard error
Building a confidence interval with t's
- Same logic as with the normal distribution
- just use t instead of z
- use t when sigma is unknown and must be estimated
An Example - find the 95% confidence interval for x bar.
- Four music fans from Boston are asked to rate how much they like Helium on a 1-123 scale.
- The four ratings are 98, 45, 78, and 67.
- What is x bar?
- What is the stadard error of x bar?
- sqrt(var(98,45,78,67))/sqrt(4)=11.1
- What t value should we use?
- degrees of freedom equals 3
- t(95%,df=3)=3.18
- The confidence interval is 72 ± 3.18*11.1=35.3
A one sample t-test
- If a morning class drinks caffeine, they will average an 80 on the final exam.
- H0: m = 80
- H1: m < > 80
- mean = 86
- s.d. = 5.4
- 32 students taking the exam
- standard error of the mean = .955
- t (31) = (86 - 80)/.955 = 6.285
-
Reject H0
Questions
The Frayed Nerves Anxiety Scale (FNAS) is used to measure the anxiety level of people on a scale from 0 to 1000. Four students were
randomly chosen and given the FNAS before their second exam. Their scores were 700, 653, 740, and 707. We don't know what sigma is.
1. What is x bar?
2. What is the standard error of the mean?
3. What is the 95% confidence interval for x bar?
4. What is the 99% confidence interval for x bar?
5. What is the 80% confidence interval for x bar?
6. A TA for the class states that the mean stress level of the class is 750. The professor is not so sure. Take the TA's belief as
the null hypothesis. Remember, only four subjects were run.
a. State the null and alternative hypotheses
b. Is it significant with alpha equal to .05?
c. Is it significant with alpha equal to .01?
More t-tests.
- You can also test for differences (paired t-test)
- What if I have two sets of observations from the same group?
- Pretest/Posttest
- I might want to know if they differ significantly
- Differences might be due to chance
- The null hypothesis for this test
- H0: m1 - m2 = 0
- H1: m1 - m2 < > 0
- Assume same standard deviation for both samples
Again, calculate the statistics
Testing and Caffeine
- Each person in a class takes two exams
- One after drinking regular coffee
- One after drinking decaf
- H0: mucoffee - mudecaf = 0
- Mean (regular) = 86
- Mean (decaf) = 83
- s.d. = 6.42
- 32 students
- t(31) = (86 - 83) / (6.42/321/2)
- = 2.64
- p < .05
So, we now know one sample t-tests
- We can answer these questions:
- Is x bar different from zero (paired tests)?
- Is x bar different from some null hypothesis?
- We can also make confidence intervals.
What if we want to see if two samples are different?
The great questions of our time:
- Are boys smarter than girls?
- Drug A or drug B?
What's the null hypothesis here?
What to do with two groups
- Things get more complicated with two groups
- Must assume that the standard deviation is the same for both groups
- Actually can do other procedures
-
See the book or me if interested
- Standard deviation is found by pooling
Calculating the t statistic
- Given the pooled variance, a t statistic can be found
- Degrees of freedom reflect that two means need to be known.
Here's an experiment
- Half a morning class is given regular coffee and the other half is given decaf for a semester.
- H0: muregular = mudecaf
- H1: muregular < > mudecaf
- Mean (regular) = 87
- Mean (decaf) = 82
- var (regular) = 6.5, n = 16
- var(decaf) = 7.1, n = 16
- pooled var. = 6.8
- pooled sd=2.61
- t(30)=(87-82)/(2.61*sqrt(1/16+1/16))=5.42
t test is robust
- Test assumes that variances are the same
- Even if the variances are not the same, the test still works pretty well
- Test assumes data are drawn from a normally distributed population
- Even if the population is not normally distributed, the test still works pretty well.
- Of course, there are limits
Questions
A public health advocate believes children growing up by power lines get sick 15 days per year. Take sigma for the number of days equal 4. A sample of three children is gathered with number of sick days equal to 19, 15, 17. Do you reject the null hypothesis?
x bar is 17.
stadnard deviation of the mean is 4/sqrt(3) = 2.31
z score = (17 - 15)/2.31 = .866
p value = .36, do not reject with alpha of .05
Do the same problem, but now sigma is unknown.
variance = 4
s = 2
standard error of the mean = 2/(sqrt(3) = 1.15
t(2) = (17 - 15)/1.15 = 1.74
p value = .22, do not reject with alpha of .05
Another researcher wants to compare the effect on boys and girls. She gets a sample of three girls, 10, 15, 20, as well as a sample of boys, 14, 20, 26. Are boys and girls affected differently by the power lines?
x bar girls = 15
x bar boys = 20
variance for girls = 25
variance for boys = 36
pooled variance = (2*25+2*36)/4 = 30.5
pooled standard deviation = 5.52
t(4) = (15-20) / 5.52*sqrt(1/3 + 1/3) = -1.09
p value = .34, do not reject with alpha of .05
More questions
An American and a Englishman get in an argument over who is more boring: Canadians or Belgians (subsequently a Belgian and a Canadian get in an argument over the relative intelligence levels of the English and the Americans, but that is a different story). To resolve the debate, the American and the Englishman collect some data on Canadians and Belgians using the North Atlantic Boredom Scale (NABS). A high rating indicates a boring person.
The data for four Canadians are: 25, 5, 24, 14.
The data for four Belgians are: 31, 41, 22, 42
a. State the null and alternative hypothesis.
b. Do Canadians and Belgians differ significantly in their level of boringness (use alpha at .05)?
Online tutorials when you need help.