Scientific Method and Basic Experimental Design
Scientific method
- Observe, hypothesize, experiment, evaluate.
- What are the skills you learn in this course good for?
- Scientific method as an attitude
- Evaluating evidence
- Approaching problems
- "The whole of science is nothing more than a refinement of everyday thinking." - Albert Einstein
Scientific explanations are:
- Empirical (based on observation)
- Predictive (vs. post-hoc)
- Testable (can be falsified)
- Tentative (Aristotle -> Newton -> Einstein -> ?)
- Rigorously evaluated (replication, competing theories)
- Parsimonious and general (simple and powerful)
- Objective (can be replicated)
- Rational (explanations follow from observations)
Scientific thought
- Assumption: Nature is structured by laws that govern its operation
- To manage technology and human decision-making
- We want to be able to predict and control behavior.
- We need to understand laws of behavior.
- Scientific inquiry seems to offer the best possibility.
Other ways of approaching problems:
- Method of authority
- Commonsense explanations
- Rational method
Background
- Pre-modern era: "age of faith" or "age of superstition" - knowledge was based on
- The authority of the Church and philosophers like Aristotle
- Commonsense explanations
- Modern era: "age of reason" or "age of science" - birth of modern science by
- Francis Bacon (1561-1626): Empiricism
- Rene Descartes (1596-1650): Rational method
Method of authority
- Knowledge is based on a trusted authority.
- You believe something an authority figure says.
"The method of authority is wrong because I said so."
Commonsense explanations
- Belief-based
- Based on observation
- Superstitions
- Not systematic, not controlled, not experimental
- "Common sense is the collection of prejudices acquired by age eighteen." - Albert Einstein
- Of course, laypeople understand behavior to the extent that they can coexist with other people.
- They have principles of behavior, but these commonsense principles are subjective - depend on people's perspectives - and only work locally.
- For example, "Absence makes the heart grow fonder" but "out of sight, out of mind."
- Leaning Tower of Pisa
- Aristotle: According to commonsense logic, heavier objects fall faster than lighter objects.
- Galileo: According to the experiments, all objects with differing weights fall at the same speed (have the same acceleration rate without air resistance) - law of gravity through observations.
- Acquiring new language
- Kids are better because they are not afraid of making mistakes.
- Plasticity of the brain.
Rational method
- Rene Descartes (1956-1650): Rationalism.
- You draw inferences based on some established truths.
- The conclusion is true if it follows from the premises - deduction.
- All horses are mammals (premise 1)
- All mammals are animals (premise 2)
- -----
- All horses are animals (conclusion)
- Not exactly how science works.
Valid and true deductive arguments
- A valid deductive argument - if the premises are true, then the conclusion must be true.
- Validity and truth are separate concepts.
- Valid and true
- All horses are mammals (premise 1)
- All mammals are animals (premise 2)
- -----
- All horses are animals (conclusion)
- Valid but not true
- All horses are mammals (premise 1)
- All mammals are trees (premise 2)
- -----
- All horses are trees (conclusion)
Wason selection task
- Difficult to decide what information is necessary to test the truth of an abstract logical reasoning problem.
- Rule: If there is a vowel on one side, there is an odd number on the other side.
- Which two cards do you have to turn over to verify the rule?
- [ A ] [ 3 ] [ J ] [ 4 ]
- This task becomes easier when the rule becomes familiar.
- Rule: If you have a beer, you are over 21.
- Which conditions do you need to check to verify the rule?
- [ BEER ] [ 22 ] [ NO BEER ] [ 19 ]
Valid and invalid arguments
- Valid arguments.
- Affirming the antecedent (modus ponens)
- P -> Q (if rain, wet)
- P (rain)
- -----
- Q (wet)
- Denying the consequent (modus tollens)
- P -> Q (if rain, wet)
- Not Q (not wet)
- -----
- Not P (not rain)
- Invalid arguments.
- Affirming the consequent
- P -> Q (if rain, wet)
- Q (wet)
- -----
- P (rain)
- Denying the antecedent
- P -> Q (if rain, wet)
- Not P (not rain)
- -----
- Not Q (not wet)
Circular reasoning
- Assuming something to prove the very thing that you assumed.
- X shows aggressive behavior because X has aggressive instinct.
- How do you know X has aggressive instinct?
- Because X shows aggressive behavior!
- Catch-22.
- Combat flight - sanity is a prerequisite to discovering one is insane.
- You got flies in your eyes.
- Job search - need work experience to get a job, but need a job to gain work experience.
- The chicken or the egg.
- Depression and negative thought.
- More jobs and more consumption.
Modern scientific method = empirical induction + rational deduction
- Francis Bacon (1561-1626): Empiricist approach to science.
- Science = direct observation (empiricism) + inductive logic (deriving generalities from particulars)
- Observe, experiment, and apply induction to the results.
- Knowledge should be extracted from observation, not based on authority.
- Observation will avoid basing knowledge on ungrounded assumptions.
- Induction lacks circularity of deduction.
- Empirical method gives "objectivity" in science.
- Empirical approach is tentative, but rigorous application will lead to knowledge.
- What is wrong with rational method?
- According to Bacon, deduction does not lead to new knowledge because
- conclusions are always contained or implied in the premises
- logically valid conclusions may not be true
- Induction does not contain conclusions in the premises and allows for prediction.
- What is wrong with empirical method?
- Descartes did not trust our senses and observations.
Of course, scientific method does not always get it.
- As a matter of fact, scientific method is always wrong, that is the point.
- "No amount of experimentation can ever prove me right; a single experiment can prove me wrong." - Albert Einstein
- Scientific method is just more right than anything else.
Observations
- Play a major role in scientific method and the advancement of theories.
- Issues associated with observations.
- Observations are based on theories.
- Does your observation include key pieces of information?
- Can you generalize from your observation?
- Not everything is observable.
- People may not behave naturally when observed.
- Your expectation may bias not only people's behavior but also your measure.
- We deal with these issues by
- Minimizing interference - Demand characteristics, reactivity.
- Maximizing precision - Precision and validity.
- Maximizing objectivity - Expectancy and reliability.
Minimizing interference
- Reactivity - Influence of the observer on the situation
- People are affected by being observed
- Fear of being evaluated
- Increased arousal at being evaluated
- Mere presence of the observer rather than some action taken by observer
- Participants will try to figure our what you "want" them to do or say - demand characteristics
- From the newspaper:
- In a recent survey, 90% of people said they give to the homeless. A government official said, "That doesn't mean that 90% of people give to the homeless, it means that 90% of people were afraid to tell a stranger that they don't care about the homeless."
Maximizing precision/reliability
- An apparatus for measurement should be as accurate as possible relative to what you want to measure.
- Accuracy
- Correctness or validity of your measure.
- How close your measure is to the true value.
- There is a point of diminishing return.
- Do the results replicate under similar circumstances?
- Precision
- How closely your measure can be replicated.
- A weight scale + or - 2 pounds
- Bush over Gore 51 to 46 + or - 2 points
- When precision is high, you get the same result when you replicate.
- There is often a tradeoff between precision and ease of measurement.
- Your operational definitions should be specific, otherwise precision and reliability will be low.
Accuracy and precision: Suppose true value = 10, measure 5 times
- [ 15, 15, 15, 15, 15 ] -> low accuracy, high precision
- [ 8, 9, 10, 11, 12 ] -> high accuracy, low precision
- [ 10, 10, 10, 10, 10 ] -> high accuracy, high precision
Operational definitions
- Exact (clear and detailed) descriptions of how to measure (obtain values for) your constructs.
- To make precise measurements, you must know what you are measuring.
- Remove ambiguity by ensuring researchers have the same understanding.
- Constructs are
- Abstract factors (e.g., conflict, anxiety, performance improvement).
- Inferred from observations.
- Made concrete via operational definitions.
Validity
- Operational definitions must be clear and precise.
- Milgram defined "disobedience" - A quantitative value based on the maximum intensity of shock administered before refusing to participate further.
- Our definitions must match what we think we are testing.
- Downhill skiing: Time
- Ski jumping: Distance
- Freestyle skiing: Aesthetic value
- External validity - can your findings be generalized to the population at large?
- Are your participants representative?
- Is there a point to your research?
Types of Validity (external/ecological validity)
- Face validity - asking math questions to assess math ability.
- Content validity - using representative material, such as typing test for typist.
- Criterion-related validity - correlating SAT scores and grades - can one infer a value of some other measure?
- Construct validity - measuring intelligence by feeling for lumps on the head - does the measure really capture the underlying construct?
Practice with operational definitions
- People who live in clean houses get sick less often than people who live in dirty ones.
- Clean house? Dirty house?
- Sick?
- People with high mathematical ability can play chess better than people with low mathematical ability.
- Physically attractive individuals make better salespeople than physically unattractive individuals.
Problems with dependent measures
- Range effects
- Scores crammed to the top or bottom of the scale.
- When everyone gets very high scores - Ceiling effects.
- When everyone gets very low scores - Floor effects.
- This is a problem because you will not see any difference between groups when ceiling or floor effect takes place.
- Sensitivity
- Refers to the fineness of your dependent measure.
- When measuring fear
- Screaming may not be a sensitive measure because people may not scream in response to fear.
- Blood pressure will be more sensitive.
Maximizing objectivity
- There is always a danger of observer bias
- Expectancy effects - the influences of an experimenter's expectation on the outcome of an experiment
- Bias can affect the observer's actions.
- The experimenter's actions may influence the outcome of an experiment.
- Clever Hans (The Horse of Mr. von Osten) - Expectation can lead to subtle and unintentional cueing.
- The clever horse could work with numbers.
- Only when the experimenter was visible and knew the answer...
- Tension release of the experimenter was an unintentional visual cue for the horse to stop tapping.
- Bias can affect the observer's observations.
- The experimenter may be more sensitive to behaviors that support the hypothesis than to those that disconfirm it.
- Observers involved in a situation may become emotionally involved.
- Ensuring objectivity.
- Naive raters who do not know the point of the study.
- Blind raters who do not know the experimental condition of the data they are scoring.
- Double blind - neither participants nor experimenter know the condition - e.g., using a placebo.
- Subjective measures require a check on inter-rater reliability.
Collecting useful information
- You cannot observe everything.
- Observation requires losing information.
- Some measurements preserve a lot of information.
- Video-tape recordings.
- Some measurements lose a lot of information.
- Response and latency measurements.
- How can we deal with this?
- You need to decide exactly what kind of behavior you will observe.
- You need some method for sampling behavior.
- Sampling influences representativeness of observations, external validity, and quality of the data.
Classes of measurements
- Narrative records
- Videotape, audiotape, written descriptions.
- Requires massive data reduction.
- Coding of behavior
- What events should be recorded?
- Must define the event.
- Frequency method: How often does a particular behavior occur?
- Intervals method: Only look at certain set times.
- Duration method: How long does a particular behavior last?
- Sampling
- Time sampling
- Select particular times of observe.
- Random times - studies with beepers - how are you feeling now?
- Fixed times - diary studies - entries written at a fixed time of day.
- Strength: may capture a range of situations.
- Weakness: may miss infrequent events.
- Event sampling
- Observe a specific event every time it occurs.
- When an event is rare, time sampling may not work.
- Strength: can capture rare events.
- Weakness: is the event easily defined?
Techniques for coding data - Checklists - Limits and checks for particular behaviors
- Binary checklists (happened/didn't happen)
- Frequency and duration
- Objective as long as behaviors are well defined.
- Ratings (How strong was behavior)
- These measures are more subjective.
- These measures require checks of inter-rater reliability
Reliability
- Rater reliability is the extent to which ratings can be reliably replicated.
- Does your rating at time t1 agree with you rating at time t2?
- Does your rating agree with someone else's rating?
- Inter-rater reliability is the extent to which two or more coders agree on their ratings.
- Addresses the consistency of the implementation of a rating system.
- Most open-ended measures require checks of inter-rater reliability.
- Requires multiple raters.
- Important even if naive or blind raters are used.
- Proportion of agreement
- (# of agreements)/(# of possible agreements).
- Low reliability (less than 85%) means the operational definitions are not specific enough.
- There are other measures as well.
- Cohen's Kappa - Accounts for correlations between two raters.
- Fleiss' kappa - works for any fixed number of raters.
- Intra-class correlation coefficient.
- Confidence limits around the mean of the differences between two raters.
An imaginary study
- Ratings of boredom
- A videotape of a class
- What counts as boredom?
- The number of time students:
- put head on desk.
- look away from screen.
- yawn more than three times in one minute.
- check their watch more than three times in one minute.
- Are these measures reliable?
- Are they really measures of boredom?
Measurement scales
- Nominal scale
- No ordering of values
- e.g., Male or female
- Ordinal scale
- Can infer ordering of values
- e.g., low, medium, and high self-esteem
- Interval scale
- Can infer ordering of values
- The values are evenly spaced
- e.g., Celcius scale, intelligence tests
- Ratio scale
- Same as interval scale, but zero is special
- Can make ratio
- e.g., Weight, Kelvin scale (absolute zero)
Pilot study
- Pre-study with a few participants.
- Exploratory.
- You can check whether your hypothesis is reasonable, your operational definitions are working, and so on.
Manipulation check
- Are your results consistent with previous results?
- Sanity check.