Factor Analysis
Abstraction
- Multivariate data - scores for multiple manifest, or measured, variables.
- A basic premise of factor analysis is that the data are not completely random but rather have systematic aspects that can be identified.
- Simplify data for better understanding of the structure underlying the data.
Theory and method
- Theory: Statistical models specifying a structure underlying the data.
- Methodology: Computational procedures for analyzing data and revealing the underlying structures.
Some definitions
- Domain: Set of phenomena of interest in a particular research project. e.g. mental ablilties, personality traits.
- Manifest variable: Variable that can be directly measured (measured variable) selected from the domain under investigation.
- Latent variable: Variable that cannot be directly observed or measured - hypothetical construct - a latent variable is a factor in factor analysis.
Observation
- Variation of each variable over individuals (variance or sd).
- Covariation in a pair of variables over individuals (correlation coefficient).
The objective of factor analysis
- is to uncover and understand the structure that produces the covariances (correlations) in our data.
- For a large number of variables, the pattern of correlations may appear chaotic and difficult to describe or understand.
- To understand the pattern of relationships among the MVs, we could try to describe interrelationships between manifest variables in terms of the entries in this correlation matrix.
- If the number of MVs (p) is large the number of correlations (p(p-1)/2) is too large to understand fully - if p = 20 the number of distinct correlations is 190.
- A general principle of factor analysis is that these correlations are structured and can be explained in a simple way.
- There exists a small number of factors that influence MVs and thereby produce the variances and covariances (correlations) between MVs.
- For example, variation on a test of arithmetic skills (MV) is attributable in part to
variation in underlying mathematical ability (factor).
- is to identify the number and nature of the factors that produce the observed covariation and variation in the MVs.
Procedure
- Specify model (matrix, diagram).
- Fit model to data (estimate free parameters that minimize errors).
- Evaluate fit (estimates of free parameters, overall goodness of fit).
Formulating a model
- We want to determine the number and nature of the underlying factors and their pattern of influence (factor loadings) on the MVs.
- The number of factors is considered to be much smaller than the number of MVs - otherwise there would be little to be gained by doing factor analysis.
- We want to get a simple explanation of relationships in the data using a small number of factors.
- The number of factors = (1) prior hypotheses (2) the number of eigenvalues greater than 1 (3) interpretability, model fit, and parsimony.
- Factor loadings - the numerical value of a factor loading indicates the strength of the influence of the factor on the manifest variable (0 indicates no influence).
- Factor loadings are equivalent to regression coefficients, representing the influence of a factor (independent variable) on a MV (dependent variable).
- A factor is defined by the subset of MVs that are substantially influenced by the factor.
- Correlations between MVs arise because of their dependencies on one or more of the same factors.
- Planning is required - MVs are selected and a study is designed for the purpose of conducting factor analysis.
- A model that makes sense but does not fit the data is useless.
- A model that fits the data but does not make sense is also useless.
- Exploratory: The researcher has no prior hypotheses about the number and nature of the factors - explore the number and nature of factors.
- Confirmatory: The number and nature of the factors are prespecified - incorporate prior hypotheses into model specification and estimation.
Common factor analysis
- Common factors influence more than one MVs - account for the correlations among the MVs (a portion of the variance of each MV).
- Unique factors influence only one MV - account for the variance in each MV that is not accounted for by common factors (do not account for correlations among MVs).
- Each unique factor has two components: The specific component representing a factor influencing a single MV and the error component representing random error.
- The variance of a given MV = common variance + unique variance
- = common variance + specific variance + error variance
- Relationships with multiple regression
- Regression: DV = f(IVs) + residual
- Factor analysis: MV = f(common factors) + unique factor
- IVs are observed but common factors are unobserved.
- Residuals may be correlated for different DVs but unique factors are uncorrelated for different MVs.
- Regression coefficients = common factor loadings.
An example
Color-Blind Racial Attitudes Scale (CoBRAS).
Participants responded to these questions on a scale of 1 (Not at all appropriate or clear) to 5 (Very appropriate or clear) so the researchers could get a sense of people's attitudes on racial issues (Neville, H. A., Lilly, R. L., Duran, G., Lee, R. M., & Browne, L., 2000).
-- White people in the U.S. have certain advantages because of the color of their skin.
-- Social policies, such as affirmative action, discriminate unfairly against white people.
-- Racial problems in the U.S. are rare, isolated situations.
-- Race is very important in determining who is successful and who is not.
-- White people in the U.S. are discriminated against because of the color of their skin.
-- Talking about racial issues causes unnecessary tension.
-- Race plays an important role in who gets sent to prison.
-- English should be the only official language in the U.S.
-- Racism is a major problem in the U.S.
-- Race plays a major role in the type of social services (such as type of health care or day care) that people receive in the U.S.
-- Due to racial discrimination, programs such as affirmative action are necessary to help create equality.
-- It is important for public schools to teach about the history and contributions of racial and ethnic minorities.
-- Racial and ethnic minorities do not have the same opportunities as while people in the U.S.
-- Racial and ethnic minorities in the U.S. have certain advantages because of the color of their skin.
-- It is important for political leaders to talk about racism to help work through or solve society's problems.
-- Everyone who works hard, no matter what race they are, has an equal chance to become rich.
-- It is important that people begin to think of themselves as American and not African American, Mexican American or Italian American.
-- Racism may have been a problem in the past, it is is not an important problem today.
-- White people are more to blame for racial discrimination than racial and ethnic minorities.
-- Immigrants should try to fit into the culture and values of the U.S.
The researchers used factor analysis to find out which questions on the CoBRAS related most closely to one another. For half the questions, a score of 5 reflected less awareness of racial problems; for the other half, marked with an asterisk (*), a score of 5 reflected greater awareness.
The questions on the CoBRAS fell into three related categories (i.e., the responses to the items were correlated with one another and were relatively unrelated to other items), as indicated below. What labels would you give to each of the factors?
Factor 1
-- *White people in the U.S. have certain advantages because of the color of their skin.
-- *Race is very important in determining who is successful and who is not.
-- *Race plays an important role in who gets sent to prison.
-- *Race plays a major role in the type of social services (such as type of health care or day care) that people receive in the U.S.
-- *Racial and ethnic minorities do not have the same opportunities as while people in the U.S.
-- Everyone who works hard, no matter what race they are, has an equal chance to become rich.
-- *White people are more to blame for racial discrimination than racial and ethnic minorities.
Factor 2
-- Social policies, such as affirmative action, discriminate unfairly against white people.
-- *White people in the U.S. are discriminated against because of the color of their skin.
-- English should be the only official language in the U.S.
-- *Due to racial discrimination, programs such as affirmative action are necessary to help create equality.
-- Racial and ethnic minorities in the U.S. have certain advantages because of the color of their skin.
-- It is important that people begin to think of themselves as American and not African American, Mexican American or Italian American.
-- Immigrants should try to fit into the culture and values of the U.S.
Factor 3
-- Racial problems in the U.S. are rare, isolated situations.
-- Talking about racial issues causes unnecessary tension.
-- *Racism is a major problem in the U.S.
-- *It is important for public schools to teach about the history and contributions of racial and ethnic minorities.
-- *It is important for political leaders to talk about racism to help work through or solve society's problems.
-- Racism may have been a problem in the past, it is is not an important problem today.
Research Planning
Planning is important
- Objectives/goals
- Methods/Analyses
- Variables/Operational definitions
- Hypotheses/predictions
Some pitfalls to avoid
- Don't balk at research because it seems far too "scientific."
- Don't worry about the research design being perfect.
- Don't interview just the successes.
- Don't throw away research results once a report has been generated.
Making sense of data
- Analyzing data (whether from questionnaires, interviews, focus groups, or whatever) - always start from review of your research goals, i.e., the reason you undertook the research in the first place - this will help you organize your data and focus your analysis.
- If you want to improve a program by identifying its strengths and weaknesses, you can organize data into program strengths, weaknesses, and suggestions to improve the program.
- If you want to fully understand how your program works, you can organize data in the chronological order in which customers go through your program.
- Interpreting results - attempt to put the information in perspective - compare results to what you expected.
- Reporting results - the level and scope of content depends on to whom the report is intended, e.g., to funders/bankers, employees, clients, customers, the public, etc..
Research report
- Record enough information so that someone outside of the organization can understand what you're researching and how.
- An example.
- Title Page
- Table of Contents
- Executive Summary (one paragraph, concise overview of findings and recommendations)
- Purpose of the Report (what type of research was conducted, what decisions are being aided by the findings of the research, who is making the decision, etc.)
- Background About Organization and Product/Service/Program that is being researched
- Problem Statement
- Overall Goal(s) of Product/Service/Program
- Outcomes (or client/customer impacts) and Performance Measures (that can be measured as indicators toward the outcomes)
- Overall Evaluation Goals (eg, what questions are being answered by the research)
- Methodology
- Types of data/information that were collected
- How data/information were collected (what instruments were used, etc.)
- How data/information were analyzed
- Operational definitions
- Results (data table and figures)
- Limitations of the evaluation (e.g., cautions about findings/conclusions and how to use the findings/conclusions, etc.)
- Interpretations and Conclusions (from analysis of the data/information)
- Recommendations (regarding the decisions that must be made about the product/service/program)
- Appendices: content of the appendices depends on the goals of the research report, e.g., instruments used to collect data/information
- Data in details
- Testimonials, comments made by users of the product/service/program
- Case studies of users of the product/service/program
- Any related literature