Midterm - MGT 718: Multivariate analysis The main goal of Principal Components Analysis is to describe the variation in a set of *correlated* variables in terms of a new set of uncorrelated variables. Factor Analysis assumes that the observed *relationships* between manifest (or directly measured) variables are due to the relationships of these variables to the latent variables (or factors that cannot be measured directly). These analyses and multivariate analysis in general work only when there are multiple variables that are related to one another. One good practice when analyzing data is to visualize the data first. For multivariate analysis, you will examine if there are variables that may correlate with others. If there are dependencies in variables, the results of multivariate analysis will make sense. With this in mind, please follow the instructions below and answer questions. Please email me (ysakamot@stevens.edu) if you have any questions. The following data are from airpollw.txt file. The variables are Rainfall: mean annual precipitation in inches Education: median school years completed for >25 Popden: population density Nonwhite: percentage of nonwhite NOX: nitrogen oxide SO2: sulfur dioxide Mortality: total age-adjusted mortality rate Rainfall Education Popden Nonwhite NOX SO2 Mortality akronOH 36 11.4 3243 8.8 15 59 921.9 albanyNY 35 11.0 4281 3.5 10 39 997.9 allenPA 44 9.8 4260 0.8 6 33 962.4 atlantGA 47 11.1 3125 27.1 8 24 982.3 baltimMD 43 9.6 6441 24.4 38 206 1071.0 birmhmAL 53 10.2 3325 38.5 32 72 1030.0 bostonMA 43 12.1 4679 3.5 32 62 934.7 bridgeCT 45 10.6 2140 5.3 4 4 899.5 bufaloNY 36 10.5 6582 8.1 12 37 1002.0 cantonOH 36 10.7 4213 6.7 7 20 912.3 chatagTN 52 9.6 2302 22.2 8 27 1018.0 chicagIL 33 10.9 6122 16.3 63 278 1025.0 cinnciOH 40 10.2 4101 13.0 26 146 970.5 clevelOH 35 11.1 3042 14.7 21 64 986.0 colombOH 37 11.9 4259 13.1 9 15 958.8 dallasTX 35 11.8 1441 14.8 1 1 860.1 daytonOH 36 11.4 4029 12.4 4 16 936.2 denverCO 15 12.2 4824 4.7 8 28 871.8 detrotMI 31 10.8 4834 15.8 35 124 959.2 flintMI 30 10.8 3694 13.1 4 11 941.2 ftwortTX 31 11.4 1844 11.5 1 1 891.7 grndraMI 31 10.9 3226 5.1 3 10 871.3 grnborNC 42 10.4 2269 22.7 3 5 971.1 hartfdCT 43 11.5 2909 7.2 3 10 887.5 houstnTX 46 11.4 2647 21.0 5 1 952.5 indianIN 39 11.4 4412 15.6 7 33 968.7 kansasMO 35 12.0 3262 12.6 4 4 919.7 lancasPA 43 9.5 3214 2.9 7 32 844.1 losangCA 11 12.1 4700 7.8 319 130 861.8 louisvKY 30 9.9 4474 13.1 37 193 989.3 memphsTN 50 10.4 3497 36.7 18 34 1006.0 miamiFL 60 11.5 4657 13.5 1 1 861.4 milwauWI 30 11.1 2934 5.8 23 125 929.2 minnplMN 25 12.1 2095 2.0 11 26 857.6 nashvlTN 45 10.1 2082 21.0 14 78 961.0 newhvnCT 46 11.3 3327 8.8 3 8 923.2 neworlLA 54 9.7 3172 31.4 17 1 1113.0 newyrkNY 42 10.7 7462 11.3 26 108 994.6 philadPA 42 10.5 6092 17.5 32 161 1015.0 pittsbPA 36 10.6 3437 8.1 59 263 991.3 portldOR 37 12.0 3387 3.6 21 44 894.0 provdcRI 42 10.1 3508 2.2 4 18 938.5 readngPA 41 9.6 4843 2.7 11 89 946.2 richmdVA 44 11.0 3768 28.6 9 48 1026.0 rochtrNY 32 11.1 4355 5.0 4 18 874.3 stlousMO 34 9.7 5160 17.2 15 68 953.6 sandigCA 10 12.1 3033 5.9 66 20 839.7 sanfrnCA 18 12.2 4253 13.7 171 86 911.7 sanjosCA 13 12.2 2702 3.0 32 3 790.7 seatleWA 35 12.2 3626 5.7 7 20 899.3 springMA 45 11.1 1883 3.4 4 20 904.2 syracuNY 38 11.4 4923 3.8 5 25 950.7 toledoOH 31 10.7 3249 9.5 7 25 972.5 uticaNY 40 10.3 1671 2.5 2 11 912.2 washDC 41 12.3 5308 25.9 28 102 968.8 wichtaKS 28 12.1 3665 7.5 2 1 823.8 wilmtnDE 45 11.3 3152 12.1 11 42 1004.0 worctrMA 45 11.1 3678 1.0 3 8 895.7 yorkPA 42 9.0 9699 4.8 8 49 911.8 youngsOH 38 10.7 3451 11.7 13 39 954.4 1) Paste a scatterplot matrix on the above airpoll data. 2) Which two variables, X and Y, are most strongly correlated? 3) What is the correlation coefficient of X and Y? 4) Is the correlation between X and Y significant? Paste the test results. 5) Is the linear regression analysis on X and Y significant? Paste the results. 6) Paste the scatterplot of X and Y with city names as points. 7) Add a regression line to the scatterplot of X and Y. 8) Preform a principal components analysis using a correlation matrix a) Paste the summary table b) Paste the scree plot c) How many number of components seem appropriate? Why? 9) Perform an exploratory factor analysis on the airpoll data. a) How many factors seem appropriate. Why? b) Paste the summary table with the appropriate number of factors. c) Interpret the factors in the summary table. d) What does a very low uniqueness value in factor analysis suggest? The correlation matrix below is from the scores of 220 boys in six school subjects: French, English, History, Arithmetic, Algebra, and Geometry. French English History Arithmetic Algebra Geometry French 1.00 0.44 0.41 0.29 0.33 0.25 English 0.44 1.00 0.35 0.35 0.32 0.33 History 0.41 0.35 1.00 0.16 0.19 0.18 Arithmetic 0.29 0.35 0.16 1.00 0.59 0.47 Algebra 0.33 0.32 0.19 0.59 1.00 0.46 Geometry 0.25 0.33 0.18 0.47 0.46 1.00 1) Preform a principal components analysis using a correlation matrix a) Paste the summary table b) Paste the scree plot c) How many number of components seem appropriate? Why? 2) Perform an exploratory factor analysis a) How many factors seem appropriate. Why? b) Paste the summary table with the appropriate number of factors. c) Interpret the factors in the summary table.