Final - MGT 718: Multivariate analysis Please email me (ysakamot@stevens.edu) if you have any questions. The first step in analyzing data is to visualize the data. 1) Multidimensional scaling is an excellent technique for representing an observed proximity matrix geometrically. Using the airpoll data (attached below), create a distance matrix that shows the distance between two cites based on the 7 variables. For example, to find the distance between akronOH and albanyNY, you might calculate (x1 - x2)^2 for each variable and sum. Because the scales are different for different variables, you may want to normalize each variable so that the values range from 0 to 1. Of course, you can come up with other ways to measure distances. Then, find a two-dimensional scaling solution and plot the results. 2) Are there any interesting patterns in the mds plot? Any clusters? Any outliers? 3) Hierarchical clustering is another good technique for visualizing data. Using the distance matrix you have created above, find a clustering solution and plot the results. 4) Are there any interesting patterns in the hierarchical clustering plot? Any clusters? Any outliers? 5) Do you see any relationship between the mds solution and the hierarchical clustering solution? 6) Perform a canonical correlation analysis to look at the relationship between the set [NOX SO2 Mortality] and the set [Rainfall Education Popden Nonwhite]. Report the correlation. 7) Create a new binary variable by converting the Mortality variable. In the new variable, values that are >= median(Mortality) are assigned 1 and considered high mortality rate. Values that are < median(Mortality) is assigned 0 and considered low mortality rate. Perform a discriminant analysis on the newly created categorical variable with your choice of explanatory variables. Report the table showing the fit that looks like the example below. Also report the probability of correct classification for each category (in the example below, probability of correct categorization for category 0 = .63 and 1 = .6). cases 0 1 0 154 90 1 20 30 8) Create two models for predicting the binomial variable you have created above (the new mortality score). Perform logistic regression for each model and compare the two models. Report the fit of each model, and the test statistics for the model comparison. The following data are from airpollw.txt file. The variables are Rainfall: mean annual precipitation in inches Education: median school years completed for >25 Popden: population density Nonwhite: percentage of nonwhite NOX: nitrogen oxide SO2: sulfur dioxide Mortality: total age-adjusted mortality rate Rainfall Education Popden Nonwhite NOX SO2 Mortality akronOH 36 11.4 3243 8.8 15 59 921.9 albanyNY 35 11.0 4281 3.5 10 39 997.9 allenPA 44 9.8 4260 0.8 6 33 962.4 atlantGA 47 11.1 3125 27.1 8 24 982.3 baltimMD 43 9.6 6441 24.4 38 206 1071.0 birmhmAL 53 10.2 3325 38.5 32 72 1030.0 bostonMA 43 12.1 4679 3.5 32 62 934.7 bridgeCT 45 10.6 2140 5.3 4 4 899.5 bufaloNY 36 10.5 6582 8.1 12 37 1002.0 cantonOH 36 10.7 4213 6.7 7 20 912.3 chatagTN 52 9.6 2302 22.2 8 27 1018.0 chicagIL 33 10.9 6122 16.3 63 278 1025.0 cinnciOH 40 10.2 4101 13.0 26 146 970.5 clevelOH 35 11.1 3042 14.7 21 64 986.0 colombOH 37 11.9 4259 13.1 9 15 958.8 dallasTX 35 11.8 1441 14.8 1 1 860.1 daytonOH 36 11.4 4029 12.4 4 16 936.2 denverCO 15 12.2 4824 4.7 8 28 871.8 detrotMI 31 10.8 4834 15.8 35 124 959.2 flintMI 30 10.8 3694 13.1 4 11 941.2 ftwortTX 31 11.4 1844 11.5 1 1 891.7 grndraMI 31 10.9 3226 5.1 3 10 871.3 grnborNC 42 10.4 2269 22.7 3 5 971.1 hartfdCT 43 11.5 2909 7.2 3 10 887.5 houstnTX 46 11.4 2647 21.0 5 1 952.5 indianIN 39 11.4 4412 15.6 7 33 968.7 kansasMO 35 12.0 3262 12.6 4 4 919.7 lancasPA 43 9.5 3214 2.9 7 32 844.1 losangCA 11 12.1 4700 7.8 319 130 861.8 louisvKY 30 9.9 4474 13.1 37 193 989.3 memphsTN 50 10.4 3497 36.7 18 34 1006.0 miamiFL 60 11.5 4657 13.5 1 1 861.4 milwauWI 30 11.1 2934 5.8 23 125 929.2 minnplMN 25 12.1 2095 2.0 11 26 857.6 nashvlTN 45 10.1 2082 21.0 14 78 961.0 newhvnCT 46 11.3 3327 8.8 3 8 923.2 neworlLA 54 9.7 3172 31.4 17 1 1113.0 newyrkNY 42 10.7 7462 11.3 26 108 994.6 philadPA 42 10.5 6092 17.5 32 161 1015.0 pittsbPA 36 10.6 3437 8.1 59 263 991.3 portldOR 37 12.0 3387 3.6 21 44 894.0 provdcRI 42 10.1 3508 2.2 4 18 938.5 readngPA 41 9.6 4843 2.7 11 89 946.2 richmdVA 44 11.0 3768 28.6 9 48 1026.0 rochtrNY 32 11.1 4355 5.0 4 18 874.3 stlousMO 34 9.7 5160 17.2 15 68 953.6 sandigCA 10 12.1 3033 5.9 66 20 839.7 sanfrnCA 18 12.2 4253 13.7 171 86 911.7 sanjosCA 13 12.2 2702 3.0 32 3 790.7 seatleWA 35 12.2 3626 5.7 7 20 899.3 springMA 45 11.1 1883 3.4 4 20 904.2 syracuNY 38 11.4 4923 3.8 5 25 950.7 toledoOH 31 10.7 3249 9.5 7 25 972.5 uticaNY 40 10.3 1671 2.5 2 11 912.2 washDC 41 12.3 5308 25.9 28 102 968.8 wichtaKS 28 12.1 3665 7.5 2 1 823.8 wilmtnDE 45 11.3 3152 12.1 11 42 1004.0 worctrMA 45 11.1 3678 1.0 3 8 895.7 yorkPA 42 9.0 9699 4.8 8 49 911.8 youngsOH 38 10.7 3451 11.7 13 39 954.4