** CH01PR22.DAT --1.22 Root MSE 3.23403 R-Square 0.9731 Dependent Mean 225.56250 Adj R-Sq 0.9712 Coeff Var 1.43376 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 168.60000 2.65702 63.45 <.0001 X 1 2.03438 0.09039 22.51 <.0001 (a) Y=168.60+2.034 * X, R^2=0.9731 means a good fit Dependent Predicted Obs Variable Value Residual 17 . 249.9750 . (b) When X=40, we predict Y=249.9750=168.60+2.034 *40 (c) Doesn't matter what X is. If increase X by 1 unit, we always increase Y by b1=2.03438. -- 2.7 Parameter Standard Variable Estimate Error t Value Pr > |t| 99% Confidence Limits Intercept 168.60000 2.65702 63.45 <.0001 160.69046 176.50954 X 2.03438 0.09039 22.51 <.0001 1.76529 2.30346 (a) When X increases by 1 unit, mean of Y increases by b1. This question asks for a 99% confidence interval for b1. (1.76529,2.30346) (b) Since the 99% CI is (1.76529,2.30346), which contains 2, we can not reject the manufacturer's claim. -- 2.16 For a new observation with X=30 and unknown Y, part a asks for a confidence interval (clm) and part b for a prediction interval (cli). Dependent Predicted Std Error Obs Variable Value Mean Predict 98% CL Mean 98% CL Predict 17 . 229.6313 0.8285 227.4569 231.8056 220.8695 238.3930 (a) (227.4569 , 231.8056) (b) (220.8695 , 238.3930) -- 2.26 ; (a) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 5297.51250 5297.51250 506.51 <.0001 Error 14 146.42500 10.45893 Corrected Total 15 5443.93750 Root MSE 3.23403 R-Square 0.9731 Dependent Mean 225.56250 Adj R-Sq 0.9712 Coeff Var 1.43376 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 168.60000 2.65702 63.45 <.0001 X 1 2.03438 0.09039 22.51 <.0001 (b) 22.51^2 = 506.51. This is true only when we have a single numerical predictor. (c) Please skip this part. (d) R^2 = SSM/SST = 5297.51250/5443.93750 = 0.9731 Pearson correlation coefficient r = 0.9731^(1/2) = 0.9865. Again This is true only when we have a single numerical predictor. -- 3.6 Boxplot: residuals look symmetric with respect to 0. | +-----+ *--+--* | | +-----+ | Normal qqplot is roughly a straight line. Residuals vs predicted Y, residuals vs X, show no apparent pattern. -- 4.5(a) 90% Bonferroni intervals for two parameters, each interval should be 95%. Then critical value is T(1-0.05/2=0.975, n-2=14) = 2.145. From SAS output we will have the standard error of betas: Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 168.60000 2.65702 63.45 <.0001 X 1 2.03438 0.09039 22.51 <.0001 Intervals are For slope b1: 2.03438 +/- 2.145 * 0.09039 For intercept b0: 168.60000 +/- 2.145 * 2.65702 -- 4.9 (a) Now it is 90% Bonferroni for 3 predicted response values. Obtain the predicted values and their standard errors from SAS output, but ignore the intervals: Dependent Predicted Std Error Obs Variable Value Mean Predict 95% CL Mean 17 . 209.2875 1.0847 206.9610 211.6140 18 . 229.6313 0.8285 227.8544 231.4081 19 . 249.9750 1.3529 247.0733 252.8767 90% for 3 intervals, means (100-10/3)=96.7% for each interval. The critical value is then T(1-0.033/2=0.9833, n-2=14) = 2.360. 90% Bonferronni intervals are: 209.2875 +/- 2.360 * 1.0847 for X=20 209.2875 +/- 2.360 * 0.8285 for X=30 209.2875 +/- 2.360 * 1.3529 for X=40 ***************************************************************************** ** CH06PR15.DAT ; ** 6.15 ; (b) Pearson Correlation Coefficients, N = 46 Y X1 X2 X3 Y 1.00000 -0.78676 -0.60294 -0.64459 X1 -0.78676 1.00000 0.56795 0.56968 X2 -0.60294 0.56795 1.00000 0.67053 X3 -0.64459 0.56968 0.67053 1.00000 With variables somewhat correlated, a regression model may work. (c) Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 158.49125 18.12589 8.74 <.0001 X1 1 -1.14161 0.21480 -5.31 <.0001 X2 1 -0.44200 0.49197 -0.90 0.3741 X3 1 -13.47016 7.09966 -1.90 0.0647 Yhat = 158.49125 -1.14161*X1 -0.44200*X2 -13.47016*X3 Intepretation for b2: When X1 and X3 stay with the same values, and X2 increases by 1 unit, reponse Y will drop by 0.4420. (d) From proc univariate of residuals, we don't have unusually large or small residuals. (e) Normal qqplot shows a reasonably straight line. All residual plots look ok. We don't need to add interaction terms in the regression model. -- 6.16 (a) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 9120.46367 3040.15456 30.05 <.0001 Error 42 4248.84068 101.16287 Corrected Total 45 13369 H0: beta3 = beta1 = beta2 = 0 Ha: at least one beta is not 0. Since p-value < 0.0001 < alpha=0.05, reject H0. There is a regression relationship. (b) 90% Bonferroni for 3 betas, again means 96.7% for each interval. N=46. Critical value is T(1-0.033/2=0.9833,n-3-1=42)=2.1995. Obtain SEs from SAS output: Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 158.49125 18.12589 8.74 <.0001 X1 1 -1.14161 0.21480 -5.31 <.0001 X2 1 -0.44200 0.49197 -0.90 0.3741 X3 1 -13.47016 7.09966 -1.90 0.0647 beta1: -1.14161 +/- 2.1995 * 0.21480 = (-1.6141, -0.6691) beta2: -0.44200 +/- 2.1995 * 0.49197 = (-1.5242, 0.6402) beta3: -13.47016 +/- 2.1995 * 7.09966 = (-29.0860,2.1456) Two intervals contain 0, we can drop X2 and X3 from the regression model and keep X1. (c) R^2 = SSM / SST = 9120.46367/13369 = 0.6822 (pdf solution file has a typo here). Interpretation of R^2: 68.22% of the variation in Y are explained by the three predictors X1, X2, and X3 together. -- 6.17 (a) Dependent Predicted Std Error Obs Variable Value Mean Predict 90% CL Mean 47 . 69.0103 2.6646 64.5285 73.4920 The regression line with 90% confidence will fall into (64.5285, 73.4920), when X1=35, X2=45 and X3=2.2 (b) Dependent Predicted Std Error Obs Variable Value Mean Predict 90% CL Predict 47 . 69.0103 2.6646 51.5097 86.5109 For a future observation with X1=35, X2=45 and X3=2.2, the response Y with 90% confidence will be in (51.5097, 86.5109) -- 7.5 Please check out the pdf solution file for more details: Parameter Standard Variable DF Estimate Error t Value Pr > |t| Type I SS Intercept 1 158.49125 18.12589 8.74 <.0001 174353 X2 1 -0.44200 0.49197 -0.90 0.3741 4860.26000 X1 1 -1.14161 0.21480 -5.31 <.0001 3896.04414 X3 1 -13.47016 7.09966 -1.90 0.0647 364.15952 Test dropX3 Results for Dependent Variable Y Mean Source DF Square F Value Pr > F Numerator 1 364.15952 3.60 0.0647 Denominator 42 101.16287 -- 7.14 (a) Have to run multiple proc reg: ** model Y = X1 / pcorr1; give us Squared Parameter Standard Partial Variable DF Estimate Error t Value Pr > |t| Corr Type I Intercept 1 119.94317 7.08475 16.93 <.0001 . X1 1 -1.52060 0.17985 -8.45 <.0001 0.61898 R^2_Y1 = 0.61898 ** model Y = X2 X1 / pcorr1; give us Squared Parameter Standard Partial Variable DF Estimate Error t Value Pr > |t| Corr Type I Intercept 1 156.67186 18.63964 8.41 <.0001 . X2 1 -0.92079 0.43489 -2.12 0.0401 0.36354 X1 1 -1.26765 0.21035 -6.03 <.0001 0.45787 R^2_Y1|2 = 0.45787 ** model Y = X3 X2 X1 / pcorr1; give us Squared Parameter Standard Partial Variable DF Estimate Error t Value Pr > |t| Corr Type I Intercept 1 158.49125 18.12589 8.74 <.0001 . X3 1 -13.47016 7.09966 -1.90 0.0647 0.41550 X2 1 -0.44200 0.49197 -0.90 0.3741 0.09060 X1 1 -1.14161 0.21480 -5.31 <.0001 0.40211 R^2_Y1|23 = 0.40211 -- 9.9 (a,b) Number in Adjusted Model C(p) R-Square R-Square AIC Variables in Model 2 2.8072 0.6761 0.6610 215.0607 X1 X3 3 4.0000 0.6822 0.6595 216.1850 X1 X2 X3 2 5.5997 0.6550 0.6389 217.9676 X1 X2 1 8.3536 0.6190 0.6103 220.5294 X1 2 30.2471 0.4685 0.4437 237.8450 X2 X3 1 35.2456 0.4155 0.4022 240.2137 X3 1 42.1123 0.3635 0.3491 244.1312 X2 (c) Summary of Forward Selection Variable Number Partial Model Step Entered Vars In R-Square R-Square C(p) F Value Pr > F 1 X1 1 0.6190 0.6190 8.3536 71.48 <.0001 2 X3 2 0.0571 0.6761 2.8072 7.58 0.0086 3 X2 3 0.0061 0.6822 4.0000 0.81 0.3741 -- 10.11 (a) Bonferroni cut-off for studentized deleted residuals T(1-0.1/(2*46)=0.998913, (46-1)-3-1=41) = 3.27 Any smaller than -3.27 or greater than 3.27 will be considered as an outlier. As a rule of thumb, we can simple use 3 as cut-off. ... ... ... Hat Diag Cov ------------------DFBETAS----------------- Obs RStudent H Ratio DFFITS Intercept X1 X2 X3 33 0.0616 0.0450 1.1527 0.0134 0.0072 -0.0028 -0.0010 -0.0043 34 -1.5422 0.0372 0.9128 -0.3030 -0.1812 -0.1409 0.1542 0.0172 35 0.0959 0.1030 1.2266 0.0325 -0.0182 0.0103 0.0041 0.0102 36 1.1763 0.0272 0.9913 0.1968 0.0560 -0.0637 -0.0120 0.0043 37 1.2278 0.1212 1.0846 0.4560 0.0626 -0.2760 -0.1769 0.3582 38 -0.5494 0.0706 1.1506 -0.1514 0.0321 0.1210 -0.0236 -0.0690 39 -0.9870 0.1810 1.2240 -0.4639 0.3366 -0.0180 -0.4045 0.2367 40 -0.5898 0.0869 1.1659 -0.1819 -0.0375 -0.0753 0.1177 -0.1080 41 1.1190 0.0380 1.0149 0.2223 -0.0702 0.0654 0.1035 -0.0976 42 -0.0954 0.1539 1.3003 -0.0407 0.0085 0.0307 -0.0247 0.0112 43 -1.4222 0.0610 0.9673 -0.3626 -0.0114 -0.0631 0.1794 -0.2491 44 1.3454 0.0509 0.9761 0.3116 0.0384 0.2095 0.0222 -0.1646 45 -0.5671 0.0726 1.1509 -0.1587 0.0499 0.0160 -0.1191 0.1102 46 1.0449 0.0832 1.0812 0.3147 0.1574 -0.0540 0.0296 -0.1788 47 . 0.3267 . . . . . . Please refer to pdf solution file. ******************************************************************************************** ** Merge CH08PR16.txt and CH01PR19.DAT -- 8.16 (a,b) model Y = X1 X2 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 2.19842 0.33886 6.49 <.0001 X1 1 0.03789 0.01285 2.95 0.0038 X2 1 -0.09430 0.11997 -0.79 0.4334 We fit two parallel lines. They have different intercept terms. For class X2=0, Yhat = 2.19842 + 0.03789 * X1 For class X2=1, Yhat = (2.1984 - 0.09430) + 0.03789 * X1 (c) t-value = -0.79, p-value=0.4334 . We can drop X2. (d) Please check out the SAS code for the plot. We will include an interaction term in 8.20 anyway. -- 8.20 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 3.22632 0.54943 5.87 <.0001 X1 1 -0.00276 0.02141 -0.13 0.8977 X2 1 -1.64958 0.67220 -2.45 0.0156 X12 1 0.06224 0.02649 2.35 0.0205 (a,b) p-value = 0.0205 < alpha = 0.05. We keep the interaction X1*X2. We have two non-parallel lines: For class X2=0, Yhat = 3.22632 - 0.00276 * X1 For class X2=1, Yhat = (3.22632-1.64958) + (0.06224-0.00276)*X1 ************************************************************************************** ** 16.10 17.11 18.7 -- 16.10 (a) plot suggest middle aged group is different than the other two groups. Sum of Source DF Squares Mean Square F Value Pr > F Model 2 316.7222222 158.3611111 63.60 <.0001 Error 33 82.1666667 2.4898990 Corrected Total 35 398.8888889 R-Square Coeff Var Root MSE Y Mean 0.794011 6.698808 1.577941 23.55556 Source DF Type I SS Mean Square F Value Pr > F A 2 316.7222222 158.3611111 63.60 <.0001 Tukey's Studentized Range (HSD) Test for Y NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ. Alpha 0.05 Error Degrees of Freedom 33 Error Mean Square 2.489899 Critical Value of Studentized Range 3.47019 Minimum Significant Difference 1.5807 Means with the same letter are not significantly different. Tukey Grouping Mean N A A 27.7500 12 m B 21.5000 12 y B B 21.4167 12 e -- Young and elderly are almost the same. ********* -- 17.11 (a) please refer to SAS code (b) A N Mean 99% Confidence Limits m 12 27.7500 26.5050 28.9950 y 12 21.5000 20.2550 22.7450 e 12 21.4167 20.1716 22.6617 (c) The GLM Procedure t Tests (LSD) for Y Comparisons significant at the 0.01 level are indicated by ***. Difference A Between 99% Confidence Comparison Means Limits e - y -0.0833 -1.8441 1.6774 (e,f) Tukey's Studentized Range (HSD) Test for Y NOTE: This test controls the Type I experimentwise error rate. Alpha 0.1 Error Degrees of Freedom 33 Error Mean Square 2.489899 Critical Value of Studentized Range 3.00649 Minimum Significant Difference 1.3695 Comparisons significant at the 0.1 level are indicated by ***. Difference A Between Simultaneous 90% Comparison Means Confidence Limits m - y 6.2500 4.8805 7.6195 *** m - e 6.3333 4.9638 7.7028 *** y - m -6.2500 -7.6195 -4.8805 *** y - e 0.0833 -1.2862 1.4528 e - m -6.3333 -7.7028 -4.9638 *** e - y -0.0833 -1.4528 1.2862 Bonferroni (Dunn) t Tests for Y NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than Tukey's for all pairwise comparisons. Alpha 0.1 Error Degrees of Freedom 33 Error Mean Square 2.489899 Critical Value of t 2.22091 Minimum Significant Difference 1.4307 Comparisons significant at the 0.1 level are indicated by ***. Difference A Between Simultaneous 90% Comparison Means Confidence Limits m - y 6.2500 4.8193 7.6807 *** m - e 6.3333 4.9026 7.7640 *** y - m -6.2500 -7.6807 -4.8193 *** y - e 0.0833 -1.3474 1.5140 e - m -6.3333 -7.7640 -4.9026 *** e - y -0.0833 -1.5140 1.3474 (d) Contrast DF Contrast SS Mean Square F Value Pr > F 2*u2-u1-u3 1 316.6805556 316.6805556 127.19 <.0001 -- 18.7 (a,b) Normality is not a big issue. Constant variance assumption does not hold.