A somewhat prevalent fallacy is misinterpreting the importance of regression coefficients based on their p-values. If the test \(H_{0} : \beta_2 = 0\) cannot be rejected, we may conclude \(X2\) has no purpose in the model. For example, if we saw the following regression output, we may consider \(X1\) to be important but \(X2\) to be unimportant in the model. (Note: The “e” refers to powers of 10. For example, 2.1e-5 = \(2.1 \times 10^{-5}\))

```
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -18.80 6.768 -2.78 7.83e-03
## X1 3.97 0.413 9.61 1.14e-12
## X2 3.25 2.441 1.33 1.90e-01
```

However, this gets dangerous in some situations if you don’t understand what’s going on behind the scenes. I will offer an example to illustrate this, followed by a brief explanation.

I’ve created a toy analysis where we have variables \(X1\) and \(X2\) and want to predict whether \(y\) belongs to class “black” or “red”. The data looks like this:

If we try to run this analysis with logistic regression (predicting the probability of red), we obtain the following output.

```
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -26.54 132487 -0.000200 1
## x1 -1.77 10556 -0.000168 1
## x2 1.77 3754 0.000472 1
```

By the common misunderstanding, no variable in this model matters! But it perfectly fits the data. For example, calculate the probability that y = “red” (\(y=1\)) when \(X1 = 0\) and \(X2\) = 30. We compute:

\[ \frac{\exp(-26.54 - 1.77 \times 0 + 1.77 \times 30)}{1 + \exp(-26.54 - 1.77 \times 0 + 1.77 \times 30)} \approx 1\]

The model predicts that this point is red with probability almost 1! This is easily verified by looking at the plot.

So what’s going on? Individually, neither coordinate \(X1\) nor \(X2\) can help you predict black versus red. If all I tell you is \(X1 = 5\), can you tell me if \(y\) is red or black? Definitely not, and nor could you with only \(X2\) (you might argue that a higher \(X2\) leads you to choose red, but just imagine the plot keeps going). The point is, neither variable tells you anything, but together they tell you the full story.

In general, it’s very important to remember joint tests! We really need to see if \(H_0: \beta_1 = \beta_2 = 0\). Without doing so, we cannot and should not conclude that “both variables are useless in this model.”

For those who have taken enough statistics, we can (approximate) the joint test if we look at the full output:

```
##
## Call:
## glm(formula = y ~ x1 + x2, family = "binomial", data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.92e-06 -2.40e-06 -8.49e-08 2.31e-06 3.06e-06
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -26.54 132487.24 0 1
## x1 -1.77 10556.20 0 1
## x2 1.77 3754.16 0 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 5.5452e+01 on 39 degrees of freedom
## Residual deviance: 2.3261e-10 on 37 degrees of freedom
## AIC: 6
##
## Number of Fisher Scoring iterations: 25
```

The difference between the null deviance and residual deviance is 5.5452e+01 - 2.3261e-10 \(\approx 55.42\). The p-value from the \(\chi^2\) distribution on 2 degrees of freedom (39 - 37) is approximately zero. So we reject the null that both coefficients are simultaneously zero. That is, we reject the null that the model doesn’t fit.