What statistics or accomplishments have the Hall of Fame voters deemed to be most important? This question can be answered using a technique called logistic regression. The logistic regression model is a binary response model where the response is classified as either a "success" (in this case, being elected to the Hall of Fame) or a "failure" (not being elected to the Hall of Fame). One or more predictor variables are selected and the resulting model can be used to predict the probability of a success given certain values of the predictor(s).
For the Hall of Fame problem, I tried to use as many predictor variables as I could think of, but I did not use statistics that have not been kept for most of the NBA's history (e.g., steals). My player pool consisted of players who had played a minimum of 400 NBA games and had been eligible for at least one Hall of Fame election. After trying numerous models, my final model had eight predictor variables:
All of the predictors listed above were significant at the 0.03 level except for log(NBA MVP awards won + 0.5), which had a P-value of 0.1408. However, every NBA MVP award winner who is eligible for the Hall of Fame has been elected, so I thought it was important to keep that term. Other than height, all of the predictors had positive coefficients. ABA statistics, honors, and championships were not important predictors of Hall of Fame status, which is why I only used NBA statistics in my final model. I don't like ignoring the ABA statistics, but that's what the voters have apparently done. Keep in mind that my goal was not to determine who should be in the Hall of Fame, but rather who is likely to be in the Hall of Fame.
The table below gives the parameter estimates of the coefficients for each of the seven predictors:
height -0.1719 last season indicator 4.4667 NBA points per game 0.5013 NBA rebounds per game 0.3397 NBA assists per game 0.4356 NBA All-Star game selections 0.5744 log(NBA MVP awards won + 0.5) 5.2637 NBA championships won 1.0975
The parameter estimates given in the previous section can be used to obtain the predicted probability of Hall of Fame election for a particular player. I will go through an example using Jo Jo White. Find the values of the eight predictor variables for White, multiply them by the coefficients given in the table above, and find the sum of the products:
height -0.1719 * 75 = -12.8925
last season indicator 4.4667 * 0 = 0
NBA points per game 0.5013 * 17.20 = 8.6224
NBA rebounds per game 0.3397 * 4.00 = 1.3588
NBA assists per game 0.4356 * 4.89 = 2.1301
NBA All-Star game selections 0.5744 * 7 = 4.0208
log(NBA MVP awards won + 0.5) 5.2637 * log(0 + 0.5) = -3.6485
NBA championships won 1.0975 * 2 = 2.1950
---------------------------------------------------------------
1.7861
To find the predicted probability of Hall of Fame election, do the following:
P(HoF election) = e**1.7861 / (1 + e**1.7861)
= 0.8564
Based on Jo Jo White's statistics and accomplishments, the probability that he has been elected to the Hall of Fame is 0.86.
Hall of Fame probabilities are presented for all players with a minimum of 400 NBA games played. Although it can be risky to make predictions for active players you can think of these probabilities as answering the question "If this player retired today, what is the probability he would be elected to the Hall of Fame?". The model was built using a pool of 668 players. The jackknife method was used to assess classification accuracy. To implement this method each observation is temporarily held out and the selected model is fit to the remaining cases. This leads to an estimated HoF probability for the case which is compared to the actual result. Of the 668 players, 78 had been elected to the Hall of Fame and 590 had not. If the player's predicted probability of election using the jackknife method was greater than or equal to 0.5, I predicted he was in the Hall of Fame. Of the 78 players in the Hall of Fame, 61 were correctly classified (78.2%) and 17 were not (21.8%). Of the 590 players not in the Hall of Fame, 579 were correctly classified (98.1%) and 11 were not (1.9%). Overall, 640 of the 668 players (95.8%) were correctly classified by the model using the jackknife method.