Box Plus/Minus (BPM) is a box score-based metric for evaluating basketball players' quality and contribution to the team. It is the latest version of a stat previously called Advanced Statistical Plus/Minus; it is NOT a version of Adjusted Plus/Minus, which is a play-by-play regression metric.
BPM relies on a player's box score information and the team's overall performance to estimate a player's performance relative to league average. BPM is a per-100-possession stat, the same scale as Adjusted Plus/Minus: 0.0 is league average, +5 means the player is 5 points better than an average player over 100 possessions (which is about All-NBA level), -2 is replacement level, and -5 is really bad.
To get a feel for the scale:
Note: BPM does not take into account playing time – it is purely a rate stat. Thus, Durant playing 79% of available minutes with a +8.8 BPM was overall slightly more valuable than LeBron's +8.9 BPM for 73% of the available minutes, and both of them were way more valuable than Chris Paul, who missed quite a few games. That playing time aspect is handled by Value over Replacement Player (VORP), which is discussed below.
BPM was created to intentionally only use information that is available historically, going back to 1973-74. More recently there has been more information gathered, both in box scores and via play-by-play, but in order to create a stat with historical usefulness, those stats have been ignored for BPM. In other words – it is possible to create a better stat than BPM for measuring players, but difficult to make a better one that can also be used historically.
There are limitations on all box score stats – if the box score doesn't measure a particular contribution, a box-score-based metric can only approximate that contribution. This is not a great hindrance on the offensive side, as nearly everything of importance on offense is captured by the box score (only missing things like screen-setting), but on defense the box score is quite limited. Blocks, steals, and rebounds, along with minutes and what little information offensive numbers yield about defensive performance are all that is available. Such critical components of defense as positioning, communication, and the other factors that make Kevin Garnett and Tim Duncan elite on defense can't be captured, unfortunately.
What does this mean? Box Plus/Minus is good at measuring offense and solid overall, but the defensive numbers in particular should not be considered definitive. Look at the defensive values as a guide, but don't hesitate to discount them when a player is well known as a good or bad defender.
In order to create a box-score-based player evaluation metric, some basis for the weights given to each statistic must be chosen. A number of different "box-score" stats have been developed over the years: some of the more intricate and well-known include John Hollinger's PER (further explanation at ESPN), Justin Kubatko's Win Shares here at Basketball Reference, and Dave Berri's Wins Produced.
The different composite statistics use a variety of approaches, from pure empiricism to pure theory and even a mix of the two.
The approach followed with Box Plus/Minus leans toward the empirical side, following the concepts of a "Statistical Plus/Minus (SPM)" metric. Please read Neil Paine's review of SPM at the Basketball Reference Blog to understand the background of this approach, originally pioneered by Dan Rosenbaum as a sideline to his Adjusted Plus/Minus (APM) work.
BPM differs from earlier public SPM work as follows:
For more information on APM and RAPM, the stats that form the basis for the BPM regression, please read this Review of Adjusted Plus/Minus and Stabilization.
The complete (with historical numbers) BPM Excel Spreadsheet has been published online. Feel free to download, play around with it, get confused, and (maybe) follow along.
The basis of the BPM regression is an unweighted, 14-year RAPM sample kindly created for this use by Jeremias Engelmann, the creator of Real Plus/Minus and the public-domain leader in the use and calculation of RAPM. This data set includes data from the 2001-2014 NBA seasons. This version of RAPM does not include any box score information, which some of Jeremias' later work (Real Plus/Minus, aka xRAPM) does. Jeremias used a prior of about -2, weighting that to produce the minimum error in cross-validation within sample. This leads to bias among players with small sample sizes; the average player rating declines with lower minutes played, up until the point where the -2.0 prior starts dominating and bringing the result back toward -2.0.
For this reason, the BPM regression was weighted by player possessions played, with an adjustment applied to account for the bias from the prior. The weighting scheme looks like this:
Weight = Possessions*[Possessions/(Possessions + 5000 + 2000(if only 1 team-season played))]^2
The 5000 and 2000 terms were estimated to account for how strong the effect of the prior is on the RAPM output. Only players who had a final weight of over 250 were used, leading to a total of 961 players being included in the regression. They vary from Dirk Nowitzki with a weight of 134,271 down to Brandon Hunter with a weight of 252.
For input into the model, the Team-by-Team Advanced Statistics tables and the Adjusted Team Ratings data from Basketball Reference were used.
The regression for BPM was constructed upon the individual player advanced statistics (such as True Shooting Percentage (TS%) and Assist Percentage (AST%), with a linear team adjustment constant added so that the player ratings sum to the observed team rating. As mentioned, the target variable for this regression is the 14-year average RAPM's for 961 players. To run the regression, BPM's were calculated for each season, each player's values were averaged across the 14 seasons (weighted by minutes played), and the BPM variable weights were iterated to minimize the weighted squared error vs. the 14-year RAPM values.
Variable selection was done by hand, iterating through various models and interaction terms, gradually weeding out the least significant terms, with the objective of getting the lowest squared error while minimizing nonlinear terms and total number of variables. Many constructs were considered; the final product had very nearly the lowest error of all tested, while having relatively few variables and only logical nonlinear terms, reducing the possibility of overfitting or not being valid for outliers.
The final BPM equation looks like this:
Raw BPM = a*ReMPG + b*ORB% + c*DRB% + d*STL% + e*BLK% + f*AST% - g*USG%*TO% + h*USG%*(1-TO%)*[2*(TS% - TmTS%) + i*AST% + j*(3PAr - Lg3PAr) - k] + l*sqrt(AST%*TRB%)
Starting at the beginning: ReMPG, ORB%, DRB%, BLK%, STL%, and AST% are all nice, simple, linear terms. They are all context-sensitive input variables, in that they adjust for the number of opportunities a player had to produce the given stat – ORB%, for instance, is offensive rebounds divided by the number of offensive rebounding opportunities the player had.
There was some debate over whether including minutes per game (MPG) in the regression was reasonable. It was clear from the regression, however, that MPG adds to the accuracy of BPM, and that consideration outweighed the desire to be orthogonal from coaching choices. Essentially, the positive (and sizeable) coefficient for MPG means that coaches can see things that the box score can't and that they, in general, are correct in giving minutes to the players they do, even when they give more minutes than the other box score statistics would suggest. In addition, MPG serves as a proxy for quality of opponents – if a player plays 5 MPG, he's probably playing against scrubs, while a player playing 35 MPG is playing against LeBron. To reduce weird effects from players playing only a couple of games with very high MPG (the last two games of the season for a team resting for the playoffs, for instance), MPG was regressed with 4 games of 0 MPG to create the input term, here called ReMPG.
The rebounding terms are both linear. Various nonlinear formulations were considered, but proved not significant. Using total rebound percentage (TRB%) alone was considered, and even used for early iterations of the stat, but the final formulation yielded slightly better results with the types of rebounds split – which is consistent with the findings of other box score regressions.
Blocks and steals both logically should be linear terms. Steals should be more valuable, and indeed the regression found steals to be much more valuable.
Assist% is a linear term, but assists also figure in both of the interaction terms in the regression, so the specific coefficient for this linear term has no meaning by itself.
Turnover percentage times usage percentage yields turnovers per 100 team possessions, exactly the same scale as the steal percentage term. (Turnover percentage is defined as turnovers per possession used by the player.)
Those terms are not unusual. The scoring term, though, is rather unique.
Essentially, the scoring term is a [Usage * (points produced per possession – threshold points produced per possession)] term. If a player's efficiency is above a certain threshold value, any usage helps the team; below that value hurts the team. Usage is here multiplied by (1-TO%) to remove the usage that is just turnovers. This usage term is therefore just the player's shooting possessions divided by total number of team possessions.
Finally, a positive interaction term between rebounding and assists is included. This can be interpreted a number of ways – athleticism interacting with basketball awareness, size interacting with basketball skills, etc. This term was highly significant, and helped the overall fit of the regression quite a bit. Using the square root of the interaction both makes sense theoretically (maintaining a denominator of opportunities or possessions) and empirically (it is more significant and helps the overall fit of the regression more).
Each player's raw BPM is calculated through the above equation and is then adjusted at the team level.
The team's efficiency differential, adjusted for strength of schedule, is known. It is, by definition, the true sum of all players' contributions. BPM is adjusted such that the minute-weighted sum of individual players' BPM ratings on a team equals the team's rating times 120%. The team adjustment is simply a constant added to each player's raw BPM and is the same for every player on the team. The constant does 3 things: it adds the intercept to the BPM equation, it adjusts roughly at the team level for things that cannot be captured by the box score (primarily defense), and it also adjusts for strength of schedule. The formula for this adjustment looks like this:
BPM_Team_Adjustment = [Team_Rating*120% - S(Player_%Min*Player_RawBPM)]/5
So if a team has a rating of +8, and the player BPM terms sum to +7, the team adjustment, applied equally to all players on the team, would bump their rating by [(8)*120%-7]/5=+0.52.
Where did the 120% come from? Jeremias Engelmann has done extensive work on how lineups behave, and he discovered that lineups that are ahead in a game play worse, while lineups are behind play better – even if the exact same players are playing. Perhaps it's an effort thing? While the source is unclear, the effect is both significant and linear. He incorporates that effect into his RPM model, adjusting to a neutral environment, and BPM does the same. At the team level, a team that is always ahead is actually better than its final results indicate by about 20%, and a team that is usually behind is worse by 20%. Therefore, the team adjustment accounts for this effect.
What the 120% does, effectively, is to translate players from the team they are on onto a league average team, a team that is ahead and behind equal amounts during the season. Because of this adjustment, though, team-level analysis will need to divide that 120% back out – if a particular team would sum to +15 via adding up the BPM values, we would expect that team to actually perform at a +12.5 level – since they would usually be ahead in games. At the lineup level things are more tricky, since it is hard to predict how far ahead or behind a lineup typically is in normal game situations.
Note: For regular seasons before adjusted team offensive & defensive ratings are available (pre-1986), the team's raw efficiency differential is adjusted for strength of schedule using the Simple Rating System‘s SOS term. In the postseason, the team adjustment just ensures that 5 times the weighted average of individual BPM scores adds up to the team's raw efficiency differential, without an SOS adjustment (this may be tweaked in the future).
Thus you have the final BPM – a box score regression, coupled with a team adjustment. Such a regression will still be inaccurate in attributing value for things not measured by the box score, but unlike APM or RAPM, it is not plagued with multicollinearity/instability.
The actual coefficients for Box Plus/Minus are as follows:
Raw BPM = a*ReMPG + b*ORB% + c*DRB% + d*STL% + e*BLK% + f*AST% - g*USG%*TO% + h*USG%*(1-TO%)*[2*(TS% - TmTS%) + i*AST% + j*(3PAr - Lg3PAr) - k] + l*sqrt(AST%*TRB%)
Coeff. | Term | BPM Value | Variable Format |
---|---|---|---|
a | Regr. MPG | 0.123391 | 48.0 |
b | ORB% | 0.119597 | 100.0 |
c | DRB% | -0.151287 | 100.0 |
d | STL% | 1.255644 | 100.0 |
e | BLK% | 0.531838 | 100.0 |
f | AST% | -0.305868 | 100.0 |
g | TO%*USG% | 0.921292 | .000*100.0 |
h | Scoring | 0.711217 | |
USG% | 100.0 | ||
TO% | .000 | ||
TS% & TmTS% | .000 | ||
i | AST Interaction | 0.017022 | 100.0 |
j | 3PAr Interaction | 0.297639 | .000 |
k | Threshold Scoring | 0.213485 | |
l | sqrt(AST%*TRB%) | 0.725930 | 100.0*100.0 |
The 14 year RAPM dataset also includes offensive and defensive RAPM estimates. A second regression was run to estimate offensive and defensive BPM based on those values. This regression split the total BPM value calculated above into offensive and defensive BPMs, while still requiring the total to still equal basic BPM. It was tuned to minimize error on both the offensive and defensive RAPM values simultaneously.
Here are the tables of coefficients:
Raw O/D BPM = a*ReMPG + b*ORB% + c*DRB% + d*STL% + e*BLK% + f*AST% - g*USG%*TO% + h*USG%*(1-TO%)*[2*(TS% - TmTS%) + i*AST% + j*(3PAr - Lg3PAr) - k] + l*sqrt(AST%*TRB%)
Coeff. | Term | O/D BPM Value | Variable Format |
---|---|---|---|
a | Regr. MPG | 0.064448 | 48.0 |
b | ORB% | 0.211125 | 100.0 |
c | DRB% | -0.107545 | 100.0 |
d | STL% | 0.346513 | 100.0 |
e | BLK% | -0.052476 | 100.0 |
f | AST% | -0.041787 | 100.0 |
g | TO%*USG% | 0.932965 | .000*100.0 |
h | Scoring | 0.687359 | |
USG% | 100.0 | ||
TO% | .000 | ||
TS% & TmTS% | .000 | ||
i | AST Interaction | 0.007952 | 100.0 |
j | 3PAr Interaction | 0.374706 | .000 |
k | Threshold Scoring | -0.181891 | |
l | sqrt(AST%*TRB%) | 0.239862 | 100.0*100.0 |
A team adjustment is added to the results of this regression to force the team sum to equal the adjusted team offensive rating (above or below league average) similar to how the team adjustment is done for overall BPM.
Defensive BPM is simply overall BPM minus offensive BPM. The offensive BPM regression was tuned to minimize weighted squared error on both offensive and defensive RAPM simultaneously.
R^{2} is a measure of how well a regression or model fits the data it is built upon. A value of 0 means that the model doesn't explain the data at all, while a value of 1.0 means that the model explains the variation in the data perfectly. The weighted R^{2} results for the BPM regressions are:
Weighted R^{2} onto 14 year: | |||
---|---|---|---|
Stat | RAPM | O-RAPM | D-RAPM |
Box Plus/Minus | 0.661 | 0.460 | 0.194 |
Off. Box Plus/Minus | – | 0.791 | – |
Def. Box Plus/Minus | – | – | 0.620 |
RAPM, even a 14-year dataset, has some error associated with it, so even a theoretical "true" measure of performance would still show an R^{2} below 1.0.
Perhaps the best way to get a feel for the accuracy of a model is to look at it graphically. The results of all three regressions are presented here, with BPM on one axis and the RAPM basis on the other.
Some things to note on those charts:
Box Plus/Minus can be calculated back to 1974; before that time many stats weren't tabulated. Looking at historical results can help gain confidence in the regression's accuracy.
The top 10 seasons of since 1974, by Box Plus/Minus, are heavily populated by Michael Jordan and LeBron James – which certainly passes the smell test:
Rk | Year | Tm | Player | Age | MP | BPM |
---|---|---|---|---|---|---|
1 | 2009 | CLE | LeBron James | 24 | 3054 | 13.0 |
2 | 1989 | CHI | Michael Jordan | 25 | 3255 | 12.6 |
3 | 2010 | CLE | LeBron James | 25 | 2966 | 12.5 |
4 | 1988 | CHI | Michael Jordan | 24 | 3311 | 12.2 |
5 | 2013 | MIA | LeBron James | 28 | 2877 | 11.6 |
6 | 2008 | CLE | LeBron James | 23 | 3027 | 11.2 |
7 | 2009 | NOH | Chris Paul | 23 | 3002 | 11.2 |
8 | 2012 | MIA | LeBron James | 27 | 2326 | 11.0 |
9 | 1994 | SAS | David Robinson | 28 | 3241 | 10.9 |
10 | 1991 | CHI | Michael Jordan | 27 | 3034 | 10.8 |
The Offensive Box Plus/Minus list looks similar – there is a much wider distribution on offense than on defense, which is realistic but also exaggerated by the lack of defensive box score statistics.
Rk | Year | Tm | Player | Age | MP | OBPM |
---|---|---|---|---|---|---|
1 | 1989 | CHI | Michael Jordan | 25 | 3255 | 9.8 |
2 | 1988 | CHI | Michael Jordan | 24 | 3311 | 9.8 |
3 | 2003 | ORL | Tracy McGrady | 23 | 2954 | 9.8 |
4 | 1990 | CHI | Michael Jordan | 26 | 3197 | 9.7 |
5 | 2010 | CLE | LeBron James | 25 | 2966 | 9.7 |
6 | 2009 | CLE | LeBron James | 24 | 3054 | 9.4 |
7 | 2013 | MIA | LeBron James | 28 | 2877 | 9.2 |
8 | 2009 | NOH | Chris Paul | 23 | 3002 | 9.1 |
9 | 2008 | CLE | LeBron James | 23 | 3027 | 9.0 |
10 | 1991 | PHI | Charles Barkley | 27 | 2498 | 9.0 |
The defensive list is rather repetitive – apparently Ben Wallace was really good at playing defense? Defense is only partially captured by the box score, so elite defenders based on position and communication, like Kevin Garnett and Tim Duncan, will not be properly represented. The regression mathematically accounts for that, pulling all of the estimates closer to average.
Rk | Year | Tm | Player | Age | MP | DBPM |
---|---|---|---|---|---|---|
1 | 2003 | DET | Ben Wallace | 28 | 2873 | 7.0 |
4 | 2007 | CHI | Ben Wallace | 32 | 2697 | 6.8 |
1 | 2004 | DET | Ben Wallace | 29 | 3050 | 6.7 |
2 | 2007 | DEN | Marcus Camby | 32 | 2369 | 6.7 |
3 | 2008 | DEN | Marcus Camby | 33 | 2758 | 6.6 |
1 | 2002 | DET | Ben Wallace | 27 | 2921 | 6.5 |
1 | 1985 | UTA | Mark Eaton | 28 | 2813 | 6.5 |
5 | 1986 | WSB | Manute Bol | 23 | 2090 | 6.4 |
3 | 2006 | DET | Ben Wallace | 31 | 2890 | 6.4 |
2 | 1992 | SAS | David Robinson | 26 | 2564 | 6.2 |
Alex at Sports Skeptic compared ASPM (BPM's predecessor) and a number of box score metrics on his blog; it is a good read. BBR's Neil Paine also compared ASPM (BPM's predecessor) and other individual metrics in their ability to predict future team performance at APBRmetrics. These tests include the best measure of a metric's validity: its out-of-sample predictive ability. Neil also recently tested BPM in that way, though the results are not published currently, and it outperformed any other box-score stat, except that it equaled FiveThirtyEight.com's proprietary SPM, which is not published publicly.
Here is how BPM and some other common box score stats compare to the same 14 year average RAPM. The fact that this comparison is in sample for BPM inflates its R^{2} somewhat, but since this regression has relatively few degrees of freedom vs. the sample size, the inflation should not be too large. Note: in the previous exposition of this stat, Wins Produced was included in this table, and had an R^{2} on RAPM significantly worse than PER.
Weighted R^{2} onto 14 year: | |||
---|---|---|---|
Stat | RAPM | O-RAPM | D-RAPM |
Regr. Minutes/Game | 0.280 | 0.379 | 0.008 |
PER | 0.388 | 0.437 | 0.032 |
Win Shares/48 | 0.525 | 0.385 | 0.140 |
Off. Win Shares/48 | – | 0.571 | – |
Def. Win Shares/48 | – | – | 0.565 |
Box Plus/Minus | 0.661 | 0.460 | 0.194 |
Off. Box Plus/Minus | – | 0.791 | – |
Def. Box Plus/Minus | – | – | 0.620 |
RAPM, even a 14-year dataset, has some error associated with it, so even a theoretical "true" measure of performance would still show an R^{2} below 1.0.
Value over Replacement Player (VORP) converts the BPM rate into an estimate of each player's overall contribution to the team, measured vs. what a theoretical "replacement player" would provide, where the "replacement player" is defined as a player on minimum salary or not a normal member of a team's rotation. A long and comprehensive discussion on defining this level for the NBA was had at Tom Tango's blog, and is worth a read. (Tom Tango is a baseball sabermetrics expert, and one of the originators of the replacement level framework and the Wins Above Replacement methodology common now in baseball.)
The conclusion was to establish -2.0 as replacement level for the NBA, measured in terms of points above or below average per 100 possessions. Unlike in major league baseball, players below replacement level do frequently play, primarily for development purposes. Rookies are frequently below replacement level, but there are no formal minor leagues to act as a development system like major league baseball has, so they end up getting playing time in the NBA in order to develop. Also, some teams tank, and trade for Byron Mullens to help that effort.
If one were to define a "replacement level" for offense and defense, it would be -1.7 on offense and -0.3 on defense – though the concept of replacement level for components doesn't necessarily make sense. Almost all point guards would be well below the -0.3 level on defense, since a guard's role is primarily to focus on offense. The reverse is true of post players. It's an interesting exercise, but ultimately OVORP and DVORP aren't that useful, and will not be displayed here.
So, to calculate VORP, the formula is simply: [BPM – (-2.0)] * (% of minutes played)*(team games/82). This yields the number of points the player is producing over a replacement player, per 100 TEAM possessions over an entire season.
As an example: In 2014, LeBron had a BPM of +8.9, and he played 73% of Miami's minutes. His VORP, then, would be [8.9 – (-2.0)] * 0.73 * 82/82 = 8.0.
The beauty of VORP is that like WAR in baseball, it should track linearly with salary. A player with a VORP of 4.0 is worth, on the market, about twice what a player of VORP 2.0 is worth. Sometimes good players play few minutes for reasons outside their control, and would be worth more because they should be getting more minutes. Still, for a crude estimate, VORP is valuable. It does measure fairly accurately what a player did produce in terms of value for a given team.
To convert VORP to an estimate of wins over replacement, simply multiply by 2.7. This translates a player's point differential approximately into wins, using the conversion rate near league-average rather than that in the diminishing returns area of the Pythagorean formula. By this methodology, Michael Jordan in 1989 was worth about 32 wins. (In reality, he would quickly push the team into the diminishing returns region of the points-to-wins conversion.)
Here's a look at the top 10 seasons of all time by VORP. This is the best measure of actual value contributed to the team. Notice that Michael Jordan played a few more minutes per season than LeBron, thus increasing his overall value.
Rk | Year | Tm | Player | Age | MP | VORP |
---|---|---|---|---|---|---|
1 | 1989 | CHI | Michael Jordan | 25 | 3255 | 12.0 |
2 | 1988 | CHI | Michael Jordan | 24 | 3311 | 11.8 |
3 | 2009 | CLE | LeBron James | 24 | 3054 | 11.6 |
4 | 2010 | CLE | LeBron James | 25 | 2966 | 10.9 |
5 | 1994 | SAS | David Robinson | 28 | 3241 | 10.6 |
6 | 1990 | CHI | Michael Jordan | 26 | 3197 | 10.1 |
7 | 2008 | CLE | LeBron James | 23 | 3027 | 10.1 |
8 | 2009 | NOH | Chris Paul | 23 | 3002 | 10.0 |
9 | 2013 | MIA | LeBron James | 28 | 2877 | 9.8 |
10 | 1991 | CHI | Michael Jordan | 27 | 3034 | 9.8 |
Box Plus/Minus for the playoffs is calculated the same way as BPM for the regular season, which a few additions to derive an appropriate team efficiency differential.
The playoff team efficiency, which is used in the team adjustment portion of the BPM calculation, is derived as follows:
As an example, in 2013-2014 San Antonio won the title. They went through 7 games of Dallas (derived strength +4.1), 5 games of Portland (+6.5), 6 games of Oklahoma City (+10.5), and 5 games of Miami (+6.9), so their average strength of schedule was +6.9. Their raw efficiency differential was +10.0 in the playoffs, so their overall adjusted efficiency differential was a spectacular +16.9. That is the same scale as regular season efficiency differential–the Spurs were really dominant.
(This is reminiscent of Hollinger's playoff ratings. The 2001 Lakers had a playoff adjusted efficiency differential of +20.4 by this method.)
The top 10 playoff BPMs, minimum 500 minutes played in the playoffs:
Rk | Year | Tm | Player | BPM |
---|---|---|---|---|
1 | 2009 | CLE | LeBron James | 18.2 |
2 | 1977 | LAL | Kareem Abdul-Jabbar | 14.8 |
3 | 1990 | CHI | Michael Jordan | 14.3 |
4 | 1991 | CHI | Michael Jordan | 13.8 |
5 | 1989 | CHI | Michael Jordan | 12.8 |
6 | 1976 | NYA | Julius Erving | 12.5 |
7 | 2008 | NOH | Chris Paul | 12.2 |
8 | 1991 | PHI | Charles Barkley | 11.8 |
9 | 1975 | INA | George McGinnis | 11.6 |
10 | 2003 | SAS | Tim Duncan | 11.6 |
LeBron was transcendent in 2009.
Playoff VORP is calculated the same way as regular season VORP, but based on the playoff BPM values calculated above. Be aware–players on teams that played more games will have higher VORP values. A team that swept several rounds may play several games fewer than a team that was taken to seven games a couple of times.
Box Plus/Minus for college basketball is calculated using the same coefficients derived for the NBA. While it may be argued that college basketball is somewhat different, there is no easy way to derive BPM coefficients specifically for college basketball, so the NBA coefficients will have to suffice. A rebound is still a rebound... The coefficient that could be the most questionable would be the MPG coefficient, because it is unclear if minutes distribution at the college level is based upon the same criteria as at the pro level, and the length and pace of games is different. Until further information becomes available, all coefficients have been used as-is.
VORP, on the other hand, does not make sense for college basketball. VORP is derived based on salaries, and in a consistent market, and is primarily useful in relation to evaluating salaries. In college, on the other hand, every school and conference has widely disparate situations, and since there are no salaries, their is neither a rational method nor strong need for deriving or using VORP.
Therefore, BPM will be shown alone for college basketball.
Data for college basketball is currently available only back to the 2011 season. Here are the top 10 seasons in the database, minimum 500 minutes played.
Rk | Year | Team | Player | BPM | OBPM | DBPM |
---|---|---|---|---|---|---|
1 | 2012 | Kentucky | Anthony Davis | 18.7 | 7.8 | 10.9 |
2 | 2013 | Indiana | Victor Oladipo | 17.0 | 9.7 | 7.3 |
3 | 2013 | Louisville | Gorgui Dieng | 15.0 | 4.0 | 11.0 |
4 | 2014 | Kansas | Joel Embiid | 14.9 | 4.7 | 10.2 |
5 | 2014 | Kentucky | Willie Cauley-Stein | 14.9 | 4.1 | 10.8 |
6 | 2012 | Marquette | Jae Crowder | 14.7 | 8.8 | 5.9 |
7 | 2013 | Kentucky | Nerlens Noel | 14.6 | 2.9 | 11.7 |
8 | 2012 | Kansas | Jeff Withey | 13.8 | 2.7 | 11.1 |
9 | 2011 | Michigan State | Draymond Green | 13.6 | 6.3 | 7.3 |
10 | 2013 | Gonzaga | Mike Hart | 13.5 | 7.9 | 5.6 |
A few notes on BPM for college basketball:
This was a bug fix update to address a couple of issues. The most significant change is to correct the weighting scheme used in the regression. The original BPM regression used a sqrt(Poss) weighting, which was incorrect. This version corrects the scheme to a number of possessions weighting system, with an additional correction to account for the effect of the prior on the RAPM values.
This fix generally increases the spread in the BPM values slightly, with the best players getting a bump in the +0.5 range, while below average players saw a marginal reduction in their BPM values.
This fix also adjusted the values slightly, causing efficiency and blocks to be valued somewhat more highly and turnovers to be somewhat worse. Centers and other players with high shooting efficiency and blocks were helped the most by those tweaks (such as Anthony Davis), and players with high usage and relatively low efficiency (like Russell Westbrook) were hurt. The overall BPM gap between those helped the most and hurt the most was about 1.0. For example: 2014 Russell Westbrook saw his BPM drop by 0.2, but 2014 Anthony Davis saw his BPM increase by 0.8.
The other bug fixes were minor issues with the regression logic deriving the Offense/Defense split coefficients. There were two small bugs in the code that do not effect the overall BPM values, but will tweak the OBPM/DBPM split slightly. The effects of these corrections are small.
Comparison of the coefficients:
BPM:
Coeff. | Term | BPM 1.1 Value | Original Value | Difference | Percentage |
---|---|---|---|---|---|
a | Regr. MPG | 0.123391 | 0.120051 | 0.0033 | 2.8% |
b | ORB% | 0.119597 | 0.137600 | -0.0180 | -13.1% |
c | DRB% | -0.151287 | -0.151938 | 0.0007 | -0.4% |
d | STL% | 1.255644 | 1.144182 | 0.1115 | 9.7% |
e | BLK% | 0.531838 | 0.449468 | 0.0824 | 18.3% |
f | AST% | -0.305868 | -0.310548 | 0.0047 | -1.5% |
g | TO%*USG% | 0.921292 | 0.723784 | 0.1975 | 27.3% |
h | Scoring | 0.711217 | 0.610605 | 0.1006 | 16.5% |
i | AST Interaction | 0.017022 | 0.019936 | -0.0029 | -14.6% |
j | 3PAr Interaction | 0.297639 | 0.380536 | -0.0829 | -21.8% |
k | Threshold Scoring | 0.213485 | 0.269667 | -0.0562 | |
l | sqrt(AST%*TRB%) | 0.725930 | 0.691501 | 0.0344 | 5.0% |
OBPM/DBPM Split:
Coeff. | Term | O/D BPM 1.1 Value | Original Value | Difference | Percentage |
---|---|---|---|---|---|
a | Regr. MPG | 0.064448 | 0.059270 | 0.005178 | 8.7% |
b | ORB% | 0.211125 | 0.197487 | 0.013638 | 6.9% |
c | DRB% | -0.107545 | -0.102144 | -0.005401 | 5.3% |
d | STL% | 0.346513 | 0.322082 | 0.024431 | 7.6% |
e | BLK% | -0.052476 | -0.062684 | 0.010208 | -16.3% |
f | AST% | -0.041787 | -0.088460 | 0.046673 | -52.8% |
g | TO%*USG% | 0.932965 | 0.798831 | 0.134134 | 16.8% |
h | Scoring | 0.687359 | 0.606303 | 0.081056 | 13.4% |
i | AST Interaction | 0.007952 | 0.011822 | -0.003870 | -32.7% |
j | 3PAr Interaction | 0.374706 | 0.430225 | -0.055519 | -12.9% |
k | Threshold Scoring | -0.181891 | -0.126574 | -0.055317 | |
l | sqrt(AST%*TRB%) | 0.239862 | 0.262148 | -0.022286 | -8.5% |
This update also includes a revision to how VORP is handled for partial seasons. Previously, partial seasons would show the player's production extrapolated to the full 82 game season–VORP was behaving more as a rate stat. That has now been changed, so that VORP will now act as a counting stat over the season, with each 1 point being equal to 1 point of season-end team point differential (per 82 games).
This was critical also for handling playoff VORP, which has been put on the same scale: [BPM – (-2.0)] * (% of minutes played)*(team games/82).
This update also includes new sections on playoff BPM and BPM for NCAA Men's College Basketball.
A multitude of variables were evaluated via Excel's regression tools to find the best design for BPM. Here is the best regression: highest adjusted R^{2} with every term clearly significant. This is how the variables were finally chosen. Note, this is without the team adjustment – just raw box score stats vs. RAPM.
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.7297 | |||||
R Square | 0.5325 | |||||
Adjusted R Square | 0.5261 | |||||
Standard Error | 1.9504 | |||||
Observations | 894 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 12 | 3816.9 | 318.1 | 83.6 | 2.19E-136 | |
Residual | 881 | 3351.4 | 3.804 | |||
Total | 893 | 7168.3 | ||||
Coefficients | Std. Err. | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -5.6659 | 0.7287 | -7.7754 | 2.096E-14 | -7.0961 | -4.2358 |
AST% | -0.2642 | 0.0468 | -5.6484 | 2.185E-08 | -0.3560 | -0.1724 |
ORB% | 0.1366 | 0.0432 | 3.1600 | 1.632E-03 | 0.0518 | 0.2214 |
DRB% | -0.1278 | 0.0346 | -3.6912 | 2.369E-04 | -0.1957 | -0.0598 |
MPG | 0.1006 | 0.0133 | 7.5401 | 1.168E-13 | 0.0744 | 0.1267 |
STL% | 1.1382 | 0.1459 | 7.8005 | 1.741E-14 | 0.8518 | 1.4246 |
BLK% | 0.3936 | 0.0686 | 5.7349 | 1.341E-08 | 0.2589 | 0.5283 |
Shot% | -0.9725 | 0.0734 | -13.2545 | 1.073E-36 | -1.1165 | -0.8285 |
TS%xShot% | 1.4735 | 0.1225 | 12.0330 | 5.518E-31 | 1.2332 | 1.7138 |
TOV%xUSG% | -0.0075 | 0.0014 | -5.2554 | 1.854E-07 | -0.0103 | -0.0047 |
AST%xShot% | 0.0109 | 0.0021 | 5.2075 | 2.384E-07 | 0.0068 | 0.0150 |
sqrt(TRBxAST) | 0.6262 | 0.0769 | 8.1457 | 1.285E-15 | 0.4753 | 0.7771 |
3PAr*Shot% | 0.2281 | 0.0369 | 6.1795 | 9.819E-10 | 0.1556 | 0.3005 |
Other variables that were investigated but ultimately not selected:
Note that for the NBA from 1974-77, player turnovers were estimated according to the following process created by Justin Kubatko for use in Win Shares (here illustrated using Kareem Abdul-Jabbar of the 1976-77 Los Angeles Lakers as an example):
- 0.0005075172 * (minutes played) * (player age) - 0.0873982755 * (field goals) + 0.0925506598 * (field goal attempts) + 0.1566322510 * (free throw attempts) + 0.0449241773 * (total rebounds) + 0.2321637159 * (assists) + 0.2040169400 * (personal fouls)
Note that if this number is less than zero, then it should be rounded up to zero. Plugging Abdul-Jabbar's statistics into the formula above we get an estimate of 280.316 turnovers.
The APBRmetrics forum people have been instrumental in the development of this stat over the years since the idea was first posted as Advanced Statistical Plus/Minus back in 2009. (Also see the recovered thread.) Neil Paine in particular has helped often over the years and was an early proponent of ASPM in his writing at ESPN. Thanks to Jeremias Englemann, who provided the RAPM basis for ASPM and now BPM. Thanks also to editors Michael Myers, Curtis Buck, Kevin Ferrigan, and Andrew Johnson, and to David Corby and Neil Paine who did the database wrangling to make BPM a reality on Basketball Reference.
If you have any comments or questions about the BPM methodology, please contact Daniel directly or use the Basketball-Reference feedback form.