About Box Plus/Minus (BPM)
About Box Plus/Minus (BPM)
What is Box Plus/Minus?
Box Plus/Minus (BPM) is a box score-based metric for evaluating basketball players' quality and contribution to the team. It is the latest version of a stat previously called Advanced Statistical Plus/Minus; it is NOT a version of Adjusted Plus/Minus, which is a play-by-play regression metric.
BPM relies on a player's box score information and the team's overall performance to estimate a player's performance relative to league average. BPM is a per-100-possession stat, the same scale as Adjusted Plus/Minus: 0.0 is league average, +5 means the player is 5 points better than an average player over 100 possessions (which is about All-NBA level), -2 is replacement level, and -5 is really bad.
To get a feel for the scale:
- The greatest seasons of all time by BPM are LeBron James' 2009 and 2010 seasons, and Michael Jordan's 1989 tour-de-force. All of those seasons had BPMs between +12.5 and +13.0.
- Some players who over their career were about average (+0.0) include Stephen Jackson, Kurt Thomas, Leandro Barbosa, James Donaldson, and Channing Frye.
- The best player by BPM in 2013/14 was LeBron James, at +8.9, just above MVP Kevin Durant's +8.8. Kevin Love was close behind at +8.3, and Stephen Curry (+7.4) and Chris Paul (+7.4) round out the top five.
- Some players at or near +0.0 (average) from the 2013-14 NBA season include: Monta Ellis, Martell Webster, Iman Shumpert, Roy Hibbert, Nene Hilario, Ray Allen, Terrence Ross, and J.R. Smith.
- The worst player who played significant minutes in 2013-14 was Dennis Schroder of Atlanta, with a very poor -8.3 rating. Anthony Bennett, the surprise 2013 draft number 1 pick, followed with a -7.3.
Note: BPM does not take into account playing time – it is purely a rate stat. Thus, Durant playing 79% of available minutes with a +8.8 BPM was overall slightly more valuable than LeBron's +8.9 BPM for 73% of the available minutes, and both of them were way more valuable than Chris Paul, who missed quite a few games. That playing time aspect is handled by Value over Replacement Player (VORP), which is discussed below.
BPM was created to intentionally only use information that is available historically, going back to 1973-74. More recently there has been more information gathered, both in box scores and via play-by-play, but in order to create a stat with historical usefulness, those stats have been ignored for BPM. In other words – it is possible to create a better stat than BPM for measuring players, but difficult to make a better one that can also be used historically.
There are limitations on all box score stats – if the box score doesn't measure a particular contribution, a box-score-based metric can only approximate that contribution. This is not a great hindrance on the offensive side, as nearly everything of importance on offense is captured by the box score (only missing things like screen-setting), but on defense the box score is quite limited. Blocks, steals, and rebounds, along with minutes and what little information offensive numbers yield about defensive performance are all that is available. Such critical components of defense as positioning, communication, and the other factors that make Kevin Garnett and Tim Duncan elite on defense can't be captured, unfortunately.
What does this mean? Box Plus/Minus is good at measuring offense and solid overall, but the defensive numbers in particular should not be considered definitive. Look at the defensive values as a guide, but don't hesitate to discount them when a player is well known as a good or bad defender.
The Concept of Box Plus/Minus
In order to create a box-score-based player evaluation metric, some basis for the weights given to each statistic must be chosen. A number of different "box-score" stats have been developed over the years: some of the more intricate and well-known include John Hollinger's PER (further explanation at ESPN), Justin Kubatko's Win Shares here at Basketball Reference, and Dave Berri's Wins Produced.
The different composite statistics use a variety of approaches, from pure empiricism to pure theory and even a mix of the two.
The approach followed with Box Plus/Minus leans toward the empirical side, following the concepts of a "Statistical Plus/Minus (SPM)" metric. Please read Neil Paine's review of SPM at the Basketball Reference Blog to understand the background of this approach, originally pioneered by Dan Rosenbaum as a sideline to his Adjusted Plus/Minus (APM) work.
BPM differs from earlier public SPM work as follows:
- A much longer-term APM sample is used in the regression model (reducing random error)
- Ridge-Regressed APM (RAPM) is used rather than pure APM (further reducing random error, albeit introducing slightly more bias, in that all players are pulled toward the population mean of 0.)
- "Advanced Box Score" measures are used rather than simple points and FGAs (advanced rates are more accurate and less skewed by context)
- A couple of nonlinear interactions are modeled – but only ones that are highly statistically significant and make sense.
For more information on APM and RAPM, the stats that form the basis for the BPM regression, please read this Review of Adjusted Plus/Minus and Stabilization.
Calculating Box Plus/Minus
Background
The complete (with historical numbers) BPM Excel Spreadsheet has been published online. Feel free to download, play around with it, get confused, and (maybe) follow along.
The basis of the BPM regression is an unweighted, 14-year RAPM sample kindly created for this use by Jeremias Engelmann, the creator of Real Plus/Minus and the public-domain leader in the use and calculation of RAPM. This data set includes data from the 2001-2014 NBA seasons. This version of RAPM does not include any box score information, which some of Jeremias' later work (Real Plus/Minus, aka xRAPM) does. Jeremias used a prior of about -2, weighting that to produce the minimum error in cross-validation within sample. This leads to bias among players with small sample sizes; the average player rating declines with lower minutes played, up until the point where the -2.0 prior starts dominating and bringing the result back toward -2.0.
For this reason, the BPM regression was weighted by player possessions played, with an adjustment applied to account for the bias from the prior. The weighting scheme looks like this:
Weight = Possessions*[Possessions/(Possessions + 5000 + 2000(if only 1 team-season played))]^2
The 5000 and 2000 terms were estimated to account for how strong the effect of the prior is on the RAPM output. Only players who had a final weight of over 250 were used, leading to a total of 961 players being included in the regression. They vary from Dirk Nowitzki with a weight of 134,271 down to Brandon Hunter with a weight of 252.
For input into the model, the Team-by-Team Advanced Statistics tables and the Adjusted Team Ratings data from Basketball Reference were used.
Derivation
The regression for BPM was constructed upon the individual player advanced statistics (such as True Shooting Percentage (TS%) and Assist Percentage (AST%), with a linear team adjustment constant added so that the player ratings sum to the observed team rating. As mentioned, the target variable for this regression is the 14-year average RAPM's for 961 players. To run the regression, BPM's were calculated for each season, each player's values were averaged across the 14 seasons (weighted by minutes played), and the BPM variable weights were iterated to minimize the weighted squared error vs. the 14-year RAPM values.
Variable selection was done by hand, iterating through various models and interaction terms, gradually weeding out the least significant terms, with the objective of getting the lowest squared error while minimizing nonlinear terms and total number of variables. Many constructs were considered; the final product had very nearly the lowest error of all tested, while having relatively few variables and only logical nonlinear terms, reducing the possibility of overfitting or not being valid for outliers.
The Equation
The final BPM equation looks like this:
Raw BPM = a*ReMPG + b*ORB% + c*DRB% + d*STL% + e*BLK% + f*AST% - g*USG%*TO% + h*USG%*(1-TO%)*[2*(TS% - TmTS%) + i*AST% + j*(3PAr - Lg3PAr) - k] + l*sqrt(AST%*TRB%)
Starting at the beginning: ReMPG, ORB%, DRB%, BLK%, STL%, and AST% are all nice, simple, linear terms. They are all context-sensitive input variables, in that they adjust for the number of opportunities a player had to produce the given stat – ORB%, for instance, is offensive rebounds divided by the number of offensive rebounding opportunities the player had.
There was some debate over whether including minutes per game (MPG) in the regression was reasonable. It was clear from the regression, however, that MPG adds to the accuracy of BPM, and that consideration outweighed the desire to be orthogonal from coaching choices. Essentially, the positive (and sizeable) coefficient for MPG means that coaches can see things that the box score can't and that they, in general, are correct in giving minutes to the players they do, even when they give more minutes than the other box score statistics would suggest. In addition, MPG serves as a proxy for quality of opponents – if a player plays 5 MPG, he's probably playing against scrubs, while a player playing 35 MPG is playing against LeBron. To reduce weird effects from players playing only a couple of games with very high MPG (the last two games of the season for a team resting for the playoffs, for instance), MPG was regressed with 4 games of 0 MPG to create the input term, here called ReMPG.
The rebounding terms are both linear. Various nonlinear formulations were considered, but proved not significant. Using total rebound percentage (TRB%) alone was considered, and even used for early iterations of the stat, but the final formulation yielded slightly better results with the types of rebounds split – which is consistent with the findings of other box score regressions.
Blocks and steals both logically should be linear terms. Steals should be more valuable, and indeed the regression found steals to be much more valuable.
Assist% is a linear term, but assists also figure in both of the interaction terms in the regression, so the specific coefficient for this linear term has no meaning by itself.
Turnover percentage times usage percentage yields turnovers per 100 team possessions, exactly the same scale as the steal percentage term. (Turnover percentage is defined as turnovers per possession used by the player.)
Those terms are not unusual. The scoring term, though, is rather unique.
Essentially, the scoring term is a [Usage * (points produced per possession – threshold points produced per possession)] term. If a player's efficiency is above a certain threshold value, any usage helps the team; below that value hurts the team. Usage is here multiplied by (1-TO%) to remove the usage that is just turnovers. This usage term is therefore just the player's shooting possessions divided by total number of team possessions.
- 2*TS% is the player's points per shooting possession.
- -2*TmTS% adjusts for the team's average shooting, without the player included. If a player is shooting 50% on a terrible team, he's a better player than if he's shooting 50% on a great team.
- i*AST% gives a positive value to assists multiplied by usage – basically, a player's assists are worth more if he is also finishing possessions.
- j*(3PAr – Lg3PAr) gives a positive value for spacing and additional offensive rebounds off of 3 pointers. To help the regression work well historically, this is normalized vs. league average.
- -k is the constant threshold value.
Finally, a positive interaction term between rebounding and assists is included. This can be interpreted a number of ways – athleticism interacting with basketball awareness, size interacting with basketball skills, etc. This term was highly significant, and helped the overall fit of the regression quite a bit. Using the square root of the interaction both makes sense theoretically (maintaining a denominator of opportunities or possessions) and empirically (it is more significant and helps the overall fit of the regression more).
Each player's raw BPM is calculated through the above equation and is then adjusted at the team level.
The Team Summation
The team's efficiency differential, adjusted for strength of schedule, is known. It is, by definition, the true sum of all players' contributions. BPM is adjusted such that the minute-weighted sum of individual players' BPM ratings on a team equals the team's rating times 120%. The team adjustment is simply a constant added to each player's raw BPM and is the same for every player on the team. The constant does 3 things: it adds the intercept to the BPM equation, it adjusts roughly at the team level for things that cannot be captured by the box score (primarily defense), and it also adjusts for strength of schedule. The formula for this adjustment looks like this:
BPM_Team_Adjustment = [Team_Rating*120% - S(Player_%Min*Player_RawBPM)]/5
So if a team has a rating of +8, and the player BPM terms sum to +7, the team adjustment, applied equally to all players on the team, would bump their rating by [(8)*120%-7]/5=+0.52.
Where did the 120% come from? Jeremias Engelmann has done extensive work on how lineups behave, and he discovered that lineups that are ahead in a game play worse, while lineups are behind play better – even if the exact same players are playing. Perhaps it's an effort thing? While the source is unclear, the effect is both significant and linear. He incorporates that effect into his RPM model, adjusting to a neutral environment, and BPM does the same. At the team level, a team that is always ahead is actually better than its final results indicate by about 20%, and a team that is usually behind is worse by 20%. Therefore, the team adjustment accounts for this effect.
What the 120% does, effectively, is to translate players from the team they are on onto a league average team, a team that is ahead and behind equal amounts during the season. Because of this adjustment, though, team-level analysis will need to divide that 120% back out – if a particular team would sum to +15 via adding up the BPM values, we would expect that team to actually perform at a +12.5 level – since they would usually be ahead in games. At the lineup level things are more tricky, since it is hard to predict how far ahead or behind a lineup typically is in normal game situations.
Note: For regular seasons before adjusted team offensive & defensive ratings are available (pre-1986), the team's raw efficiency differential is adjusted for strength of schedule using the Simple Rating System‘s SOS term. In the postseason, the team adjustment just ensures that 5 times the weighted average of individual BPM scores adds up to the team's raw efficiency differential, without an SOS adjustment (this may be tweaked in the future).
Thus you have the final BPM – a box score regression, coupled with a team adjustment. Such a regression will still be inaccurate in attributing value for things not measured by the box score, but unlike APM or RAPM, it is not plagued with multicollinearity/instability.
The Coefficients
The actual coefficients for Box Plus/Minus are as follows:
Raw BPM = a*ReMPG + b*ORB% + c*DRB% + d*STL% + e*BLK% + f*AST% - g*USG%*TO% + h*USG%*(1-TO%)*[2*(TS% - TmTS%) + i*AST% + j*(3PAr - Lg3PAr) - k] + l*sqrt(AST%*TRB%)
Coeff. | Term | BPM Value | Variable Format |
---|---|---|---|
a | Regr. MPG | 0.123391 | 48.0 |
b | ORB% | 0.119597 | 100.0 |
c | DRB% | -0.151287 | 100.0 |
d | STL% | 1.255644 | 100.0 |
e | BLK% | 0.531838 | 100.0 |
f | AST% | -0.305868 | 100.0 |
g | TO%*USG% | 0.921292 | .000*100.0 |
h | Scoring | 0.711217 | |
USG% | 100.0 | ||
TO% | .000 | ||
TS% & TmTS% | .000 | ||
i | AST Interaction | 0.017022 | 100.0 |
j | 3PAr Interaction | 0.297639 | .000 |
k | Threshold Scoring | 0.213485 | |
l | sqrt(AST%*TRB%) | 0.725930 | 100.0*100.0 |
Offensive and Defensive BPM
The 14 year RAPM dataset also includes offensive and defensive RAPM estimates. A second regression was run to estimate offensive and defensive BPM based on those values. This regression split the total BPM value calculated above into offensive and defensive BPMs, while still requiring the total to still equal basic BPM. It was tuned to minimize error on both the offensive and defensive RAPM values simultaneously.
Here are the tables of coefficients:
Offensive BPM
Raw O/D BPM = a*ReMPG + b*ORB% + c*DRB% + d*STL% + e*BLK% + f*AST% - g*USG%*TO% + h*USG%*(1-TO%)*[2*(TS% - TmTS%) + i*AST% + j*(3PAr - Lg3PAr) - k] + l*sqrt(AST%*TRB%)
Coeff. | Term | O/D BPM Value | Variable Format |
---|---|---|---|
a | Regr. MPG | 0.064448 | 48.0 |
b | ORB% | 0.211125 | 100.0 |
c | DRB% | -0.107545 | 100.0 |
d | STL% | 0.346513 | 100.0 |
e | BLK% | -0.052476 | 100.0 |
f | AST% | -0.041787 | 100.0 |
g | TO%*USG% | 0.932965 | .000*100.0 |
h | Scoring | 0.687359 | |
USG% | 100.0 | ||
TO% | .000 | ||
TS% & TmTS% | .000 | ||
i | AST Interaction | 0.007952 | 100.0 |
j | 3PAr Interaction | 0.374706 | .000 |
k | Threshold Scoring | -0.181891 | |
l | sqrt(AST%*TRB%) | 0.239862 | 100.0*100.0 |
A team adjustment is added to the results of this regression to force the team sum to equal the adjusted team offensive rating (above or below league average) similar to how the team adjustment is done for overall BPM.
Defensive BPM
Defensive BPM is simply overall BPM minus offensive BPM. The offensive BPM regression was tuned to minimize weighted squared error on both offensive and defensive RAPM simultaneously.
Evaluating Box Plus/Minus
R^{2} is a measure of how well a regression or model fits the data it is built upon. A value of 0 means that the model doesn't explain the data at all, while a value of 1.0 means that the model explains the variation in the data perfectly. The weighted R^{2} results for the BPM regressions are:
Weighted R^{2} onto 14 year: | |||
---|---|---|---|
Stat | RAPM | O-RAPM | D-RAPM |
Box Plus/Minus | 0.661 | 0.460 | 0.194 |
Off. Box Plus/Minus | – | 0.791 | – |
Def. Box Plus/Minus | – | – | 0.620 |
RAPM, even a 14-year dataset, has some error associated with it, so even a theoretical "true" measure of performance would still show an R^{2} below 1.0.
Perhaps the best way to get a feel for the accuracy of a model is to look at it graphically. The results of all three regressions are presented here, with BPM on one axis and the RAPM basis on the other.
Some things to note on those charts:
- Players with extreme outlier status in their box score stats seem to not be quite as accurate in their ratings.
- Elite defensive players are not very well captured by the box score, such as Kevin Garnett, Luol Deng, and Andrew Bogut.
- Some point guards, in particular, may be underrated on offense – Steve Nash and Mike Conley, for instance.
Looking at Historical Results
Box Plus/Minus can be calculated back to 1974; before that time many stats weren't tabulated. Looking at historical results can help gain confidence in the regression's accuracy.
The top 10 seasons of since 1974, by Box Plus/Minus, are heavily populated by Michael Jordan and LeBron James – which certainly passes the smell test:
Rk | Year | Tm | Player | Age | MP | BPM |
---|---|---|---|---|---|---|
1 | 2009 | CLE | LeBron James | 24 | 3054 | 13.0 |
2 | 1989 | CHI | Michael Jordan | 25 | 3255 | 12.6 |
3 | 2010 | CLE | LeBron James | 25 | 2966 | 12.5 |
4 | 1988 | CHI | Michael Jordan | 24 | 3311 | 12.2 |
5 | 2013 | MIA | LeBron James | 28 | 2877 | 11.6 |
6 | 2008 | CLE | LeBron James | 23 | 3027 | 11.2 |
7 | 2009 | NOH | Chris Paul | 23 | 3002 | 11.2 |
8 | 2012 | MIA | LeBron James | 27 | 2326 | 11.0 |
9 | 1994 | SAS | David Robinson | 28 | 3241 | 10.9 |
10 | 1991 | CHI | Michael Jordan | 27 | 3034 | 10.8 |
The Offensive Box Plus/Minus list looks similar – there is a much wider distribution on offense than on defense, which is realistic but also exaggerated by the lack of defensive box score statistics.
Rk | Year | Tm | Player | Age | MP | OBPM |
---|---|---|---|---|---|---|
1 | 1989 | CHI | Michael Jordan | 25 | 3255 | 9.8 |
2 | 1988 | CHI | Michael Jordan | 24 | 3311 | 9.8 |
3 | 2003 | ORL | Tracy McGrady | 23 | 2954 | 9.8 |
4 | 1990 | CHI | Michael Jordan | 26 | 3197 | 9.7 |
5 | 2010 | CLE | LeBron James | 25 | 2966 | 9.7 |
6 | 2009 | CLE | LeBron James | 24 | 3054 | 9.4 |
7 | 2013 | MIA | LeBron James | 28 | 2877 | 9.2 |
8 | 2009 | NOH | Chris Paul | 23 | 3002 | 9.1 |
9 | 2008 | CLE | LeBron James | 23 | 3027 | 9.0 |
10 | 1991 | PHI | Charles Barkley | 27 | 2498 | 9.0 |
The defensive list is rather repetitive – apparently Ben Wallace was really good at playing defense? Defense is only partially captured by the box score, so elite defenders based on position and communication, like Kevin Garnett and Tim Duncan, will not be properly represented. The regression mathematically accounts for that, pulling all of the estimates closer to average.
Rk | Year | Tm | Player | Age | MP | DBPM |
---|---|---|---|---|---|---|
1 | 2003 | DET | Ben Wallace | 28 | 2873 | 7.0 |
4 | 2007 | CHI | Ben Wallace | 32 | 2697 | 6.8 |
1 | 2004 | DET | Ben Wallace | 29 | 3050 | 6.7 |
2 | 2007 | DEN | Marcus Camby | 32 | 2369 | 6.7 |
3 | 2008 | DEN | Marcus Camby | 33 | 2758 | 6.6 |
1 | 2002 | DET | Ben Wallace | 27 | 2921 | 6.5 |
1 | 1985 | UTA | Mark Eaton | 28 | 2813 | 6.5 |
5 | 1986 | WSB | Manute Bol | 23 | 2090 | 6.4 |
3 | 2006 | DET | Ben Wallace | 31 | 2890 | 6.4 |
2 | 1992 | SAS | David Robinson | 26 | 2564 | 6.2 |
Comparison to Other Stats
Alex at Sports Skeptic compared ASPM (BPM's predecessor) and a number of box score metrics on his blog; it is a good read. BBR's Neil Paine also compared ASPM (BPM's predecessor) and other individual metrics in their ability to predict future team performance at APBRmetrics. These tests include the best measure of a metric's validity: its out-of-sample predictive ability. Neil also recently tested BPM in that way, though the results are not published currently, and it outperformed any other box-score stat, except that it equaled FiveThirtyEight.com's proprietary SPM, which is not published publicly.
Here is how BPM and some other common box score stats compare to the same 14 year average RAPM. The fact that this comparison is in sample for BPM inflates its R^{2} somewhat, but since this regression has relatively few degrees of freedom vs. the sample size, the inflation should not be too large. Note: in the previous exposition of this stat, Wins Produced was included in this table, and had an R^{2} on RAPM significantly worse than PER.
Weighted R^{2} onto 14 year: | |||
---|---|---|---|
Stat | RAPM | O-RAPM | D-RAPM |
Regr. Minutes/Game | 0.280 | 0.379 | 0.008 |
PER | 0.388 | 0.437 | 0.032 |
Win Shares/48 | 0.525 | 0.385 | 0.140 |
Off. Win Shares/48 | – | 0.571 | – |
Def. Win Shares/48 | – | – | 0.565 |
Box Plus/Minus | 0.661 | 0.460 | 0.194 |
Off. Box Plus/Minus | – | 0.791 | – |
Def. Box Plus/Minus | – | – | 0.620 |
RAPM, even a 14-year dataset, has some error associated with it, so even a theoretical "true" measure of performance would still show an R^{2} below 1.0.
Value over Replacement Player
Value over Replacement Player (VORP) converts the BPM rate into an estimate of each player's overall contribution to the team, measured vs. what a theoretical "replacement player" would provide, where the "replacement player" is defined as a player on minimum salary or not a normal member of a team's rotation. A long and comprehensive discussion on defining this level for the NBA was had at Tom Tango's blog, and is worth a read. (Tom Tango is a baseball sabermetrics expert, and one of the originators of the replacement level framework and the Wins Above Replacement methodology common now in baseball.)
The conclusion was to establish -2.0 as replacement level for the NBA, measured in terms of points above or below average per 100 possessions. Unlike in major league baseball, players below replacement level do frequently play, primarily for development purposes. Rookies are frequently below replacement level, but there are no formal minor leagues to act as a development system like major league baseball has, so they end up getting playing time in the NBA in order to develop. Also, some teams tank, and trade for Byron Mullens to help that effort.
If one were to define a "replacement level" for offense and defense, it would be -1.7 on offense and -0.3 on defense – though the concept of replacement level for components doesn't necessarily make sense. Almost all point guards would be well below the -0.3 level on defense, since a guard's role is primarily to focus on offense. The reverse is true of post players. It's an interesting exercise, but ultimately OVORP and DVORP aren't that useful, and will not be displayed here.
So, to calculate VORP, the formula is simply: [BPM – (-2.0)] * (% of minutes played)*(team games/82). This yields the number of points the player is producing over a replacement player, per 100 TEAM possessions over an entire season.
As an example: In 2014, LeBron had a BPM of +8.9, and he played 73% of Miami's minutes. His VORP, then, would be [8.9 – (-2.0)] * 0.73 * 82/82 = 8.0.
The beauty of VORP is that like WAR in baseball, it should track linearly with salary. A player with a VORP of 4.0 is worth, on the market, about twice what a player of VORP 2.0 is worth. Sometimes good players play few minutes for reasons outside their control, and would be worth more because they should be getting more minutes. Still, for a crude estimate, VORP is valuable. It does measure fairly accurately what a player did produce in terms of value for a given team.
To convert VORP to an estimate of wins over replacement, simply multiply by 2.7. This translates a player's point differential approximately into wins, using the conversion rate near league-average rather than that in the diminishing returns area of the Pythagorean formula. By this methodology, Michael Jordan in 1989 was worth about 32 wins. (In reality, he would quickly push the team into the diminishing returns region of the points-to-wins conversion.)
Here's a look at the top 10 seasons of all time by VORP. This is the best measure of actual value contributed to the team. Notice that Michael Jordan played a few more minutes per season than LeBron, thus increasing his overall value.
Rk | Year | Tm | Player | Age | MP | VORP |
---|---|---|---|---|---|---|
1 | 1989 | CHI | Michael Jordan | 25 | 3255 | 12.0 |
2 | 1988 | CHI | Michael Jordan | 24 | 3311 | 11.8 |
3 | 2009 | CLE | LeBron James | 24 | 3054 | 11.6 |
4 | 2010 | CLE | LeBron James | 25 | 2966 | 10.9 |
5 | 1994 | SAS | David Robinson | 28 | 3241 | 10.6 |
6 | 1990 | CHI | Michael Jordan | 26 | 3197 | 10.1 |
7 | 2008 | CLE | LeBron James | 23 | 3027 | 10.1 |
8 | 2009 | NOH | Chris Paul | 23 | 3002 | 10.0 |
9 | 2013 | MIA | LeBron James | 28 | 2877 | 9.8 |
10 | 1991 | CHI | Michael Jordan | 27 | 3034 | 9.8 |
Playoff Box Plus/Minus and VORP
Box Plus/Minus for the playoffs is calculated the same way as BPM for the regular season, which a few additions to derive an appropriate team efficiency differential.
The playoff team efficiency, which is used in the team adjustment portion of the BPM calculation, is derived as follows:
- Use playoff minutes distribution for each team along with each player's regular season BPM value to generate a "playoff team strength." In general, playoff rotations are shortened, so teams are stronger than in the regular season. If a player had fewer than 200 minutes of regular season play on the given team, they were assumed to be replacement level for this calculation.
- Count games played against each opponent in the playoffs.
- Strength of Schedule is the average opponent team rating for the duration of the playoffs.
- Add actual playoff efficiency differential to the calculated strength of schedule to get the adjusted team efficiency differential used in the BPM calculation.
As an example, in 2013-2014 San Antonio won the title. They went through 7 games of Dallas (derived strength +4.1), 5 games of Portland (+6.5), 6 games of Oklahoma City (+10.5), and 5 games of Miami (+6.9), so their average strength of schedule was +6.9. Their raw efficiency differential was +10.0 in the playoffs, so their overall adjusted efficiency differential was a spectacular +16.9. That is the same scale as regular season efficiency differential–the Spurs were really dominant.
(This is reminiscent of Hollinger's playoff ratings. The 2001 Lakers had a playoff adjusted efficiency differential of +20.4 by this method.)
The top 10 playoff BPMs, minimum 500 minutes played in the playoffs:
Rk | Year | Tm | Player | BPM |
---|---|---|---|---|
1 | 2009 | CLE | LeBron James | 18.2 |
2 | 1977 | LAL | Kareem Abdul-Jabbar | 14.8 |
3 | 1990 | CHI | Michael Jordan | 14.3 |
4 | 1991 | CHI | Michael Jordan | 13.8 |
5 | 1989 | CHI | Michael Jordan | 12.8 |
6 | 1976 | NYA | Julius Erving | 12.5 |
7 | 2008 | NOH | Chris Paul | 12.2 |
8 | 1991 | PHI | Charles Barkley | 11.8 |
9 | 1975 | INA | George McGinnis | 11.6 |
10 | 2003 | SAS | Tim Duncan | 11.6 |
LeBron was transcendent in 2009.
Playoff VORP is calculated the same way as regular season VORP, but based on the playoff BPM values calculated above. Be aware–players on teams that played more games will have higher VORP values. A team that swept several rounds may play several games fewer than a team that was taken to seven games a couple of times.
College Basketball Box Plus/Minus
Box Plus/Minus for college basketball is calculated using the same coefficients derived for the NBA. While it may be argued that college basketball is somewhat different, there is no easy way to derive BPM coefficients specifically for college basketball, so the NBA coefficients will have to suffice. A rebound is still a rebound... The coefficient that could be the most questionable would be the MPG coefficient, because it is unclear if minutes distribution at the college level is based upon the same criteria as at the pro level, and the length and pace of games is different. Until further information becomes available, all coefficients have been used as-is.
VORP, on the other hand, does not make sense for college basketball. VORP is derived based on salaries, and in a consistent market, and is primarily useful in relation to evaluating salaries. In college, on the other hand, every school and conference has widely disparate situations, and since there are no salaries, their is neither a rational method nor strong need for deriving or using VORP.
Therefore, BPM will be shown alone for college basketball.
Data for college basketball is currently available only back to the 2011 season. Here are the top 10 seasons in the database, minimum 500 minutes played.
Rk | Year | Team | Player | BPM | OBPM | DBPM |
---|---|---|---|---|---|---|
1 | 2012 | Kentucky | Anthony Davis | 18.7 | 7.8 | 10.9 |
2 | 2013 | Indiana | Victor Oladipo | 17.0 | 9.7 | 7.3 |
3 | 2013 | Louisville | Gorgui Dieng | 15.0 | 4.0 | 11.0 |
4 | 2014 | Kansas | Joel Embiid | 14.9 | 4.7 | 10.2 |
5 | 2014 | Kentucky | Willie Cauley-Stein | 14.9 | 4.1 | 10.8 |
6 | 2012 | Marquette | Jae Crowder | 14.7 | 8.8 | 5.9 |
7 | 2013 | Kentucky | Nerlens Noel | 14.6 | 2.9 | 11.7 |
8 | 2012 | Kansas | Jeff Withey | 13.8 | 2.7 | 11.1 |
9 | 2011 | Michigan State | Draymond Green | 13.6 | 6.3 | 7.3 |
10 | 2013 | Gonzaga | Mike Hart | 13.5 | 7.9 | 5.6 |
A few notes on BPM for college basketball:
- Big men tend to rank more highly than guards–it appears that a big man can dominate on defense more in college than in the NBA. In college, there are some ridiculous block rates for elite centers. Are they overrated, or does this reflect reality? There is no easy way to know for sure.
- Beware of partial season results. Because of imbalanced schedules, with many easy games early in the season for top teams, players who compile great stats early in the season (along with their whole team) but then get hurt and miss conference play, where their teammates' stats drop down, will often see inflated BPM numbers because their numbers look so much better than their teammates.
- Beware of crazy outliers. Mike Hart, above, is one. A "quintessential glue guy", he never shot the ball at all, but made the few shots he did shoot. He never, ever turned the ball over, either, but rebounded, passed, got a lot of steals…. His numbers are stretching the interaction terms in BPM past their breaking point, particularly on offense. He had just enough minutes to qualify for the list above.
Updates
Box Plus/Minus Version 1.1: January 7, 2015:
This was a bug fix update to address a couple of issues. The most significant change is to correct the weighting scheme used in the regression. The original BPM regression used a sqrt(Poss) weighting, which was incorrect. This version corrects the scheme to a number of possessions weighting system, with an additional correction to account for the effect of the prior on the RAPM values.
This fix generally increases the spread in the BPM values slightly, with the best players getting a bump in the +0.5 range, while below average players saw a marginal reduction in their BPM values.
This fix also adjusted the values slightly, causing efficiency and blocks to be valued somewhat more highly and turnovers to be somewhat worse. Centers and other players with high shooting efficiency and blocks were helped the most by those tweaks (such as Anthony Davis), and players with high usage and relatively low efficiency (like Russell Westbrook) were hurt. The overall BPM gap between those helped the most and hurt the most was about 1.0. For example: 2014 Russell Westbrook saw his BPM drop by 0.2, but 2014 Anthony Davis saw his BPM increase by 0.8.
The other bug fixes were minor issues with the regression logic deriving the Offense/Defense split coefficients. There were two small bugs in the code that do not effect the overall BPM values, but will tweak the OBPM/DBPM split slightly. The effects of these corrections are small.
Comparison of the coefficients:
BPM:
Coeff. | Term | BPM 1.1 Value | Original Value | Difference | Percentage |
---|---|---|---|---|---|
a | Regr. MPG | 0.123391 | 0.120051 | 0.0033 | 2.8% |
b | ORB% | 0.119597 | 0.137600 | -0.0180 | -13.1% |
c | DRB% | -0.151287 | -0.151938 | 0.0007 | -0.4% |
d | STL% | 1.255644 | 1.144182 | 0.1115 | 9.7% |
e | BLK% | 0.531838 | 0.449468 | 0.0824 | 18.3% |
f | AST% | -0.305868 | -0.310548 | 0.0047 | -1.5% |
g | TO%*USG% | 0.921292 | 0.723784 | 0.1975 | 27.3% |
h | Scoring | 0.711217 | 0.610605 | 0.1006 | 16.5% |
i | AST Interaction | 0.017022 | 0.019936 | -0.0029 | -14.6% |
j | 3PAr Interaction | 0.297639 | 0.380536 | -0.0829 | -21.8% |
k | Threshold Scoring | 0.213485 | 0.269667 | -0.0562 | |
l | sqrt(AST%*TRB%) | 0.725930 | 0.691501 | 0.0344 | 5.0% |
OBPM/DBPM Split:
Coeff. | Term | O/D BPM 1.1 Value | Original Value | Difference | Percentage |
---|---|---|---|---|---|
a | Regr. MPG | 0.064448 | 0.059270 | 0.005178 | 8.7% |
b | ORB% | 0.211125 | 0.197487 | 0.013638 | 6.9% |
c | DRB% | -0.107545 | -0.102144 | -0.005401 | 5.3% |
d | STL% | 0.346513 | 0.322082 | 0.024431 | 7.6% |
e | BLK% | -0.052476 | -0.062684 | 0.010208 | -16.3% |
f | AST% | -0.041787 | -0.088460 | 0.046673 | -52.8% |
g | TO%*USG% | 0.932965 | 0.798831 | 0.134134 | 16.8% |
h | Scoring | 0.687359 | 0.606303 | 0.081056 | 13.4% |
i | AST Interaction | 0.007952 | 0.011822 | -0.003870 | -32.7% |
j | 3PAr Interaction | 0.374706 | 0.430225 | -0.055519 | -12.9% |
k | Threshold Scoring | -0.181891 | -0.126574 | -0.055317 | |
l | sqrt(AST%*TRB%) | 0.239862 | 0.262148 | -0.022286 | -8.5% |
This update also includes a revision to how VORP is handled for partial seasons. Previously, partial seasons would show the player's production extrapolated to the full 82 game season–VORP was behaving more as a rate stat. That has now been changed, so that VORP will now act as a counting stat over the season, with each 1 point being equal to 1 point of season-end team point differential (per 82 games).
This was critical also for handling playoff VORP, which has been put on the same scale: [BPM – (-2.0)] * (% of minutes played)*(team games/82).
This update also includes new sections on playoff BPM and BPM for NCAA Men's College Basketball.
Footnotes
Selecting Variables
A multitude of variables were evaluated via Excel's regression tools to find the best design for BPM. Here is the best regression: highest adjusted R^{2} with every term clearly significant. This is how the variables were finally chosen. Note, this is without the team adjustment – just raw box score stats vs. RAPM.
SUMMARY OUTPUT | ||||||
Regression Statistics | ||||||
Multiple R | 0.7297 | |||||
R Square | 0.5325 | |||||
Adjusted R Square | 0.5261 | |||||
Standard Error | 1.9504 | |||||
Observations | 894 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 12 | 3816.9 | 318.1 | 83.6 | 2.19E-136 | |
Residual | 881 | 3351.4 | 3.804 | |||
Total | 893 | 7168.3 | ||||
Coefficients | Std. Err. | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | -5.6659 | 0.7287 | -7.7754 | 2.096E-14 | -7.0961 | -4.2358 |
AST% | -0.2642 | 0.0468 | -5.6484 | 2.185E-08 | -0.3560 | -0.1724 |
ORB% | 0.1366 | 0.0432 | 3.1600 | 1.632E-03 | 0.0518 | 0.2214 |
DRB% | -0.1278 | 0.0346 | -3.6912 | 2.369E-04 | -0.1957 | -0.0598 |
MPG | 0.1006 | 0.0133 | 7.5401 | 1.168E-13 | 0.0744 | 0.1267 |
STL% | 1.1382 | 0.1459 | 7.8005 | 1.741E-14 | 0.8518 | 1.4246 |
BLK% | 0.3936 | 0.0686 | 5.7349 | 1.341E-08 | 0.2589 | 0.5283 |
Shot% | -0.9725 | 0.0734 | -13.2545 | 1.073E-36 | -1.1165 | -0.8285 |
TS%xShot% | 1.4735 | 0.1225 | 12.0330 | 5.518E-31 | 1.2332 | 1.7138 |
TOV%xUSG% | -0.0075 | 0.0014 | -5.2554 | 1.854E-07 | -0.0103 | -0.0047 |
AST%xShot% | 0.0109 | 0.0021 | 5.2075 | 2.384E-07 | 0.0068 | 0.0150 |
sqrt(TRBxAST) | 0.6262 | 0.0769 | 8.1457 | 1.285E-15 | 0.4753 | 0.7771 |
3PAr*Shot% | 0.2281 | 0.0369 | 6.1795 | 9.819E-10 | 0.1556 | 0.3005 |
Other variables that were investigated but ultimately not selected:
- Quadratic, cubic, power, and other transformations of TRB%, along with linear TRB% alone.
- Free throw rate.
- All standard interaction terms between the primary variables listed above.
- All quadratic terms of primary variables listed above (e.g. USG%^{2}). Usage squared was actually in the precursor to BPM, known as Advanced Statistical Plus/Minus (ASPM), but using the revised basis and better techniques it ended up not being significant.
- Height would have been significant (it helps the R^{2} by 0.01) but was not chosen based on principle. It does not directly reflect any performance on the court, and could bias results – Yao Ming could be a problem for a regression including it!
Estimating Pre-1978 Turnovers
Note that for the NBA from 1974-77, player turnovers were estimated according to the following process created by Justin Kubatko for use in Win Shares (here illustrated using Kareem Abdul-Jabbar of the 1976-77 Los Angeles Lakers as an example):
- Obtain an initial estimate of the player's turnovers. To do this use the following formula:
- 0.0005075172 * (minutes played) * (player age) - 0.0873982755 * (field goals) + 0.0925506598 * (field goal attempts) + 0.1566322510 * (free throw attempts) + 0.0449241773 * (total rebounds) + 0.2321637159 * (assists) + 0.2040169400 * (personal fouls)
Note that if this number is less than zero, then it should be rounded up to zero. Plugging Abdul-Jabbar's statistics into the formula above we get an estimate of 280.316 turnovers.
- Find the sum of estimated turnovers for the players on the given team. The sum for the players on the 1976-77 Lakers is 1448.057.
- Calculate the player's share of this total. Abdul-Jabbar's share of the team total is 280.316 / 1448.057 = 0.194.
- Multiply the team's turnovers (adjusted for team turnovers) by the player's share. As mentioned, the NBA tracked turnovers at the team level in these seasons. However, the team totals include team turnovers (i.e., turnovers that are not attributed to an individual player). Thus, we multiply the team's turnovers by 0.985, then multiply this adjusted figure by the player's share. For Abdul-Jabbar this is 1538 * 0.985 * 0.194 = 293.9, which we round up to 294.
Acknowledgements
The APBRmetrics forum people have been instrumental in the development of this stat over the years since the idea was first posted as Advanced Statistical Plus/Minus back in 2009. (Also see the recovered thread.) Neil Paine in particular has helped often over the years and was an early proponent of ASPM in his writing at ESPN. Thanks to Jeremias Englemann, who provided the RAPM basis for ASPM and now BPM. Thanks also to editors Michael Myers, Curtis Buck, Kevin Ferrigan, and Andrew Johnson, and to David Corby and Neil Paine who did the database wrangling to make BPM a reality on Basketball Reference.
Feedback
If you have any comments or questions about the BPM methodology, please contact Daniel directly or use the Basketball-Reference feedback form.
We're Social...for Statheads