This is our old blog. It hasn't been active since 2011. Please see the link above for our current blog or click the logo above to see all of the great data and content on this site.

Alternatives to the BBR Rankings: The Maximum Likelihood Method

Posted by Neil Paine on January 13, 2010

If you didn't catch it the first time around (since I know all of you read the PFR Blog now that I occasionally post over there), I highly recommend that you check out this series of posts that Doug Drinen wrote about various computer ranking systems and the methods behind them:

A very simple ranking system (If you ever wondered where the ubiquitous SRS comes from, this is it)
Another ranking system
Another rating system: maximum likelihood

That last link is the topic I wanted to talk about today.

Every Friday in the BBR Rankings, I combine a pure won-lost rating with a strength of schedule component that factors in the point margin of each game. I combine the two this way because I think it's fair -- it rewards teams for wins and doesn't give undue credit to blowouts, while still acknowledging that the best indicator of a team's "true" strength is still its margin of victory/defeat. However, logically and mathematically, this method is not exactly the most rigorous one in the world. By combining the two elements in a somewhat arbitrary fashion, the aim of the rating is not crystal clear -- it's certainly not predictive (nor is it intended as such), but while I say it's retrodictive, it's not purely that either, because it does combine elements of predictiveness.

Obviously I'm still going to post them every week, but I also wanted to show you an alternative method that is the most purely retrodictive possible rating. It's called "maximum likelihood", and it seeks to find the set of team ratings that maximize retrodictive accuracy in the past.

Think about the way the season has progressed so far, starting with last night's game between Orlando and Sacramento. The Magic beat the Kings, which is a data point for any rating system to work with, and it implies that Orlando is better than Sacramento. Therefore, all else being equal, the system would seek to create a rating that ranked Orlando ahead of Sacramento. However, all else is not equal -- Orlando also has lost this season to Indiana, Washington, Utah, & Oklahoma City, all of whom Sacramento has beaten. Because the computer can't find a perfect ranking based on 100% internal consistency in the past, it can only maximize the rate at which it correctly retrodicts game results. It does this by establishing the probability of each win, and then multiplying these probabilities together for the entire season, producing the likelihood that, given a certain set of ratings, the season would have played out exactly the way it has in real life. In essence, we want to try different combinations of ratings until we maximize that likelihood, hence the name of the method.

If you want to know the math, for each game we'll assume the probability of the home team winning is p(hW) = exp(rH - rA + HC)/(exp(rH - rA + HC) + 1), where rH = the home team's rating, rA = the away team's rating, and HC = a home-court advantage term. When the home team wins in real life, the "likelihood" of the result is p(hW). When the road team wins, the likelihood is (1 - p(hW)). Now, instead of taking the product of the quotients involved, you can work with the natural logarithms of each individual game probability, and sum them for the entire season. The set of ratings that maximizes that sum is the set that best retrodicts the past. (If you have Excel, you can use the Solver tool to do this, telling it to maximize the sum of the natural logs by changing the team ratings and the home court term while keeping the sum of all ratings equal to zero.) This season, you get these ratings from the maximum likelihood method:

Team Rating W L WPct
CLE 1.32637 30 10 0.750
LAL 1.22496 29 9 0.763
BOS 1.07509 26 10 0.722
DAL 0.98798 25 12 0.676
ATL 0.81961 24 13 0.649
ORL 0.81689 26 12 0.684
PHO 0.80430 24 14 0.632
HOU 0.63461 21 17 0.553
DEN 0.50077 24 14 0.632
SAS 0.46512 23 13 0.639
POR 0.44561 23 16 0.590
OKC 0.40487 21 16 0.568
UTA 0.29150 21 17 0.553
NOH 0.21510 19 17 0.528
MEM 0.17557 19 18 0.514
TOR 0.11373 19 20 0.487
MIA 0.08994 18 18 0.500
LAC -0.12693 17 19 0.472
CHA -0.17451 17 19 0.472
CHI -0.31822 16 20 0.444
MIL -0.42008 15 20 0.429
SAC -0.42290 15 22 0.405
NYK -0.55804 15 22 0.405
DET -0.69794 12 25 0.324
GSW -0.78356 11 25 0.306
PHI -0.81500 12 25 0.324
WAS -0.83306 12 24 0.333
IND -0.96396 12 25 0.324
MIN -1.55545 8 31 0.205
NJN -2.72235 3 34 0.081
HCA 0.61325

(Note: No, they don't all add to zero, but the solver will find the solution that both maximizes the sum of the natural logs and gets the average as close as possible to zero.)

So these are the ratings that best "retrodict" the past. They are only concerned with past wins and losses (even the SOS adjustment and the HCA term is based purely on W-L), which is the polar opposite of the SRS, which only concerns itself with point differential and is chiefly interested in predicting future outcomes. And the BBR Rankings, I suppose, are a hybrid of both approaches. As always, the approach that's best depends on the philosophical goal you're trying to achieve with the rankings.

8 Responses to “Alternatives to the BBR Rankings: The Maximum Likelihood Method”

  1. Ryan J. Parker Says:

    Yay for MLE :)

    It might have just been easier to say that this is a logistic regression, but I like getting people to think about the idea of maximum likelihood.

  2. Ryan Says:

    Neil, I know this isn't the appropriate place to post this question, but I'm on the run. I'll get around to reading & responding to this blog tonight or tomorrow.

    I've noticed an increase, especially on BBR's blog, of measuring a player's greatness/talent by Win Shares. Win Shares, to my understanding, is incredibly team-reliant. Wouldn't it make more sense, perhaps, to measure a player's individual ability and impact by WS%? Adding a WS% column to BBR's advanced stats, IMO, would be beneficial. If Jordan's team won 50 games, while James' won 66, yet both have comparable WS then the disparity should be made readily available in an alternative form of WS (in this case, WS%).


    Unless of course, my understanding of WS is completely wrong.

  3. Walter Says:

    I love the use of MLE here. I think it would be interesting to use MLE with margin of victory instead of the binary win/loss. I doubt the results would be significantly different but it would be interesting none-the-less.

    Keep up the good work.

  4. Justin Kubatko Says:

    Ryan wrote:

    I've noticed an increase, especially on BBR's blog, of measuring a player's greatness/talent by Win Shares. Win Shares, to my understanding, is incredibly team-reliant.

    This is a common misperception. Two players (playing on different teams) with the same playing time, same number of possessions, same offensive rating, and same defensive rating will have the same number of Win Shares. Now, defensive rating does have a team component, but all in all I would not call Win Shares "incredibly team-reliant". More details are available here.

  5. Mike G Says:

    According to this page:

    Hedo Turkoglu, from '03 in Sac, to '04 in SA, to '05 in Orl, had his DRtg go from 102 to 94 to 110; his DWS from 1.5 to 4.5 to 1.0.
    DWS per 484 minutes (.50 = avg) moved from .62 to 1.05 to .28 in this time, while his OWS/484 rose steadily from .47 to .52 to .60 .

    Last year to this year, his DWS/484 has gone from .76 to .04, Orl to Tor.
    This is "incredibly team-reliant".

    Meanwhile, what about a column for WS per X minutes?

  6. Neil Paine Says:

    Mike, I don't really see how you can properly factor in defense as half of the game (as WS does) and not have fluctuations like that when a player moves from a good defensive team to a bad one. How does eWins handle defense? If it doesn't make a team adjustment at some point, it's essentially saying, "100% of a player's defensive ability is described by his blocks, steals, and DReb." I don't know how Justin feels, but I'd rather be "wrong" on a few players by adjusting the defensive component of the stat for the team's defensive rating than claim that blocks, steals, and DReb describe 100% of a player's defensive contribution.

  7. Mike G Says:

    "How does eWins handle defense? If it doesn't make a team adjustment at some point, it's essentially saying, '100% of a player's defensive ability is described by his blocks, steals, and DReb.'"

    eWins makes team adjustments all along the process. Points and assists are scaled to opponent points, rebounds to opp. reb.

    Monta Ellis is averaging 26 PPG, but for a team that allows 112 PPG. So, that 26 Pts are just (100/112) 89% as much a contribution as they'd be for an avg team. For Hou he might be expected to avg 23; for Cha, perhaps 21.5.

    eWins doesn't bother to distinguish between offense and defense per se. Rather, productivity is scaled to 'rest of the league' performance in the games a player is in : vs GSW, in Ellis' case.

  8. Jesse Says:

    How would one go about translating the power ratings given above to points? Expressing power ratings as points or wins or what-have-you... something real... seems to be more understandable than using a dimensionless number.

    The idea I had was to find the league-wide HCA, and then translate the power ratings into points based on that. So if the HCA league-wide is 3 points, then, for example, Cleveland's rating is +6.49 points.

    Also, I know that Neil said the ratings don't add to zero exactly, but if the sum is as close to zero as it is (0.00002), it's not worth worrying about.