This is our old blog. It hasn't been active since 2011. Please see the link above for our current blog or click the logo above to see all of the great data and content on this site.

Great Expectations, Part III

Posted by Neil Paine on November 4, 2008

That's right, it's all expectations all the time here. To refresh everyone's memories, on Friday we looked at a very simple way to set up preseason expectations for each team using a linear regression model with the previous season's SRS. Then, yesterday we took that same dataset and added team minute-weighted age as a variable, which helped to (marginally) improve the model's fit.

Today, we're going to work on something a commenter named Mountain suggested last week, which is the incorporation of month-by-month performance as a predictor of improvement/decline in the following season. The idea here is pretty simple: we would expect a "breakout" team in year Y (the year we're setting expectations for) to have at least given some hint of their future greatness in the previous season (year Y-1). As the assumption goes, young, improving teams should theoretically get better as the season goes on; consequently, teams that play their best in the second half (specifically the months of March/April) may be good candidates to make "the leap" in the following season.

Is this true, though? Well, just like we did when testing the significance of SRS and age in year Y-1, we can build a regression model that uses month-by-month performance as variables and see if it improves our ability to create accurate expectations for year Y. And that's exactly what I did: I took every NBA season since 1962-63 and found each team's average margin of victory/defeat for each month of the season. Out of necessity, I lumped October games in with those of November, and grouped March & April into one combined "month" (teams didn't start playing regular-season games in April until 1975). Likewise, results from the lockout-shortened 1999 season were thrown out completely, as no games were played until February. Then I regressed wins in season Y on average age and the month-by-month point differentials in season Y-1. The resulting equation:

Wins_Y = 41 - (0.592 * agaa_Y-1) + (.479 * oct/nov_Y-1) + (0.165 * dec_Y-1) + (0.297 * jan_Y-1) + (0.36 * feb_Y-1) + (0.564 * mar/apr_Y-1)

That's an interesting result, to say the least. Remember, the monthly variables all represent point differentials, so we can see the relative significance of each term simply by comparing the coefficients -- and it appears Mountain was on to something: a team's performance in the final 2 months of the season is fairly important in predicting their wins the following season. I had feared that the "tanking" phenomenon we've witnessed in recent seasons would dampen the impact of late-season games (and it may well have still done so), but we can clearly see their predictive power nonetheless. Also of interest is the fact that performance in October/November was the 2nd-most significant factor in predicting success in the next season, perhaps because each team is relatively intact in the early going -- this year's Blazers notwithstanding, in-season injuries don't typically begin to take a huge toll that early in the year.

So that's the good news: Mountain's intuitive hypothesis about month-by-month performance looks to be true in terms of predicting wins the next season. The bad news, though, is that the inclusion of this data doesn't really improve on our SRS-based model from Monday. The r-squared on that model, if you recall, was 0.4443, while the r^2 value of the regression we just performed is 0.4428 -- a better predictor than simply using the previous year's SRS alone, but not as good as a model that incorporates SRS and average age.

Why did this happen, since there is clearly a relationship between performance late in the year and success the following season? Mainly because we had to use unadjusted point differential and not SRS. Calculating simple ratings on a month-by-month basis is of questionable value because of the small sample sizes involved, so we had to use average point differential by month as variables -- and while that is a better metric than straight W%, it doesn't take into account opponent strength like the SRS does, and therefore doesn't describe a team's "true talent" as well as the SRS.

Even so, here are the biggest overachievers since 1963-64 by the month-by-month regression:

Year	Team agaa_Y-1 oct/nov-1 dec-1	jan-1	feb-1 mar/apr-1 Wins	xWins	Diff
1998	SAS	 2.22	-7.07	-5.00	-6.00  -11.21	-8.96	56.0	24.6	31.4
1980	BOS	 1.23	-5.86	 2.31	-7.53	-0.75	-9.48	61.0	30.0	31.0
2008	BOS	-3.07	 0.14	-3.67	-5.50	-5.83	-2.76	66.0	37.0	29.0
1990	SAS	-1.27	-3.46	-6.50	-4.60  -15.38	-7.11	56.0	28.1	27.9
1972	LAL	 1.65	 5.10	 3.59	 2.88	 5.94	-4.83	69.0	43.3	25.7
2005	PHO	-2.56	-0.07	-5.47	-4.71	-2.58	-4.86	62.0	36.5	25.5
1970	MIL	-0.17	-8.14	-6.37	-3.79	-0.47	-5.23	56.0	31.9	24.1
1996	CHI	 0.92	 1.92	 6.00	 4.60	-0.46	 8.38	72.0	48.3	23.7
1989	PHO	 0.65	-4.30	-2.40	-7.13	-3.14	-4.89	55.0	32.1	22.9
2002	NJN	-0.21	-1.67	-7.87	-6.47	-2.58	-5.65	52.0	33.0	19.0

And the biggest underachievers:

Year	Team agaa_Y-1 oct/nov-1 dec-1	jan-1	feb-1 mar/apr-1 Wins	xWins	Diff
1965	SFW	-0.89	 4.00	 3.40	 6.44	 5.78	 6.20	17.4	51.5	-34.1
1997	SAS	 1.87	 6.00	 7.60	 5.00	 5.57	 6.78	20.0	51.3	-31.3
1999	CHI	 4.01	 3.00	 9.07	 5.81	 8.38	 8.96	21.3	51.4	-30.0
1983	HOU	 2.14	-5.53	-0.08	 0.92	 6.86	-0.64	14.0	39.5	-25.5
2007	MEM	 1.56	 4.13	 3.71	 1.33	-1.77	 7.68	22.0	46.8	-24.8
1973	PHI	 1.63	-3.26	-5.50	-0.36	-4.79	-4.40	 9.0	33.3	-24.3
1985	NYK	 0.86	 3.18	 2.64	 8.75	 3.44	 2.78	24.0	47.9	-23.9
2008	MIA	 2.74	-6.80	-3.20	 0.40	 4.17	 0.76	15.0	37.6	-22.6
1991	DEN	 2.40	 5.14	 3.80	-1.54	 0.62	-0.04	20.0	42.4	-22.4
1998	LAC	-2.18	-2.63	-4.50	-2.42	 1.67	-2.82	17.0	38.6	-21.6

We're pretty much reshuffling the same teams we saw on our earlier lists at this point, although the 1997-98 Clippers join the ranks of the disappointments (who would have thought Loy Vaught, Bo Outlaw, and Malik Sealy were so important?). And as always, here are the model's expectations for this season:

Year	Team agaa_Y-1 oct/nov-1	dec-1	jan-1	feb-1 mar/apr-1	xWins
2009	BOS	 1.15	13.73	13.86	 5.67	 5.38	11.44	59.3
2009	UTA	-1.46	 8.82	-0.63	 8.46	 4.62	11.04	56.4
2009	LAL	-0.09	 4.13	 5.14	 6.93	12.53	 7.48	54.7
2009	ORL	-0.08	 6.94	-0.20	 2.93	 3.77	10.73	52.6
2009	DET	 1.49	 5.64	14.65	 0.47	 9.50	 6.54	52.5
2009	NOH	 0.28	 3.47	 3.93	12.50	 0.83	 5.36	50.2
2009	HOU	 1.48	 2.00	-0.57	 4.71	12.62	 5.33	49.9
2009	DAL	 2.12	 4.94	 1.73	 8.14	 1.64	 5.65	48.6
2009	SAS	 4.64	 8.82	 4.42	 0.00	 8.45	 3.73	48.4
2009	PHO	 2.49	 5.38	 5.93	 6.56	-0.36	 5.75	48.1
2009	TOR	-0.46	 4.44	-0.81	 7.38	 7.92	-0.44	48.1
2009	DEN	 1.74	 3.82	 2.77	-0.33	 5.77	 5.54	47.4
2009	GSW	-1.27	 0.73	 4.00	 0.87	 4.60	 1.72	45.6
2009	PHI	-1.50	-2.80	 1.00	-4.40	 4.69	 2.83	42.7
2009	CLE	 0.33	-3.24	-4.36	 5.29	-0.21	 0.70	40.4
2009	WAS	 0.46	-0.63	 4.77	-0.20	-4.43	-0.58	39.2
2009	POR	-2.70	-5.31	 5.53	 2.50	-5.29	-1.70	38.8
2009	ATL	-2.61	-2.13	 1.31	-3.60	-3.79	-1.08	38.7
2009	IND	-0.11	-1.76	 0.40	-4.57	-4.38	 1.30	38.1
2009	SAC	 0.50	-4.60	-2.71	 0.40	-2.57	-2.00	36.1
2009	CHI	-1.04	-8.31	-0.44	-1.38	-1.77	-3.88	34.3
2009	CHA	-0.45	-4.07	-6.07	-1.29  -14.67	-0.54	32.3
2009	NJN	 0.58	-6.80	-3.13	-7.80	-0.15	-6.13	31.1
2009	MEM	-2.32	-3.00	-6.93	-3.44  -11.67	-6.88	30.7
2009	MIN	-2.13	-7.57	-9.81	-6.33	-4.92	-5.56	30.2
2009	MIL	-0.97	-2.79	-9.38	-7.71	-4.82	-8.00	30.1
2009	NYK	-1.09	-8.40	-8.00	-2.38	-4.62	-8.58	29.1
2009	LAC	 1.78	-5.29	-4.67	-3.08	-5.64  -12.81	26.5
2009	MIA	 0.62	-3.67	-6.81  -10.86	-7.08  -12.32	25.0
2009	OKC	-1.39	-8.59	-3.29  -10.67	-6.83  -11.83	24.9

And, yes, I promise that this is the last post on preseason expectations for a while. But I can say that we will probably revisit these later in the season, just to see which teams overperformed and underperformed our 3 regression models.

3 Responses to “Great Expectations, Part III”

  1. Mountain Says:

    Thanks for the substantial follow-thru.
    Appreciate the strong work.

  2. Kevin Pelton Says:

    Neil, what does the history say about the meaning of the different coefficients? Are they statistically significant? While the explanation for October/November makes sense, I can't see why December would be such a relatively poor predictor.

  3. Neil Paine Says:

    They're all significant at the 5% level, but I don't really get it either... except that maybe performance during the midseason grind before the All-Star break isn't representative of players' and teams' true abilities. Baseball has its "dog days" in late July and August -- could the NBA's version be happening in December & January? Also, I think nagging injuries probably start to hit after a month or two of play. You play through them, but they definitely affect your performance.