Great Expectations, Part II

Posted by Neil Paine on November 2, 2008

On Friday, in an effort to establish preseason "expectations" for each team, we built a very simple model for projecting future performance at the team level. We included both W-L records and SRS scores from the 5 previous seasons in a linear regression, and we discovered that in both cases the only past season that's significant at the 5% level is the year directly before the one we're trying to predict (year "Y-1"). We also found that past SRS scores (which are essentially average scoring margins, but adjusted for strength of schedule) better predict future W-L than past W-L do -- which just confirms what people like John Hollinger have been saying for a long time. Finally, using our regression model, we ranked the biggest positive ('08 Celtics) and negative ('99 Bulls) surprises in NBA history, and showed what the model predicted for the current season as well.

However, the model's fit (r^2 of 0.44) still left something to be desired, and two commenters (Ben and Mountain) had some suggestions that could potentially make our set of "expectations" more accurate. Ben said:

One variable to consider adding would be the previous year’s win share weighted average age. That could be one number that might pick up the direction a team’s headed in.

And I think that's a pretty good idea. After all, if two teams (one old and one young) had similar SRS scores in a season, you would naturally expect the younger team to improve the next season and the older one to decline -- but our old model would make no distinction between the two teams. So let's add minute-weighted (not Win Share-weighted, for reasons explained here) team age into the mix as a variable.

Then Mountain, always a valuable source for fresh ideas, had this to say:

I wonder what you’d find if you weighted previous year performance by month in a way that gave somewhat greater weight to later months.

I like this thought too, because intuitively you would expect a young/"up-and-coming" team to improve from month to month, while an old team might clue us in to an impending collapse if their performance declined over the course of the season.

Anyway, over the next two posts, let's put both of these suggestions to the test and see if they help us build a better system. The model I'll create today is one that basically extends our model from Friday -- instead of just using SRS from the previous season (SRS_Y-1) as our lone variable, we'll also add the previous year's minute-weighted age minus the league's average age (agaa_Y-1) to the equation. Regressing those variables on wins in year Y for every season since 1963, when the NBA started tracking split-season stats for traded players, we get this equation:

wins_Y = 41 + (1.956 * SRS_Y-1) - (0.549 * agaa_Y-1)

Both of these variables are significant at 5%, so Ben's intuition was correct: adding age as a variable does in fact improve the model's predictive power. Unfortunately, it doesn't really improve it that much -- the r-squared value for our original equation was 0.4404, while the r^2 for this new model is 0.4443. No matter, though, here are the new Top 10 surprise teams:

Year	Team   srs_Y-1 agaa_y-1	xWins	Wins	Diff
1998	SAS	-7.926	 2.22	24.3	56.0	31.7
2008	BOS	-3.706	-3.07	35.4	66.0	30.6
1980	BOS	-4.775	 1.23	31.0	61.0	30.0
1990	SAS	-7.450	-1.27	27.1	56.0	28.9
2005	PHO	-2.941	-2.56	36.7	62.0	25.3
1970	MIL	-5.067	-0.17	31.2	56.0	24.8
1989	PHO	-4.801	 0.65	31.3	55.0	23.7
1996	CHI	 4.311	 0.92	48.9	72.0	23.1
1972	LAL	 3.264	 1.65	46.5	69.0	22.5
2002	NJN	-5.303	-0.21	30.7	52.0	21.3

And our new biggest disappointments:

Year	Team   srs_Y-1 agaa_y-1	xWins	Wins	Diff
1965	SFW	 4.390	-0.89	50.1	17.4	-32.7
1997	SAS	 5.975	 1.87	51.7	20.0	-31.7
1999	CHI	 7.244	 4.01	53.0	21.3	-31.6
2007	MEM	 3.738	 1.56	47.5	22.0	-25.5
1983	HOU	-0.393	 2.14	39.1	14.0	-25.1
1973	PHI	-3.441	 1.63	33.4	 9.0	-24.4
1985	NYK	 3.789	 0.86	47.9	24.0	-23.9
1991	DEN	 1.562	 2.40	42.7	20.0	-22.7
2008	MIA	-1.209	 2.74	37.1	15.0	-22.1
1998	TOR	-2.555	-2.54	37.4	16.0	-21.4

So the age variable does make a difference, but we're seeing basically the same teams in both lists, albeit in a slightly different order. And the two newcomers, the 2002 Nets and the 1998 Raptors, fit the existing schemata we laid out on Friday -- New Jersey added a superstar in Jason Kidd, while all of the Raps' best players from Y-1 simultaneously had bad years. For curiosity's sake, this is how our new model sets the expectations for the 2008-09 season:

Year	Team   srs_Y-1 agaa_Y-1	xWins
2009	BOS	 9.307	 1.15	58.6
2009	LAL	 7.344	-0.09	55.4
2009	UTA	 6.867	-1.46	55.2
2009	DET	 6.671	 1.49	53.2
2009	NOH	 5.464	 0.28	51.5
2009	ORL	 4.788	-0.08	50.4
2009	PHO	 5.138	 2.49	49.7
2009	HOU	 4.835	 1.48	49.6
2009	DAL	 4.702	 2.12	49.0
2009	SAS	 5.104	 4.64	48.4
2009	DEN	 3.739	 1.74	47.4
2009	GSW	 2.381	-1.27	46.4
2009	TOR	 2.469	-0.46	46.1
2009	PHI	 0.188	-1.50	42.2
2009	POR	-0.520	-2.70	41.5
2009	CLE	-0.525	 0.33	39.8
2009	WAS	-0.605	 0.46	39.6
2009	ATL	-2.228	-2.61	38.1
2009	IND	-1.864	-0.11	37.4
2009	SAC	-1.854	 0.50	37.1
2009	CHI	-3.191	-1.04	35.3
2009	CHA	-4.484	-0.45	32.5
2009	MEM	-5.752	-2.32	31.0
2009	NJN	-5.146	 0.58	30.6
2009	MIN	-6.254	-2.13	29.9
2009	NYK	-6.543	-1.09	28.8
2009	MIL	-6.912	-0.97	28.0
2009	LAC	-6.561	 1.78	27.2
2009	SEA	-8.037	-1.39	26.0
2009	MIA	-8.530	 0.62	24.0

Tomorrow, we'll depart from the SRS for a bit, but we'll still use average point margin and age as variables, and we'll also try to incorporate Mountain's idea about month-by-month performance into our model. Stay tuned!

This entry was posted on Sunday, November 2nd, 2008 at 11:54 pm and is filed under General, SRS. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

One Response to “Great Expectations, Part II”

Ben Says:
November 3rd, 2008 at 12:05 pm
Wow, I didn't expect it to improve r^2 by much, but thought it might do better than that. :) I don't suppose quadratic or cubic terms would make much of a difference. Also, the effect with minutes is so small, I don't imagine a performance weighted average would make much of an improvement either. Nonetheless, I like the effect on the 2009 predictions - it does what you'd want with Phoenix, Dallas, and San Antonio.

« Great (and Not-So-Great) Expectations

Great Expectations, Part III »