Posted by Neil Paine on September 17, 2009
This tool has been live on the site since the preseason of 2008, but I'm not sure many people are aware of it yet... It's called the Simple Projection System, and it's a pretty unique feature as far as basketball projections on the internet are concerned. Plenty of other sites have projections, of course, but most of them are either completely non-scientific (read: "guesswork") or based on heavily-guarded methods so secretive and requiring so much proprietary data that no layman could possibly hope to recreate them for themselves. The SPS, though, was borne out of the same spirit that had led sabermetrician Tom Tango to create the Marcel projection system for baseball, so named because it was simple enough that a monkey (in this case, Ross Geller's pet monkey on Friends) could replicate its results. Similarly, our SPS at Basketball-Reference doesn't need to use fancy similarity scores or umpteen-thousand obscure variables in order to spit out a series of projected per-minute rates for every player who played a game the year before. Instead, it simply uses past results, a heavy regression to the mean, and a simple aging adjustment, creating surprisingly credible results with this no-frills approach.
How credible, you ask? Well, just like the Marcels, the SPS has stood toe-to-toe with far more complicated systems and proven itself to be nothing if not competent. The success of "monkey systems" like these proves that for groups of players, nothing really beats the simple, time-honored approach of weighting an average of the past three seasons and regressing to the mean. Since the best predictor of future performance truly is the past, and since basketball players' stats are actually significantly more stable year-to-year than those of their baseball counterparts, the SPS is probably even more informative for hoops than Marcel is for hardball.
But... (and you knew there'd be a "but") I'd also like to quote something Tom wrote regarding home run performances in this article from 2006:
"Look at all the guys forecasted for 28 to 30 home runs: Adrian Beltre, Gary Sheffield, Carlos Delgado, Mark Teixeira, Andruw Jones, Alfonso Soriano, Miguel Tejada, Todd Helton, Lance Berkman, Paul Konerko, Rafael Palmeiro, Jeromy Burnitz, Carlos Beltran. Half of those guys will hit more than 29 home runs, and half will hit less.
But, you, me, and everyone else has no idea who will hit 30 or 35 home runs. Bad luck, good luck, injuries, whatever ... everything plays a role in this. Marcel's best guess is that those 13 hitters will average 29 home runs. If you wanted Marcel to forecast number of home runs without attaching names to it, that'd be a lot easier, and the range would be wider. Think of these forecasts as over/unders.
The highest forecasted RBIs [in 2005] were 112 (Tejada), 110 (Pujols), and 108 (Ortiz). What is this, the 1980s? If you had wanted me to only forecast RBIs, and not tell you who would do it, I would have said 150. Why would I give a number like that? Because from 2001 to 2004, the four highest RBI totals were 160, 150, 146, 145. It would therefore be reasonable to think that the league leader will be around 150. The league leader in 2005 had 148 RBI. So, I would have been pretty close, as an over/under.
But, how sure could I have been that it would be Ortiz? You could come up with a reasonable list of 15 or 20 players that would lead the league in RBI. But, that's not what we area trying to figure out. We are trying to come up with reasonable over/unders, numbers that you could find equal reasons where the player will over-perform and under-perform. Injuries, as we know with Bonds, can devastate any forecast.
So, what to do? Trust the forecast for a group of players, but don't go betting on any one forecast. There's not a single person in the whole world who can help you there. There's no book, there's no program, there's nothing to help you with any single forecast. That's why we play the darn game, and that's why we love its drama."
It's a bit like what we encountered when projecting minutes here... Systems like this are going to give you a reasonable mean projection that closely matches historical trends given a population of hundreds of players. And across that entire population, they will be accurate -- some guys will be too low, some too high, but everything will average out in the end. However, if we want to look at specific individual forecasts, all we have is a baseline (or as Tom says, an "over/under"). You're not going to get breakout performances here, and you're not going to get sudden catastrophic decreases in production, either. By regressing to the mean, these possibilities are built into the overall projection, but they're (rightfully) considered improbable. Projections like this are meant to decrease random noise as much as possible -- but real-life results introduce a significant element of randomness to the numbers by definition. The SPS will not tell you playing time, either. Playing time in basketball is so difficult to project, between injuries, constantly-changing roles, the allocation of minutes between starters & bench players, etc., that we simply offer per-minute rates and leave the playing time predictions up to you.
Keeping these caveats in mind, though, you should derive a lot of use from the SPS. Not only does it offer projections for next season, but it also can give you "beforehand" projections for every season since 1981. So if you wanted to get an alternate take on what Michael Jordan would have been expected to score in 1993-94, check that feature out too. And be on the lookout for the SPS when we roll out our NBA preview over the next month and a half, because you'll definitely see it on our team pages.