Posted by Neil Paine on February 21, 2011
Everyone who has spent time studying historical player stats knows this phenomenon: You've seen a player's name for years, but you only know him as a series of numbers on a page. He retired before you were born, and you haven't even seen so much as a trading card with his picture on it... yet, instinctively wanting to humanize him, you imagine what he was like. You know his height, weight, all of the vital stats, everything except his ethnicity. So you make an educated guess based on his name. You now have an idealized picture in your mind's eye of the player in his prime, a man to go alongside the numbers.
The only problem comes when you do see him for the first time -- and he looks quite a bit different from the imaginary version you created years ago.
This is the concept behind Bill Simmons' Reggie Cleveland All-Stars, a "list of sports figures whose names would seem to indicate that they are of a different race or ethnicity than they actually are". Its namesake? Cleveland, a 1970s-era pitcher whom Simmons just assumed was black until learning otherwise when he joined the Red Sox.
For me, that player was Dick "Richie" Allen, most notably of the Phillies and White Sox. Retiring 8½ years before I was born, all I knew of Allen was his impressive stat sheet, including 1972 & 1964 campaigns that were among the best ever in my primitive homebrewed version of Win Shares. Shamefully, I always assumed he was white because of his name, and was very surprised when I finally saw his picture and learned of my mistake. The error is especially ironic in retrospect because Allen was outspoken in his rebellion against MLB's conservative power structure, and also suffered a great deal of racism throughout his career. In other words, I couldn't have been more wrong about Dick Allen.
Of course, this phenomenon doesn't always have such trivial consequences. Studies have found that similar assumptions frequently lead to discrimination in job hiring, causing some to even disguise their ethnicity in resumes to make themselves more attractive to employers. So it's a serious topic -- but hopefully one that we can also learn more about by examining our own ways of thinking, as well as analyzing some data. To that end, I took John Grasso's biographical database (which lists players as "white", "black", or "other" -- i.e., Asian) and calculated the expected probability of being a certain ethnicity based on the demographic trends of every player in NBA history. Based on Simmons' logic, the players least likely to be their actual race should be considered "Reggie Cleveland All-Stars".
Take for example the aptly-nicknamed Jason "White Chocolate" Williams (most recently of the Memphis Grizzlies). In NBA history, 14 players went by the first name "Jason". 4 were considered "white" in Grasso's data and 10 were considered "black", so the probability of an NBA player named Jason being white is 28.6%. Meanwhile, 57 players have had the surname "Williams" and, shockingly, Jason Williams is the only one to be considered "white" in Grasso's database, meaning the probability of an NBA player named Williams being white is 1.8%. Using Bayes' Theorem, we find that the probability of Jason Williams being white is 2.3%, making him the most unlikely player in NBA history to be his actual ethnicity.
Repeating this for all players, we get this list of the quintessential NBA "Reggie Cleveland All-Stars":
|First Name||Last Name||Full Name|
According to this data, a future generation of fans will list David Lee a member of their Reggie Cleveland All-Star team. We probably consider that unthinkable today... but I suppose older fans would never have imagined such confusion could exist for someone like Dick Allen, either.