Revisiting the facts that everyone knows

How a bored night watchmen led to a best selling book and Brad Pitt movie

Jan 25, 2022

Baseball became a unique definable sport at some point in the mid 1800s. Prior to that there were similar type games (rounders, cricket, etc) that hailed from Europe and came over with immigrants. The rules were slowly updated and changed until we get the rules that have in force for pretty much the last 150 years. One of the unique things about baseball is that, even from the beginning, it’s been ripe for statistical analysis. At the core of the game, you have one pitcher throwing to one batter. There’s no way to keep batting your best hitter, there’s little that a teammate can do to help the batter at the plate. It’s a one-on-one match and that makes it easier to do statistical analysis that other sports where the interactions aren’t as distinct.

Our history of baseball statistical analysis starts with the writer Henry Chadwick. His two major contributions to baseball statistics were Earned Run Average (ERA) and batting average. His thoughts on batting average were that he wanted to isolate the skill of the batter, so that meant focusing on hits and attempts and ignoring runs and even walks. The simple formula is number of Hits divided by number of At Bats. At Bats were defined as all the times a hitter went to the plate and either struck out, got a hit or made an out. Balls in play which were ruled as an error didn’t count as a Hit or an At Bat. Similar statistics in cricket focused more on runs and outs. But those numbers were affected by teammates and that wasn’t what Chadwick was interested in.

Earned Run Average focuses on the number of Runs given up and number of Innings a pitcher pitches. Chadwick wanted to focus only on things in a pitcher’s control, a theme we’ll come back to later, and would ignore Runs scored as a result of an Error. He set it up to show the number of runs per 9 innings, relievers had just started being used, so it made sense to scale the stat to a full game. Previously you could just use wins or runs allowed to judge a pitcher. But as fewer pitchers completed games, there needed to be a scaling to compare them.

These were the two biggest stats for most of the first hundred years of baseball. There were some people (such as Earnshaw Cook) who came up with interesting ideas during this time, but nothing caught on. Entering the 1970s, baseball players were still mostly defined by Batting Average or Earned Run Average.

George William (Bill) James was a recent graduate from The University of Kansas when he was drafted for the Vietnam War. He never actually made it to Vietnam and after serving his tour of duty, came back to Kansas. He took a job as a night watchmen at a pork and beans cannery in Kansas. James had always wanted to be a writer and decided that with his spare time on the job, he would write about his obsession, baseball. While James was no a mathematician, he was able to wring enough information from box scores that he was able to answer some questions he was interested in. Those questions quite often challenged the deeply held beliefs and assumptions about baseball. James was able to show that Chadwick had been wrong about walks. Batters did have a skill to take more walks.

Where Batting Average only focuses on hits and ignores walks, On Base Percentage focuses on all plate appearances. So, instead of Hits divided by At Bats, On Base Percentage uses all Plate Appearances (every time a hitter goes to the plate) and takes a percentage of not making an out. This has the advantage of focusing on what is important, not making an out. Unlike other sports, where time is the major constraint, baseball has no time element at all. When a team is batting, they stay batting until they make three outs. Bill James was able to revisit the what everyone knew about player offense and prove that the common wisdom was wrong..

By the late 90s, there was a whole community of amateur sabermetricians who grew up reading Bill James and had taken not only his findings, but his approach to questioning the common baseball wisdom. Using the tools and approach this group also used the latest technology, Internet news groups, to communicate with each other. Voros McCraken was one of this new group. He first came to prominence with his paean to small sample size with Voros Law “Any player can hit just about anything in around 60 at-bats”. This was to remind us that the Kevin Maas of the world are just on a hot streak, and their first 60 at bats don’t represent their talent level.

Voros then started down a path of thought on pitching and came out with Defense Independent Pitching Statistic (DIPS). The basic concept was that once you take away strikeouts, walks and home runs (plays where there is no fielder involved), the pitcher has little control over the percentage of balls which drop in as hits. People reacted by either suggesting that he had to be wrong or that this would be the biggest reconceptualization of the role of the pitcher in baseball. DIPS was the talk of sabermetricians everywhere and everyone, including Bill James, got involved in trying to prove or disprove it. After years of analysis and debate, it was determined that Voros was mostly correct. This was a biggest change in analyzing pitchers since Chadwick’s original ERA stat. Where previously the impact of fielding was ignored in pitching (excepting errors), with DIPS there was a better way to compare pitchers regardless of the quality of the defense behind them. By revisiting the common wisdom on Pitching, Voros was able to redefine pitching measurements.

All this revisiting of the stats had barely made any impact on the actual game itself. Until the Oakland A’s owner decided to cut spending. The GM Sandy Alderson, who had led the A’s to multiple division titles and the 1989 World Series. But in 1995, he now had to try to figure out how to win while cutting costs. First Alderson, then his protege, Billy Beane, started reading and incorporating the new statistics and slowly started winning.

By 2002, the Oakland A’s were an embarrassment to baseball. In the mid 1990s MLB had commissioned a Blue Ribbon report on baseball economics and determined that only the teams with higher payrolls had a chance to win. However, the A’s had increased their wins for 4 straight seasons and had made the playoffs the last two years, while having one of the lowest payrolls in baseball. MLB was trying to contain player costs, in the name of equality, but the A’s were proving to be a stubborn counter example. But everyone believe that this would have to change. After the 2001 season, the A’s were losing three of their best players to free agency, perennial MVP candidate Jason Giambi, closer Jason Isringhausen and CF Johnny Damon. Due to losing these three players, the As also had 7 first round draft picks.

Into this mess, came writer Michael Lewis. He sat with the GM Billy Beane as the As tried to figure out how to replace the loss of 3 stars as well as draft 7 first round players on a tight budget. The As ended up increasing their wins again, winning 103 games, and Michael Lewis’ book Moneyball became a best seller and led to a 2011 movie starring Brad Pitt as Billy Beane. All of this because a night watchmen got bored in the 1970s and started writing about baseball.

Since then teams have not only started using sabermetric principles in the game but also grabbing some of the amateur sabermetricians to help them figure out what should change next. The changes started by Bill James and continued by Voros McCraken changed the game of baseball. And the biggest thing they did was challenge the current wisdom of what is important in baseball statistics.

Never accept that the common wisdom is correct. Also check and see if there are better ways to do things. You never know what change will change your industry, but I wouldn’t hold out for it leading to a Brad Pitt movie.

Innovation and Learning

Revisiting the facts that everyone knows

How a bored night watchmen led to a best selling book and Brad Pitt movie