Statistical Analysis, Baseball, and Unbridled Hostility

This morning, as I often do before launching into an inevitably fun-filled day of clinical protocols, biostatistics, and study reports, I paid a visit to the online blog Fire Joe Morgan (hereafter FJM). I’m an enormous fan of FJM, because its writers are intelligent, passionate, and incisive, and also I wish that ESPN would, in fact, Fire Joe Morgan. Anyway, the blog entry for yesterday (Thursday, April 10, 2008) was a deconstruction of a commentary piece written by something called a Jim Armstrong of AOL; you can read the actual piece here, if you overate at breakfast and need something to stimulate a quick purge.

Now, in and of itself, this article is not particularly remarkable. Jim Armstrong is a terrible, terrible writer, of course; his commentary reads like an 11-year-old’s attempt at an edgy opinion essay. But the thing is, Armstrong’s article - which in case you have the good taste not to have read it is a very hostile attack on statistical analysis in baseball, replete with the customary references to “mother’s basements” (in reference to the guy who created VORP, Keith Woolner, who in case anyone is wondering works not out of his mother’s basement but out of the front office of the Cleveland Indians. Which Jim Armstrong does not do. But I disgress. And can you believe this is the same sentence that started with “but the thing is?”) - is not the first, or second, or tenth or twentieth, of its kind that I’ve seen dissected on FJM.

I don’t know why this one put me over the top, but it did. So I come here to pose the question:

*Why is it that contemporary statistical analysis creates such incredible hostility in so many fans and (especially) mediocre sportswriters? *

It makes no sense. I mean, I have trouble understanding folks who dismiss statistical analysis. Most of the stats used by the modern sabermetric crowd are way more intuitive than the ones that have been in usage for decades. OBP and OPS+ make so much more sense, just plain intuitive sense, as a measure of offensive performance then a weird manufactured statistic like batting average. We had a thread a month or two ago in which a few posters acted as if Range Factor, quite possibly the simplest statistic ever invented, was some complex and elaborate theorem. I think good statistics can deepen and broaden appreciation for the human element of the game, by giving it context (liberal use of predictive statistics can also make you look like a plain old genius to people who don’t use them).

But if they’re not your bag, hey, that’s OK.

What I’m asking about here is the intense, white-hot, snarling, drooling, frantic hatred that some folks - again, notably folks in the mainstream media - seem to have for a numbers-based approach. The venom, the need to belittle folks who work with statistics with that sad hackneyed “mother’s basement” nonsense, the broad insinuations that stat-based analysis has nothing to add to the game even as a General Manager inspired by and currently employing Bill James has won two World Series in five years for a franchise that went eight decades without a ring, the gleeful lolling about in ignorance: these things I don’t understand.

Thoughts?

I’d say a lot of the hostility would come from the idea that statistics alone can tell you whether or not a player is good, one doesn’t actually have to watch the game being played, he only needs the box score.

Probably the most striking example of this is the claim by statisticians that Derek Jeter, who won 3 consecutive Gold Gloves, is among the worst fielders at his position. I’d presume that people who spend their lives in baseball, like managers and sportswriters who have been praising DJ over the years won’t take kindly to a “basement dweller” telling them they were 100% wrong, not only isn’t he one of the best, he’s the worst.

Range Factor, am I right that it’s just the number of assists and put outs per game? That’s simple, maybe too simple, it doesn’t seem to factor in a reduced or increased number of chances based on your pitching staff. Strike outs alone can make a big difference to this number.

Baseball is more open to statistical interpretation than other sports, I think, and it also has more of the “this is how they did it in the old days” quality.

Some of the “new” stats are also a little abstract, and I think that makes people suspicious, too. I know I felt that way, and sometimes still do. Range factor is a 100 percent sensible stat, in my opinion, but it can tells you something counterintuitive - the player you saw just make a great play might NOT be better than the guy who doesn’t make a lot of great plays, precisely because the second guy is good enough to make it look easier. That’s hard for some people to buy into.

As a third factor, the more I read, the more I think that the writers who cover baseball are just dumber than any other sportswriters. I don’t know why it is.

I think there are some fundamental aspects of human nature at work here, some of which have already been touched on:

  • Once people form an opinion, it’s absurdly hard to change, even with facts. You see it in the political sphere all the time. Fans and sportswriters, especially older ones, have had years to cement their opinions based on the stats of the time: batting average, rbi’s, wins, etc. A sportswriter who has spent thirty years writing that the best pitcher is the one with the most wins, will be loathe to change that opinion no matter how many compelling arguments you make about run support.

  • People fear the unknown, and cling to the known. Baseball is hardly unique in this regard. When faced with paradigm-changing realities, some folks tend to lash out, seeing the new thinking as attacks on their sensibilities. You see it in the realms of religion, philosophy and science, as well as baseball.

  • People prefer the warmth of romanticism to the cold of statistics – and the newer, more advanced statistics exacerbate the perceived dichotomy. The new statistics often end up debunking cherished notions – that Derek Jeter is a great fielder, or that Tony Gwynn was the best hitter of our generation, or that teams that bunt are playing “smarter” than teams that don’t, or that hackers are more desirable than walkers. Writers and fans, like most humans, have an instinctive averse reaction to having their romantic bubbles popped.

  • People don’t instinctively understand the problem of drawing conclusions from small samples. A Yankee fan, or a NY sportswriter, or even a manager, who sees 160 games of Derek Jeter and 20 games of every other shortstop, will see Derek make 8 times as many spectacular plays as any other, and conclude he’s a gold-glover. Sure, he looks awesome in the field – any starting SS in the majors is one of the best 30 at his job on the planet, and is going to regularly perform amazing athletic feats. But the folks in this example won’t realize that Jeter’s awesome is only 70% as good as Orlando Cabrera’s awesome, and only 85% as good as the awesome of the average AL shortstop. So, advanced statistics essentially tell people they’re wrong, and so the people get angry.

It’s easy for ‘hardcore’ baseball fans to become frustrated about these things, and start advocating for Joe Morgan’s firing, etc. I know I do. Every time I hear an announcer tell me a leadoff hitter’s HR and RBI, but not his OBP, my blood pressure ticks up a smidge. But then I tell myself: people used to believe the Earth was flat, and that the sun went around it, because that’s what was clear to unsophisticated observation. And one day, the average baseball fan will understand that ERA is a better indicator than Wins of a pitcher’s ability.

One day.

Of course, when that day comes, a younger generation will be arguing how flawed a statistic ERA is, and writers will be writing about how fans who don’t use ERA, don’t see the beauty of the game.

ERA seems to be “on the outs” anyway, in favor of adjusted ERA. And WHIP, of course. I don’t know how Armstrong hadn’t heard of WHIP or OPS - I’m hardly a stat nerd and I’ve known about them for years.

Sabermetrics can be an excellent analysis tool. Going over past seasons for patterns and issues is something it does quite well.

It is, however, piss poor at anything else. It cannot predict particularly well, and cannot explain anything worth a damn. This makes is of limited use to anyone in a baseball community. While it’s nice to know that someone hit .333 against lefthanders in scoring postion over his career, that doesn’t really help when you’re trying to figure out what’s happening this year.

As far as explaining, the biggest problem with sabermetrics is that it only has one explanation for anything: regression. A player plays more poorly one year, and that’s the only reason for it. Sabermetrics refuses to accept things like injuries, adjustments, etc.

For instance, take Oliver Perez. Back two years ago, a Sabermatrician would have said, “He’s lost it. His ERA has been consistently over 5.00 except for 2004, which looks like a fluke.” Yet Perez was traded to the Mets and has become a good major league starter. Nothing about his sabermetrics would have predicted that.

OTOH, a mechanical analysis of Perez’s pitching motion indicated that he had changed it considerably after 2004, all to the worse. But fixing the motion, Perez was able to become a good starter. Sabermetrics could not have predicted this, other than maybe saying there was a 1% probability of that happening. And this is the type of analysis baseball insiders need: not how a player performed in the past, but how his performance can be improved in the future.

There are also Mike Pelfrey. His sabermetrics looked horrible, but it turned out he was tipping his pitches. If he fixes that problem – certainly possible – then any PECOTA (Pulling Every Conclusion Out The Ass) for this season is wrong (note, too, that sabermetricians never say PECOTA is wrong – it can’t be! It’s Sabermetrics! – but rather the player did better or worse than predicted).

Also, Sabermetrics does not take into account things like injuries (other than those that put someone on the DL – which they take as a demerit no matter what the reason. If you’re out with a bad rotator cuff, it’s one thing, but if you’re out because someone spiked your finger, it’s not likely to be a long-term issue). It assumes that psychological factors don’t exist (admittedly, they’re overstated, but that doesn’t mean that there aren’t any).

I do like a lot of their statistics (I even tend to think that WHIP may be better than ERA, since it lets you analyze relievers). But, really, any system that has only one answer to explain change in performance by a multitude of individuals and problems that might affect how that player plays, is seriously flawed.

I agree that ERA is itself a flawed statistic. ERA+ is better, though it has the disadvantage that it cannot be easily calculated by a casual fan. And ERA, for all it’s flaws, is miles better than Wins when it comes to evaluating a pitcher’s current and predicted ability to prevent runs.

(Another reason, I think, for the hostility of the establishment; it’s annoying to know the “best” stats are ones you have rely on someone else to determine. I think WARP is a great stat, but I can’t construct it from whole cloth myself without half a day in front of a spreadsheet. For that reason, I do concede that there is some inherent value in simplicity, when you’re talking about stats you’d like to see enter the broader discussion.)

Also, RA and RA+ are probably better than ERA and ERA+, because errors themselves are such an arbitrary thing. A pitcher who already has objectively better fielders (i.e. greater range) may also have more errors committed behind him, since the fielders are able to make plays on more balls. Removing the questionable distinction between earned and unearned runs makes for a better statistic, I think.

Agreed.

A lot of these “new stats” are nice. But a lot of really are confusing and don’t make much sense without a phone book sized book of charts and figures. And this makes them pointless to most fans.

Especially when a lot of fans (I think rightly) believe that you can get a similar feel for a guy just by watching him play a lot of games.

Hey, math is hard. Explaining math can be even harder. And God forbid there should ever be a place in this mans sport for any pencil-necked bed-wetting slide-drool using geeks!!

Even something like WHIP relies too much on one’s team defense and luck to be a very reliable measure of pitching ability. At any rate, my point was that no matter what stats we choose to embrace, younger generations will find better ones that we will have a hard time accepting.

But they aren’t really more complicated. Well, some are, but a lot of them are more straight forward than traditional numbers. They are just new.

Read this, because I couldn’t hope to put it as well.

http://joeposnanski.com/JoeBlog/2008/03/09/statheads-and-true-wins/

OK, but in many cases they are. Look at the Wikipedia writeup for VORP: Value over replacement player - Wikipedia

That thing is a monstrous explanation for a stat that I actually find very interesting. But it feels useless to the large degree because it requires some serious calculations to figure out. Whereas something like Batting Average (the most complex “base” stat) only requires knowing a player’s total number of at bats (minus walks) and total number of hits, all things that can be observed just by watching the game.

That’s where the aversion to new stats comes in.

DIPS tends to work fairly well. I’ve actually used that the last couple of years in determing keepers and who to draft in fantasy baseball Nerd alert!) and it has worked out well.

Reading the comments at the end of Armstrong’s article is sort of the icing on the cake that is the FJM article. The sheer idiocy of Armstrong when compared to the non-Luddite world view of everyone else is mesmerizing.

Do any of these writers not know that Branch Rickey was a huge proponent of statistical analysis? Do they not realize that some “conventional” stats like batting average actually fell out of vogue for a time because people realized their limitations? I’d love to talk to Armstrong and ask him why he thinks stats invented generations ago tell us all we need to know and that nothing new can be learned over time. His house, car, job, and life are all reliant on things invented long ago that have improved with time as people have learned how to do more and better with the base elements at hand.

Okay, we are good so far.

Huh wa? You seem to have a very limited view of what Sabermetrics is. There are stats that are good at telling us how good players were. There are additionally statistics that are good at telling us how good players will be. The stat you quoted is one that is worthless in predicting the future due to sample size issues. That doesn’t mean there aren’t predictive stats. If a pitcher’s BABIP (batting avg on balls in play) is very high one year, he probably was unlucky, and likely to put up better numbers the following year.

Again, sabermetrics has lots of explanation for things. It doesn’t predict everyone will go 81-81 every year.

Seriously no. Just no. Please attempt to understand something better before you attempt to explain what it would do in a given situation. The suggestion that a Sabermetrician would see a high ERA, and say that player is worthless is luducrous. Perez completely lost his control, but other indicators were positive. I don’t think you will find a stathead in the world who thought Pitt made a good deal.

Every one who struggles has a mechanical issue. How many years now has Zito changed his mechanics? They say thinks like this, because it is better than saying this guy just sucks now.

See, my above point. It is usually just another excuse. Also, your use of the word Sabermetrics is interesting.

Pulling Every Conclusion Out The Ass is the exact opposite of what Pecota does. It systematically predicts stats based on many factors. If factors turn out to not do what is previously thought, it will adjust. Of course, it is often wrong, but it has been consistently among the best predictors out there.

Why do you think Sabermetricians are idiots? Of course they will take into account injuries or any other affect that can be measured. What it won’t take into account is psychological factors that can’t be measured. It isn’t that they believe they don’t exist, it is that we can’t see what they are. Lo Duca was a great clubhouse leader, except when he was a clubhouse cancer. They exist, but neither you nor I can tell what they are from watching TV or through the media.

Like I said, there are a lot of answers, and a lot of questions. What you seem to not understand is that the numbers do not replace common sense.

I’m going to take issue with a lot of what you have to say here, and in fact I disagree with the third sentence above. But for the moment, and in the spirit of my initial question, let’s say I assume that you are correct, and that statistical analysis is good only for studying and characterizing what has happened in the past.

So what? My pencil is good for writing on paper. It’s piss poor at anything else. It cannot stab someone particularly well, and it can’t write on glass worth a damn.

But people don’t hate pencils because of that. To use a more germane example, David Ortiz can’t play the field or run well; he’s still a useful player. So feeling loathing for statistical analysis for what it (hypothetically) doesn’t do well makes no sense, at least not to me.

But of course, I disagree with you on the rest, too.

If you believe this, then I offer you the following wager. Every year, Baseball Prospectus projects final records for each of the 30 major league teams. If you think that statistical analysis is a poor predictive tool, then surely there must be someone who projects or predicts final records that can outperform BP. You choose any five non-stat-based predictors you like; I’ll take BP. When the season ends, if any of your predictors outperforms BP, I’ll concede the point. If BP outperforms your predictor, you do the opposite.

Statistics can predict any number of things. If a pitcher has an ERA of 3.50 but a low strikeout rate and a BABIP of .245 in a given season, I can reasonably predict that his ERA will increase substantially in the following season.

I don’t know where you get this idea. I read Baseball Prospectus every year. Their entries for every player take into account all of the things you’re discussing. Injuries, adjustments, learning new pitches, changing managers or coaches; it’s all in there. The above-quote paragraph is just simply, factually untrue.

Not true. I can’t remember a single stat-focused analyst who didn’t like the acquisition of Perez by the Mets. Perez always had a strong strikeout rate; he needed to improve his control.

Well, I mean, it’s a prediction. If they could nail it every time, they’d be a lot wealthier than they are. PECOTA is still the most accurate predictive tool I’ve ever seen for projecting player statistics. Once again, if you know of one that’s more accurate, I’ll gladly offer the same wager I did above.

Again, this is just not true. I don’t know how to respond to it other than that.

I have no dog in this fight since I do not care anything about baseball, but I am a statistician, and this makes absolutely no sense whatsoever.

Regression isn’t an explanation, it is one estimation technique among many used to understand the relationship between things in the real world. It is only as good as the data, the model specification, and the interpretation of the results. If a modeler fails to control for an important variable, his predictions become biased. He might have abused the technique in one instance, but this in no way undermines the enterprise or the validity of the tools.

Ok, I’ve been a baseball fan for 25 years or so. Back in the day I subscribed to the Sporting News mainly to look over the week’s worth of box scores (I cancelled my subscription when they dropped the box scores). I watch Baseball Tonight on ESPN fairly regularly. I pay a subscription fee to stream MLB over the internet. I’ve even dabbled in fantasy baseball a bit.

I never once in my life heard the term “sabermetrics.” Until this year, where I’ve heard it about, oh, 100 times. Is this some sort of new meme?

It’s gotten to be a more mainstream way of thinking in the past 5 years but it’s been around in some form for 100 years. Though, it didn’t really have a name or a face until Bill James started doing some of his more well-known work in the 80s and 90s.

As for the streaming MLB, how is the quality? I’ve considered it but never done it.

This is a really strange thing to say. I’m not sure you know what “Sabermetrics” means.

Really? You really think that’s true? See, there’s two problems here:

  1. “Sabermetrics” means “the study of baseball using objective evidence.” PECOTA isn’t “Sabermetrics,” it’s one method that was invented by a couple of guys and is used primarily by one publication/website which is no more infallible than any other. Asserting that sabermetrics doesn’t work because some PECOTA projection was wrong is exactly equivalent to saying that you don’t think physics is useful because one physicist was wrong about something at one point.

  2. It’s simply, inarguably a falsehood. BP’s own writers will often write, in their little player blurbs, things like “PECOTA looks way low here, he’ll outperform that” or “PECOTA isn’t acocunting for Shlabotnik’s arm injury, so he’ll likely hit a lot more homers than you see above.”