For Serious Baseball Stat Geeks: Does Your Gut EVER Overrule Your Numbers?

Well, it’s hard to separate “thinking they’re better” from just liking or disliking a player. I absolutely despise Lyle Overbay. Technically I guess he was an okay hitter last year; his numbers were not terrible, but I still dreaded his appearances because I just hate him and dread filled me whenever he stepped to the plate.

Of course, there’s also the fact that the stats just aren’t perfect, especially when it comes to fielding. The stats say that Jose Bautista is a terrible fielder but I have to admit I’m skeptical; he looks good to me. If there was some supporting evidence Bautista was a bad fielder maybe my mind would be changed, but it’s just that UZR et al. say one thing and yet he looks fine. If anything the supporting eidence kind of backs me up (Bautista’s team won more games than the analytical stats say they should have - a sign something is being missed somewhere, and the something could very well be fielding.)

That said, there’s sch a long history of fielding stats being wrong, that might just be geekery. When the methods of the 1990s said Roberto Alomar was the Worst Second Baseman Evar!!!111! I just didn’t for an instant believe them. He was visibly NOT a bad second baseman, and the surrounding information just seemed really weird, most notable the fact that the methods of the time claimed Alomar was a fine fielder in 1988-1990 with the Padres, and a fine fielder 1996-on with the Orioles, but somehow, in the prime of his career and for no explained reason, sucked in between in Toronto. That screams “illusion of context,” doesn’t it? And yet a lot of stats geeks simply would not budge from the methods of the time, insisting they had to be right.

I’m not sure I believe the numbers now. While looking up various assessment of Robbie Alomar’s defense, I looked up some of his teams. According to WARP, the 1993 Toronto Blue Jays was basically no better a fielding team than a Triple-A ballclub; collectively all their fielders were just 1.4 games better than a team of replacement level players with the glove. Amazingly, according to WARP, the best defensive player on the Blue Jays was Devon White. Okay, no surprise there, but the second best was… Rob Butler. Rob Butler, who played a grand total of 113 innings (about 13 games) in the outfield, making 32 catches, one error and no assists, was worth the better part of a game over a replacement player. In just 13 games. The rest of the team were mostly sub-major-league-quality fielders. This was, mind you, the World Series champion, a team that sure as hell LOOKED like they could field the ball.

Well, I don’t believe those numbers for an instant. They’re fucking insane. Rob Butler, in 113 innings and with no other evidence to suggest it, was equal in defensive skill to Willie Mays, and one of the only good fielders on the best team in the league? Ed Sprague was a better fielder than Roberto Alomar? I’m sorry, but I am absolutely convinced the metrics are wrong.

The problem is I still qualify for my geek hat because I can prove they’re wrong. Here’s another problem I see. According to the WARP method the Blue Jays (95-67) were 42.4 wins above replacement level - 27.8 from their hitters, 13.2 from their pitchers, 1.4 from the fielders, led by Devon White and Rob Butler. Okay.

Let us now examine the Baltimore Orioles (85-77.) It seems to me that, since the Orioles won 10 fewer games, they should be “worth” 10 fewer WARP. They played in the same league and in the same division, so that just makes sense, right? But no; according to WARP, the Orioles were 37.4 WARP.

That makes no sense to me. If the Blue Jays won 95 games and the Orioles won 85, the total value of Blue Jay players has to add up to 95 wins, and the total value of Orioles players has to add up to 85 (or however many WARP those win totals are above replacement level.) Value is winning. You cannot logically say a team that won ten fewer games was worth just five fewer games. They obviously were not.

I think some of that missing value was in Robbie Alomar’s glove.

<shakes fist in anger>You did that???
:smiley:

I have a related question that I hope won’t prevent anyone from answering the OP, but it’s directed at the same crowd.

IMO (and I’m happy to be shown that I’m wrong,) most of the newer sabermetric-type stats seem to do a very good job at analyzing past performance, but are a little less accurate in predicting future performance. Would you say that’s fair, or completely off-base? Maybe they aren’t intended for that purpose, and maybe they work better at predicting long-term outcome, rather than next season outcome?

Yep. You can blame me. I won’t speculate as to the reason for the Yankees winning in '09 tho.

RickJay, that’s one reason I prefer Win Shares, which does explicitly tie individual value to overall team value. WAR is like something a committee of 100 people who never talked to each other would design, and it shows. People assume that WAR has been balanced, checked for both reliability and validity, and such and so-forth, but there are always a few niggling irregularities like what you wrote about which makes me darkly suspicious. Yet WAR has won the day, and WS languishes forgotten in the closet, for good or bad.

I think that’s absolutely true, but then, it’s to be expected. Past performance is a set of established facts. Future performance is always going to be hard to predict, because there will always be a guy who just loses his curveball, or a Jose Bautista, or someone who gets hurt, or a guy who gains ten pounds and it makes him just a bit too slow at second base. Predicting future performance is always guesswork; there are general trends you can bet on but some people will buck the trend.

[QUOTE=John DiFool]
RickJay, that’s one reason I prefer Win Shares, which does explicitly tie individual value to overall team value. WAR is like something a committee of 100 people who never talked to each other would design, and it shows. People assume that WAR has been balanced, checked for both reliability and validity, and such and so-forth, but there are always a few niggling irregularities like what you wrote about which makes me darkly suspicious. Yet WAR has won the day, and WS languishes forgotten in the closet, for good or bad.
[/QUOTE]

Which I think is a shame; Win Shares was a brilliant invention. I don’t think any of the overall stats have it quite right, and prefer to examine them all. I think they all have glaring flaws, and WAR’s disconnection from actual team performance is a pretty glaring one.

RickJay:

What do Alomar’s defensive stats show when measured in Win Shares? I haven’t looked it up, but (you and I have disagreed on him before, based on which part of his career we watched more closely):

  1. if it doesn’t show that he stunk on D while playing in NY then I’ll agree that it doesn’t measure much–he was timid on the DP, and displayed a range the width of an index card

  2. if it shows him phenomenally inept on D in NY but phenomenally ept earlier in his career, then I’ll simply concede his performance was excellent the whole time I didn’t have my eyes on him

  3. if it shows him nothing special on D for his entire career, then I’ll resume my assertion that you were blinded by his offense and attributed to him defensive skills he never had.

Based on his brief career in NYC, which is all I saw regularly, I’m opposed to all and any honors he may receive, but am willing to be persuaded otherwise if shown convincing stats demonstrating that his Mets career was an aberration.

That’s funny, I don’t like win shares at all! My big problem with win shares is that teams often win due to luck. We may not like to admit it, but teams can have anomalies like high strand rates for their pitchers, win a ton of 1-run games, lucky HR/FB rates, etc. In these cases, the team would win more games than would otherwise be expected based on their underlying stats. The problem is that a team could perform exactly the same two consecutive years, but due to changes in luck the team could win 5 fewer games one year than the other. This would then be reflected in the team’s win shares, even though the players performed exactly the same

I like WAR because it’s a luck-independent stat that just looks at the player’s peripheral stats to derive their value.

[QUOTE=markdash]
That’s funny, I don’t like win shares at all! My big problem with win shares is that teams often win due to luck. We may not like to admit it, but teams can have anomalies like high strand rates for their pitchers, win a ton of 1-run games, lucky HR/FB rates, etc.
[/QUOTE]

That’s true, which is why it’s important to distinguish between actual value and future expectations.

To use a non-analytical stat, last year Jose Bautista hit 54 home runs. I don’t think there is a person on earth, including Bautista himself, who thinks Bautista will hit 54 home runs again. While he is a legitimately good hitter and will probably hit quite a few homers again if he stays healthy, he’s not going to hit 54. 54 home runs was a fluke. He was lucky. So in terms of projecting how Bautista will do in 2011 and so on, it would be stupid to assume his 2010 performance will be repeated forever.

But in admitting that he was lucky in 2010, you don’t discount the fact that he actually did it. Whatever Jose Bautista’s hitting abilities really are and how they translate into his performance in 2011, 2012, etc., you cannot examine his 2010 performance and say that because it was flukey, he DIDN’T hit 54 home runs. The fact is that he did, in fact, hit 54 home runs, and an analysis of his value in 2010 must account for the fact that he hit 54 home runs, no matter if it was lucky. He hit 54 homers.

To my mind, that must logically extend to team performance. The 1993 Blue Jays, just to keep using the same example. won 95 games. They did not win 90 games, which is what their WAR suggests. They did not win 91 games, which is what their Pythagorean projection suggests. They won 95, and they lost 67. Any analysis of the Blue Jays’ performance in 1993 must begin with the absolutely undeniable fact that they went 95-67; furthermore, that their winning 95 games is quite literally the single most important statistic there is. Winning games is the entire point. Just as you cannot examine Jose Bautista’s 2010 denying that he hit 54 homers, you cannot examine Toronto’s performance in 1993 and somehow conclude that they did not win 95 games.

Bill James’s point, which I think has a lot of logic to it, is that in examining a player’s past value, the value of a team’s players must add up to the value of the team’s performance. That strikes me as being eminently sensible. Yes, the 1993 Blue Jays might have been slightly lucky in some respects, but (a) the results are what they are, (b) it’s sensible to distinguish between how the Jays did in 1993 and how you expected them to do in 1994, and (c) we can’t be absolutely positive that we are acquainted with all the facts and so it’s a lot smarter to start with the team’s performance and go from there than it is to build up from nothing, hope you get there, and shrug off the discrepancies. As James points out in Win Shares, at least when it came to Linear Weights, disregarding team performance resulted in some numbers, especially regarding fielding ability, that were just insanely stupid.

Again, I don’t trust any one number and I’m not saying Win Shares always gets it right, either.

As I recall - I can’t find my copy of the book - they showed that he was a very good defensive player, but not brilliant, for most of his career. How he fared in his short stint with New York I don’t recall, he he was in his late 30s so I’d be shocked if he was good at that point.

I’m sorry he played poorly in New York, but old players do that. Willie Mays did not suck because he played badly for the Mets. I know you have a real hate on for Alomar, but come on, you don’t REALLY think he played that way his whole career, do you?

I think that’s a mild case of begging the question, though, RickJay. I mean, to the extent that the question is “does winning equal actual value,” that is. Because I would argue that actual value does correspond with future expectations very closely, and in fact that’s the advantage of having stats that can figure out actual value.

I can’t think of a good Blue Jays example, so I’ll use the Phillies. Cole Hamels in 2009 was 10-11 with a 4.32. On the face of it, he had a terrible, aberrant season. He even admitted himself that he thought he had basically a wasted year because of all the post-World Series hoopla which had screwed up his mindset. If value is defined in terms of wins, which is defined by runs allowed, more or less, for a pitcher, since he can’t really affect the other part of the equation, then Cole Hamels was way less valuable to his team in 2009 than any other season of his career.

Analytically, though, he was pretty much the same pitcher in 2009 as 2008 and 2010, as it turned out. He had about the same number of home runs, slightly fewer strikeouts and a bunch fewer walks, which meant slightly more balls put in play, a lower percentage of extra base hits/hits, and so on. But his batting average allowed on balls in play was absurdly higher – up 60 points from '08 to '09. On the field, he actually had to be as good as he was in 2008 just to have a 10-11 record with a 4.32 ERA. If he had gotten tangibly worse the team would have been worse.

I’d argue that what this means is that Hamels’ value was the same from year to year, and things external to his value were the only things that changed appreciably. At the end of 2009, I think you could have said (and many people did say) that Hamels was just unlucky, and would be much better in 2010. And he was. And I think that means that his value was the same the whole time, and the winning was a separate issue.

(Which isn’t to say that I think WAR or anything else solves that problem by itself, either. Hamels had a pretty crap WAR in 2009, too.)

No, I don’t. He couldn’t have made a MLB team if he had. But I have to relate how maddening it was to watch him sucking donkey dick in the field and have METS FANS explain to me how my eyes were faulty, he really was hustling, he actually displays great range, that botched DP wasn’t his fault, etc. because he was, after all, a Gold Glove second-base man, and an all-around fabulous fielder by acclamation.

Then when I would point out how his defensive stats, particularly range factor, kinda sucked, they’d say to me “Well, you have a smalll sample size there, that will all even out when he’s played a whole season.”

After a whole season, though, everyone hated him and were glad to see him go for a bag of goat gonads. So I’m suspicious of his reputation for stellar fielding.

FWIW, unless you’ve looked at a HR tracker for Jose Bautista and found that an unusually large percentage of his homers barely made it over the wall, luck has little to no part to play. For the most part, there isn’t a whole lot of luck in hitters hitting home runs.

Whether it was a fluke is another question entirely, of course.

It’s entirely possible Hamels was in fact as valuable in 2009 as he was in 2008. But he might not have been.

I don’t think we’re disagreeing, really. Hamels’s ERA in 2009 might in fact be deceiving. Perhaps a lesser pitcher would have posted an ERA of 5.60 in the same innings. ERA could be a stat where the issue is what marginal effect the pitcher has, just like W-L. It may simply be that Hamels pitched when Philadelphia’s defense sucked; I don’t have the numbers.

On the other hand, if Hamels just happened to be giving up hits into the gap more often than not - and that does happen - then in fact I’d argue he did not pitch with the same value in 2009.

I realize BABIP is, for the most part, not a repeatable skill. But, again, I argue that one must distinguish between past performance and expected future results. If Hamels gave up more hits into the gaps in 2009, then that’s what he did, and that had an effect on Philadelphia’s ability to win games. A similar case is clutch hitting. Clutch hitting isn’t a repeatable skill, but if a player one season bats .420 in clutch situations, then that’s what he did, and it helped his team win. Just because we do not expect him to do it next year does not mean he doesn’t deserve credit for doing it this year.

If you want to take a really extreme example, Reggie Jackson hit 5 homers in the 1977 World Series. Obviously, that is a fluke. Reggie Jackson was a good hitter but he couldn’t hit 5 homers every six games. But in those six games he DID hit 5 homers, and he deserves credit for it. A few years later Dave Winfield went 1-for-22 in the 1981 World Series. Obviously, that was a fluke; Dave Winfield wasn’t really an .045 hitter. But he was in those six games, and he really did hurt his team. Maybe 15 of his outs were shrieking line drives right into someone’s glove - but you can’t pretend that helped the Yankees.

BTW, it’s a perfect example of the OP that I, probably the best-documented stat geek on the SD, HATE Roberto Alomar no matter what RickJay’s stats will show.

My stats? Dude, look up the regular stats.

Well, I looked up his dWAR and, as you said, they showed him to be a poor fielder. You say you’ve got other stats that say otherwise? I probably won’t buy them either but I’d be interested in seeing if they even exist, or if you’re a believer in the face of ALL of the evidence.

Which Mets fans? Alomar was a scrub on the Mets and his tenure was so terrible that a lot of Mets fans (myself included) can’t believe he was with the team for only a year and a half. It felt like give torturous years of suck.

He was the starting secondbase man and batted in the 1-3 spots every day he was on the Mets–he just PLAYED like a clumsy, timid, clueless scrub, but every Met fan that I knew and every Met official for sure asserted that he was a skilled, gifted All-Star and that my eyes were deceiving me, exactly as RickJay maintains. I believe Rickjay that he deserved his ALL-Star status with the Blue Jays, but it IS the same argument. His offensive stats with Toronto convince me that he used to be an effective hitter, but I’m not seeng any evidence that he was a superior fielder, and I maintain that Gold Gloves are too frequently earned for offensive prowess (and the playing time that prowess earns for you.)

It amazes me that you assert he was a scrub–again, this is a Met fan minimizing his suckitude.

I’m not minimizing anything. He played like a scrub for the Mets. You’re right, he should have been better, but he played like a scrub.

This post simply asserts that he WAS a scrub, not that he played LIKE one, which to me is minimizing his poor numbers.

Well, we can examine some of these “leading predictors”, including how many “wall-squeakers” he hit:

Home runs per flyball: spiked to 21.7%, which is huge, and much higher than his career baseline (13.8%). Pujols by way of comparison was at 19.9% career, maxed at 22.5% in 2006. He also saw his overall flyball % jump too (to 54.5 from 45.8).

According to HitTracker, he had 13 “just enoughs” for home runs in 2010, leading the majors, tho in % terms that isn’t big (13/54 < 25%).

So it is very likely that he will regress, but he also will probably hold on to enough of these abilities to get somewhere around 30-40 homers anyway.