I have read the Wikipedia page on this multiple times, and have come to the realization I may never quite fully understand how it works, specifically the part about the average replacement player. But it is safe to say, that unlike batterng average, ERA, WHIP, etc., that WAR is not 100% based on actual player stats, but at least has some element of hypothetical to it? Thanks!
Wins Above Replacement (WAR) is hypothetical in the sense that a pitcher whose team has lost literally every game that pitcher appeared in can nevertheless have positive numbers.
If the team lost every game, then there is no real-world possibility that the existence of that pitcher on the mound, hurling fastballs and fanning batters, could have contributed to that team winning, compared to a random replacement player from Triple-A, precisely because the team did not win. So… what is it trying to study?
The numbers from WAR come directly from game stats and player performance, so they’re not hypothetical in that sense. But the numbers are then digested and regurgitated in a way that projects forward onto hypothesized future games, rather than just spitting out the actual “win-output” of what the player actually did. So if Madison Bumgarner pitches an extremely good game, giving up one earned run in eight innings, and then the Giants lose 1-0, then MadBum did not actually contribute to the team winning, because the team did not win. Some random Triple-A newb could’ve given up 6 runs in those same eight innings, and the giants would’ve lost 6-0 instead of 1-0. But they still would’ve lost. The ace did not contribute a win in that situation.
But WAR isn’t designed to tell you the past, but rather to predict the future. Imagine a repeat of that particular pitching performance. If a magical fairy came down and told you if you start MadBum tonight, he’ll deliver eight solid innings and give up only one run. Do you start him instead of Mr Replacement? Fuck yes, you do. Of course you do. The WAR is an attempt to be that sort of fairy, in a probabilistic sense: if you have the guy available for future games, and he gives similar performances in those future games, then how many more games do you expect you would win with the star pitcher?
If I’m a GM who’s trying to project forward, then I want something like WAR as one tool (among others!) helping to guide my decisions for the future. Which is exactly why the stat is so popular.
The issue, though, comes up with things like MVP voting when people are comparing the WAR of various players.
What does it actually mean to the Most Valuable Player? Well, it should bloody well mean you actually added value. And what is “value”? Contributing to your team winning, obvs. So we had Aaron Judge recently comparing favorably with Altuve in the AL-MVP race based on WAR in 2017, but this was a bit ridiculous. The Yankees had great stats. If we did a rewind of that year, and started it again from Game 1 after making only random changes, then they almost certainly would’ve won a lot more games. Judge payed great, they just slipped on some banana peels. Do you want Aaron Judge on your team after that year? Hell yeah, you do. The high WAR was fully justified.
But was he nearly as valuable as Altuve based on the reality of that season? No. Fucking. Way. Not even close. Something like “Win Shares” developed by Bill James gives you a much better idea, after the fact, of how much value a player added. People really should be looking more in that direction when deciding things like the MVP awards. WAR in contrast is geared toward prediction, which is what GMs need. I think this is why it’s applied, even in contexts when it arguably should not be.
No, not obvs. This is the annual debate when MVP voting comes up every year. I’m firmly in the other camp, where “value” isn’t tied to a team statistic, or reliant upon his other teammates’ abilities. A player who hits 100 home runs and hits .400, but is on a team that gives up 8 runs a game isn’t “valuable” in your interpretation, because his team struggles to win 40 games a year.
If there were a hypothetical auction involving the top 25 players each year, the “most valuable” (i.e. the one receiving the most money) would most likely be the one with the highest WAR. If Bumgarner pitches 8 innings every outing and gives up 0 or 1 runs, he’s absurdly valuable. I don’t give a shit what the rest of his team does - MVP is an individual award, there’s no reason to involve team stats in the decision.
Just to be clear, I’d give that guy the MVP. Fans show up to see a star player on a shit team, and that can be considered “value” as well, and rightfully so.
I wasn’t trying to give a dogmatic treatise, just give a perspective that contrasted with the reasoning behind WAR.
Gotcha. I think you gave a really good explanation of WAR, to be honest. I just have a raw nerve when people start the meaning of valuable in MVP debate.
For the OP - a few years ago, I dug into the WAR equation, to see what it was doing under the hood a little. It’s fun to set up a hypothetical player, and start fiddling with stats. For instance, when you start to drill down, and look at individual plays. How does WAR value a double versus a walk and a stolen base? Or a walk/SB versus a single and a stolen base? It does account for that - a double is worth more than a single/SB is worth more than a walk/SB. And for obvious reasons when you think about it - there’s less risk of screwing up on the basepath for a double, and you’re improving your odds of driving in a run with an extra base hit.
In fact, one other way of looking at it is as “win probability added”. Hitting a homer may not contribute towards a win when you’re down by 12 runs, but it certainly moves the needle towards the odds of winning. The Angels may have a 41% chance of winning a playoff game - but if they didn’t have Mike Trout, those odds might be 29%.
thanks! Obviously the major league players actual stats go into his WAR, but who determines what the stats would be for a hypothetical replacement player, who didnt actually play and isnt real, so did not generate real game stats?
The more detailed discussions of WAR on Baseball-Reference.com indicate that they use the average offensive performance for each league, and each season, in order to figure out what an “average” player would be. They also note that they exclude the stats for pitchers batting in their calculations.
So, for example, to calculate Mike Trout’s WAR for 2018, they would be comparing against the average performance for American League batters in 2018.
Details: Position Player WAR Calculations and Details | Baseball-Reference.com
It’s set at an arbitrary number.
The Baseball Reference default for the “Replacement Player” is that a team full of those guys should only win about 30% of their games. They don’t look at minor league talent or anything like that, they just arbitrarily say 30%, and then try to measure what sorts of stats (from real players) would average out to winning less than a third of your games.
But the whole “didn’t actually play and isn’t real” thing is, theoretically, not quite fair. Good players do get injured, sent to the IL, and ballclubs pull up replacements on the 40-man squad from Triple-A. If you wanted, you could actually try to measure the average quality of this sort of player, who is sent back down just as soon as the regular player is healed. I mean… they don’t actually do that. They just flatly state a certain quality level and then stick to it. But they could. Replacement-level players are a real thing. It’d just get messy trying to do that, because different clubs have much different depth in the minors, different rates of injuries, etc., so it would create a bit of variance which would be more accurate to the literal meaning of WAR for a given year, but very volatile from year to year, all of that without adding much insight. Better to move beyond the literal, and aim for year-over-year consistency and insight instead.
So a team full of WAR=0 players should win about 30% of their games. That’s how the stat is calibrated.
I think you’ve slightly misread that. Or else I’m slightly misreading your post.
Look at the beginning of that cite’s explanation:
You get a score of 0 within each of those first five (out of six) categories for being average. However, if you actually get that average score for every category, you’ll absolutely have positive WAR. Summing a bunch of zeroes gives a positive number, in this case.
A team full of average players should be 0.500, that is to say, they should win about 81 games in a typical season. But a team full of “replacement players” should win only about 30% of their games, or about 48 games.
A team full of players who are achieving average statistics in all of those categories (meaning they’re getting a score of 0 on the first five of six of those categories) should have (on average) about 33 more wins than a team full of “replacements”. That’s 33 WAR to be divided among the players of the team. A player who literally is average on every quality is going to have a WAR over 1.0 pretty easily. Being average is valuable. It sure beats being below average, which about half the players are.
Of course, in some sense it doesn’t really matter where you’re setting the baseline for “replacement players”, as long as you’re consistent: Stats like WAR don’t mean anything by themselves; you’re always comparing one player’s stats to another’s.
Has the formula for WAR ever been revised? Perhaps at some point, the sabremetricians discovered that some stat that they previously overlooked was a decent predictor for wins?
Possibly both.
I did scratch my head a bit at the “based on the average offensive output for that league and season,” too, because, I agree, that would seem to yield replacements which would finish at about .500. But, the BR posts seemed to be pretty clear that their calculation for an individual player’s WAR for a particular year is done using their offensive averages for the league and year in which that player played, which was mostly the point I was (inelegantly) trying to make.
My educated guess is that the WAR formula uses the league averages for the year in question, and then adjusts those averages downward (which is, I expect, done in a somewhat arbitrary fashion) to yield the level of “replacement player” you note (in which a team of them would only be expected to win 30% of their games).
That’s not true, no. WAR is a measure of value in the past. When the stats tell you Mookie Betts was worth 10.8 WAR in 2018, that means he was worth 10.8 in 2018, not that he will be worth that in the future. It is a resonably predictive (e.g. consistently remains good or bad for the same player) stat, but a lot of stats are. Home runs are very predictive, too.
It wasn’t, no. Judge’s WAR was clearly too high, at least based on how Baseball Reference and Fangraphs do it, because they base a player’s value on the team’s Pythagorean record, NOT its actual record. bWAR and fWAR are, with due respect to the great work those folks do, incoherent; they neither predict the future nor accurately describe the past (in cases where a team misses its Pythagorean.) Had the Yankee figures been adjusted correctly, the way Win Shares are, Judge would have been significantly below Altuve.
Most players are below average, if you think about it. The average player at any given point in time in a baseball game in progress is average, but most players in MLB are below average. Above average players play more; guys like Christian Yelich play every day, and guys like Justin Verlander get regular innings. Guys like Joe Shlabotnik get cups of coffee. MLB is made of a small number of average and above average players who get the lion’s share of playing time, and many, many below average players who get the scraps. That’s why a guy who’s average is actually really valuable.
But that player can still help his team win games. If his shit team went 40-122, but without that guy would have gone 22-140, he was worth 18 wins (which would make him the greatest player who ever lived.)
No team actually wins NO games. All teams in modern baseball win games; you cannot find a single example of an exceptional player on a bad team who created no wins. Last year Jacob DeGrom was worth at least 8-10 additional wins to the Mets. They were bad but would have been WAY worse without him; just look at his game log and it’s plainly obvious he won them a lot of ballgames they would otherwise have lost. That is value.
Is “number of increased wins” even linear? Like, at least theoretically, if you swapped out one player for a new one with a WAR that was 5 greater than the old one, you’d expect (to within the limits of random variation) to win about 5 more games the next season. But if you swap out five players for a player with a WAR 5 greater, would you expect to win 25 games more? More than that, or less? Likewise, would you expect that player to add 5 wins to a team that’s otherwise very good, as well as to a team that’s otherwise very bad? Are there positions that are synergistic with each other, where having both be good is more than twice as good as only one of them being good?
That is the reasoning that kept getting Randy Johnson screwed out of the Cy Young Award. Great pitching plus bad offense = mediocre win/loss.
I don’t think this is accurate. Imagine a team that scores 2 runs every game and a team that scores 0 runs 80% of the time and 10 runs 20% of the time. Same averages, but one team wins 80% of the games. Mean stats does not imply mean wins.
Or imagine a team gets 1 hit every single inning, it’ll be unlikely to score many runs, while a team that average 9 hits a game, but clusters them up will score more runs.
But are there actually any teams that consistently cluster their statistics in that way? What would the mechanisms be that would cause that? Sure, there are probably some that historically have happened to be clustered, but would you expect them to continue to be clustered in the future?
This is exactly the point I was trying to make here, which you elided:

if you have the guy available for future games, and he gives similar performances in those future games, then how many more games do you expect you would win with the star pitcher?
Underline added. I tried to explicitly specify what kind of “prediction” WAR is intending to make.
The problem here is that the English language isn’t exactly designed to describe the extremely weird, counterfactual, rewind-history kind of philosophy that WAR embodies. Part of the reason for this is that the idea behind WAR itself is a sort of rarefied exemplar of a certain kind of statistical philosophy, taken to the absolute extreme. That extreme can be seen as very strange, even if the philosophy behind it is not.
And really, the philosophy behind it also strikes many as strange, even tho it honestly shouldn’t. From here we get into those underpinnings of stats stuff that bores a lot of people, but that I find fascinating. I’d go so far as to say it’s intellectually fundamental. Baseball is actually a great place to talk about it. There’s a reason Nate Silver started as a sports statistician. But that’s a huge discussion.

It wasn’t, no. Judge’s WAR was clearly too high, at least based on how Baseball Reference and Fangraphs do it, because they base a player’s value on the team’s Pythagorean record, NOT its actual record.
The philosophy behind WAR is to judge based on idealized record, rather than actual record.
For anyone who accepts the premise behind it (which you don’t seem to and don’t have to), they should probably also accept the result that Judge’s WAR in 2017 was in the right neighborhood.

bWAR and fWAR are, with due respect to the great work those folks do, incoherent; they neither predict the future nor accurately describe the past (in cases where a team misses its Pythagorean.) Had the Yankee figures been adjusted correctly, the way Win Shares are, Judge would have been significantly below Altuve.
This is the same sort of thing.
I’m not out to defend WAR itself, because I haven’t personally gotten my hands dirty with the data. Maybe I’d disagree with choices they’d made. But if they’re “not adjusting correctly”, as you style it, that’s because they fundamentally disagree about what’s the correct way to adjust. That’s where the issue is. Why they disagree is a huge issue, because it cuts to the heart of what statistical analysis is supposed to be. But that would take quite a lot of discussion.
I actually starting writing up some of those ideas, and I got to fifteen hundred words of gobblygook that I might be able to salvage in a few days. Suffice it to say that I think WAR represents a sort of lofty statistical ideal that’s abstruse enough to rub people the wrong way. It serves a very particular purpose that not everyone cares about.
That does not mean it’s “incorrectly adjusted”. They adjust it as they do quite deliberately in order to meet that ideal.

But are there actually any teams that consistently cluster their statistics in that way? What would the mechanisms be that would cause that? Sure, there are probably some that historically have happened to be clustered, but would you expect them to continue to be clustered in the future?
Batting averages are not uniformly distributed on a team. Consider a team that has every player with a 0.250 batting average. Consider a second team that has 3 players at the top of the order that bat 0.750 and the rest with 0.000. The latter will have hits clustered around where the top of the order is at bat vs the first team.
I haven’t done a simulation, but it seems to me that a team with outlier players in both the good and bad direction would tend to beat a team with mean stats.
To put it simply: What you really want is a prediction of who will win games. But you can’t get that. The best you can do is an estimate based on the information you have, which is what happened in previous games. Prior outcomes do not guarantee future results, but they’re still the best you can do. WAR is a way of combining past statistics to most accurately (or so the statisticians believe and hope) predict the outcomes of future games.
Question, and no offense to anyone:)- if WAR does cause such friction among stat lovers, due to the number of variables, the obtuseness, and the fact it is based on a hypothetical baseline (which yes is irrelevant because that base, even if you don’t agree, serves as the base for everyone and therefore fair)- why not stick to actual measurable stats?
Point being, WAR says Mike Trout is a god, but ANY stat like WAR, if based solely on the real stat history of the player, as long as it is a reasonable mixture of positives minus negatives, whatever it is (doubles, plus homers, plus batting average minus strikeouts, minus GIDP’s. etc. etc.) I am sure would end up showing Trout is among the best ever, and all other players with high WAR would look just as good- IOW, WAR doesn’t seem to provide any deep insight other type stats don’t- its not like WAR has revealed instead of being average, that Doug Decinces is actually the best player in the modern ERA.
So I guess why make the stat so difficult, when simpler ones give you the same results- best and worst players, just maybe a slight difference in order? I mean, is WAR really a better indicator as to how good is pitcher is than WHIP, or ERA? Every pitcher with a WHIP under one or close is HOF material, A bad pitcher with a WHIP under one does not exist.
Does WAR as it is produce something other stats don’t, other than its title, wins above replacement? I get that its fun and interesting, but seems odd to have all that effort to show Trout Rules, when OPS and other (easier to figure) show the same thing?