So, in yet another one of my creative attempts to avoid doing any real work, i was playing around on Retrosheet and decided to download all their game logs and import them into an Access database table.
I now have a single table with 184,147 records, encompassing regular season games starting in 1871 and going all the way through to the end of 2004. It’s a lot of fun playing around with it, asking it for the sort of meaningless statistics that TV commentators seem to love so much.
For example, did you know that, in night games played between the Baltimore Orioles and National League teams in the years 1998-2004, Baltimore pitchers pitched a total of 10 complete games. Of those 10 games, 9 were at home and 1 was on the road, and 9 were victories while 1 was a loss.
Or, that during Cal Ripken’s career, in games where Ripken played shortstop and batted at number 5 in the order, the Orioles won 63 games by 4 runs or more?
So, tell me what you want to know. I’ll do my best to come up with queries (either using the Access query builder, or trying to write my own SQL phrase) that will answer your question.
To give you an idea of the information at my disposal, here is a guide to the statistics contained in the Retrosheet game logs.
By the way, if there are any database gurus out there who have any suggestions as to how i might organize this data more efficiently than in just one massive table, i’d be happy to hear it. Part of the reason to download this data in the first place was to give me a chance to practice my rudimentary knowledge of SQL. I’m really no database expert.
Heh, mhendo, that’s pretty dedgum cool. Yes, you’ve got hundreds of future work hours quite suddenly at risk.
Let’s see… how about something germane to tonite’s game? Help me formulate something pertinent cuz left to my own devices my question’s gonna suck.
Ummm… how many times (or what percentage of the time) does an NL wildcard team win the LCS in 5, 6 and 7 games? If that needs to be improved upon, please be my guest.
You both asked questions that the data can’t answer. It doesn’t contain individual player stats, so i can’t answer World Eater’s question. And it’s regular season games only (no postseason), so lieu’s question is out.
Have a look at the second link in the OP and it will give you an idea of what stats are available.
Well, it took a bit of messing around, but i’ve finally got some figures.
In order to make it manageable (at least for someone with my low level of database knowledge), i restricted the time period to the years since 1970. That way, i didn’t have to deal with a whole bunch of franchise name changes. There were a few, but not enough to make life much more difficult.
Anyway, in the period since 1970, the team with the best record in the second game of a double header is the Florida Marlins, with a record of 14-8, or .636. The team with the worst record in these situations is the Tampa Bay Devil Rays, with a record of 3-8, or .273.
Now, it’s clear that neither of those teams have played too many double headers. If we look only at teams who have played more than 100 such games, then the team with the best record is the New York Yankees at 132-80 (.622), and the worst is the Houston Astros at 55-79 (.410).
Here’s the full sheet, in order from worst to best:
Team Wins Losses Percentage
TBA 3 8 0.272727273
HOU 55 79 0.410447761
TOR 43 60 0.417475728
ANA 57 76 0.428571429
SEA 28 36 0.4375
ATL 89 111 0.445
CHN 92 114 0.446601942
DET 95 113 0.456730769
MIL 107 125 0.461206897
COL 14 16 0.466666667
CLE 140 153 0.4778157
NYN 126 137 0.479087452
TEX 92 100 0.479166667
CHA 108 117 0.48
OAK 94 98 0.489583333
SDN 97 100 0.492385787
PHI 114 114 0.5
MON 119 118 0.502109705
CIN 87 82 0.514792899
BAL 297 278 0.516521739
KCA 100 91 0.523560209
SFN 91 82 0.526011561
LAN 51 45 0.53125
MIN 97 83 0.538888889
BOS 114 96 0.542857143
SLN 99 78 0.559322034
PIT 138 105 0.567901235
ARI 4 3 0.571428571
NYA 132 80 0.622641509
FLO 14 8 0.636363636
Well as usual the Mets are losers. Since I’ve been riding Yankee coattails for sometime though, this is good news. I wonder what the correlation, if any, is.
There was a game this year in which the Yankees beat the Devil Rays 20 - 11. I noticed that in that game, both teams had 8-run leads at some point. In another thread, I asked if anyone knew the last time there a was a game in which both teams had 8-run leads. Rufus Xavier pointed me to a 1999 game between Cleveland and Tampa Bay in which this happened. I guess there’s something about Tampa Bay.
Can you give me get me stats on how often this has happened in MLB history? I suppose a good start is to look at games in which at least 24 runs were scored, with at least 16 by one of the teams.
Can you provide a complete list of games since 1973, in AL parks, where a manager elected not to use a DH at the start of the game and allowed the pitcher to bat? I’m aware that Ken Brett did it a couple of times for the White Sox and I think Fergie Jenkins did it at least once but I don’t know if there were more.
It’s a difficult thing to do because of the way that the inning by inning stats are laid out. They are in a single cell, as one long number. For example, the home team line score will be in one cell, and might look like 41000301x, with the visiting team line score in another cell saying 200030110. Because of this layout, i don’t know how to track the inning by inning swings within each game.
I did look for all games in which the winning team scored at least 16 runs and the losing team scored at least 8. That gave me about 700 games in total. Now, the number that i had to look through was reduced by the fact that very few of the games before WWI have inning by inning line scores available. When you only have the final result, it’s impossible to say what the in-game swings were.
For the games where i did have the line scores, i scanned them one by one, looking for cases where the losing team went up by a big margin early in the game. So i basically looked for line scores in which the losing team had big numbers early, and the winning team had small numbers early.
I don’t claim that this method was perfect, or that i couldn’t have missed something, but as far as i could see the only game in which both teams led by 8 runs or more was the Tampa-Cleveland game you mentioned.
There were some games in which the losing team jumped out early, but the winning team were never actually eight runs behind. For example, on July 11, 1947, St. Louis scored 9 in the first two innings and ended up losing to the New York Giants 17-9. But St. Louis never led by 8, because the Giants scored 11 of their 17 runs in the first two innings also.
Sorry i can’t be more definitive. If anyone knows how i might extract this sort of data from within those line score categories, i’d love to hear it.
Well, if i’ve done this right, then i’ve come up with four games. You were correct about Breet and Jenkins. But there was one more.
Date Teams Pitcher/Team No. in Batting Order
July 6, 1976 CHA@BOS Ken Brett (CHA) 9
Sept. 23, 1976 MIN@CHA Ken Brett (CHA) 8
Sept. 27, 1975 CAL@OAK Ken Holtzman (OAK) 9
Oct. 2, 1974 TEX@MIN Fergie Jenkins (TEX) 9
OK, this got me out of Lurker mode. I agree this is fascinating.
I just downloaded the All Star game data so I could see what it looked like.
I have a couple of requests for you:
How many games did Willie Mays (maysw101) start, but not in centerfield (position code 8)? We could do this for any number of players. In fact, how many games did Babe Ruth start at 1st base (position code 3)?
How many games in each year 1900-1930 were won by a pitcher other than the starting pitcher? My accuracy challenged perception is that there were not a whole lotta relief pitchers back in the day, so pitchers were starters and they either won or lost. Actually that same stat for the years 2000-2004 would be an interesting counterpoint, too.
Unfortunately, gaps in the data provided by Retrosheet means that i was only able to answer parts of your questions.
I put together a query asking how many games Willie Mays started at a position other than center field. Then, just to be sure, i also put together a query asking how many games he DID start in center.
The totals were:
1691 in center
80 in a different position.
But this gives a total well short of Mays’ total of 2992.
The problem is that all position players are only listed from 1960 onwards. Those fields are completely blank for the period before. So my figures for Mays’ positions miss the first nine years of his career.
For the same reason, i can’t tell you how many games the Babe started at first.
Also, the only listing for pitchers in the early 20th century is for starting pitchers. Winning and losing pitchers aren’t listed for that period, so i’m afraid i can’t answer that question either. Sorry.
This spreadsheet/database really is strongest for the period since 1960. Retrosheet is adding new data all the time, but there are still many gaps for the earlier years.
I was able to answer the last part, though:
For the period 2000-2004, there 12,142 games played altogether in the majors.
Of these, 3,652 (~30%) were won by a pitcher other than a starting pitcher.
Thanks mhendo. 80 games at some position other than Center for Mays is very interesting. And I am surprised that only 30% are won by other than the starter. Really fascinating data.