Ask the guy who's downloaded at the Retrosheet baseball data into a database.

So, in yet another one of my creative attempts to avoid doing any real work, i was playing around on Retrosheet and decided to download all their game logs and import them into an Access database table.

I now have a single table with 184,147 records, encompassing regular season games starting in 1871 and going all the way through to the end of 2004. It’s a lot of fun playing around with it, asking it for the sort of meaningless statistics that TV commentators seem to love so much.

For example, did you know that, in night games played between the Baltimore Orioles and National League teams in the years 1998-2004, Baltimore pitchers pitched a total of 10 complete games. Of those 10 games, 9 were at home and 1 was on the road, and 9 were victories while 1 was a loss.

Or, that during Cal Ripken’s career, in games where Ripken played shortstop and batted at number 5 in the order, the Orioles won 63 games by 4 runs or more?

So, tell me what you want to know. I’ll do my best to come up with queries (either using the Access query builder, or trying to write my own SQL phrase) that will answer your question.

To give you an idea of the information at my disposal, here is a guide to the statistics contained in the Retrosheet game logs.

By the way, if there are any database gurus out there who have any suggestions as to how i might organize this data more efficiently than in just one massive table, i’d be happy to hear it. Part of the reason to download this data in the first place was to give me a chance to practice my rudimentary knowledge of SQL. I’m really no database expert.

How many home runs were hit by national league pitchers during Tuesday night games between 1957 and 1983? :stuck_out_tongue:

Heh, mhendo, that’s pretty dedgum cool. Yes, you’ve got hundreds of future work hours quite suddenly at risk.

Let’s see… how about something germane to tonite’s game? Help me formulate something pertinent cuz left to my own devices my question’s gonna suck.

Ummm… how many times (or what percentage of the time) does an NL wildcard team win the LCS in 5, 6 and 7 games? If that needs to be improved upon, please be my guest.

Sorry guys.

You both asked questions that the data can’t answer. It doesn’t contain individual player stats, so i can’t answer World Eater’s question. And it’s regular season games only (no postseason), so lieu’s question is out.

Have a look at the second link in the OP and it will give you an idea of what stats are available.

How many of them were with full counts?

How many with full counts, a runner on second, and 1 out?

Ok, ok I’ll quit while I’m ahead. I have a sinking feeling you’re going to punch me next time we go for a drink!

Ok, a serious question. Is it possible to determine which team historically has the best winning percentage in the second game of a doubleheader?

Wow. I think that it might be. The main limitation on this will be my SQL query-building skill. Let me see what i can do.

Well, it took a bit of messing around, but i’ve finally got some figures.

In order to make it manageable (at least for someone with my low level of database knowledge), i restricted the time period to the years since 1970. That way, i didn’t have to deal with a whole bunch of franchise name changes. There were a few, but not enough to make life much more difficult.

Anyway, in the period since 1970, the team with the best record in the second game of a double header is the Florida Marlins, with a record of 14-8, or .636. The team with the worst record in these situations is the Tampa Bay Devil Rays, with a record of 3-8, or .273.

Now, it’s clear that neither of those teams have played too many double headers. If we look only at teams who have played more than 100 such games, then the team with the best record is the New York Yankees at 132-80 (.622), and the worst is the Houston Astros at 55-79 (.410).

Here’s the full sheet, in order from worst to best:

Team	Wins	Losses	Percentage

TBA	3	8	0.272727273
HOU	55	79	0.410447761
TOR	43	60	0.417475728
ANA	57	76	0.428571429
SEA	28	36	0.4375
ATL	89	111	0.445
CHN	92	114	0.446601942
DET	95	113	0.456730769
MIL	107	125	0.461206897
COL	14	16	0.466666667
CLE	140	153	0.4778157
NYN	126	137	0.479087452
TEX	92	100	0.479166667
CHA	108	117	0.48
OAK	94	98	0.489583333
SDN	97	100	0.492385787
PHI	114	114	0.5
MON	119	118	0.502109705
CIN	87	82	0.514792899
BAL	297	278	0.516521739
KCA	100	91	0.523560209
SFN	91	82	0.526011561
LAN	51	45	0.53125
MIN	97	83	0.538888889
BOS	114	96	0.542857143
SLN	99	78	0.559322034
PIT	138	105	0.567901235
ARI	4	3	0.571428571
NYA	132	80	0.622641509
FLO	14	8	0.636363636

Well as usual the Mets are losers. :stuck_out_tongue: Since I’ve been riding Yankee coattails for sometime though, this is good news. I wonder what the correlation, if any, is.

There was a game this year in which the Yankees beat the Devil Rays 20 - 11. I noticed that in that game, both teams had 8-run leads at some point. In another thread, I asked if anyone knew the last time there a was a game in which both teams had 8-run leads. Rufus Xavier pointed me to a 1999 game between Cleveland and Tampa Bay in which this happened. I guess there’s something about Tampa Bay.

Can you give me get me stats on how often this has happened in MLB history? I suppose a good start is to look at games in which at least 24 runs were scored, with at least 16 by one of the teams.

Can you provide a complete list of games since 1973, in AL parks, where a manager elected not to use a DH at the start of the game and allowed the pitcher to bat? I’m aware that Ken Brett did it a couple of times for the White Sox and I think Fergie Jenkins did it at least once but I don’t know if there were more.

It’s a difficult thing to do because of the way that the inning by inning stats are laid out. They are in a single cell, as one long number. For example, the home team line score will be in one cell, and might look like 41000301x, with the visiting team line score in another cell saying 200030110. Because of this layout, i don’t know how to track the inning by inning swings within each game.

I did look for all games in which the winning team scored at least 16 runs and the losing team scored at least 8. That gave me about 700 games in total. Now, the number that i had to look through was reduced by the fact that very few of the games before WWI have inning by inning line scores available. When you only have the final result, it’s impossible to say what the in-game swings were.

For the games where i did have the line scores, i scanned them one by one, looking for cases where the losing team went up by a big margin early in the game. So i basically looked for line scores in which the losing team had big numbers early, and the winning team had small numbers early.

I don’t claim that this method was perfect, or that i couldn’t have missed something, but as far as i could see the only game in which both teams led by 8 runs or more was the Tampa-Cleveland game you mentioned.

There were some games in which the losing team jumped out early, but the winning team were never actually eight runs behind. For example, on July 11, 1947, St. Louis scored 9 in the first two innings and ended up losing to the New York Giants 17-9. But St. Louis never led by 8, because the Giants scored 11 of their 17 runs in the first two innings also.

Sorry i can’t be more definitive. If anyone knows how i might extract this sort of data from within those line score categories, i’d love to hear it.

Well, if i’ve done this right, then i’ve come up with four games. You were correct about Breet and Jenkins. But there was one more.

Date			Teams		Pitcher/Team		No. in Batting Order

July 6, 1976		CHA@BOS		Ken Brett (CHA)		9
Sept. 23, 1976		MIN@CHA		Ken Brett (CHA)		8
Sept. 27, 1975		CAL@OAK		Ken Holtzman (OAK)	9
Oct. 2, 1974		TEX@MIN		Fergie Jenkins (TEX)	9

OK, this got me out of Lurker mode. I agree this is fascinating.

I just downloaded the All Star game data so I could see what it looked like.

I have a couple of requests for you:

How many games did Willie Mays (maysw101) start, but not in centerfield (position code 8)? We could do this for any number of players. In fact, how many games did Babe Ruth start at 1st base (position code 3)?

How many games in each year 1900-1930 were won by a pitcher other than the starting pitcher? My accuracy challenged perception is that there were not a whole lotta relief pitchers back in the day, so pitchers were starters and they either won or lost. Actually that same stat for the years 2000-2004 would be an interesting counterpoint, too.


Mmmm. All three were pitchers, all three played for one or the other Chicago team. I don’t think that’s a coincidence.

Unfortunately, gaps in the data provided by Retrosheet means that i was only able to answer parts of your questions.

I put together a query asking how many games Willie Mays started at a position other than center field. Then, just to be sure, i also put together a query asking how many games he DID start in center.

The totals were:

1691 in center
80 in a different position.

But this gives a total well short of Mays’ total of 2992.

The problem is that all position players are only listed from 1960 onwards. Those fields are completely blank for the period before. So my figures for Mays’ positions miss the first nine years of his career.

For the same reason, i can’t tell you how many games the Babe started at first.

Also, the only listing for pitchers in the early 20th century is for starting pitchers. Winning and losing pitchers aren’t listed for that period, so i’m afraid i can’t answer that question either. Sorry.

This spreadsheet/database really is strongest for the period since 1960. Retrosheet is adding new data all the time, but there are still many gaps for the earlier years.

I was able to answer the last part, though:

For the period 2000-2004, there 12,142 games played altogether in the majors.

Of these, 3,652 (~30%) were won by a pitcher other than a starting pitcher.

Now this is what I call a delurk. 16 posts in 5 years, wow!

And I pay!

Thanks mhendo. 80 games at some position other than Center for Mays is very interesting. And I am surprised that only 30% are won by other than the starter. Really fascinating data.


What percentage of double headers are split?

Thanks mhendo. I certainly appreciate the effort.