Microsoft Access and the Lahman Baseball Database

If someone knows what I am talking about, I will be amazed, but armed only with my small brain and an Access Manual, I can’t figure this out.

The database is here.

I want to get a list of the career stats for all active players. In other words, I want to narrow the field to all active players and for each player to have each year of his active stats. If someone knows the database and Access, could you clue me in? I am lost.


Ahhh, Lahman.

I have a copy on my computer that I have done a few things with. Let me play with it for a little bit and I’ll see what I can come up with.

I have gone through the initial steps. Perhaps there are better ways to do this, but what I did works to some extent. It will leave you with queries populated with the data you want. Creating the reports to pretty them up (and give you such things as career totals, batting average, OBP, SLG%, etc) I leave up to you. Just use the following SQL statements and paste them into a table or query as noted. Each post will be a separate SQL statement for a separate table or query.

Confused yet? I might be. Let me know if this helps or makes sense.

Since hitters and pitchers are a different breed, I used 2 different actions to get the relevant data.

To isolate hitters active in 2001, I used the following SQL statement and saved it as a query (name it whatever you want):

SELECT Batting.LahmanID, Batting.Year, Master.LastName, Master.FirstName INTO 2001Hitters
FROM Batting RIGHT JOIN Master ON Batting.LahmanID = Master.LahmanID
WHERE (((Batting.Year)=2001))
ORDER BY Master.LastName;

On to the pitching stats for 2001. On a side note, I am using the Lahman database only updated through 2001. If there is a newer version with 2002 numbers, adjust all SQL statements that use 2001 and change it to 2002. Obviously the last table will have pitchers in it, but it is going to be used just for their hitting stats (although outside of Mike Hampton, most of those are pretty useless).

This will isolate pitchers active in 2001. Save it as a query:

SELECT Pitching.LahmanID, Pitching.Year, Master.LastName, Master.FirstName INTO 2001Pitchers
FROM Pitching RIGHT JOIN Master ON Pitching.LahmanID = Master.LahmanID
WHERE (((Pitching.Year)=2001));

At this point, you now have 2 new tables that contain only pitchers or hitters that played in 2001. To do that requires another pair of queries that will take the isolated players and bring in their yearly career stats.

For the hitters, create a new query and use the following SQL statement:
SELECT [2001Hitters].LastName, [2001Hitters].FirstName, Teammaster.City, Teammaster.Nickname, Batting.Year, Batting.G, Batting.AB, Batting.H, Batting.[2B], Batting.[3B], Batting.TB, Batting.HR, Batting.RBI, Batting.R, Batting.SB, Batting.CS, Batting.BB, Batting.SO, Batting.HBP
FROM (2001Hitters INNER JOIN Batting ON [2001Hitters].LahmanID = Batting.LahmanID) INNER JOIN Teammaster ON Batting.Team = Teammaster.Team
ORDER BY [2001Hitters].LastName, [2001Hitters].FirstName;
I didn’t include some stats (Sac flies for instance). Obviously, you can remove or add what you want, but this gives a pretty good overview of the most noted total stats (note: this does not include averages. That’s to be written in the report.)

And now, for the pitchers. Once again, plop the SQL into a query. Most commonly referred to stats are included. Change as you wish.

SELECT [2001Pitchers].LastName, [2001Pitchers].FirstName, Teammaster.City, Teammaster.Nickname, Pitching.Year, Pitching.G, Pitching.W, Pitching.L, Pitching.GS, Pitching.CG, Pitching.SH, Pitching.SV, Pitching.IP, Pitching.H, Pitching.ER, Pitching.HR, Pitching.BB, Pitching.SO, Pitching.ERA
FROM (2001Pitchers INNER JOIN Pitching ON [2001Pitchers].LahmanID = Pitching.LahmanID) INNER JOIN Teammaster ON Pitching.Team = Teammaster.Team
ORDER BY [2001Pitchers].LastName, [2001Pitchers].FirstName;
What do these 4 queries leave you with? A pair of tables that contain the career stats, on a year by year basis, for all players active in 2001. You will now need to write reports to pretty things up. In the reports, I would suggest grouping by the following (specifically using hitters).
Group by last name (set visibility property to “No”)
Group by first name (leave visible. Put the first and last name fields in this section)
Group by year (put year and all other fields in here. This is also where you can calculate yearly averages, OBP, OPS, etc)
Use the first name footer section where you can calculate the career stats based off of the yearly stats. Use the same formulas for Avg, etc that you created above. Just paste them in this section.

This should work, but I flew through it pretty quickly. Let me know how things turn out.

Mullinator, I will work on this tonight.
::crosses his fingers::

Hercules! Hercules!

It worked. Of course, now I have to get greedy and try to get the position added. For some reason when I do that (from the fielding menu, which is the only place a see the position field) I end up getting nine entries for each year for each player. Any ideas what I am doing wrong?

Here is the SQL (although I had to cheat and do it using the wizard because I didn’t know how to input the correct parameters):

SELECT [2002 Hitters].playerID, Batting.yearID, Batting.teamID, Batting.lgID, Batting.G, Batting.AB, Batting.R, Batting.H, Batting.[2B], Batting.[3B], Batting.HR, Batting.RBI, Batting.SB, Batting.CS, Batting.BB, Batting.SO, Master.nameLast, Master.nameFirst
FROM (Master INNER JOIN [2002 Hitters] ON Master.playerID = [2002 Hitters].playerID) INNER JOIN Batting ON Master.playerID = Batting.playerID

I named this query “2002 Hitters Cumulative.”
When I try to add the position from the fielding table, I get this:

SELECT [2002 Hitters Cumulative].playerID, [2002 Hitters Cumulative].nameFirst, [2002 Hitters Cumulative].nameLast, [2002 Hitters Cumulative].yearID, [2002 Hitters Cumulative].teamID, [2002 Hitters Cumulative].lgID, Fielding.POS, [2002 Hitters Cumulative].G, [2002 Hitters Cumulative].AB, [2002 Hitters Cumulative].R, [2002 Hitters Cumulative].H, [2002 Hitters Cumulative].[2B], [2002 Hitters Cumulative].[3B], [2002 Hitters Cumulative].HR, [2002 Hitters Cumulative].RBI, [2002 Hitters Cumulative].SB, [2002 Hitters Cumulative].CS, [2002 Hitters Cumulative].BB, [2002 Hitters Cumulative].SO
FROM (Master INNER JOIN [2002 Hitters Cumulative] ON Master.playerID = [2002 Hitters Cumulative].playerID) INNER JOIN Fielding ON Master.playerID = Fielding.playerID;

Thanks Mullinator. You are a genius!

Tried to do some quick troubleshooting on your above problem, but I am getting some weird stuff showing up so I might try to play around with it in a different way.

Methinks I have moved to the top of someone’s favorite poster list.

If I figure out another query for position, I’ll post it. But, I have tended to find that including position makes things needlessly complicated based on how Lahman has the database set up.

I also think this thread has brought me the most joy on this board, even if we are the only 2 reading it.

Clarification question on position.

Are you looking to show their primary position or to break down their stats by position played? For instance, take Kurt Abbott in 1993 (for no reason other than that he appears at the top of the query results).

In the fielding table, the '93 Abbott played 2 games at 2B, 13 at OF, 6 at SS.
In the Master table, his position is listed as SS, which is his primary career position.

Obviously, the choice will greatly affect how the query is written and the subsequent results.

Just curious as someone that writes a lot of SQL for ASA and Oracle databases. …

Could you save yourself some typing by using aliases such as:
SELECT [a].playerID, [a].nameFirst, [a].nameLast, [a].yearID, [a].teamID, [a].lgID, b.POS, [a].G, [a].AB, [a].R, [a].H, [a].[2B], [a].[3B], [a].HR, [a].RBI, [a].SB, [a].CS, [a].BB, [a].SO
FROM (Master INNER JOIN [2002 Hitters Cumulative][a] ON Master.playerID = [a].playerID) INNER JOIN Fielding b ON Master.playerID = b.playerID;

I don’t use Access but I thought their syntax was to put the column names, not table names, in . Is this incorrect?

Ideally, I would like for it to show the primary position played during a given year. I don’t need stats broken out by position.

I don’t know because I use the design wizard and then look at the results in the SQL view. I tried it to see what happens, but when I try to run the query, it asks me for a parameter value for [a].playerID, etc. and I don’t know what to put to get complete results.