I’m a member of a database message board, and i may ask this question there later, but i know people here better, and i know there are people here with database experience, so i thought i’d give the Dope a try first.
I’m putting together a little database of websites related to history (which is my field of study and teaching). In particular, i’m focusing on sites that have primary sources in history (documents, letters, manuscripts, video, audio, etc.). I would like this database to be searchable in a variety of ways, using a variety of different categories.
I may come back later with questions about how to implement the more complex stuff, but right now i want to start with more basic questions of setting up the tables, and of organizing the data properly.
So, first of all, the table containing the websites themselves will look something like this:
Sites
SiteID
SiteName
SiteURL
SiteHost
StartYear
EndYear
I’m not an expert, but it seems to me that this conforms with the ideals of 1NF. (ETA: actually, SiteHost should probably be moved off to a table by itself, because a particular university or company might host more than 1 site)
The SiteID will be an autoincrement, and will also be the Primary Key for the table. I could, theoretically, make the URL the PK (as URLs are unique), but the length and complexity of some URLs would make that too messy, i think.
The StartYear and EndYear functions will contain the range of dates covered by each website. So, for example, a site that covers the Great Depression, New Deal and recovery might have StartYear=1929 and EndYear=1945. I figure that this will make it easy for the user (me) to search for sites within a particular date range.
Anyway, in addition to the main table, i want to have other tables that break each site into categories for classification and searching. Categories i would like to have include:
[ul]
[li]Geographical area: which region, nation, and states/provinces does the site cover?[/li][li]Time periods: these would be named time periods (in contrast to the StartYear and EndYear values), such as “Revolutionary,” “Antebellum,” “Progressive Era,” etc.[/li][li]Topic areas or themes: using standard historians’ terms, such as “Social History,” “Sex/Gender,” “Race,” “Economic History,” “Historiography,” etc.[/li][li]Document types: a selection like the list i gave in the introduction, including Letters, Maps, Images, etc., etc.[/li][/ul]
There may (and probably will) be other things, but you get the idea.
A key problem i’m confronting here is that the relationships between the main Sites table and these category tables will have to be many-to-many. After all, a site might cover more than one geographical region, more than one time period, etc., etc.
I’m cognizant of what needs to be done to set up a many-to-many relationship. I need an intermediate table with foreign keys that correspond to the primary keys of the two main tables. So, to link my Sites table in a many-to-many relationship with, say, a Nations table, i would have:
Sites SiteNation Nations
SiteID SiteID (fk) NationID
SiteName NationID (fk) NationName
SiteURL
SiteHost
StartYear
EndYear
My first issue here is one that i can’t seem to find a definitive answer on anywhere. It’s the question of whether the linking table, SiteNation, needs to have its own auto-increment primary key, or whether it’s sufficient to just use the two columns themselves to form a compound primary key. Some sites and books i’ve read insist that each table should always have a single-column primary key, while others say that a composite one is fine, especially for tables like this. Anyone have any advice on this issue?
Second, and much more difficult, is the problem i’m having working out how to deal with the question of geographic locations in my database. I’d like to identify, where possible, which nations AND which states/provinces a website covers. So, for example, a site about relations between Europeans and Indians in the Great Lakes region might cover the nations France, England, Canada, and the United States, as well as the States/Provinces of Ontario, Quebec, Michigan, Illinois, etc., etc. You get the picture.
What i’m wondering is how to deal with this geographical information in tables. I could, theoretically, have a single table with all geographic entities: nations, states, etc. It would be a single long list, and would have a many-to-many relationship with the Sites table.
But one problem with this is that it doesn’t establish any relationship between the states and the nations. It seems to me that the data should describe, somewhere, the fact that Michigan is not only a geographic entity in its own right, but that it is a part of the United States of America. Of course, i could have a table with nations, and then a bunch of other tables, each containing the states of one nation, and each nation could them be linked to its sub-table. But i don’t know how to do this, and also to connect those sub-tables back to the main Sites table.
Am i being clear about what i’m trying to do here, and what my problems are? If not, please ask for clarification, and i’ll try to explain further.
I’ll leave it there for now. If anyone can offer advice, or point me to somewhere that i can find some answers, i’d be grateful.