Hello,
I hope I can get a bit of advice regarding database design. I’m a complete amateur at this, but being the only one with some skills, it’s fallen to me to do the implementation. Hopefully I can explain my question in an organized fashion:
Information to track
There are five (though there will soon be more) surveys. Each survey has some slight variation. In short, the survey questions can be divided into three categories:
[ul]
[li] Biographical information (name, address, etc.)[/li][li]Survey questions (e.g. “do you like filling out surveys?”, and “do you own any pets?”)[/li][li]Item checkboxes (e.g. "Which of the following pets do you own Cat__ Dog__ Llama __ Platypus __)[/li][/ul]
The biographical information fields will never change and are identical across surveys.
The survey questions may change in the future, but right now half are all identical across all surveys and half vary between surveys, though there is some overlap.
The item checkboxes will definitely change over time. There is some overlap between surveys (i.e. “cat” appears on all surveys at the moment, but some have “cat” and “dog” while others have “cat” and “llama”).
How it’s set up now
There is a main table to collect the common biographical information, and a sub-table for each individual survey.
[ul]
[li]tbl_main has all biographical fields and all questions that are identical on all surveys[/li][li]tbl_survey1 … 2 … 3 … N have fields for those questions that vary and for the checkboxes of the respective survey. There is also a foreign key that relates the record to tbl_main. [/li][/ul]
This seems somehow inelegant, but I’m not quite proficient enough to say why. It feels wrong to have several fields repeating across sub-tables. What if I want a query that, for example, pulls all people in California who don’t like surveys and have cats or llamas? I think the problems with that are somewhat self evident: I’d need to know/remember the fields of all tables in order to know which to include in the query.
But I’m not sure what the best way to redesign the structure would be (hence, why I’m posting here).
I’m leaning towards just two tables: One table with the unchanging, permanent biographical information in it. Then a second table with a field for each question, regardless of whether the question repeats across surveys. However… wouldn’t doing it this way mean that over time, the second table will grow and grow and grow in the number of fields?
Or should each question get its own table? That is, a tbl_cats with the only field being the primary key of tbl_main to relate it to a responder. Each one-field table would get populated by a responder-ID. However… wouldn’t that mean I’ll eventually end up with hundreds of tables? Some complicated queries might be a bit, well, extra complicated, no?
So… are any of the above methods the “proper” way to design this structure? Is there another, better way I should go about this?
Oh, if it makes a difference, this database is built with PHP and MYSQL, interacting through the web.
Thanks for any help you may have—even if it’s just an encouraging word!
Rhythm