Question about database best practice design. Number IDs or Names?

Definitely use a surrogate key with some sort of auto-increment. Aside from the previously discussed technical problems with using SSN as an ID, there are also confidentiality issues. That’s why all the big corps have switched to only using last-four-digits of the SSN as a verification, but not as a primary ID.

If you’re actually handling personally identifiable medical data in the USA…you are up on HIPAA requirements for electronic data, n’est-ce pas? And all of the tracking requirements?

This +1 - thou shalt not use an intelligent key. It sounds right, but it is a bad idea. I have lost track of the times I have had to try to explain this to a non-technical manager.

I can’t remember the name of the guy who published the Rules of Database Design but this violates one of them.

Regards,
Shodan

I work in a healthcare software business. Do not use SSN as a key for anything in healthcare.
What we ran into many many times, is the problem of illegal aliens coming in for healthcare and passing around the same SSN card as their ID.
We would have up to 25 persons with the “same SSN”.
This is bad, especially when it comes to lab orders, pharmacy, and medical history.

I’m not actually doing anything related to patients or health care. I just thought it would be a convenient example.

Thanks for everyone’s contributions. Was an interesting and helpful thread to read.

SQL Server tends to optimize indexes better if they are numeric. So I recommend that you create an identity field as a patient id and use it as the primary key for the patient record. You could also have indexes on the SSN, etc., to aid searches. All relationships would be established using the patient id number, and the relationships would be better optimized.

Noooo!! HIPAA requires that data to be effectively protected (and, for most laws - like HIPAA - you need data truncation at minimum). Of course, truncating SSNs increases the likelihood of data collision. Also, a healthcare institution cannot legally deny critical care to someone just because there’s an apparent collision with another patient’s SSN (like… I stole your SSN, but my arm fell off - please stop the bleeding).

Data encryption is the best solution for any fields that may include PHI/PII. The master key/unique identifier can be left plaintext.

Please don’t build a healthcare (or other sensitive DB) without at least field level encryption… learn from the lessons that others have breached (errr… “teached”) us.

Codd, and the OP should read up on Normal Forms.

I don’t think this is going to happen. I read somewhere that the US and Canada like to keep their jurisdictional abbreviations distinct, so that e.g. someone in the US can write a letter to “123 Main St, Calgary, AB” and forget to specify “CANADA”, and not have it accidentally sent to Alabama. According to: Postal Abbreviations for Provinces in Canada , “PE” is the postal code for Prince Edward Island.

reifly mentioned (and linked above) but there are several important database design criteria:

Normal FOrm essentially means removing redundant data. Taken to the extreme (3rd Normal Form, IIRC) this means that a name in your database is simply a list of keys: This guy is
000001 0075 8003
where another table lists just names:
0075 John
8003 Smith
ANd 000001 is that particular John Smith’s special ID.
So we don’t waste space repeating “John” 50,000 times.
Logically, for things like state names, this is obvious; PA or NY repeated for all 1,000,000 clients is a lot less wasted space than spelling them out in full; and a simple table can list the 50 states and their codes (and PR, DC, and 12 provinces and territories of Canada, etc.) Similarly, you would have a table of Medical History - say a list of pairs - 00001 066 Where 066 is a key in the table of medical procedures - “oh, look, this pair says John SMith has had a mastectomy.”

So for simple normalization, great - you eliminate a lot of redundant data. That’s one aspect of database design. So much for theory.

You also seem to ask about separating the simple from the sensitive; one table for ID info, another for medical history. Good idea and practical. Practical database design, as many examples above show, has may pitfalls:
I worked with a fully normalized database. Manual debugging is almost impossible, you ar forever looking up keys. God help you if something scrambles the data and you don’t notice for a while…
Every table is either data and a unique key, or tuples - key from Table A matched to key from table B (i.e. patient identity and medical procedure). When your design includes a lot of these tuples, it is a good idea to keep the keys as small as possile.
Many posts above point out the pitfalls of using real data for composite keys - SSNs are NOT unique, even sex can change, not to mention name, address, eye colour (contacts); DoB may even be misrepresented.

The obvious question is - what do you need? If the patient shows up at a desk, the reception/nurse will need to ask them who they are. Perhaps it becomes a game of 20 questions - what’s your name, DoB, when were you here last, current or previous address/phone, last procedure?(so you keep address/phone history list similar to procedure list). Short of DNA (which identical twins can share) there is nothing unique about a person except fingerprints or retina pattern; however, we hope your customers are not trying to conceal their identities as they might in a police line-up. Plus, anticipated customer base (2,000? Or 10,000,000?) has a bearing on how complex the

Most databases have the ability to generate unique keys. That’s the best situation, rather than trying to find a real-world unique key. Also, I find a lot of effort often spent programming for really wild cases; why not give someone a tool to correct patient info (heavily logged) rather than trying to code for sex-change possibility?

Indeed. I’ve seen some examples of databases that are probably perfectly 4th normal form, but that were an absolute and total pain to deal with because some simple operations required way more joins that was absolutely necessary.

E.g. making a city and state table, and joining from the employee table to it by some sort of code is just dumb, compared to just storing the guy’s address on his record, but you see that kind of thing all the time.

And when you start dividing your data even more finely, you can run into big problems when you encounter data that doesn’t match your preconceived notion of how that data is structured.

e.g.

If you store “Name” as a free text field, you can pretty much store any name that is in Roman characters, and if you make it a Unicode field (e.g. NVarchar in SQL Server), you can cover 99.9% of all personal names in the world.

If you have separate fields for “first”, “middle”, and “last”, you run into a problem when John Peter James Xavier MacDonald IV demands that he be listed in your database with his full moniker. What do you do then, fill in “Peter James Xavier” as his middle name?

If you are in North America and decide on an address structure with State/Province/Territory and postal code (smartly coded to handle both US and Canadian postal codes), you leave your DBA or system user scratching their head when the first Bolivian customer walks in the door who doesn’t have a state, provincial, or territorial code in his address. With a free text “Address” field, or a more generic “street address”, “national subdivision”, “country” and “postal code”, you get more flexibility, but make it more difficult to organize your data efficiently.

Erk. One project I was on ran into a customer rep complaining that our product, then in final Release Candidate checkout and Early Adopter delivery, wasn’t at his preferred spec of database normalization. IIRC, he threw a real hissy fit over it.

I have a friend who, after her marriage (next year, perhaps) will have a name something like “Lucretia April Fawn Borgia Longfellow Colburg.” Try fitting THAT into three 8-character fields.

Or to use a real-world-ish example, try updating a record from “Kate Middleton” to “HRH Catherine Elizabeth Middleton Mountbatten-Windsor, Duchess of Cambridge, &c.” And Og help you if you forget any part of her title.