Is it ever a good idea to use a meaningful primary key? If so, could you list some examples?
Such as social security number? Or MLS number?
IMHO it’s better to err on the side of caution and just use an auto-generated, incremental ID column as a primary key…but that’s just me. You just never know when things are going to change, or if you want to use the same structure for a different application with different types of (or no) numbers.
I’ve heard (possibly apocryphal) that SSN is a bad idea becasue they get reused, and in rare cases this can result in duplicates. And nowadays with identity theft I’m sure it’s much more common.
As for MLS number, that reminds me of an example mentioned in the original discussion thread on a programmer board. (I’m fighting the good fight and stating flat-out that you should never use one.) The example offered was ISBN. The fact that the ISBN format changed after 30+ years (from 9 to 13 digits) seems to be compelling reason to never use a meaningful number.
Who’s to say that MLS number will never change format?
Ah, yeas. This reminds me of a debate I had with another database designer regarding the importance of completely meaningless primary keys. It finally came down to the question of what does it mean to be meaningless at which point his head exploded.
Short answer is make 'em meaningless (auto-generated).
Although a duplicate SSN throwing some kind of flag or warning would be prudent, as long as it does not bring the whole system to a screeching halt…
Since ISBN#'s were mentioned, FYI Scholastic Inc has an internal inventory system that assigns incremented item numbers and does not use ISBN. In our case we often purchased the same book from multiple vendors at slightly different price points with the same ISBN. Proper return on investment calcs require being able to differentiate the stock in the system.
I use autogenerated IDs. I’ve revamped a number of dbs I’ve inherited or taken over to remove meaningful IDs, due to the problems caused.
Frex, duplicates when the clients originally swore there wouldn’t be, changes in the ID when they swore there wouldn’t be, etc.
A common one I’ve run into is apps set up with meaningful IDs based on internal usage only (e.g., using the network login as the db ID) and then later the clients decide they want to allow external access - but of course the external users won’t have network IDs.
I won’t say it’s never a good idea, but the times when it is are extremely few and far between IMO.
Around my office this is what’s known as a “religious argument.” Those who feel strongly on either side are rarely swayed by arguments by the other side. We’ve gone back and forth on meaningful vs. meaningless keys countless times.
Damnation to hell aside, what’s wrong with a hybrid key – that is, an auto-generated key prefixed to a meaningful key. Seems that if you want meaning to the key, any query that uses that info can easily truncate the left X digits (the auto part). If they change the ISBN or whatever format at a later date, you can always ignore the right X digits of the key.
'Course, I’m not sure what benefit that would have… in a large database, why would anyone want a meaningful key in the first place? If the saving of a single field has that much of an effect on performance, isn’t that suggesting another problem?
A primary key’s purpose is to prevent duplicate records in the database. If you have a meaningless primary key, then you are only preventing suplicates to the extent that your meaningless data is never duplicated…but what good is that?
using meaningless, auto-generated keys means I can insert the same record time and time again into the DB and it will always think there are no duplicates.
The examples you chose are simply the result of laziness on the part of the application development teams. While it is not a trivial matter to redesign a database to include a different primary key, it certainly results in a database that is designed in a way that makes it easier to understand and model.
I’ve also heard that they can be re-assigned (legally, by the government), but I’ve looked briefly and never found a cite to back it up.
A meaningful primary key can be a good idea over a small set of static data. For example, in a table that contains a list of all 50 states, it might be appropriate to use the two letter code as the primary key. We presume there is no chance of duplicates (although not unimaginable, if the table gets overloaded for other purposes). So for the purposes of performance and maintenance, you won’t get much if any gain out of a surrogate key, may as well go with what’s human-readable. But this example aside, I’m from the school that primary keys should be system generated by default, you can always set up a secondary key to protect data integrity.
I disagree… I say used an auto-generated primary, and slap a Unique Constraint on the column representing the Business Key. It had better already be an Index, anyway.
Why? Because, frex, phone numbers change. Do you really want to run through all the tables using the phone number as a foreign key every time a customer’s changes? Or lose all the data attached to the customer every time their phone number changes?
Don’t use data that is likely to change as a primary key then. Using Mo, Ks, Co as primary keys is good because the data isn’t going to change. Who uses phone numbers as primary keys anyway?
If I had a dollar for every time I’ve seen it done…
And I’m not even a DBA!
The use of surrogate keys for the primary key values is the way to go.
If other fields are supposed to be unique in your application, use unique constraints on those fields.
The reason to use surrogate keys is because primary key values are often duplicated in child tables, and the use of surrogate keys prevents you from having to modify a value in the parent table and the same value in all the tables that have a foreign key relationship to the parent table.
You are still going to have to change the information in child tables… right?
Are you saying you don’t use foreign keys at all? (Cobb rolls over in his grave)
And if I had a dollar for every time clients swore that data wouldn’t change, and then it did…
What Arnold Winkelried sez.
I’ve despised all the apps I’ve seen that used data for PK values, and each had other problems showing it was obviously not the product of a sane DB-minded developer.
One used the company name as a PK (fixing typos or company re-names therefore caused immense havoc), and another used a SSN for a PK. For the industry it was used for, employees now may refuse to provide a SSN to their employer on privacy grounds, making me wonder what’ll happen if/when someone declines to provide one.
Not what he’s saying. If you use meaningful info as the primary key, and then use that as a foreign key in a child table…then when the primary key data changes, you have to update all the fields in the child tables to match.
Not a problem if you used a generated key because the ID wouldn’t have changed, just some of the data associated with it.
Been there, done that, more than once.
No. To expand on what others have said, let me take an example.
A. Use of surrogate primary keys:
employee table
. employee # - meaningless number (sequentially generated, any other method) - primary key
. social security # - unique constraint
. first name
. last name
department table
. department # - primary key
. department name
employee - department table
. department # - foreign key to department
. employee # - foreign key to employee
B. Use of “intelligent” primary keys:
employee table
. social security # - primary key
. first name
. last name
department table
. department name - primary key
employee - department table
. department name - foreign key to department
. social security number - foreign key to employee
Suppose I am using method B and intelligent primary keys
One of the departments changes its name. I have to change the department table and update every row in the employee-department table to reflect the new department name.
Someone made a mistake entering the social security number for employee John Smith. I have to change the employee table and update every row in the employee-department table to correct the social security number.
If I were using method A and surrogate primary keys, I would not only need to update one table to correct a social security number or change a department name.
Expand the relational integrity down to a chain of 10 or 15 tables with thousands of rows and the task becomes time-consuming. (instead of employee ids and departments, think customers and customer orders.)
I use my surrogate primary key to ensure that the application will update / delete only one row, the correct row. I use the primary key for foreign key relationships.
I use unique constraints to prevent duplicate rows in the table.