Database design: Meaningful primary keys?

Mangetout · August 21, 2007, 2:56pm

Never say ‘never’ - it’s just another way of asking to be proven wrong.

There’s nothing to stop you repeating fields in different tables for quick reference or indexing, or whatever - it does go against the principles of normalization, but it’s a lesser sin than using a real piece of data as a key, and is not likely to cause some dire consequence further down the road.

Nanoda · August 21, 2007, 3:34pm

Now that’s a good point. IIRC, all the textbooks and tutorials I’ve seen have meaningful keys and content since they’re trying to teach set relationships and show how DBs work, not teach maintainable and proper coding practices.

Mangetout · August 21, 2007, 3:43pm

I concur - all the examples I can think of that used meaningful keys, were probably just using them as simple illustrative shorthand to demonstrate the concept of relational tables. I think they typically go on to suggest using an autonumber or other generated key, once they start talking about real data.

rainwalker78 · August 21, 2007, 4:16pm

I have some of my textbooks, but I don’t remember exactly where I saw it. I don’t think it said “in EVERY case always use meaningful primary keys.” I actually don’t think we discussed meaningful vs.meaningless keys hardly at all (unfortunately). I just have a specific memory of looking at a “student” table in either a textbook or in class example that used SSN as the primary key. Shoot - I might be making up memories.

El_Zagna · August 21, 2007, 4:56pm

The short answer remains the same - use meaningless keys.

The only legitimate exceptions as I see it are:

(a) When you are dealing with huge databases - hundreds of millions of records - and lots of concurrent users.

(b) Data warehousing, which by its very nature sacrifices flexibility for speed.

Confession: In my previous post here, I mentioned that I once hotly debated the issue of meaningless vs. meaningful keys. I was the one arguing for meaningful keys. I was wrong.

El_Zagna · August 21, 2007, 5:11pm

Right, and it’s not a straightforward change, either. Depending on the DBMS, you can’t just change the SSN in the Employees table because you’ve got child tables with SSN FK’s that are dependent on the SSN PK. Nor can you just change the SSN in the Department table because the new SSN won’t point to a legitimate key in the Employees table.

You have to either turn off the FK integrity (or whatever it’s called), make your changes and turn it back on, or you have to first create a whole new Employees record with the correct SSN, cascade those changes to any child tables, and delete the original Employees record. Lots of room for error that way.

robardin · August 21, 2007, 5:18pm

I have seen this in action (use of a “meaningful” primary key for a DB table) and I have seen the pain it can cause. Sure, you may get away with it for a while, maybe even forever, but what really is the downside to taking simple insurance out against the pain?

Here’s a simple 2x2 boolean grid that should make things clear:

Use Meaningful Primary Key? [Y/N]
(Where “N” = instead using a unique constraint and clustered index on the intended primary value)
Will This Key Eventually Change In Nature/Value Set? [Y/N]

If you use a meaningful primary key [Y]…
…and the key never changes its nature or value set [N]…
…you come out OK.

If you use a meaningful primary key [Y]…
…and the key unexpectedly changes its nature or value set [Y]…
…you come out Hosed Up The Wazoo.

If you use a meaningless primary key [N]…
…and the key never changes its nature or value set [N]…
…you come out OK.

If you use a meaningless primary key [N]…
…and the key unexpectedly changes its nature or value set [Y]…
…you come out OK.
What conclusion do you think is the prudent one?

Nars_Glinley · August 21, 2007, 5:33pm

I don’t see how this is a legitimate exception. What is your rationale?

We almost always use “meaningless” numbers. I can’t tell you the number of times I’ve had to tell a user, “It’s just a number, it doesn’t mean anything. We use numbers because we’ve got lots of numbers.”

El_Zagna · August 21, 2007, 5:52pm

Same as for data warehousing. Sacrificing flexibility for speed.

Actually, I’ve never worked with a database large enough to justify that, but there are some out there.

robardin · August 21, 2007, 7:10pm

Interesting. Is there an actual performance gain from using a “native primary key” field versus a “clustered unique index” built on a field when querying a table?

Does it depend on the DB vendor?

EllisDee · August 21, 2007, 7:12pm

Referential integrity.

Okay, what the fuck? Is this topic really not covered in school? I always assumed it was, but I wouldn’t know. I’d think it would be one of the very first things covered.

EllisDee · August 21, 2007, 7:14pm

Probably. Re-read the entire quote I posted in [post=8899720]post 32[/post].

rainwalker78 · August 21, 2007, 7:53pm

I spoke to a coworker of mine that graduated from the same university/same semester as me. He doesn’t recall a discussion about this either. I don’t think it is brought up as it’s own topic. I think, if you have a good professor, they probably bring it up as an aside. Unfortunately, the prof we both had for the database design class didn’t really bring it up.

I hope I am not forgetting the discussion from school, but I think I remember the discussions about choosing a PK, and I don’t remember this…consideration being mentioned.

Arnold_Winkelried · August 21, 2007, 8:10pm

As a data point: one of our customers called us to report a problem with slow performance of our software. They had a single (non-partitioned) table in an Oracle database that was 2.5 TB in size! This was 4 or 5 years ago. I hate to think how big that table is now.

El_Zagna · August 21, 2007, 8:26pm

From the little I know about cluster keys, they are a proprietary feature of MS SQL Server. The basic idea is that you can declare a cluster key on a table and the table will be physically ordered on that key. The benefit is that if you use the cluster key in a query, the amount of disc head movement is reduced, thus faster response.

I would imagine, however, that there is significant overhead involved by the DBMS in keeping the physical order as the table grows.

EllisDee · August 21, 2007, 8:56pm

As it was explained to me, the data storage strategy is similar to a Library sort, where gaps are left in the table so that each insertion doesn’t require repositioning every subsequent record.

robardin · August 22, 2007, 1:34am

I’ve primarily used Sybase as a DBMS and it certainly supports use of a clustered index. In fact I thought it was a standard feature of a DBMS.

El_Zagna · August 22, 2007, 2:45am

Well, Sybase and MS SQLServer were originally the same product, they just ran on different platforms, so you would expect that there would still be similarities.

So far as I know clustered indexes are not a standard DBMS feature.

K364 · August 22, 2007, 4:14am

Oracle has similar features re clustered tables, including “Multi-table Index Cluster Tables” where parent-child records (ie from two or more related tables) have their records stored physically as close as possible so that the parent and children are picked up with one disk read (or as few as possible).

Funny - this is exactly how IBM built their Hierarchical databases in the '70’s

Duckster · August 22, 2007, 4:55am

No, it’s just that in this case locating a cited answer was relatively easy to do. My apologies for any misunderstanding.

Topic		Replies	Views
SQL - keys that are longer than the data they reperesent... why? Factual Questions	12	6166	July 28, 2010
Question about database best practice design. Number IDs or Names? Factual Questions	31	5451	May 5, 2011
MS Access Question: Relational Databases Factual Questions	34	3090	September 9, 2016
Aren't we a little past secret Social Security numbers? In My Humble Opinion	26	4512	July 15, 2015
Data structure/database question. Factual Questions	10	1996	December 10, 2010

Database design: Meaningful primary keys?

Related topics