Efficient implementation of RDB seek keys

When you need to use a relatively long string as a key to several database tables, it is usually a good idea to implement a seek key strategy. A seek key is just a “shorthand” or “hash” of the original key. What is the best (most efficient) way to implement seek keys?

Have a separate Seek-Key table that has the original long string as the key, then query that table using the long string to look up the seek key?

Something else?

OK, I will assume my example is the best way.

(Obligatory bump.)

Your idea sounds pretty good.

I would run the hash algorithm on the original key to calculate the seek key. If the original key is so long that it makes the hash slow, then I would do exactly what you suggested, create a seek-key table, and consider it an optimization of the hash algorithm.

One disadvantage of the seek-key table is that you have an extra database look-up and an extra table to maintain and keep in-sync. A trade-off between CPU and IO.

One advantage of the seek-key table is that you could handle hash collisions almost transparently. When you add an entry to the seek-key table, you would detect a hash collision and choose another seek-key. Since all subsequent look-ups refer to this table, the collision would be a thing of the past. Of course, your hash algorithm would minimize collisions so this should be a small advantage.

Actually, now that I am typing this out - if you have a seek-key table, why use a hash algorithm at all? You could just assign the next integer as your seek-key. Since you always look-up the seek-key first, it doesn’t matter if the seek-keys are distributed or sequential. In fact, the database table is probably already doing a hash for you to implement the seek-table and you would just be duplicating the effort.

BTW, I am not a DB programmer so take my info with a grain of salt. However, I have seen the seek-table mechanism used, but it was for an embedded system without a database and the implementation was basically a home-grown database.

I guess the seek-key could reduce your storage quite a bit if you use a long key for a bunch of tables.

I believe the implementation I worked on defined the seek-key as a quad-word, potentially a very, very big number. You NEVER have to worry about “wrap-around” (assuming you are judicious about allocating seek-keys). There was a Next-Seek-Key table that contained the next seek-key number to be assigned, this table was locked and updated any time a new seek-key was allocated.

Then there was a table that was used to translate the original-key to the seek-key. This table contained only those two columns and had a new row inserted for every new (original-key, seek-key) pair.

The seek-key was the primary key used in many of the other tables in the database. The amount of time and space saved by using the seek-key as opposed to the original-key was enormous. The original-key was 250 characters long, generating a large potential number of seek-keys (36 ** 250, 36 being the number of characters in the alphabet plus the numbers 0 through 9). But you can imagine the savings of processing time, and key cache size, for key comparison searches by not using a key field 250 characters long.

I am facing another project where I believe this technique will maximize the speed of retrievals, and minimize the size of the database. Creation of new rows may be a bit slower, but that time should be negligible as compared to the overall performance of the database.

I just wanted to run this idea by the good folks here and make sure I am on the right track.

Thanks for the comments.

ccwaterback, what you described makes tons of sense. When I originally read that the seek-key was a hash, I was thinking you meant to run a hash algorithm on the long-key, but it sounds like you are just incrementing a quad-word and using that.

Another advantage of your seek-table strategy happens if you ever need to look-up a record from multiple tables during a single transaction. Getting the simpler seek-key and using it multiple times should be lots quicker.