What is the origin of the nine-character limit for name fields found in some clients?

Hello Teeming Millions,

This is my first ever post on the The Straight Dope forum, though I’ve been a fan of the straightdope.com for many years. This question will likely be easy to answer for any computer guru – especially server admins who haven’t quite fossilized.

My question is of a somewhat personal nature. My first name is 11 letters long and because of this I have become aware of a certain conspiracy against those with long names. A conspiracy with unknown origins, though seemingly realized by software engineers.

When interfacing with certain (computer) clients that require me to input my name, there is a 9 character limit imposed on entries into name fields. So my question is this: why? My first guess is that this is some kind of holdover.

A subtle aspect of this problem (which in fact makes it more personal) is the fact that when my name is truncated to 9 letters, it is transformed into another name: leading coworkers and bosses alike into believe my name is something other than it is. Over the years, it has become tiresome to explain that my name is [insert 11-letter name] and not [insert 9-letter name].

Any insight into this matter would be much appreciated!
Pham Nuwen

I work with mainframe datasets, so I’m no stranger to small fields.

It’s just space in the dataset, or lack thereof. Space is cheap today - disk space, RAM, processor cache, etc., but not that long ago, space was expensive. Some of the mainframe applications I work with are so old that they used to run on systems with hand-made core memory, and that stuff was wildly expensive. If your back-end application has a 256 byte limit per record, and you need to hold a name, address, phone number, customer number, etc, in that 256 bytes, you may need to limit your field size.

Most US/English first names are nine characters or less, so for the vast majority of users, it’s not a problem. You just happen to be one of that 0.2% or so that have a longer name.

Ask Mr Cartwright-Chickering, who got fired from his job because his name wouldn’t fit on a punchcard, and it was easier to get rid of the problem than fix it. (Scroll down on this webpage for a little more information.) As I recall he did manage to keep his job – and name – intact, though it was a bit messy.

In your case, if it is only one client, perhaps you could chop your name somewhere in the middle and be known by that name to their computers? Or change it completely to some really cool name like , I don’t know, “Batman”?

And if, after years of explaining to your coworkers and bosses their confusion over two people with same/similar names, I have no answer. It sounds like a running joke to them that needs to be retired.

An 8 character limit would make sense because it is a power of 2. For example, in my processing environment, we are limited to 8 character variable names for memory allocation reasons.

But 9 characters? Seems very odd to me.

See gotpasswords’ explanation, above. The limitation may be a record limit, and records are composed of multiple fields which cannot add up to more than the record allows. In this case, each field may be shorter than ideal, but the programmer had to compromise somewhere, and thought 9 was enough, but 10 was too many. Setting the length of something in high-level languages and/or databases is not constrained by powers of 2.

What I meant was that 8 characters would be an expected limit to come across in multiple different systems, as the OP indicated. But coming across 9 all the time seems highly coincidental, if it isn’t really a fundamental thing and just a choice of the programmer. That’s all I meant.

I work for one of the largest health insurance companies in the state, and our Cobol based mainframe (in the process of being replaced at mammoth expense because it’s not ICD-10 compliant) has a 9 character limit for names, ID numbers are normally 9 digits, etc.

At this point, I’m just guessing, but it occurs to me that the reason different systems might have this in common might be due to a common development system or programmer. IOW, not significant for any deeper reason like a power of 2.

I once created a specialty programming language for some test equipment. Someone noted that the key words and symbols I used were a lot like a contemporary database language, and asked why. I said if I have to choose a symbol set from scratch, why not use the one I was most familiar with as long as it works?

This is probably a good guess. A sort of “group think” effect going on. Someone somewhere probably came up with a system that imposed a 9 character limitation on field entries based on a variety of factors, and other systems based off of it or inspired by it somehow kept the same sort of implementation.

There’s another factor too.

Even today, there are innumerable printed forms which we must fill out by hand, and then someone else enters the data into a computer. Unfortunately, many people have handwriting which is very difficult to decipher, even when they’re supposed to be using block letters. So in order to help the accuracy of this procedure, many such forms are designed with actual rectangles, with the expectation that a person will be forced to place exactly one letter into each block.

So, even though data space may be incredibly inexpensive nowadays, that’s only once it’s inside the computer. Real estate on the paper form is as expensive as it’s always been, and the form designer must often choose between tiny boxes, or a smaller number of boxes – or both.

Why 9 instead of 8 or 10? 9 is bigger than 8, but it’s only one keystroke. Typing 10 takes sooo much longer. And if you’re not using a numeric keypad, those damn numbers are on opposite sides.

I’m only half-joking. Programmers make similar decisions for trivial (or stupid) reasons all the time.

Following the implementation might not have been much of a choice.

If we need two computer systems to synchronize data, and one of them has a 9-character name already, our easiest solutions is to make the second computer use a 9-character name as well. If the new system uses a 10-character name, we have to spend quite a bit of effort in adding that extra letter by hand, and you have to manually add that extra letter every time a name change in the 9-character system updates into the 10-character system. Or… we have to upgrade the old 9-character system to permit more characters.

So when you’re talking about large organizations that might have multiple databases, or data that needs to be shared between multiple organizations, the solution is often to base everything off the existing system, even if that’s not ideal.

I can dig it. :slight_smile:

I am conditioned to expect powers of 2 more often than other numbers, but I was surprised when I heard that the most common data tape in open reel days (before my time) was 9 track. Even more surprising considering it was storing 8-bit bytes. (Doubly surprising, since audio tape was 2,4,8, or 16 tracks.) Then I found out it was 8 bits plus parity, which made perfect sense.

There’s another explanation that nobody has mentioned yet. Have all the older generation of computer programmers and users died off?

It was common for certain short character strings (file names, for example) to be in multiples of three characters. A great many of the line of DEC machines (the PDP-10 in particular, but others too) did this. It was DEC, recall, that also gave us the filenames with three-character extensions.

There was a subset of the entire character set, consisting of UPPER CASE letters, the digits, and a few punctuation marks, that were allowed in file names. With a total of 40 characters, you could encode them in a base-40 way that squeezed 3 characters into 16 bits. This format was sometimes called “Squoze” character strings.

Thus, if you allocated n 16-bit words for your file name (or other similar symbol-table usage), you got 1.5**n* bytes of storage.

ETA: Okay, Wikipedia has pages on everything. I may be remembering a few details wrong – the PDP-10 machine has 36-bit words, and got 6 characters squozen in there; 3 characters per 18-bit half-word. This article discusses other similar codings for 16-bit based words.

But you’re assuming there’s a solid, rational reason, related to some existing parameter. Which may indeed be the case, but I’d like to see the evidence.

Or it might be as innocent as…

“Let’s make this field 24 characters, and we’ll have 40 for the address. It’s 64 max.”

“No, I like 55 for the address. 9 chars for the name will work just fine.”

“OK, I can live with that. Hey, it’s quitting time. Want a beer?”

A quick skim through our HR system seems to back up my earlier suspicions. Lots of Tim, Jane, Rick, etc. and just a few at 8 or 9 characters.

Angelica
Dominique
Elizabeth
Gwendolyn
Jennifer
Michelle
Nicholas
Roseanne
Stephanie

Hey Pham, just be happy you’re not a Tine. Talk about a database name collision hassle!

First Name: Peregrine
First Name: Wic
First Name: Kwc
First Name: Rac
First Name: [del]Rum[/del]
Last Name: [del]Wickwrackrum[/del]
First Name: Scar
Last Name: Wrickwrackscar
Full Name: [del]Peregrine Wickwrackrum[/del]
Full Name: Peregrine Wickwrackscar

:slight_smile:

In the (COBOL) mainframe world it just comes down to how the field length is defined in the actual database. Over decades that field length will be replicated in variables all over the place in working storage in the COBOL, common libraries and interfaces etc, and that same restriction has to remain in all the web forms and other applications that ultimately must conform to that field that was designed in 1972.

It’s easy enough to redefine a DB field length (although the re-org can take a while and that’s time core systems are offline) but finding and changing everything else can quickly mount up to a Major Project for little benefit.

thousand yard stare

I could see the indexing of the memory access being 0 to 8 or 0000 to 1000 binary which requires half a byte. They could index 2 items of stored 9 character strings with one byte. You were always looking to use ever bit in the programing if you could. Half a byte was often a good divider.
Clear offset to 0
Load data from address + offset
Display character stored there
Increment offset
Loop until offset = 1000
You have now accessed 9 characters stored in memory.

Not important, or even applicable, in high-level languages.