Need way to hide serial numbers in string

So I have an application that is assigning sequential numbers to an item. It’s a 10 digit number, so I’d like to shorten it by using a range of digits and upper lower case letters. That way I could use 4 or 5 characters to represent the large number; e.g., 5345729927 could be expressed as “bU5_”.

Here is the hitch, though. I don’t want the string to be assigned in such a way that is obvious what order they were issued in; e.g., aaaa1, aaaa2, etc.

Any ideas? I’m afraid I am not being clear.

Are you saving this data in a database? You could program a secondary key that is a random 5-character string, save it to the record and then recall the data using the secondary key, not the primary key (the item’s 10-digit ID). Your function for creating the random 5-char string would also need to check against the database to ensure that it’s not duplicating an already-used string.

Fixed typo in thread title.

If it is a 5 character string, there would be 90^5 combinations. Almost 6,000 million. So would you need to check? The odds of getting a duplicate would be insignificant.

What level of “obviousness” do you want to obfuscate? The least obvious is to use a hash function (say MD5) and truncate it to the appropriate number of characters. Of course, then you run into the chance of collisions. So depending on your application, maybe you require an exact one-to-one correspondence between numbers and their codes.

In that case, how about taking each number, multiplying it by a large constant, and XORing it with some magic key value, and outputting the result in hex (or base64, or whatever). For example, if you had a key of 56463 and a constant of 12345, you’d get:

1: ecb6
2: bcfd
3: 4c24
4: 1c6b
5: 2d92
6: 1fdd9
7: 18d00
8: 15d47
9: 16e8e
10: 13eb5

Reversing the operation is easy, just decode the hex value, XOR it with 56463 and divide by 12345.

This is of course just obfuscation, and is not suitable for any kind of security application. You need a real cryptographic solution for that.

But if you’re inserting it into a database with a unique key, you only need that 1 in 6,000 million to b0rk your transaction as it will fail on a duplicate key.

There are lots of ways to do this. But why would you want to?

You’d presumably need to create a function to encode & decode the strings. The challenge is getting that to work perfectly in the corner cases. If this is for some in-house app with a known and benign domain you’d be fairly safe. But if its for public consumption, now you need to worry about how that string looks in Arabic or Chinese or encodes or decodes on a server running the Korean version Wiindows.

Base64 encoding is a standard technique for transmiting binary information using a simple printable subset of ASCII (later ANSI and now Unicode) characters. But it results in a longer string, not a shorter one.

If you’re trying to do something like obfuscate an order number in a web app, so one customer can’t figure out how to access another customer’s order, well you’re just planning on getting hacked. This idea is waaay weak.

Better to use a GUID as your order identity; any one value gives the bad guys very little information about any nearby values.

I’d say more, but I’m out of time …
ETA: the other replies weren’t here when I started typing, but what they said too.

  1. Take a set of characters you will use to represent the sequential number.

e.g. this set which has 95 different characters:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ`~!@#$%^&*()-+={[}]|:;"’<,>.?/1234567890
2) Arrange those characters in some “random” order, like
Hz{1eT
… \ S

  1. Convert your 10-digit number to a number in base 95 (base 95 because I specified 95 characters chosen in step 1).

  2. Create your own internal method of writing numbers in base 95, using the “order” chosen in step 2:
    0 = H
    1 = z
    2 = {
    3 = 1
    4 = e
    5 = T
    6 = _

    93 =
    94 = S
    example:
    96 = 11 (in base 95) = zz
    97 = 12 (in base 95) = z{
    98 = 13 (in base 95) = z1
    194 = 24 (in base 95) = {e
    285 = 30 (in base 95) = 1H

5198304 = 6 * 95 ^3 + 5 * 95 ^ 2 + 93 * 95 + 94 = (6)(5)(93)(94) in base 95 = _T\S

Wow, a bunch of great ideas to digest. Thanks.

If you’re using alphabetic characters, there’s something else you should consider. Some of the combinations will be readable words, and some of those will be NSFW. This might not be a problem for a purely internal application, but you wouldn’t want a consumer product to go out with the serial number “FUK69” … .