Googling around, I see people occasionally claiming that UUIDs have had collisions, but can’t seem to trace them far enough to actually track down an actual collision.
Are there any known (documented) instances of a UUID collision appearing in the wild with a valid (I acknowledge that this is the crux of the issue) implementation of (any of) the UUID generation algorithm(s)?
UUID’s are 128 bits long. That means there are 2 to the 128th (which is more than 10 to the 38th) different ones. The rule is that it is necessary to randomly choose a set approximately the size of the square root of the size of the whole set that you’re choosing from before you can expect to duplicate one. So what we want to know is if that square root (2 to the 64th) UUID’s have ever been chosen. If a billion (about 2 to the 30th) UUID’s have been randomly chosen every second on one million (about 2 to the 20th) computers over the last four years (about 2 to the 13th seconds), you still wouldn’t expect there to be any duplications.
It’s worse than that, even. That’s just the chance of a collision existing. But it’s far less likely yet that one would ever be noticed and documented, as the OP asks for. I doubt there’s a master database anywhere of all of the UUIDs that have ever been generated.
All of that assumes that your random number generator is truly random. A truly random number generator doesn’t exist on most computers. Instead a pseudo-random sequence is typically used to generate random-ish numbers. Some pseudo-random generators are better than others, and this is where things like UUID can fail.
UUID by its very nature is not guaranteed not to collide. The only real collision-proofing you have is that the chance of a collision is really, really small (but, importantly, it’s not zero). A lot of programmers just assume that since the chance is so small that it can’t happen, and program accordingly. As a result, in the very rare cases when you do have a UUID collision, chances are it isn’t going to be detected.
Unless someone has a major problem with their pseudo-random generator, I doubt that you will ever see a documented UUID collision out in the real world, even when they do occur.
I have had issues with code that assumes that UUID’s are unique and suffered data loss but that was due to the actual block for block copying of a hard disk.
Basically we had a hypervisor that was running the same version of Ubuntu that an admin needed for a guest. So he just took a snapshot (really a dd of a snapshot) of the hypervisors root disk and built his machine.
When the hypervisor, which had mounted it’s file systems based on it’s UUID scanned the LVM volumes at some time in the future it saw the new volume and replaced the /dev/disk/by-uuid/ with the guests volume, this happened at run-time and caused sever data loss.
This is still not handled properly and even a normal user can insert a USB drive with a duplicate UUID and that UUID is used to mount any volumes it will then become that device because the device mapper and lvm scan the entire /dev tree by default.
I know this wasn’t a generated UUID issue but it is a non-rare issue.
I should state the most common way this happens, someone gets a new hard drive, add it to their machine and boots to a live CD.
They use DD to copy their old boot drive to their new disk and reboot
or
They back up their computer by booting to a Live CD and dd their drive to a USB drive.
When they reboot which ever device is scanned last by dbus will own that UUID so their “live” / volume could be the one on the USB drive.
If you are running lvm or have multipathing enabled this can happen during runtime.
Here I just duplicated this for you and show you the security issues.
First lets run blkid to just show you how things are blkid does need to be run as root typically
But any user can see fstab, so lets go grab the UUID of the root FS and as a normal user see what device that UUID is linked to. (sda4)
OK here I am using sudo to run tune2fs but you could set this on a remote system or a guest where you do have root, sdb is just a usb thumb drive and all of this can be done without root.
EEK, now that by-uuid is pointing at our USB drive.
If your reboot, rescan your lvm volumes or are running multipathing in the default configuration there is a very good chance when you reboot it will try to use /dev/sdb1 as the root volume.
So IMHO this is the type of situation where you would have collisons not from /dev/urandom but from normal admin or malicious actions.
For what it’s worth, Microsoft guarantees uniqueness of GUIDs. (Not quite the same thing.) From what I understand GUIDs are a bit more unique because they incorporate the timestamp and MAC address of the generating computer/device in the algorithm.
Not sure if Microsoft’s claim is true or not, but it a collision is a once-in-a-millennium thing they can take a chance with the “guarantee.”
EDIT: I should mention that the catch pointed out above, where an imaged drive will have the same ID, could easily happen with Microsoft GUIDs as well.