I’m not sure how recently you’re talking about. As far as I’m aware, most systems still use either Manchester or GCR, or one of their derivatives, and GCR was invented in the 1970s.
If I wanted to learn this stuff from scratch, I’d start off with the simplest codes, then look at what problems the more complicated ones are trying to solve.
If you have a shared external clock, you can use NRZ, which is as easy and efficient as it gets. The obvious problem is needing an external clock.
RZ is the simplest self-clocking signal. It should be easy to see how it gets by without a clock–but a little more reflection and you can see how jitter and DC wander cause problems. Still, it’s good enough for a lot of telecom applications.
Manchester is a relatively simple solution to RZ’s robustness problems (as is BMC, used in some digital audio applications).
One problem that all of these have is parity. If you turn the signal upside-down (e.g., you’re sending through home power lines, and someone plugs the receiver in upside-down), it flips over. Differential Manchester, and similar variations on other protocols, can solve this.
Once you get all that, take a look at GCR. The idea behind GCR is that in many cases, you don’t need to do Manchester-style self-clocking as long as you can do clock recovery. The way to do this is to encode X values into Y symbols in such a way that no valid symbol has more than, say, 6 highs or lows in a row. You can do this with under 25% overhead, instead of 100%. But it’s much simpler to go with exactly 25% overhead, and this gives you some spare symbols for control codes, to maintain DC balance, etc.
8/10 (aka 8b/10b) is a specific version of GCR. By an accident of history, IBM got a patent on 8/10 that also covered most other implementations of GCR. So, in 1983, everyone basically stopped using GCR except in applications where it was worth licensing 8/10 (like DAT tapes). Then, when the patent expired, 8/10 became a sort of de facto standard.
There are other run-length limited codes out there; CDs use one, and there are a few applications where the tiny bit of extra efficiency in a 10/12 code (20% instead of 25% overhead) is worth the huge extra complexity. But nowadays, most things are either 8/10, or use Manchester or BMC or one of their variants.