How do multi-lane serial links avoid the problems with parallel links?

The trend in computing recently seems to be towards serial interfaces(like PCI Express replacing PCI, or SATA and SAS replacing EIDE and SCSI). However, in PCI Express multiple serial lanes can be aggregated together to increase the bandwidth(so a x8 PCIe card uses 8 PCIe lanes), and XAUI specifies 4 lanes, each capable of doing 3.125 Gb/s. What’s the difference between a parallel interface and multiple serial interfaces being used in parallel? How does the multi-lane serial interface avoid the pitfalls that plague parallel interfaces?

Also, RAM does not seem to be following the trend towards serial communication – DDR3 still uses a 64-bit wide parallel bus. Why is that?

For the most part SATA and SAS solutions are combating a problem with cabling, not a problem with the bus. When you use connectors and relatively long lengths of cable (greater than 3-4 inches) the inter-wire interference becomes an issue at high frequencies. If you do not have to worry about that interference, you can use a far simpler cable and still obtain much faster speeds.

Parallel interference problems even actually do affect motherboard designs (the layout of boards with the original PCI spec was a nightmare due to reflections and signal degradation.
RAM still can see these issues, but as most boards are designed with very tight coupling between the RAM and the CPU/Memory controller, it’s not so bad.

High speed serial interfaces are also differential signals, as opposed to single-ended signals which are used on parallel busses. Differential signaling, which involves a pair of connections for each channel are much less susceptible to degradation and impressed noise (and cross-coupling interference) then single-ended signaling.

The critical issue is clock skew. When you signal down a parallel buss you need the signal on each wire to stabilse and only then can you sample the entire set of signals. For very fast signals even very slight differences in the wire length and its environment (especially capacitive coupling) makes the difference in signal propogation speed important enough. This places an upper limit on the clocking speed of the buss.

A serial buss does not have this problem - with a single wire there is nothing to be skewed against.

This produces a simple set of tradeoffs. A parallel buss can use more wires and thus the bandwidth is multiplied by the number of wires. But a serial bus can clock much faster. Where you can control the physical wireing very tightly - for instance on a printed circuit board - very fast parallel busses still work - for instance a CPU talking to memory or perpiheral interface controllers (like a PCIe controller) but once you get into longer lengths and things like flexible cables, it gets too difficult and serial starts to win out.

This is all independant of the underlying signalling mechanism. There are parallel busses that use differential signalling, SCSI-3 for instance. The usual form is LVDS (low voltage differential signalling). There are also many busses for which the higher level protocol may be reticulated over a number of physical busses. Myrinet for instance was available over both parallel and serial links. SCSI is reticulated over Fibre Channel. Serial busses will often be more than a single signal wire - many carry the clock signal as well, and may actually use up to 8 wires (4 to send, 4 to receive, each 4 made up of 2 differential pairs, one for the signal one for the clock. Often the signal/clock is sent as a logical combination on each of the pairs - something that improves noise immunity.)

Since everyone else has ignored this and only explained why serial links have regained popularity:

The answer is that multi-lane serial links stripe bytes across the lanes at the sender and buffer them back into contiguous memory at the receiver (as appropriate). This maintains the resistance to propagation time differences, while increasing bandwidth.

Ah, ok. So basically they make clock skew manageable by only requiring the different lanes to synchronize once every n bits instead of for every bit.