I can’t really comment on the hardware or overall software system that google uses since it’s a bunch of orders of magnitudes larger than anything I’ve used, but I have designed and worked with smallish (8 to 20 nodes) distributed application frameworks, so I’m going to throw my assessment in anyway.
It’s not that difficult from an engineering point of view, but most programmers just aren’t really used to doing things massively parallel (even just on a single machine with many processors) and unreliable networks and nodes complicate things quite a bit if you want to do it right.
It does require a “paradigm shift” in the way you design your systems. You can’t rely on any complex operation to succeed or fail or even give you any indication of failure, so you have to rely on timers, over-parallellazing or other tricks preferable using some general framework that watches all the tasks that are being done.
Really large scale distributed operations can only be done using fairly restricted algorithms. For instance, you can’t use any operation that examines each point of data in one node, since you realistically cannot even access all that data.
A relatively easy to understand technique (which google has helped to popularize) is to use a map/reduce system, where map operations transform each data point into some (or no) output data, without taking into account any other data point, so no side effects like counting data points or the like. Reduce operators aggregate a subset of data points (that may be the result of map or reduce operations and vice versa) into single new data points, also without side effects outside the single bunch of data points that are input to the reduce operators.
A design like that places a bunch of restrictions on the data model, and also implicitly requires a non-consistent data model; a new site entered into google’s search engine will not show up at the next search, but only once all the processing is done.
That said, there are fairly easy to understand frameworks that are freely available and do much of the coordination etc required for building these large systems. Using those, a talented and interested small group of programmers can certainly build a system that is at least comparable at the design level with the google search or gmail system in, say a year (depending on how complex google’s rating system has become), and the distributed frameworks could be build by a handful of talented programmers in a year or so (including putting it into production, fixing all the bugs and making it reasonably easy to maintain).
Replicating all the interesting stuff google has build on the scale it has will be a multi-year, multiple billions of dollars project at least, though.
ETA: if you’re a programmer and interested in seeing how map/reduce works with or without actual distribution, I suggest you take a look at couchdb.