Large IT companies and their servers (question about data integrity)

Let’s take Google as an example.
I assume that Google has many servers around the world.

When someone tries to access his/her emails, his files in Google drive or any sort of information in his/her Google account, do they always connect to the same server in the same location?

For example, if I reside in Europe, would I always be connecting to the same server? If the answer is no, then how do big IT companies manage the fact that all the data centers have the same information?

I read on StackOverflow that there are three types of servers: web servers, application servers and database servers. Is it possible that an IT company has them in different locations but database server is central? What if database server is not central but there are many database servers? How do they make sure all the data in different servers corresponds to one another and they are all the same? Do they use some sort of version control logic for data like Git or SVN?

Cloud data is often replicated in many locations or “regions”. Within a region, there will often be many replicas as well. For example, your gmail database may be replicated 3 times in the Europe region, 3 times in the US region, 3 times in the Asia region, and so on. When you connect to the server, it will try to get the data from the closest region to you. So if you’re in Europe, it will start looking for your database on Europe server 1. If the server doesn’t respond or doesn’t have it, it will go to Europe server 2. If none of the Europe servers have it, it will look at the servers in the next region (USA). If those servers don’t have it, it will move to the next region.

The databases are being continually synced up as needed. As things happen to your email (new email, delete email, etc), the actions eventually get propagated to each replica of your data. Sometimes you may access a database which is out of sync, in which case it may be slightly out-of-date. This may happen because your primary region is down and you’re accessing the data from a backup region. Eventually things get back in sync once the primary region comes back online.

This is all true, but I just wanted to add that each of the regional mirror “servers”, in the case of something like google, are actually racks and racks of individual machines all doing the same thing. They then access the database on a SAN (Storage area network) which is also comprised of thousands of hard drives, and the whole SAN itself is ALSO mirrored at different regional sites.

There’s actually an interesting read if you’re of the inclination: Site Reliability Engineering.

At Google/Amazon/Twitter Scale there are lots of replicas of everything and systems built around maintaining replication factors for data and noticing when services go down or are under excessive load and brings up new or additional instances to compensate.

Rather than having a computer that is “a web server” or “a database”, their data centres are a big pile of machines and the individual services can be deployed to any of them, or moved between them when machines go down or additional ones come up.

Yeah. There’s not one of anything. There/s many of everything and there’s no one pure copy of the “truth”. As long as all the distributed versions are continuously converging towards the future all is well. Even if no two of them are pairwise identical at any given moment.