This is a GQ, but it’s about online gaming, so I figured it belonged in Cafe Society. Mods, feel free to move it around as you please.
Every Tuesday morning, all U.S. World of Warcraft servers are taken down for maintenance for six hours (5:00am to 11:00am Pacific). I’ve spent many years working in computers and networking, and I’ve never seen anything like this. With 187 servers (“Realms” in WoW parlance), why don’t they take them down in rotation, so that players with alts on multiple servers can play pretty much any time?
And what kind of server maintenance takes six hours a week, during which the whole server has to be down? What are they actually DOING?
It’s not so much that they’re doing something as reserving a big chunk of time in all the crack addict’s lives so that if they do have a five-hour upgrade to install, they can do so without people screaming bloody murder.
Doing rolling updates might be a more customer-service oriented way of doing things, but it would put extra stress on the servers everybody moves to while the update is being performed.
What they’re actually doing: Probably rebooting, clearing off some colossal log files onto DVD backups, installing a zillion fiddly little patches for stuff you’d never notice in game, and doing a detailed scan of the PC data to see if they can tell whether or not people are cheating and, if so, how.
They are also doing the weekly honor calculations during that time. It may be easier to just do it for all servers all at once than to do it piecemeal. And who knows, once the expansion comes out it may not be necessary to take them down for as long, what with the revamping of the whole honor system.
Personally, (and I say this since this isn’t GQ) I prefer this to the CoH system of being down for an hour every single day. Once a week on Wednesday, during work hours, just ain’t bad.
Don’t count on it. They’ve been doing this for as long as the game has been live, long before the honour system was put into place. I think they just use the downtime as a good time to insert things like that–all patches are released on Tuesdays as well.
I suspect the main reason is what Ethilrist said–if they do rolling restarts then people who would normally play on a server that is down will create an alt on one that’s up, causing extra stress. You see this already on days when not all the servers come up at the same time.
I think one of the reasons they don’t do rolling updates is that whenever a server goes down unexpectedly, players on that server immediate create new characters on other servers and play them until their server is back up again.
So if you had rolling updates, you’d constantly be having new characters created on new servers, filling up those server quotas yet potentially only ever being used for 3 hours.
It’s not about the server load on the server they’ve hopped on to, but the limit to the number of characters that can play on a server before they have to open a new server.
So if people are constantly creating new characters, they’ll be constantly opening new servers which is going to impact on their profit (if the existing servers are not being fully utilised due to people starting and not continuing with a character).
Best answer yet, although lisacurl’s is good, too.
I see Ethilrist’s point about reserving the time for when they do need a 5-hour chunk (it’s been 6 hours lately, BTW), but that just seems like an excessive amount of time. If the patches have been tested properly, they could be installed on a mirror and switched in a few minutes. A restart of the WoW server software probably takes long enough to swap out the log files while it’s happening (create new files, swap the pointers, and then back up the old ones at your leisure). I fail to understand why the honor calculations and cheat detection can’t happen while the server is up.
As for increased loads during rotating server outages, I don’t buy it. There were 187 servers in the U.S. last time I counted. There are 168 hours in a week. Even if the maintenance needs to be 5 hours (which I don’t believe), then they’d never need to have more than 5 or 6 servers down at a time. If every single person who was going to play on the downed server switched to another, it would be an increased load of under 3%. They certainly ought to have an extra 3% load capacity.
It’s even weirder that they do it all in one chunk because of just how many physical boxes we’re probably talking about: You enter an instance, you are jumping onto a different server. You go from Kalimdor to the Eastern Kingdoms, you are on a different server. You enter a battleground, you are on a different server.
I have no idea how many servers there are per realm, but I’m fairly confident that it’s at least “several.”
What I can’t understand is how it isn’t actually more difficult for Blizzard to get all their maintenance done on so many servers in the same damn maintenance window.
Which leads me, in a roundabout fashion, to suspect that the amount of “maintenance” that blizz does in these gaps is probably minimal. What they probably do is swap the main server for the mirror, take the mirror offline, and do maintenance on THAT. It may even be that they have three sets of hardware per realm and rotate them.
That’s a bit of a bad way of looking at it. They’re not going to have servers going down in the evening if they can avoid it, because that’s prime playing time. The reason they do it Tuesday mornings is because there aren’t very many people playing at that time of day. If even one server went down during the evenings (this could screw raiding guilds a bit–even if it’s the same night every week that’s one less night avaliable to raid, and some nights are better than others depending on people’s schedules) Blizzard would get an earful. It would probably be better to think of it in terms of there being seven days in a week, and that assuming they would take a server offline on a weekend. Although you do have a point, in the mornings the load is light enough that the other servers could probably handle it.
From what I know of clustering (which isn’t much) this sounds like a fairly reasonable explanation.
Dropping a set of servers from a cluster would just reduce capacity a bit and could be done during off peak hours. Of course loading a patch and have it propagate across all servers in probably hundreds of clusters could take a while. In addition, I’m sure that there is a wide variety of hardware involved and it gives them some time to resolve any quirks that develop secondary to any hardware incompatibilities that may crop up during the patching.
This might actually help to explain it, rather than make it more perplexing.
If they set it up so that servers specialize in, say, a certain group of instances, then that server could be running all instances for multiple realms. In that case, in order to take down that server, you have to disable multiple realms at the same time. It’s probably easiest just to take the whole network down in that case.
It’s probably technically feasible for them to create a rolling system, and all that - but if they’re only disrupting a small amount of people who play on tuesday mornings (they probably did a statistical analysis to figure out the least disruptive time), they’re just going with the easy thing that affects few people and offers them flexibility in the future than working out a complex rolling restart system that would benfit a small group of people.
I started playing WoW in July, after becoming bored with CoH. Exactly one week after I started playing, my work started giving me Tuesdays off. I was so damn annoyed. All I wanted to do was spend the time trying to catch up to my boyfriend since he gets to play while I’m at work.
Just recently, I started getting mostly Mondays and Thursdays off. I don’t have to deal with the stupid maintenance any more. BUT, my condo association has a set schedule for doing lawn maintenance and Monday is my turn for mowing. If it’s not one thing, it’s another. I just can’t win.
No idea how many “servers” make up a “realm” but in everquest it was along the lines of 70 or more. each individual server only controled 3 or so zones.
I would guess that with Wow its al hell of alot less considering the basic differences in the 2 games. I wouldnt be surprised to learn that a wow realm was only a few linked servers, under 10 would be my bet.