Did someone actually get fired over the "Google Safe" flub yesterday morning?

I put this in GQ just on the off chance we have access to the factual answer to the question.

I also want to discuss the question whether someone should have gotten fired over it. I’ve seen lots of expressions of certainty that someone did, and even some expressions of a hope that they were fired. Myself, I’m surprised if someone got fired, and my hope is that no one did. It sounds like it was about on the level of a simple keystroke error, the kind of thing that really could happen to any unlucky person. I can’t see how it would be good business to fire someone for something like that. If it were part of a pattern of behavior, sure, but a single unlucky incident? That’s not evidence that the person will tend to make the company lose money in the long run.

-FrL-

Short answer: No. I’d bet the farm on it.

Long answer: With very highly visible companies, it is accepted that mistakes happen. Everyone screws up in some magnificently spectacular way sooner or later. I was told on my first day of the job that I would, someday, bring down an entire product line. shrug

I tried to google “google safe” to figure out what you’re talking about, but only found things about safe search. Wanna expand for the curious?

Someone stuck an extra “/” in the URLs for Google search results, causing Google Search Safe to flag every search result as possible malware.

second

not all of us are up-to-date on such things as “news” :wink:

ETA: ah, sorry then, got it in just before my post, hehe

Yeah, very likely not, especially since with vast and complex software systems like Google’s, the error might not be a specific mistake attributable to one person anyway. When Google’s search system finally achieves sentience, it’ll probably still produce errors like this from time to time: and what are you going to do then, fire the entire product? :stuck_out_tongue:

Here’s a link that includes screenshots of Google marking every site as possibly harmful. Here’s Google’s explanation of what happened.

Here’s a quote from the blog…

It’s just the classic “it’s just config! we don’t need QA!” programmer fallacy. Probably the single most common programmer cause of crashing products. :smiley:

I once worked for what used to be a top-five website in terms of traffic, and pushed a change to production that broke one of the primary revenue-generating systems by screwing up a parameter in the links in some advertisements. Then I went on my lunch break.

Suffice it to say, the mistake was quickly realized, and we fixed it, and altered our processes to make sure it wouldn’t happen again. I also ended up owing several people some beer.

People don’t generally get fired for a gigantic screwup unless they were deliberately negligent or not doing what they were supposed to do, or it’s an issue of public safety, or they were looking for an excuse to get rid of that guy anyway.

I guess that’s better than doing it at 4:55.

Y’know, I was trying to figure out why Google flagged a Microsoft knowledebase page as malware…

I offer this as an example of the same kind of “human error” (translation: didn’t follow procedure) that makes Google’s error look like a burp. I don’t know if anyone was fired over this one.

>On Wednesday night, July 16, [1997,] during the computer-generation of the
>Internet top-level domain zone files, an Ingres database failure resulted
>in corrupt .COM and .NET zone files. Despite alarms raised by Network
>Solutions’ quality assurance schemes, at approximately 2:30 a.m. (Eastern
>Time), a system administrator released the zone file without
regenerating the
>file and verifying its integrity. Network Solutions corrected the
>problem and reissued the zone file by 6:30 a.m. (Eastern Time).
>
>Thank you.
>David H. Holtzman
>Sr VP Engineering, Network Solutions
>dholtz@internic.net

Interesting.

What’s a zone file?

-FrL-

A file containing information about zones. Duh.

It’s a bigass list that maps domain names to the addresses of their canonical nameservers. This famous incident resulted in the root nameservers for .com and .net giving the wrong answers to queries for several hours.

Why do you do this? I remember when you came on board years ago wanting to help everybody with everything, now you ridicule and belittle people with perfectly valid questions.

Maybe he meant it as a joke but it didn’t read too well—

What’s a lensgear?
It’s a lens with a gear in it, duh.

(I thought he was just kidding)

-FrL-

When I see something like this, it makes me think they have controls problems somewhere. Any change that goes out to the production systems should be reviewed by a second person, just to prevent this kind of mistake. (In my experience, web companies often skimp on this kind of process, right up until the time someone makes a big screw-up.)

My guess as to the failure is that it’s due this file being huge and having a large number of entries, since it lists every one-off phishing site. So even assuming it ends up being reviewed, the reviewing is something like “oh, 900 new entries, that looks about right”.

Firing someone wouldn’t solve any problems. I would want answers to these questions:

  1. Why wasn’t this caught?
  2. Why is an illogical value allowed in the file?
  3. Why didn’t we have a test case for this?

And then from those answers you would work to a solution.

At the time, Network Solutions was the government-authorized entity responsible for maintaining the master database of all second-level domain names under the .com, .net., .org, .edu top-level domains. That responsibility included maintaining the root servers. The root servers are the worldwide authoritative source for DNS information for those domains (that is, they translate the URL into an IP address). When this error occurred, it meant that requests to connect to a URL got a garbled IP address or maybe no IP address (I’m not sure technically what the result was). It basically broke DNS for a few hours, although most networks cache DNS data so the effect was not a total meltdown. This wasn’t just a glitch in their business model, it showed how Internet DNS had a single point of failure. (I stress the DNS part because the Internet itself doesn’t require DNS; you could still connect if you knew the IP address but for all practical purposes for the average guy this broke the Internet.)

BTW based on my reads of Q.E.D.'s past posts, I think the above remark was meant as an ironic joke, in the sense of, “It’s a file with zones but nobody knows what that is either.” But sometimes jokes don’t come across in print quite the same way as conceived.