Bad middle-of-night performance now

I was happy to see this brought up, but dismayed to see it turn into a flamefest by certain individuals. But I want to throw my data in so the admins can see what we’re talking about. I’m a reasonably good network troubleshooter, and I know how to tell when the problem is the site I’m trying to get to versus something in between.

All times are in Board Time.

The facts:

Before the Magical Weekend Reconfiguration which made everything faster, I had worked out that midday was pretty useless, but if I got on after 2am, it was usually fine. This with the exception that if I tried a request just after 3:30am, I usually got the message “the administrator has closed the board”, which only lasted 10-20 minutes (note: not until 6am as jdavis indicated – it may in fact be that a backup is happening until 6am, but it’s never had an impact on the board before – normally at 3:30, I’d just go browse memepool.com for 10 minutes and come back and continue reading SDMB).

After the Magical Weekend Reconfiguration, I was delighted to note that the daytime performance was a bazillion times better, but I’m still in the habit of reading it at night, and the nighttime performance has gone completely to hell. I get instant responses from http://www.straightdope.com, but the response from boards.straightdope.com is even more unbearable than daytime performance used to be. I will point out that the routes to these two machines are identical, and they’re on the same network.

When I say “unbearable”, I mean I can telnet to port 80, type “GET /sdmb/ HTTP/1.0”, hit return twice, and wait 20 minutes for any text to come back, and even then, I get a few lines of HTML and then it goes to sleep again for a while. (the “telnet to port 80 and do a request by hand” technique is useful for ruling out browser configuration issues on the client end)

I was hoping that this was part of some experimentation that the admins were doing post-reconfig, and that it would go away after a few days, but that doesn’t seem to be the case.

We’re in no position to demand that this be fixed, but if you actually want to address the problem, I want to help you understand the trouble I’m having. Blaming network congestion in this case is not the right thing.

Hope that helps.

I have to agree, this AM (8/4) from about 3:40AM onwards, the board was completely inaccessible. Every time I tried to connect, it would hang for about 10 minutes and then I get a “connection refused” message. And I’m still getting them occasionally today mid-day. Something is broken.
BTW, am I the only one who doesn’t see any difference in performance during the day? It seems as sluggish as ever.

Oh good…I thought it was just me. Thats why I popped in to see if anything is going on or to see if maybe a recent download of windows 5.5 was causing the slowdown. I seem to have problems day and night with speed.

Something may be wrong with the backup process. It’s not going through in the automated fashion as it should. Network traffic is very bad as well during the early morning hours and it could be related to some synchronization issue with the backup. This morning the outward bound bandwidth was essentially flatlined from 03:30 until 09:00. Do I know why? No. Will I look into it? Yeah, eventually… I also have 100 other things I should be doing as well and SDMB performance is never very high on the list. Sorry but that’s just the priorities one can expect when dealing with a free community resource.

The “fix it! I’m not going to stand for this” approach that was displayed yesterday will just make me less likely to do anything about it because I’m a stubborn bastard and instead of hopping to when somebody gets in my face I slow down on purpose just to piss them off. Bad character trait I know but we all have to manage to get by in this world and this is how I personally deal with confrontation.

Please note the following is not directed towards anyone in this thread but is meant to explain what I’m talking about:

First off, I never said it was a virus or that it wasn’t something local to the web server. Anything is possible and I’ve learned that definitive statements just get you in trouble. That being the case I’m even open to the possibility that a small tribe of garden gnomes have taken up residence in the server and are using it either to contact alien life forms or perform arcane rituals in the early morning hours. :wink: If so, I’ll attempt to negotiate with them…

To clarify my position since it keeps getting misinterpreted, the web server does normally perform a backup between 03:30 and 06:00. The boards themselves are taken offline from about 03:30 to about 03:45 to allow the automated process to dump the contents of the database without it being modified at the same time. Once the dump is complete the boards are brought back online and the backup to tape process begins. This occurs in the background for multiple hours and so when I say the board is backing up at 05:00 it is in fact backing up. I don’t want to quibble but I know what I’m stating is correct. It’s that people cannot see it and thus assume it’s not occurring.

Anyway…yes, I do understand that something out of the ordinary is happening and while the backups in the past did not affect performance (as much) something does seem to be affecting performance now.

Wasn’t complaining, just relieved it wasn’t on my end.

jdavis if you’ll post a mailing address, I’ll send you a sixpack of the beer of your choice to help you ignore a certain ignorant [sub]but according to him, a highly intelligent[/sub]poster.

Of course, I suppose if all the people who appreciate your gallant efforts sent you a sixpack. we wouldn’t hear from you or be able to read the boards until you sobered up.

I notice the board is timing out a lot when ping’ed. It’s possible I screwed the data cable up on Wednesday when I had the server on it’s side to install more memory. When I righted it I found the cable to be loose because it had unfortunately gotten stretched in the process. My bad. Would that explain the performance problem during the early morning hours? Unlikely…but the board is sluggish on a Saturday which is inconsistent with it having been smoking at times during a weekday.

I’ll mosey down to the office and fiddle around with it. Expect the board to be down for a few minutes over the next few hours.

Jerry

What a Guy! promise him a sixpack and he goes to work on a Saturday! Itellyathatsdedication.

Hey, Jer! I got you a sixpack, and tried posting to tell you early this morning. Had one of your beers while waiting for page to load. Urp! (Well, maybe just one more, Jerry won’t notice).

Anyway, board is flying right now but I will have to owe you the beer. Thanks for being a mensh.

jdavis - Thanks for filling us in.

Since my usual browsing hours are 1:00 am to 4:30 am Chicage Reader time, I too have been banging my head against the wall these last few days.

Glad to know it’s not just me.

Noticed that the server was experiencing about 30% packet loss when pinging our router. Saw packet loss from home to the server earlier in the day. Surmized it might be the network cable and was prepared to swap it out after doing some tests.

First test was to perform a cold reboot of the server and check for packet loss again. After the reboot the packet loss is at 0%. Data cable? Unlikely. Probably something else is going on but for the time being the board is once again zipping along and the packet loss anomaly is gone. I’ll watch it occasionally over the rest of the weekend and see if the sluggishness and packet loss come back.

Jerry

Judging from the lightning fast speed of the board in the last fifteen minutes jdavis musta jiggled that wire just right.

jdavis, I’m sure I can speak for the vast majority of the posters in saying this: we understand the situation, and we appreciate all of the work you have done and will continue to do.

Just so’s you know, overnight US time = prime time evening Australia time.

3.30am (when the boards do that downtimething) is 7.30pm AUS time.

Kind of sucky, but we just gotta tolerate it.

When working correctly the board should go offline between 03:30 and about 03:45 every night. At 04:00 to about 06:00 a backup to a remote LAN tape drive should be performed. It needs to copy/verify about 1.7 GBs of data. Beyond the 03:30 to 03:45 downtime the effect of the backup on performance shouldn’t be that bad. It’s a lot of data that has to be pumped over the network but even when I run it at primetime it doesn’t kill the boards performance, just reduces it.

Not being on the boards that often at 04:00 I can’t really say what the performance is normally like but having to wait 20 minutes or timing out completely is pointing to some other problem. I’ll eventually figure out what is going on and hopefully get you back to your normal response times.

Jerry

Sounds like the timeouts and failure to connect are more related to the packet loss. The “Connection Refused” msg is just Netscape’s braindead way of saying it couldn’t connect in general. It also makes sense because I got a lot of pages that loaded slowly, drew halfway and then died.
But apparently that situation is solved for now. Let us hope you have not discovered “The Intermittent Law” which states “Intermittent problems disappear whenever the technician appears, and reappear when he disappears.” This phenomenon is also known as “random success” (as opposed to a random failure).
Thanks for all your work on the systems. I’m sure glad it’s not me that has to keep SDMB running.

Garden gnome negotiation hint: they love peanut butter. This makes it easy to tell if you have an infestation because, if you do, you’ll find little sticky peanut butter handprints around the place. It also means you can use pb+j sandwiches to tempt the little suckers out. Use a trap that won’t hurt them since, once domesticated, gnomes make great household staff (garden gnomes are already halfway tame, so you’ve got a head start there).

Sincerely, thank you for all the work you do to keep the boards running. It’s a great place to hang out :slight_smile:

Thanks, Jerry. We appreciate your efforts. I hope you didn’t take my statement about the backup from 3:30 to 6:00 as implying that you were wrong about the backup happening. I merely meant that any maintenance that was happening previously only had an noticeable impact for about 15 minutes, regardless of whether or not it actually took 3 hours.

If you want some more data gathered while the problem is occurring* (i.e. side-by-side pings of the last router and boards.straightdope.com, or traceroutes), I’ll be happy to oblige.

One more poster that’s firmly on your side, jdavis. You may be interested to know that I’ve had much more difficulty accessing the boards lately on Netscape. On IE, it works great. Maybe it’s just me…

HEY, YOU FIXED IT!!!

Here it is 2:55 and the board is back up and running good.

Thank you Thank you Thank you.

It’s real hard to make it through a night shift with the SDMB.

Oh, did I mention…Thank you

with = without