[list=“1”][li]Sorry, Opal![/li][li]Sorry, Opal![/li][li]Hi, Opal![/list][/li]
Would I get off the hook if I let you share Tuba’s chocolate?
I guess I just can’t shut up on this topic, so here are some rambling comments/observations.
[generic admin/tech hat on]
First, from boardreader.com’s submit site FAQ:
Therefore, if Tuba didn’t know about it, some non-official SDMB person must have submitted it.
However, it looks like boardreader.com does not verify WHO submits WHAT before spidering, so anyone could have done it. (Maybe a little unethical, or maybe someone thought they were doing a good thing as a favor to the mods.)
I just submitted my own Board to them, and before I had time to open a new window and run a search to test, they had already spidered the site!
But the initial spidering was only topic headers. This explains why, checking my server stats, I found only 9 accesses from boardreader today. That about matches the number of forums, so I imagine they read each forum list once, but went no further. I haven’t checked to see if they dipped into the forum archives yet.
It looks like boardreader indexes deeper than just headers for other sites, though, so my guess is they go back and do it again more thoroughly later.
I certainly hope that their spidering schedule is very low-key and intelligent. The last thing any admin wants to see is 20% of their stats full of robots on a regular basis. Maybe this is configurable on request?
[generic admin/tech hat off]
[user hat on]
From a user’s point of view, it is a terrific feature to be able to search multiple forums/boards at once! I may get flak from this statement, but BBSes and Usenet are dead. I support a bunch of clients, mostly computer newbies, and they have never heard of Usenet. I think this is the long-term trend, as web forums take their place. Conventional search engines index Usenet, but not all spider the boards, altho I see no reason why they can’t.
[user hat off]
I hope there is something useful in here.
Correction: The inital spidering was only topic headers and user names. Still, they managed to do it on one sweep. Doesn’t sound like much, but there’s a lot of poorly-written software out there that would have taken 2 * 10[sup]23[/sup] requests to do the same thing.
boardreader is a nice service…
but ,
Musicat, you’re sayign there is 2*10(23) requests everytime ??!
did you notice any slowdown in site performance as a bandwidth or system resources utilization?
I wonder if they just hit only thread pages at a time…
No, Musicat is saying that some poorly designed programs would take a great many requests (I presume that 2*10[sup]23[/sup] was an exaggeration), but that Boardreader appears (to Musicat, at least) to be a well-designed program which does not cause too many problems.
Any updates on that, Musicat? Have they finished their sweep of your site yet, and if so, how much of a hit was it?
Haven’t heard back from the Reader (they’ve been frying other fish, I do believe), I’ll ask 'em again.
And this week I’ll go look at the site to see what we can retrieve, I’d like to do that if we can. No promises, though.
your humble TubaDiva
Administrator
How are you guys finding old posts? When I search all I get are links back to this board, complete with missing posts (if that makes any sense).
If the thread was lost on staightdope, it should be archived on boardreader.com.
Maybe we don’t need to rescue all 987 LOTR threads, but I just printed out several threads I’ve thought about since the crash.
Remember all of the threads on the “Who Wrote the Bible” Staff Reports? They’re all there. So are all of the CCC threads. Even if we can’t jsutify recapturing MPSIMS, are be we can make a case for the column-related stuff.
This is a good thing.
I understand that it should be archived there. But how do I access their archives is the question. When I perform the search all the search results point back here which obviously does no good.
eris: Here’s how I found the “Who wrote the Bible threads.” I logged onto boardreader.com. At the top of the page is a search box. I typed in: Dex Euty Who Wrote The Bible Part. I didn’t specify SDMB or any special search. The result is a screen of links to about 10 or so threads, all with the titles I remember seeing in CSR. When clicked on, the text of the old threads is there (minus links and some coding).
I’m gonna try a link direct to one of the archived threads.
Works in preview.
Right you are, Sir. 2*10[sup]23[/sup] was intended to be an exaggeration and should have been followed by the ubiquitous smiley.
My initial enthusiasm might have been premature. Searching for random data (message headers only) from my own Board on Boardreader turned up NOTHING! Perhaps they deleted it from their spidering list? I didn’t get any notices from them.
I’ll try submitting it again and see what happens.
Uh, Tasmisr, please read my lengthy post carefully, as I think it answers your questions.
And you might not be aware that the SDMB sends out an email message to all those who posted in a thread whenever a new post happens. So emailing me about your post isn’t really needed; your email just shows up alongside the automated SDMB ones.
I submitted my site once again to Boardreader and have configured a stats watch to keep an eye on their activity, assuming it comes from boardreader.com. We’ll see what happens over the next few days.
(I didn’t want to struggle with our board’s long-link-text-truncating feature, so my ‘links’ are just blue. Is there a way to avoid it?)
I repeated your search query. Clicking on “Who wrote the Bible? (Part 1)” (link http://www.boardreader.com/scripts/texis.exe/search/+txDde9Ymw/redirect.html?query=Dex+Euty+Who+Wrote+The+Bible+Part&ttype=research&postid=3ced7bcc120&posturl=http%3A//boards.straightdope.com/sdmb/showthread.php%3Fthreadid%3D108842%26highlight%3DDex%2BEuty%2BWho%2BWrote%2BThe%2BBible%2BPart)
redirects to http://www.boardreader.com/scripts/texis.exe/viewthread?query=Dex+Euty+Who+Wrote+The+Bible+Part&ttype=research&postid=3ced7bcc90
and it indeed displays their cached version.
Clicking on “Who wrote the “Who Wrote the Bible” column -OR- Where’s Cecil?”, however and for instance, redirects to http://boards.straightdope.com/sdmb/showthread.php?threadid=108842&highlight=Dex+Euty+Who+Wrote+The+Bible+Part, our very own site.
So they giving you a link doesn’t necessarily mean they will redirect you to their cache when you click on it, and you have no way of finding out beforehand. Strange. Maybe they don’t keep all of the threads they indexed in their cache for us to read?
But… taking their postID out of above non-caching link and feeding them something like http://www.boardreader.com/scripts/texis.exe/viewthread?postid=3ced7bcc120 makes them display their version. So they do have it cached. Did I mention that’s strange?
I don’t see an easy way of getting a list of what they actually have, short of asking them directly.
BTW, on christmas 2001 I did a snapshot of some of our forum display pages. I did read only 10 pages or few days worth of MPSIMS, for instance, but my GQ,CCC+Mailbag lists start at 26.12.2001 and go back beyond 07.12 (the point at which the posts start missing).
looks like this: threadid, replies, views, date, thread title, you get it
Any use for that?
thanks Musicat, sorry for email if this is annoying
femtosecond, I think they have all the threads cashed in thier version, but it displays only if the board is NOT available on the original site…
Hello All,
I’m one of the founders of Boardreader.com. Nice long thread here. Great community action on a lot of differnet subjects. Keep it up!
‘Musicat’ from doorbell.net contacted me today to submit his site and to tell me about this thread and offered me to post. There is a lot of questions I can see here. Let me try and answer some.
Boardreader.com is a search engine designed and developed to index and keep track of dynamic web pages (ever changing forums, comments, etc.). .
Tasmisr is correct with his posted comment: “I think they have all the threads cashed in thier version, but it displays only if the board is NOT available on the original site…”
We ALWAYS try our best to connect to sites but the problem with crawling message boards is that the forum file path or the cgi path always are changing. New forums come up and old forums drop off creating a very dynamic page to index and connect to. We are always trying new ways to keep on top of these boards. Sometimes you get the archive because the time to connect is to long (connection time out) or sometimes the forum has been removed or whatever.
We are going to be launching a new layout providing better displays of threads. Keep a look out. Comments are always welcome.
I hope this helps and thank you to those of you who made many nice comments.
Best,
Scott
I would have this question for jdavis and spurdon. Jerry, would it be a major problem to add into the present database structure some or all of the missing material from the period between the last complete backup and when the board vandalism/crash when the intervening material was lost, if it were provided in a compatible format? Scott, would it be possible to provide that material in a format compatible with VBB 2.x?
Admin.folks, if there is a charge for Scott supplying the material, would the Reader consider using member contributions (without any strings attached) to subsidize that recovery?
Polycarp,
I think we could provide this data if we knew what exactly you guys wanted. I’m thinking we could just give you a xml feed for you to pick up on a ftp. I’m not sure if it would fit right in. No cost of course. Let me know what I can do…
scott
Wow, stuff from the Great Whiteout?
Cool beans!
–Nenya
The other archiving site does something completely different but very similar to that by redirecting to the original files if they aren’t available at their site. They know best if they are. But for someone to know whether a page is available at the original site, he would have to acces that page there each time it is requested, to make sure.
spurdon, you say every time someone clicks on one of your search results, BoardReader does look up at the original board if the page is available?
Well, obviously it does, because I just did a query in the half hour when our board is shut down for maintenance, and BoardReader redirected to its archive. Afterwards it redirected to the original site again.
That’s convenient, because it reliably prevents people from getting displayed “page not available” errors, but maybe consider adding to your search results two rows of little buttons directly leading to your archive and the original pages respectively. Available or not, just to give the link for completeness’ sake, if people want (or need, see below) to follow them directly.
What’s your timeout on deciding the page isn’t available on the original board? Because this one sometimes (or, sadly, more than sometimes) needs a few minutes to eventually spit out a thread.
And, like erislover’s problem shows, some threads seem available at our board, but have posts missing. Make sure your software doesn’t overwrite its complete copy with our incomplete one, thinking it’s more recent. :eek: Also, some threads may appear to have different content, because some threadIDs that completely went missing have been re-assigned by now.