Tricky Computer Sorting Question

A couple years ago I was in an improvisional webboard-based roleplaying group, and in a spate of insanity I agreed to take charge of the “historical archives” (i.e., all our old posts).

Many of these posts were to Guestbook-styled webboards, and displayed in reverse chronological order. The original archivist saved these posts in that order, so I’m left with somewhere around 40~50 files that I need to reorganize.

The posts look something like this:

  • Post 10 -
    More stuff happens.

  • Post 9 -
    Stuff happens.

etc.

Other than cut and paste, is there some way of reshuffling these files? And no, I can’t bring up the original page that this stuff resided on - it’s long since been deleted.

Anything with a decent regex engine should be able to split it up and reverse the order relatively easily. If you provide a more detailed spec about what the files actually look like I could whip up a quick Perl program to do it.

Sample 1 (when the RP was based off a personal site’s webboard)

Sample 2 (when the RP moved to Lycos Guestgear)

What we really need to know is how are these files stored? Are they in a database (MS Access, SQL Server, Oracle, etc.)? Are they stored as ASCII text files? Are they handled as proprietary files by the message board software?

The answer on how to sort depends on the type of file.

If the .html files are all you have, then it could be somewhat complicated. If you could get the original administrator to do a database dump, then it would be easier.

Even if you just have the html…hmm…it shall be possible to come up with something to parse all the subject lines. The dates are all in the “correct” TimeStamp format already. So we look through line by line, use a combination of HTML tag and the word “Subject” to identify the start of a new message…

I have no clue. As I said, the original data is all kaput. What remains is this rather messy HTML.

I don’t mind doing it the old fashioned way (i.e., manual cut and paste), but I was just wondering if there were an easier method to this madness.

Although…hrm…I may have thought of something…

Yes! It does work! It’s still a bit cumbersome, but it saves a lot of sanity…

Off to paste the blocks of html into Excel and abuse the sort function! Whee! :smiley: