I’ve just finished scanning in several issues of a defunct fanzine and some newsletters from years ago. I want to put them on the web for people who weren’t around then and want to see what they looked like. I scanned every one twice. The first time I made separate image files of each page, and the second time I OCR’d the pages to get text files.
After I clean up the .bmp files (crop, straighten if they’re slightly off-kilter), I want to convert them to pdf files, so people can see what the original fanzine looked like. However, I hate pages with text on them that are ONLY pdfs, so for the .txt files, which need to be cleaned up too (the OCR’ing leaves much to be desired, but it’s better than typing in everything from scratch), I want to add html code and lay them out on a web page as close as possible to the original layout, including picture placement (which I’ll extract out of the bmp files).
I know basic html so I’m not concerned with that. I assume style sheets are the way to go in layout. There are a dozen issues with each issue having a couple dozen pages, so style sheets would enable me to have a uniform look, but I don’t know much about style sheets. I know a little bit, enough not to be 100% clueless, but not enough to know the best way to do this project. Can anyone give any tips on doing this? Perhaps a link to a page where this has been done? Once I see how someone else did it I can grasp the concept. Since you have to add css code to the html, I’d rather know what to add before I start adding html to the raw text.
Sure, it would be easier to just put up the pdfs, but I want searchable/copy/pasteable text too, so if I’m going to put up the pure text, I might as well try to lay it out close to the original, just so it looks nice. I don’t want it to be complicated though. It’s just a fan project and no big deal.
I have access to CUTEhtml (my primary html program, but you have to do css by hand), FrontPage (which I’ve barely used), TopStyle (I’ve never used it, but it seems cool), Expressions (which I’ve never used but it seems cool too) and UltraEdit (which I mention mainly because I do most of my html coding by hand in it).
Would anyone have any tips or suggestions, please? Thank you in advance.
No, no worries there. Both the person who did the fanzine and the artist who sent out the newsletters are friends of mine and are fine with me doing this.
I’ve been looking at how to do style sheets and it seems complicated and not supported by all browsers. I’ll probably just use html tables for the layout, and put the pages together by hand, but if anyone has a better idea, I’m all ears. It’s a very simple layout, your average fanzine stuff. Here’s a small image of a couple of example pages (the web and pdf versions would larger of course, and only one page would be viewable at a time).
The simpliest, quickest way would be to make either GIFs or JPGs of the images, then post them sequentially on one or more pages. No style sheets needed, and a half-dozen lines of HTML code will suffice.
Might look un-fancy, but it will get the job done. PDFs might or might not be smaller in size, require an additional reader that some people don’t care to use, and might take an extra processing step on your part.
Use GIFs if the images have limited colors (B&W, especially); JPGs otherwise. Please, for the sake of everyone’s eyes, use compression values that degrade the images a minimal amount. B&W GIFS can be made with a 2-color table (8 is better) and they can be very small, but keeping the file size down is less important these days than it once was in the days of dialup dominance.
OK so let me get this straight. You are grabbing the raw text from the PDF and are going to try to recreate the look and feel of the original layout with the raw text (which you have edited since the OCR text output isn’t perfect).
So in the example image you provided, we see two pages of an open magazine. Do you want to set it up to be viewing just one page at a time, or two pages, like an open magazine? (I would recommend one page).
Anyway on the first page you have two columns, headings, and some images. You seem to have several different styles of heading and text, so this is where using CSS would be good. If most of the zine uses a 2-column layout like the example, that would also be good for css. You could lay it out in divs or in a table.
Basically, let’s say that in any given issue there are 5 different fonts used in regular text, and 5 different heading styles… css is very good for this because instead of styling this all in the html, you put it in the stylesheet and just put something like <div class=“heading1”>Heading goes here</div> or <div class=“textstyle1”>This is some text</div>
You can control any number of aspects like this, and it’s all included in that simple code that grabs the style off the stylesheet. So you can reuse styles over and over without having to retype it into the html each time.
I would be more than happy to help you. I could throw together a basic template for you to start with, if you like. Maybe you could send some more example pages? My email is in my profile.
Let me expand a bit more on what I was trying to say. Let’s say you have a section of text that you want to be bold, in a certain font of a certain size, a certain line height, a certain color, on a certain background color, of a certain style such as underline or italics, etc. Instead of having to style the text with a bunch of crap in the html, you just define all of this in the stylesheet and name it whatever you want, such as “textstyle1” and just wrap the short code around the text you want to style and voila! You are done. And you can reuse this text style over and over again without having to put the html code around each bit of text you want to style.
I just had a thought. You could put them out on the web as PDF’s. At some point, Google is going to index them and then convert them on its own to give the “View As HTML” feature. That may good enough for most purposes but you could go grab those pages on your own and then post them straight onto your website.
If you want to preserve the appearance and layout of the original, only a scanned graphic image will do, unless you have access to the source files (not likely if they are older than 20 years). Is that your intention?
Thank you for the suggestion, but I’d like it to be searchable as soon as possible. If all the text (of each issue) is on all one page then a site visitor could search from the top. Better for me to just do it all now, than wait. I have some time now, wheras I may not in the future. The fact that all the text will be indexed at some point and Googleable is important for the future, but I’m impatient.
I am making pdfs though. I didn’t know that Google had a “View as HTML” feature for pdfs! I never use Google.
The biggest concern is that the content is accessible in every way I can make it accessible. That means searchable text, and text that’s able to be highlighted, copied, and pasted elsewhere. So I wouldn’t want to use just pdfs or just image files. It’d be easier, but inaccessable.
You’re right, not everyone will have or want to use an Adobe reader, so I will also put up jpgs. Color optimization is not a problem, all of these are in black and white.
I have the original fanzines, but not the original files that went into them, but that’s not a problem (though I do wish I had copies of the original photos that were used.)
I’m grabbing the raw text from the original, not the pdf. I scanned each page twice, once to get a .bmp file (which I cleaned up and saved as a .jpg, which I’ll put up as per Musicat’s suggestion, thanks) and a second time to get the raw text into a text file. I plan to make pdf files out of the .bmp files.
I know that there’s a way to OCR from pdfs, but I don’t know enough about Acrobat to do it, so I’ve already done it the regular way.
Here’s a quick and dirty layout (NOT the one I plan on using):
<html>
<head>
<title>Fanzine title</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#" vlink="#" alink="#">
<h1 align="center">Fanzine title</h1>
<h2 align="center">#3 Fall/Winter 1994</h2>
<a href="">Link to index of all issues</a>
<a href="">Link to Previous issue</a><a href="">Link to Next issue</a>
<a href="">Link to pdf of issue</a>
<a href="">Link to jpgs of issue</a>
<h3>First header</h3> <h3>Fourth header</h3>
<p>Content</p> <p>Content</p>
<h3>Second header</h3> <h3>Fifth header</h3>
<p>Content</p> <p>Content</p>
<h3>Third header</h3> <p><img src="">picture</p>
<p>Content</p>
</body>
</html>
But that’s not going to work, so it has to be tables or css. As you can see, I want there to be links to the issue before it, the issue after it, the pdf file, and the image file (because as Musicat points out, not everyone will have or want to use an Adobe Reader, and I still want them to see what the original looked like).
Just one page at a time, yes. I hate scrolling from side to side when I’m reading something on the web.
I don’t need to re-create the various fonts she used (and she used a different font on every page, it sometimes seems!), it just needs to be web readable and in roughly the same layout. There’s also various clip art she used that I don’t think I need to include. I do want to include the photos though, in roughly the same placement.
WOW, thank you! I’ll write you and give you the url to some real-size samples. A template would be wonderful, and very very helpful. A template would also help me understand css better.
Thank you. It’s a fannish thing that I should have done years ago (the woman who did the fanzine isn’t a computer type). I’m doing more fannish things recently like transcribing radio interviews so it just seemed like something I should get done and up on the web for the folks who are interested.
Thank you DrDeth, Musicat, Shagnasty and nyctea scandiaca!
Music-releated. They’re for a specific musician. I’ve just had them stuffed in a box for years (the last issue was in 1998) and I recently dug them out because I knew there was a specific bit of information I needed in one of them. As I was reading through them I thought ‘there’s some really good info and photos here’ and decided to put them up on the web. The woman who put out the fanzine said fine. I have more resources than she does so she’s greatful.
It was really simple to do and I think it looks great! Everything can easily be customized - the fonts, the text size, the link colors, widths of stuff, anything.