I’m working on a history project. I just acquired a few hundred very fragile, yellowing (sometimes browning) newspaper clippings of various sizes. The clippings themselves are not valuable to me, but the information they contain is.
Here’s my plan. I want convert them into a form that can be handled and read easily. I want to paste them up on sheets of paper (some of them are fraying and broken into pieces), then scan them, color-correct the scans to remove the yellow/brown background, then print them out on a b/w laser printer . (Please note that I’m not looking to OCR them, at least not at this point.)
My question regards the color-correcting step. I have a “lite” version of Photoshop, the kind that usually comes bundled with a flatbed scanner, and I know the basics of using it.
What fast and easy settings should I apply to the scanned images to make the yellowed/browned background “disappear,” while keeping the black text dark and sharp? I want “fast and easy” settings because I can’t be fussing too much with each and every scanned image – like I said, I’ve got a few hundred of these clippings.
Hmm. Change them to gray scale, then up the brightness and the contrast. If your version has adjustment for “levels” you can fiddle with that too, making the dark side very dark and the light side very light and lose the mushy stuff in between.
Something like what capybara suggests should work. But if you want this to be expedited as much as possible, here’s what I suggest. When you open up the first image you’re going to work on, go to Layer → New Adjustment Layer each time you make a layer to change something (Hue/Saturation and Brightness/Contrast may be all you need, but Levels and/or Color Balance might be necessary depending on how bad the problem is). Then when everything looks the way you want it to, use Shift+click to select all of the adjustment layers on the ‘layers’ tab. Go to the little drop down menu and select ‘Make Group from Layers’. This will make a tab group that you can just use your mouse to drag onto the other images, so you won’t have to go through the steps each time.
(Note: I have Photoshop CS4 which is a ‘complete’ version of the software, so I apologize if any of what I suggest is impossible in Elements or whatever you have.)
You may be able to bypass some adjustment stages. Rather than scan in color, creating a large file, then converting to grayscale, just scan in grayscale. Or, better yet, scan in 2-shade black & white. If all the images have about the same color tone, and you carefully adjust the point where the scanner decides it’s either black or white, the resulting image will have the background dropped out and the text sharp. The file will be 100 times smaller and you might not need compression to save it (retaining the highest quality).
One word of caution…if you ever resize the image, convert to grayscale first. The interpolation between pixels will be a lot better than if you leave it in B/W mode.
I’ve done something similar with XNView which is a photo manipulator. I scanned in grayscale and adjusted and the results were good. Scanning in B&W works too but the results were very harsh. But that is an personal choice.
If the OP wants to OCR them though is there anything else he should do?
I ask as, I know the OP said he didn’t want to OCR them, “at least not at this point.”
It seems it’d be better to do it with the ability to OCR them NOW so he won’t have to go back and do it again later on.
I’ve run into the problem of correcting background colors a few times. I know you can select the background with an eyedropper (this seems to be the icon for each type of software I’ve used) and then substitute another color - such as white. That much is obvious. What I can’t remember is, can you tell the software to select not only the pixels you select with the dropper, but a range of colors above and below that color in terms of wave length.
I think there is a way to do this but I know that most of the time what I have done instead is sampled from multiple spots and done a conversion after each one. This gives the same result but depending on the source image, you can find the color you are changing in places other than the background. That’s a problem I’ve never been able to get around.
Yes. You can either click the eyedropper with the + sign next to it and then move the eyedropper all over the background (which may end up picking up colors in the text as well) And/Or adjust the fuzziness slider to get more or less colors around what you’ve selected.
XnView (shareware) has gamma correction which allows the user to skew darker and lighter shades in both directions at the same time. This will make the ink darker and the lesser shades lighter. Within the same control feature is adjustments for 3 colors (red,green,blue) which could be used to neutralize the yellow color somewhat before converting to grayscale. That of course, requires it be scanned in color first.
Instead of scanning, what if he used a high resolution camera? Then you could adjust the light source to bring out the text and suppress the background. This is the technique they use with old manuscripts and palimpsests.
OP, do you have an example of one of these files that you can post? By looking at one of them we might be able to figure out some settings for you to use.
dzero and Joey P, you suggest grabbing a sample color and substituting another color, like white. This technique intrigues me. How do you make the substitution after you’ve defined the first (unwanted) color? I’ve never heard of this (and I fear that my simplified version of Photoshop isn’t equipped to do it).
I have something called Fade Correction in Paint Shop Pro that seems to remove yellow tint from stuff that was white before. It’s very simple and you’ve probably got that function in Photo Shop.
Personally, I’ve never really had great luck using color selection, but in the Selection menu there’s something called “Color Selection” or “Color Range” or something along those lines (I’m not at my computer), you would use that and when you hit OK, everything is selected. From there you treat it just like you would any other selection. In this case you would fill it in with white.
I mainly use Gimp and another freeware product. I’ve used Photoshop but it’s been a while
Anyway, I don’t recall ever doing a selection. You use the dropper to point to the pixels that are the color you want to change. Then you select the color to substitute. then you use the paint bucket icon to change the color to the selected color.
I think you can limit the change to selected areas and I did this once by accident, but I don’t recall how I’m afraid.
Kind of a hijack to this thread but is there anything that needs to be done to prepare for OCR, I’ve never really worked with that before, other than to touch on it.
High resolution, high contrast, hopefully no background textures. At the printshop I managed we’d use a digital copier’s high speed document feeder to get multi-page documents as 600 dpi lineart and run the OCR on that. Works well enough.
If the page has a lot of graphics, stray marks, funky fonts, busy formatting etc, it’s going to give you a lot of junk. You might want to block those parts out with masking tape before you scan.
I’m not sure if you can limit it to selected areas, but it only works on a single layer at a time. So you can certainly select an area and make a new layer out of that area. Of course, selecting everything but the text (or only the text) is what we are trying to accomplish in the first place.