.pdf to .jpg question

I used the office copier to scan a couple of pages of a magazine, and send the images to myself. It creates the scan as a .pdf file. I did not see a .tiff option. The idea is that I’ll make the .pdf images into .jpeg (or .tiff) files, and then manipulate them in Photoshop.

When I look at the .pdf file, everything looks normal. When I copy the image and paste it into a Word document or an Outlook email, it looks normal except for one thing: There’s a table on one of the pages. The tables has parameters on the left side of it, and numbers on the right. Between them are periods. For example:

GROSS WEIGHT … 2,300 lbs.

In Word and Outlook, only the dots are visible. It’s like it just erased both columns and left the guide dots alone.

I want to make it clear that the columns in the table do exist when I open the .pdf file. Everything else on the pages – the text, titles, and photos – exist when I paste the images into Word or Outlook. The table is where it’s supposed to be, but everything but most of the dots is missing from it. I’ve tried just selecting the table area and pasting it into a Word document. Same thing. If I copy and paste into Word, any part of the article except the table (e.g., a couple of paragraphs), It pastes normally. If I copy and paste any part of the table into Word, it’s as if I’d copied a blank piece of 45-year-old yellowing magazine paper.

Why is Microsoft erasing columns from the table? Why does it only remove the columns from the table, instead of refusing to paste the whole image? I mean, the table is an image just like the rest of the spread.

I don’t think it’s the fault of the Microsoft Office software.

If you can copy and paste from a scanned document, then the document must have been run through some Optical Character Recognition (OCR) software at some stage.

My bet is that the OCR software recognizes (and therefore renders) the text and the dots, but not the lines of the table. This is generally how it’s designed to work, because the main purpose of OCR software is to make the text in the document searchable.

But I can see the columns in the .pdf document. None of the text in the article is selectable. So the entire page(s) must be recorded as an image, right?

Maybe not the answer you want, but there are lots of online sites that will convert pdfs to doc or jpgs etc. Applications too, but I find most of the free ones of these contain malware.

I don’t want a doc. I just want a picture of the pages.

If it comes down to it, I’ll just scan the pages as .jpeg at home. This is a PITA because the HP printer won’t print to a file my MacBook. Or maybe that was the previous HP printer. And ISTR I couldn’t get it to scan a file to a flash drive either. I had to pull the memory card out of my camera and plug it into the printer, and it would save to that. (Now that I’m thinking about it, I think it was the previous HP printer.)

Or I’ll just type the table in.

It seems very stupid that some text on a page will vanish, while all the other text is where it should be. This never happened with my 35 mm film camera, and I don’t think it would happen with my little pocket digital camera. It’s astounding that a multi-thousand dollar machine can’t do a simple thing like scan (‘take a picture’) of a page that doesn’t have bits disappearing from it.

You mentioned Photoshop in the OP. Do you have to this program?

If you do, just open the PDF in Photoshop, then save it as a JPG.

This.

Putting Word in the loop for anything but text editing is a guarantee of disaster. At a minimum, you will end up with bloated files that are broken in some way.

If you have the photocopiers software and drivers installed on your computer, go into the software package and try setting up the scan from your computer. Other than that, we use Picasa > Import > select copier; Copier on > Scan > Remote… then it’s waiting for the command from Picasa.

If you have Windows 7 or 8, use the snipping tool to copy the .pdf image. You can save the picture as a .png, .gif, .jpg, or .mht file.

Thirding the recommendation to just open the PDF in Photoshop, edit it, then save it in your preferred image format.

If you choose JPEG, only save it after you are completely finished, as it is a lossy format. TIFF is an older format, and you likely can get smaller fiels if you save as PNG. Only use TIFF if you have to.

PDFs of scanned and OCR’ed documents will often be in two parts - the bitmap image of the document, and then overlain with the OCR’ed text, which is rendered transparent. So you see the image, and if the OCR software has done a good job, you get the illusion that you are able to select the scanned text. But if the software has not been able to gain traction on the document you don’t get anything useful.

Inside the PDF are a number of containers for different components. Indeed a PDF can be little more than a container for a range of different image representations. You can wrap a PDF around a jpeg, and even get the jpeg out again with the right tools.

If you scan a document a jpeg is not the right format anyway. The jpeg format is tuned for real life images, not b/w lines, and you will get compression artefacts.

This is exactly what I needed.

I tried to save the .pdf as .jpeg on my office computer, which is a PC. Now that I’m home, I saved a TIFF file from Photoshop. When I have time, I’ll return to the other thread to find out how to make one image out of two images.

Thanks! :slight_smile:

Or, if you’re on a Mac, do it in GraphicConverter, which is shareware (hence free during the trial period). http://www.lemkesoft.de/

Johnny: Did you actually print out the doc? Sometimes the MS Word preview doesn’t show everything on the computer screen, but the printout is fine.

Also, if you weren’t able to save the PDF as a JPG in Photoshop, try going to Image>Flatten, then save as JPG. IIRC, Photoshop will open a PDF as two layers. The type & images will be in the top layer, which is transparent, and the background layer will be all white. You can’t save layered artwork as JPG unless you flatten it first.

You could also try the “Save for web and devices” option without doing the flattening. You will get another window which is independent of the regular JPG process. You’ll get several options for file formats such as JPG, PNG, GIF, etc. Any of them will work, but JPG is the default.

No, I didn’t attempt to print it.

In any case, I can save as a JPEG from Photoshop on my Mac as mhendo suggested. So I’m good here.

That’s two steps when you can do it in one. Import the PDF to PS. Now you have a bitmapped image just as if it was TIF to start with.

Which is what it looks like you’ve figured out by now.