My recollection is that when PDFs first came on the scene, they were touted for how they’d look the same regardless of platform or device. Portable WYSIWYG: What YOU, the person creating the PDF see, is what THEY, the recipients, get.
I work with PDF documents all day long; it’s the format in which most participating agencies send their reports to our organization. Earlier today I had a colleague from one of those agencies respond to my inquiry, in which I pointed out that they’d listed their total elsewhere as 151 items dispensed, but their detailed PDF form only showed 39. Colleague writes back, “Weird. It was showing 151 on my screen. Well, I don’t know what happened but here it is again”.
I open the attachment. It says 39.
Every once in a while I get PDFs that look blank in Adobe Reader or Apple Preview but if I open them in a web browser I see the numbers. On a hunch I open this PDF in Brave and yep, 151, and furthermore all the other fillable fields have totally different values than what Adobe Reader shows. This isn’t good… if I receive a blank form I always try it in a web browser, but if it has values how’m I supposed to intuit that maybe the ones I’m seeing aren’t the ones they intended?
Other PDF weirdnesses: I often open a PDF, transcribe the data*, then close it without having made any changes to the document, but I’m asked whether I want to save changes. Or I open it and it immediately brings up a Print dialog as if I’d said I wanted to print it. Or the boilerplate text that is definitely not configured to be editable shows up W ITHTHECH ARAC TERS V ERYO D DLY SPAC ED.
My instinct is to blame the rollout of editable PDFs. Back in the days when PDFs were new, they were, by definition, not editable. I do see the utilitarian advantage of being able to create a form where the structure of the form can’t be edited but it has fillable parts so people can fill them out and send them back. So I’m not against that on principle or anything. It just doesn’t seem to work reliably.
———
**Yeah I know, you’re thinking “That’s inefficient, sending you data on PDFs that you then have to transcribe”. You’re right, of course. I’ve converted the larger orgs to sending me the same data on spreadsheets so I can just import it directly. I didn’t set this up. I’m also making good money on their design inefficiency since they pay me to do this. It used to be worse – they had orgs sending us scans of paper forms with choice bubbles and freehand text fields that all required interpretation with Optical Character Reader technology, and it was only ~ 80% reliable. When I first took this job my primary responsibility was correcting the peasoup chaos that the OCR software spat out.
I haven’t noticed much actual discrepancies in PDF’s myself but this is pretty concerning. Your theory makes sense.
We rely on them for various authorizations, as well for numerous reports and obviously invoicing. (Then there are all those contractors who still send me invoices in .doc or .xls format !!!)
What is your colleague using to generate PDFs? If it’s a Microsoft product, they’re notorious for embedding trash because users don’t know what they’re doing. There’s Flight check apps that alert potential problems with PDF generation. Adobe has them, MS doesn’t.
Also, there might be some issue with Reader if you’re looking at PDFs that have filled-in text fields. You’d be better off using Acrobat Pro.
I saw someone explain something the other day, that Adobe doesn’t want to get into copyrighted fonts legal battles, so they’ll sub out the typeface sometimes. Not a number switching thing, but another way what you see is not what others see, potentially.
It’s not something we dictate, since they are separate agencies, and so it’s probably different from organization to organization. We created the original form that they fill out, except that some of them recreate the form in their own software and fill that out and save as PDF. The PDF that we distribute has fillable parts of the form, which means they’re all saving the results on their own platform in whatever software they used to open a PDF.
You see what I mean about the shakiness of it as a standard.
And let’s go back to that thing about the text that is definitely not configured to be editable shows up W ITHTHECH ARAC TERS V ERYO D DLY SPAC ED. This is our form that we distribute coming back to us filled out. And the uneditable boilerplate text is coming back like that just from them filling them out and saving changes? I agree that it’s somehow a font-substitution issue but… why would saving their answers affect the boilerplate text?
PDF is an ISO standard. But which one? PDF 2.0? 1.7? Do the documents you get even conform to some version of the standard? It is easy enough to generate something that claims to be valid PDF but is not.
You should require that you be sent (for instance) valid PDF/A and see if that solves your problem. [Not that all validators always agree on what is a conforming document!?!]
This is the thing most people misunderstand with regards to PDF files. They often see it as a word processor document and believe that it works that way. This is not true, it is more like a painting in that it looks a certain way because it was created to look that way. If you don’t have the same paints that were used to make the original painting, then editing it will make it look weird and not match.
There is no requirement that what you see in the PDF be editable, nor that the underlying text matches what you see on the screen. I have to deal with extracting “data” from and editing PDF files all the time, and I have to remind people, that you can’t just replace some text, because that is not how it works, nor was it ever intended to work. If you aren’t producing a PDF file from some existing content, then the results will be mixed.
I do not think that this a failure of the standard itself, but a failure of people’s understanding of it. That is why there now exist a few variation of the PDF standard like PDF/X and PDF/A which are for specific purposes and have to follow specific rules of what can be part of the PDF file. PDF/A in particular is for archiving content and therefore certain things aren’t allowed to be part of the document and you get a warning, if you are even allow to edit, that editing it will cause it to no longer comply with the standard.
Dealing with PDF files can be a pain, especially when you have to remind people all the time, that it is not a word document, and you should not expect it to behave like one.
Adobe created a double-edged sword when they made PDFs editable. It was meant for minor edits like a misspelled word, but some users thought they could reuse them, replace content, and make totally different documents without having to go through designer apps such as MS Word, Indesign, etc. That just leads to all sorts of trouble.
I’d have to know more about what you’re using to generate PDFs to offer suggestions.