Anway, the tribulations described by x-ray vision in this thread made me think: why the hell do people use the blasted format on the web?
Only reason I can come up with is copy/print protection. But that’s easy enough to circumvent if you know what you’re doing.*
So really, why the hell is the format so popular? The reasons listed here don’t seem to cut it; methinks that about 99% of the time, whatever needs doing could best be handled by a regular image file (for say, brochures) or a nicely put together form + php/perl/asp/cfm/jsp script.
So what gives?
Sorry if this is IMHO material. If I could get quasi-empirical reasons, I’d appreciate it.
*Unlike about half the people in government and education who make them. I recently had to deal with a college application that was read-only and a job application “form” that woulda been better looking if I’d have printed it and filled the fucker in with crayons.
We use 'em at work for most on-line memos, procedures, etc.
The main advantage is that the document looks the same to everyone regardless of monitor resolution, browser, window size, font, etc…
If we are scanning a memo, then they are just really big image files. But if we print to Acrobat (say from Word), the files are crisp, clean, and relatively small.
Joe Behindthetimes with the most ancient Acrobat-Reader-compatible browser possible and Tom Uptothemoment with a 25" monitor and a Beta-test version of the newest browser out will see exactly the same image. And for matters where the content requires to be protected, that can be vitally important.
PDF is smaller than an image of the document. That alone should be a compelling reason to use it. It’s also a vector, rather than bitmap format, so it’s scalable. (on-screen zoom without going all blocky)
Portability is the next. PDF support either directly in a browser, or via plugin is pretty much universal.
Some systems throw up red flags at active scripting for security reasons. PDF is by comparison, inert.
I’ve never run into an unprintable PDF, but I agree with you on the forms fill-in function. There are a lot of forms that someone forgot to turn on whatever function allows text entry in fields.
On the other hand, when it’s set up right, it’s wonderful. I used to plug numbers into some Federal cash transaction reporting forms all the time, save them, email them and print for audit records.
The California online tax form this year was also a PDF. The Franchise Tax Board paid attention to the programming on this one - it was pretty complex in that it pulled in data from hither and yon and it worked quite nicely.
PDF is an excellent format for dealing with documents that need to be rendered very specifically, such as things that are going to end up on paper. PDF software is available for pretty much any platform (if you go outside of Adobe’s official products) and the results are very predictable. It should NOT, under any circumstances, be used for documents that only exist electronically. There is absolutely no reason to use a more light-weight, more accessible format like HTML for that. Most of the PDFs you see on the web are probably there because the same material is in available in print form, and rather than do the work to make a nice-looking HTML version, they just stick up the existing PDF file.
The vast majority of the time that PDF is used, HTML would be much, much better. The bottom line is that, while there are legitimate applications for PDF, like any technology (coughflash*cough) it’s grossly overused. Most of the time, it doesn’t matter if a document doesn’t display exactly the same on every system. No matter what browser you use, simple HTML like the kind that would replace a PDF document will render nearly identically. I think the main issue is that people get carried away with buzzword technologies and use them even when there are better, simpler options.
Not to beat a dead equine, but this isn’t really true. A “small” HTML font in Netscape looks much smaller than a “small” HTML font in IE. If you’re trying to indent a document or list, the difference in font sizes will kill you. And if somebody has a comment on line 2 of the second paragraph, his line 2 may not be the same as yours.
And we won’t even get into how they print out compared to how they look on-screen.
At a job I once held, I had to get a large archive of 16-24 page newspapers online. They were all done in Aldus PageMaker, and they had most of the original Pagemaker files and their support files available, right back to grotty old Pagemaker 3.0 files. I was very easily able to upgrade them to PageMaker 6.0 and then export them as PDFs that looked very, very good, didn’t take up huge amounts of bandwidth as image files would have, and were also a hell of a lot better looking than the printed piece if we’d scanned from that. And the exporting process went very, very fast.
Converting all of that to HTML would have taken AGES.
Of course, this was an archival file and no interactivity was needed. And, that’s why used PDF files … SO SUE US!
Even if that were true you still would lose the ability to take that document and save it as a single file. With HTML you’d need to save any images separately from the HTML document, no?
I use PDF when I’m writing dance choreography sheets - where I have to have stuff on three seperate lines (step,footwork,timing) line up vertically. And they are already in page layout for printing, so by putting them up in PDF, people can print them and get the same result. For any thing you want people to be able to print in a predesigned format, html sucks.
Some other parts of my website, I have lists where I give the html version for normal reading, with a link to a PDF version for a print designed version.
I did a site for an international company and the Europeans were very stiff on wanting all documents to looke xactly the same to everyone. We had to use PDF (at their request). I ended up making some really sharp HTML pages that looked 99% like the PDFs, which were lighter sized and up-to-the-minute with the database, but we still had to get PDFWriter and do a database-to-PDF batch process every hour on the server
I must say, though, even though my partner and I bitch and complain all the time about PDFs, we use them all the time now! lol
This is why I said “most of the time.” While in some cases, sure, looking the same on all platforms is necessary, in the vast majority of cases it’s at best unnecessary, and at worst a hindrance.
In my experience, text sizes are comparable between IE and Netscape. I’ve tested with NS4, IE6, and various Mozilla versions and haven’t seen simple pages be rendered wildly different between them.
I’ll remind you guys that the OP discussed using PDFs on the web. In general, if you can’t accomplish something using W3C-Standard HTML, you probably shouldn’t be doing it in the first place. There are, of course, legitimate uses for the PDF format. Documents published to the web, however, generally work and look much better for all concerned in HTML with in-line images. Using PDF destroys the benefits you get from working with an online format, notably content that conforms to the size of your window, the ability to select and copy text and images, and the ability to make a webpage conform to your color and font preferences. Not to mention that PDF files are many, many times larger than an equivalent HTML document with images.
I am required by law to provide documentation that is accessible to all and remains intact, without changes. While a Word document may first appear to be a better choice, the meager PDF security meets our legal requirements while Word does not.
PDF documents render exactly as intended. Individual Word/web users modify on their end and this is unacceptable.
It is not cost-effective to create/forever maintain HTML files from 1,000-page PDF documents. (Yes, some of our documents are that large.)
Some of these documents change on a regular basis (monthly, yearly) and it would a be full-time job just to maintain these documents as HTML files. We cannot justify this cost level.
This is not true. Because PDF rendering involves so much passing of formatted data, it’s quite possible for someone to embed a buffer-overflow attack in a PDF document. I just spent a long while completely ripping KDE out of my sytem and replacing it, because a potential vulnerability along these lines was present in several parts of the environment, so I know what I’m talking about here.
By comparison, plain static HTML (without scripts or dynamic inclusions) is much, much safer in that department.
Not necessarily. XHTML/XML is the way to go if you want to write a browser-only document. However, they are not any good if you want to print them out, so you will also need something else, which is where PDF comes in.
It is also much, much easier for publishers such as <em>Nature</em> and <em>The Lancet</em> to convert their existing files into PDF then to write HTML files from scratch. HTML also does’t work very well with equations, formulae and molecular structures.
You are comparing a known security risk with a potential, yet unproven, security risk. Buffer overflow has nothing to do with whether the data is formatted or not, either.