I’m going through thirty gigs of files from various sources trying to sort them. A lot of the original PDF files have been modified by someone adding their name to the first page. That’s easy to spot. But this is only one case of many I expect to encounter.
They are 16 pages of schematics. I see no difference other than the size. Konqueror with KPDF only shows a difference in the modified date and will open both of them. PDFEdit will not open the smaller file.
To do very low-level comparison (for which you will probably need to know a bit about the PDF file format, which is a bit like PostScript), you can use pdftk to decompress the page stream:
Oh, and I just thought of a way to do visual page-by-page comparisons, which works even with pages that have graphics. You will end up with a PDF with the differences highlighted in red. Assume your two PDFs are called foo.pdf and bar.pdf with N pages. Then do the following (in bash):
$ pdftk foo.pdf burst output foo_%d.pdf
$ pdftk bar.pdf burst output bar_%d.pdf
$ for f in {1..N};do compare foo_$f.pdf bar_$f.pdf foobar_$f.pdf;done
$ pdftk foobar_{1..N}.pdf cat output foobar.pdf
These commands use pdftk’s burst function to split the PDF files into one file per page. It then calls ImageMagick’s compare function on each pair of files to produce a “difference” image. Then pdftk’s cat function is called to stitch the difference images back together into a PDF called foobar.pdf. View the PDF in your favourite PDF viewer and you’ll see all the parts that are the same in grey, and all the parts that are different in red.