I have a PDF file. I have a text file that lists titles and page numbers in the file. Is there any program that can edit the file and insert these as bookmarks? I can do it manually in FoxIt but there are over 400 entries so I am looking for a way to do it programmatically. I looked at a PDF file in a text editor and the format is not really human-readable, so I can’t cook up something myself.
If I was charged with this task… yeesh…
I’d do it on a Mac. I’d use FileMaker to import your text file that lists titles and page numbers, parsing it in such a way that each record is the name of a title with its starting page number. I’d run a looping script that loops from last record to first (i.e. starting at last record and using Go to Record [Previous, Exit After Last]; for each iteration of the loop I’d call Perform AppleScript and use AppleScript to tell Adobe Acrobat to delete all pages after y where y is the starting number of the previously active record (which, since we’re looping in reverse order, is the starting page of the next chapter) and also all pages before x where x is the starting number of the currently active record, then save resulting file by the title name, reopen original file, continue loop.
Then in Acrobat, combine them all into a single file. The source files would automatically be available as bookmarks and their filenames the bookmark titles.
I doubt that that’s the most efficient way but it’s what springs to mind. I’m curious to ss what other folks come up with.
Not really sure of exactly what you need, but you could look at pypdf. I have had some luck with it doing reasonably simple stuff.
Trouble with PDF is that it is a programming language. Short of executing it, there isn’t a reliable way of knowing what it puts on a page. But pages are at least a reasonably well defined thing in most files. So things relative to pages can often work.
By programatically, do you mean with a programming language? You might be able to do something with the PDFtron SDK: PDFTron Systems Inc. | Documentation
There is also a free pdf-lib library that can do this, but it lacks a high-level API for bookmarks and so requires some manual logic: How to create a table of contents (document outline)? · Issue #127 · Hopding/pdf-lib · GitHub
pdftk should be able to do it
This looks great! Much simpler.
I haven’t reviewed all the answers yet but I’ll give you the whole story here.
I have a collection of lead sheets for songs that I have compiled into a single PDF file. Each song comes from a separate PDF file. I use this file in a mobile app called iGigBook. You can also upload an index to iGigBook with the song name, composer, and page number of each song. In the app, you can search on a song title and it will list all of the occurrences of the song in all available books. If you tap on one it will open that song, based on the indexing information. It ships with indexes built-in for many popular music books, and you can also upload your own custom book with accompanying index.
So I have a PDF file and a text index that lists each song and its page number.
I would like to be able to also use this PDF file standalone without going through the app. To make it useful, it would need the bookmarks, which act as a hyperlinked table of contents. To do this manually in FoxIt I can goto a page, then click Add Bookmark and type in the title. A bookmark will be added for that page. (You can also have a hierarchy of bookmarks, like a TOC with subsections, but that isn’t necessary here.)
So that’s the whole sordid story. I would like to have some way of using the text index I already have to create the bookmarks, either by using an app that already knows how to do this, or by writing some code to edit the PDF file (I am not hopeful about the latter since I don’t have the first clue about the file format; it appears to be binaryized).
We’re on the right track here but this looks like Linux and I do not have a Linux setup. Once I installed a Linux partition so I could boot my machine into Linux but I ended up not using it so I didn’t keep it when I did a rebuild.
PDFtk worked beautifully and did exactly what I need! (I’m taking a pass on whatever that guy has in GitHub.) I was able to write some quick VBA to take my index data and write the appropriate file for PDFtk to process.
I am amazed that I went from “no idea how to do this” to “mission accomplished” in less than 24 hours with my fellow Dopers. Thanks to @DPRK for your help and thanks to all others who responded.