Problems converting MS Publisher files to PDF

I’m trying to convert a bunch of my old newsletter files from MS Publisher to PDF. The issue I have is that, depending on the method I use (Save As, Export, or Print) I get different results, none of which is exactly what I want.

The main problem is that with the first two methods, any text hyperlinks, which are blue in the original, end up gray in the PDF, although they remain working hyperlinks. OTOH, if I use Print to PDF, the links are blue, but don’t actually work as links.

I’ve tried deleting and reinstalling the PDF print driver, and using an older version, and nothing seems to work.

The annoying thing is that a few years ago, before I retired and stopped publishing, I was able to get working blue links in the PDFs somehow, but now I can’t for some reason.

I’ve spent most of the day trying different options, including online conversion services. Three or four of them took so long to do the conversion that I just gave up. The only one that actually returned a file completely screwed up all the fonts and layout.

Windows 10, Publisher 2016, Adobe Acrobat Pro 2020, Dell XPS-13 laptop.

Thanks for any ideas and suggestions.

Just a thought: Have you tried converting it to some intermediary format first (like a Word doc or Microsoft’s own XPS) and then to PDF from that intermediary format?

If you can email me an example Publisher file, I’ll take a look too. I’ll PM you in a sec.

I don’t see your file type specifically, but this site can do a lot of conversions - I’ve linked to a page for converting to PDF. I use this site all the time and it almost always comes through.

Is it possible that your links are gray because your computer has noticed you’ve already visited the site? I know my pdfs will sometimes do that if I’ve visited those sites previously. Since I’m guessing you are using sites you have been to that could be it. Try just making up a site and see what happens.

I thought of that, too, and I haven’t tried opening the file on another computer, but all of the many links in the document are gray, and I haven’t even thought about clicking on the vast majority of them.

But I will try opening it on another machine.

You can try Zamzar. It’s an online converter. Here’s the page that converts PUB to PDF.

Publisher is going to be discontinued in 2026 according to this article.

Thanks, but this was the site that worked, but screwed up the fonts and layout.

Thanks, but this was one of the ones that took forever, then finally reported “Sorry, we hit an issue when converting your file.”

@Edward_The_Head, thanks for the suggestion, but the files are displaying the same on another computer.

Trying online services was just an act of curiosity, but wouldn’t have been practical even if any had worked, since I have about 220 files to process, and uploading and waiting for them all to be processed, would have been a nightmare. To say nothing of the fact that I would have had to pay for a process that should work quickly and easily on my own computer.

My next step will be to see if Adobe can be any help. But given my history with them, I’m not hopeful.

Thanks again for all the ideas. If anyone has anything else, please let me know.

Do you have Acrobat Pro?

If so, you can try saving the PUBs as EPS files, then open them in Acrobat Pro. The Distiller app will give you the spinning wheel for a minute, then generate the EPS into PDF format.

If you have access to Adobe Illustrator, you can open the EPS files and save them as PDFs. You can also open EPS files in Photoshop, but you’ll wind up with image files.

If no Adobe products are available, you can try Inkscape, a free vector design tool, similar to Illustrator. I haven’t used it, but I assume you can open EPS files with it and save them as PDFs.

This is probably more work than you’re willing to do, but PUB is a dying format and you’re unlikely to find support for it.

Thanks. I do have Acrobat Pro. Publisher allowed me to save them as .PS files, not .EPS. Distiller converted the .PS files and yielded the same unsatisfactory results.

You can convert the grey-colored links to blue in Acrobat, but you have to do each link individually.

If the hyperlinks aren’t working, you may have to relink them manually. The URLs could also have changed or been discontinued.

I know, and I tried it, but waaaayyy too much trouble. There are dozens of links in each issue, and 220 or so issues. Not happening.

It’s not that the links are bad, it’s that in the version where they’re blue, they’re not links, just blue underlined text. In the version where they’re gray, they work fine. But I want them to be blue so it’s obvious they’re links.

I signed up for Office 365’s one-month trial, which includes a newer version of Publisher. It seems to be able to save them to PDF just fine, blue and working: recolored.pdf - Google Drive. The two Google links in that example are just default text colors. The orange one is me changing the color on purpose.

Perhaps you can open up the old files in the newer version and export them from there?

(Also got your DM with the example file, but couldn’t access your GDrive share… will try it on my side once I get access).

Thanks again for all the suggestions, especially @Reply, who kindly provided additional off-line assistance. I found a workaround that worked, and later remembered that only a relatively small number of issues contained active links, so all this fuss wasn’t really worth the effort. Thanks, anyway.

Here’s my new challenge. As I mentioned in the OP, I have 216 issues of a newsletter I published over 24 years that I’m putting into the public domain with a Creative Commons license. To make them as useful as possible, I want them to be searchable singly and as a whole.

I’ve added embedded indexes* to all of them, per the instructions from Adobe that I found here.

However, the instructions for indexing multiple files (creating a catalog, in their terms) are much less clear to me. As is the question of how I might distribute them.

The run of the newsletter consists of 24 volumes (years) with between six and thirteen issues each. It would be natural to group them by volumes, rather than have one large folder with all 216 files. Adobe talks about creating a directory structure for distribution on CD, but that’s not happening.

The most obvious option would seem to be to create a zip file that I can host for download on my website. Would that be as simple as creating 24 directories with the files in them, cataloging each directory, and zipping the whole lot? Would that allow searching all 24 at once as well as each one individually? Or do I also have to create a catalog of all of them?

Also, how do the embedded indexes and catalogs actually work? I’ve just tried searching in File Explorer for words in the PDFs I’ve indexed, and get no results. Do you have to use Acrobat or Reader? But if you have those programs, do you need embedded indexes? Can’t they do word searches on files without indexes?

Finally, what would be involved in trying to set up a search function on my WordPress site that would allow visitors to search the full run of the newsletter? I doubt it will be worth the effort or expense involved; simply allowing people to download the files should be sufficient. But if it could be done cheaply and easily, I suppose I might consider it.

Thanks for your ideas.

*Yes, I know the proper plural of index is indices. I’m using the other version for the teeming masses that don’t.

Not 100% sure on this, but I don’t think the Adobe embedded index stuff is actually part of the broader PDF standard. It’s one of (many) proprietary extensions Adobe likes to add to their PDFs, but many of those can only be utilized by recent versions of Acrobat (Pro or Reader). That means they’re useless in most web contexts. Google scrapes PDFs its own way, for example, and macOS and Windows both have their own proprietary full-text search engines that most likely ignore Adobe’s embedded index. Again, not 100% sure, but I wouldn’t count on that index having any use at all without further proof…

OK, so anyway, as for how to make them searchable… some ideas (not an exhaustive list):

  1. Easy but expensive: If you want to guarantee searchability, you have to scrape and index the PDFs yourself, in a way that works with Wordpress. For example, the SearchWP plugin can do this, but it is not free ($99). There might be other, cheaper (maybe free?) plugins, but I haven’t looked too much.

  2. A little more complicated but free: Upload all your PDF files, make sure Google can index them (manually submit them if necessary and test them using the URL inspection tool). Then just pop up a Google custom search widget into your Wordpress.

  3. Very complicated and expensive: The “best practice” version of this situation is really to create alternative HTML versions of the PDFs, not only for easier searching but also long-term preservation and accessibility. But that is a complicated process if you need to preserve the original Publisher formatting to some degree (maybe you don’t? the HTML can be minimally or completely unformatted, since the PDFs are just a click away)

  4. Put it outside Wordpress altogether. If you can put this stuff into the public domain (or maybe just Creative Commons license it), maybe you can put it into the Internet Archive, which can handle the PDF parsing and searching for you (example). Makes your job easier, and gives them an interesting collection to preserve for posterity. There are also paid services like Issuu, but those typically cost a monthly fee.

I don’t think putting up a big ZIP file of them is going to do anything for searchability. It might make it easier for people to download the entire collection at once, but it’s not easily searchable.

And just to be explicit about this: As far as I know, each operating system does full-text file search in its own way. It should be automatic on newer Windows and Macs, but older versions may need special “ifilter” plugins. And you might have to make sure the PDF is in an indexed folder.

Edit: Oh, and you also have to make sure your generated PDFs still contain real text and didn’t accidentally get “rasterized” into pixels. If you can copy & paste text out of it, it’s likely fine (assuming you’re not on a Mac, which will transparently recognize text even inside rasterized PDFs).

Simplest is to just drag the folder into Documents (if it’s not already there), wait a few hours (indexing usually happens in the background while the computer is idle) and see if works after that. If not, you might have to Google for specific instructions for your particular OS and version.

However, note that this has no bearing on whether Wordpress and/or Google can index those same files. On the web, search indexes live in a database alongside the website itself, not inside the individual files being searched. To generate that index you need special software (like the Wordpress plugin). There’s free tools (for coders) that can do the same thing, but if you need to hire someone to set that up for you, it’s probably going to cost more than $99.

Thank you, thank you, thank you! This is an excellent idea I that didn’t even know was possible. This is precisely what I want, because 1) as I mentioned, I was already planning to give them a CC license, and B) I want them to be easily and freely available in as useful a form as possible, for as long as possible, and iii) I don’t want the expense and hassle of maintaining my website in perpetuity.

No, the idea of the zip file was just to make it easier than downloading 216 separate files. And I imagined that Adobe’s index function would allow someone to do their own searching on the files once they were on a local drive. But apparently not, really:

Great. Thanks, Adobe. I that may explain this note I found on a page linked to the one I included above:

With December 2018 release of Acrobat and Acrobat Reader, the embedded index in the PDF is no longer used for searching. If you still want to enable the index for searching, see How to enable the embedded index in a PDF for searching.

That link takes you to a page that says nothing about indexing.

Once I’m finished processing all the files (it’s taking longer than I thought), I’ll post them on my website to let people download them directly, then create an Internet Archive Collection as described on the page you linked.

Thanks again and again, @Reply.

You’re welcome!

Sorry, I’m bad at noticing & remembering usernames, and I didn’t realize (at first) that you were the same person involved in the preservation project, or I’d have mentioned the Internet Archive sooner. It’s too easy to get sucked into looking at some microscopic technical issue and forget the bigger picture :slight_smile:

But anyway, yeah, the Internet Archive is a fantastic resource! When I worked for a museum, the question of “how to preserve our digital collections” was always a toughie, for the exact same reasons you find it difficult. The Internet Archive always seemed like the best steward for such a thing, more than the museum itself even.

They went through some legal issues recently (during COVID, they scanned and redistributed a bunch of books and got in trouble with the publishers), which sucked significant time and resources away from their normal, less controversial archiving functions. Hopefully they’ve dealt with that by now and are able to survive into the distant future…

It’s crazy to me that basically the entire world’s digital heritage is kept alive only by this one small nonprofit. Nobody else is even attempting something on a similar scale, as far as I know. During that lawsuit, the Way Back Machine – the internet’s collective memory of the 90s and 2000s – briefly went offline, and people feared those sites were lost to time forever. Thankfully it came back after a bit.