Tagging text strings in MS Word

Never mind the why and wherefore, here is a quick rundown of what I’m trying to do. Any suggestions that might point me in the right direction would be greatly appreciated.

I’ve got a large block of text in a Word document that I am about to import into a database. Certain segments of the text should be boldfaced, italicized, etc., and are correctly formatted when the process begins. Since the result will be displayed as a webpage, I can insert HTML coding, which is what I normally do. But it’s a tedious process, pasting formatting codes at the beginning and end of each little segment. Is there a way to have Word do this for me automatically, without inserting all of the other crap that normally comes up when Word “webifies” the whole document?

Just as an example, here’s the method I’m currently using.

  1. Use search and replace to identify all italicized text in the document and color it red for easier identification.
  2. Go through the document manually and type an open bracket at the beginning of each italicized segment and a closed bracket at the end of each. There may be hundreds of these.
  3. Use search and replace to swap each open bracket for “<i>” and each closed bracket with “</i>”.
  4. Copy and paste text into database form.

Suggestions?

What version of Word?

Pending that, here’s one way to do it in Word 2003, which should be pretty easily macro-able for other markups:

Use search and replace. In the “Find what:” box, type ^? for any character, then set the format to whatever you’re doing on this iteration (italic, bold, underline, what have you). In the “Replace with:” box, type <x>^&</x> to replace the text with the same text with the appropriate HTML tags around it (replace the xs, obviously with the appropriate HTML tag).

So, where you had this, you’ll now have <b>t</b><b>h</b><b>i</b><b>s</b>

Then do search and replace again searching for any instances of </x><x> and replacing it with nothing. This will clear out the intervening tags and leave you with <b>this</b>.

I do not rule out the possibility of their being simpler ways.

ETA: Should work the same in 2007. :slight_smile:

<deleted, I see now why what I wrote was not going to work>

Under File menu, Save as Web Page. That will create a .mht file. Now go at that with a plain text editor. The file will start with a couple hundred lines of junk, but look beyond that and you should be able to find the paragraphs that you want, with the appropriate tags already around them. Still a bit of cut-and-paste, but at least the tags are there.

Damn, that’s brilliant. I think it may be exactly what I was looking for. I’ll test it out first thing tomorrow.

Happy to help. :slight_smile:

I haven’t tried it but I think if you search on * instead of ? it will make the change to the whole block of characters instead of one at a time, eliminating the second step.

I get “^* is not a valid special character for the Find What box.”

Oh, you have to check “Use wildcards”. When you use wildcards you don’t use the caret. And I determined that it won’t work (apparently * is “generous” instead of “greedy”), but if you use

<*>

as the search string it will tag each word. But I haven’t figured out how to make it tag the whole string à la regular expressions.

Most pre-built HTML editors for Web sites (like one you’d use in a CMS app) have a nice “clean up Word” feature in them.

This one from Telerik is the best I’ve seen lately.

You can use their demo to strip out Word formatting if you want.

Go to the demo in the link and delete the existing content in the text area. Ctrl-V to paste your Word content in there. A JS confirm will pop up asking if you want to clean up Word formatting, and just click OK. Go to the HTML button underneath the text area to see your cleaned up HTML.

KneadToKnow, you are a genius. It worked like a charm. I’d upload you a beer, but my USB-powered liquid intake device is on the blink. I hope you will be content to bask in the glow of my admiration, because I cannot tell you how much time your suggestion is going to save me.

:cool:

Like I tell people at work, the secret to having a good answer at the ready is to have been asked that question once before. :slight_smile:

Glad it’s a workable solution for you, Kizarvexius.