I have two different problems with HTML generating software:
First Problem
I scan many documents, often into HTML. I use or have used the following OCR software:
• The HP OCR software that came with my HP C3180 All-In-One (poor)
• Nuance/ScanSoft OmniPage Pro 16 (buggy for me)
• Abbyy FineReader 9 (trial only until I decide if it’s worth $400)
• Microsoft Office 2007 Office Document Scanning
But they all seem to have the same general problems when creating HTML (and other formats, such as PDF, Word, etc): They try too hard!
• If the OCR thinks (mistakenly) that a few letters or words are in a different typeface/font – even though they’re not – it’ll go ahead and choose a different font for them in the output!
• If the OCR thinks (mistakenly) that a few letters or words are in a different size – even though they’re not – it’ll go ahead and choose a different size for them in the output!
• If the OCR thinks (mistakenly) that a few letters or words are in a different style (italic / bold / superscript, etc.) – even though they’re not – it’ll go ahead and choose the wrong style for them in the output!
So you’ll get things like this:
You get the idea.
Second Problem
The two HTML editors I own (DreamWeaver 8, Namo Web Editor 6) often generate over-complicated, over-cryptic, and/or highly redundant code. DreamWeaver has a built-in HTML cleaner/simplifier (Clean Up XHTML / Clean Up Word HTML), but to put it politely, it’s very poor at it’s job. For example, much of what I enter will be in no more than 2-3 fonts, no more than 2-3 sizes and styles, but if you look at the source, it’ll have 20 or more CSS varieties and so forth, many of them redundant (e.g., <i>this </i><i>is </i><i>dumb.</i>. This gets particularly bad if your input was OCR.
Question
Does anyone know of any effective software for Windows XP that can simplify the HTML output? I’d like something that would do things such as the following:
[ul]
[li]Change all fonts within a selection to a single specified font[/li][li]Change all font sizes within a selection to a single specified size[/li][li]Change all styles (bold, italic, etc) in a selection to a single specified style/no style[/li][li]Change all colors in a selection to the same color[/li][/ul]
Afterwards, eliminate as much HTML/CSS/XHTML redundancy as possible.
You’d think something like this would be simple and reasonably straightforward, but oh, no! DreamWeaver in particular ignores 80% of my attempts to do this because it’s too damned smart and thinks to itself “Surely the user didn’t want to do that! I’ll just do whatever the hell I please.” Namo’s not much better, and Word isn’t either. They’re all too damned “smart” to give me the total control I want!
Any help, please?
(Sorry for the verbosity, but I hope that by posting all this detail I won’t have to do much clarification).
One last thing: Please don’t suggest that I do all my own coding by hand, okay? I’d bow down to your magnificent hand-coding skill, but my back’s out, and I just can’t do it.
Thanks!