How do I kill MS-HTML crap?

I’ve got an Excel spreadsheet I want to convert to a table in an HTML page. But all the extra crap that Microsoft products throw in there is disrupting the page layout.

Anyone know of any way to convert MS-HTML to just regular HTML?

I’m sure someone will come along with better ideas, but what I do is use an older version of the MS products. Word 97 converts a DOC file to a relatively simple HTML file which I can then clean up some more with a text editor.

However, are you viewing the HTML output with a fairly new browser? MS 2000’s HTML files don’t work well with older ones. It’s a compromise either way.

      • You could try using StarOffice to open the spreadsheet and then convert it to a web page from there. I haven’t tried it myself, but it might help.
  • Or there’s Notepad. Much of the web work I do at my internship concerns MS Office documents that will display perfectly fine saved as a “web page” and then inserted into FrontPage, but that won’t actually display when viewed with real browsers-- even IE. And no, I don’t understand the point of it either.
    ~

I ran into the same problem when wanting to generate an HTML table in Excel. I didn’t like the big HTML files that Excel produced for just a simple table that I wanted to produce. What I do now is I write the HTML tags into their own cells, followed by the data to be displayed in an adjacent cell. For example, my spreadsheet may look something like this:



  A        B              C    D
1 <table>
2 <tr><td> <b>Date</b>    <td> <b>Amount</b>
3 <tr><td> =Data!A1       <td> =Data!B1
4 <tr><td> =Data!A2       <td> =Data!B2
5 <tr><td> =Data!A3       <td> =Data!B3
6 </table>


I then write this to a .prn file and then rename it with an HTML extension. I used to have to go back through this file and strip out the extra spaces that were generated, but a fellow Doper was kind enough to write a little program for me to take care of this whole process automatically. It may seem like a PITA and the long way around to do it, but the HTML files are nice and small without all that extraneous code.

I bet there’s a PC printer driver that “prints” any document in any program to HTML. There is for the Mac. That would bypass Microsoftisms.

If anybody in your office uses Dreamweaver, it’s got a nifty “Clean up Word HTML” feature. I think it would work for Excel as well.

<slight hijack>
We had a similar problem exporting listings of our environmental (NEPA) documents from Access databases for posting to our external website. Some offices were generating a Report, then exporting to HTML. To try and maintain the formatting, Access would insert tons of junk code and these simple tables of maybe 250 rows or so would come out as 400 - 750 Kb HTML documents! By getting everybody to switch to a Query rather than a Report, then cutting the resulting HTML table into a template, I was able to get them down to 40 - 50 Kb or so.
</sh>