What do you call a program that will eliminate unnecessary html code?

Say I have the html code:
X<big></big>Y
The code between the X and Y is unnecessary. This might as well just be:
XY
But what type of program will get rid of the “<big></big>”? I’ve been googling words like compressor, minimizer, reducer, etc., but when I enter such code in free programs online, it doesn’t do anything. It still has the “<big></big>”.
Does anyone know of a free or cheap program that will do this? Or even an expensive program, I don’t even know where to start…
Thanks for any help!! :slight_smile:

Try ‘optimizer’. It’s generally not an important issue. I see lots of generated HTML with empty tags like that. Why do you need to do this?

Thanks, I tried “optimizer” and it brought up a few more results I didn’t find before, but they’re still doing the same thing (nothing). I’m really baffled here, this should be simple and obvious. And I can’t possibly pay for a program just to try it to see if it does what I want.

I’m not quite sure I get your question. It’s a universal principle across all of computer science and programming that you should have neat, organized, efficient code that’s easy to read, view, maintain, edit, load, run, etc. Html code that’s twice as long will therefore take twice the time to download and parse, will occupy twice as much viewing space when editing, will take twice as much mental effort to read through, etc., etc.

With html this is all especially relevant when you’re using WYSIWYG (What You See is What You Get) editors. These can make your code 80 times as long (80 times harder to read, download, parse, etc) if your WYSIWYG editor doesn’t know exactly what it’s doing. You could hit ctrl-B 100 times and a stupid program may insert <b></b> 50 times, etc., etc…

I found a couple using:

“software to remove excess html tags”

which lead me to googling :

“html cleaner” (as many used that term)

The second really didn’t find more than the first search.

Note, I’ve never used any of the software listed in the results, but HTML Tidy is on www.w3.org which is a fairly reputable site as things go.

Maybe “parser” is the search term you were looking for? Here’s a list you may find useful, and it includes the aforementioned HTML Tidy.

An empty tag is not always removable. For example, you couldn’t remove “<br></br>”.

But if you had a list of tags that you knew were safe to remove, you could just use a simple string substitution utility. On linux, you could do something like



sed 's:<big></big>::g' < file.html > newfile.html


–Mark

Try using regular expressions to parse HTML.

It’s a universal principle across the modern software *business *that fast and crappy stuff auto-generated by WYSIWYG editors at low cost is better = cheaper than tight code crafted by hand. Machine cycles are pennies the billion. Anal retentive programmers are multiple dollars per line of delivered code. Always waste machine time to save people time.

Pisses me right off too. But it *is *the way the business works.

If you can incorporate something like HTML Tidy into your build automation it *might *save you people time somewhere in the lifecycle. Until it removes something mistakenly. Like a tidbit of html that’s really quoted text in a JavaScript function. Oops. Or some empty tag that’s necessary to overcome a quirk in some obscure browser version you’ve never heard of but which took the dev who fixed it a hundred hours of testing to find.

Are you SURE those tags aren’t being used at some point? Maybe they’re useless; maybe they’re being referenced and/or generated by some javascript. I wouldn’t muck with it unless you are super duper sure they really are useless. I certainly wouldn’t run html through some automated process that removes stuff, unless (once again) you are absolutely sure no code is expecting them to be there.