Looking for a simple text file utility

Can someone please point me to a utility that takes extra spaces out of a text file, or for my purposes, an HTML file? Here is my situation: I maintain a web site to which I add several new HTML files each week (survey responses). Each of these files is generated in Excel. I prefer to manually build the HTML code myself since these files are pretty simple, containing mostly table tags. If I export these tables to HTML directly from Excel the files are quite large and contain a lot of overhead that I don’t need. To get around this, I have created an Excel “template” file with HTML tags placed in certain cells. I then merge source data from other files in with this template. Next I save the worksheet as a .prn file (plain text with space delimiters). The resulting file works as an HTML file when renamed, but it contains a lot of extra spaces that are generated by having the original data laid out in columns. To eliminate these extra spaces I edit this file and find/replace all occurrences of two consecutive spaces with nothing, and I replace all occurrences of " <" with “<” and "> " with “>” to eliminate the extra spaces that precede and follow the HTML tags. This system has worked for me and I have done this enough times to have gotten pretty fast with the repetitive keystrokes that this task entails, but it does get tedious and time-consuming. I would like to use a simple text utility that I could execute from the command line and use it in a batch file, that would automatically perform these tasks. I have extensively searched download.com and Tucows for such a ulility, and I have even used Google in my search, but I haven’t had any success in locating anything like this.

http://wheel.compose.cs.cmu.edu:8001/cgi-bin/browse/objweb

By no means would anyone call it simple, but perl will do everything that you need and more. It’s not a text editor, though. I’m guessing that you have a Windows machine, so go to http://www.activeperl.com

The Mac has BBEdit, and even the free “Lite” version of it is reason enough to switch to Macintosh, in and of itself.

It sounds like you need to use a macro. That’s what I would do anyway.

The text editor I use is called Textpad; it is shareware. It has macro capability.

In a pinch, you may even be able to do it in Word.

Perl is as simple or as complex as you need it to be. For example:


print "Hello, world!
";

is good Perl. Not very useful, but the point is that Perl makes simple things easy and hard things possible. Printing a line to a screen and exiting is simple. Massaging a text file is also simple, at least with Perl.

If you have to do much at all with text files, especially if you want to automate common tasks with a single language that scales from the trivial to the monumental, learn Perl. It will make your life a lot easier, especially if you have to migrate from one platform to another (Perl is a high-level language and is very portable).

Good Perl books:
[ul]
[li]Learning Perl - An introduction that takes you from code like my example to fairly advanced programs. Does not cover object-oriented code, even though Perl supports it. (You can do most of what you want in Perl without OO, IMHO.)[/li][li]Programming Perl - The Camel Book. The Perl bible. More than you ever wanted to know, from the Perl interpreter’s innards to the nitty-gritty of how to write secrure object-oriented Perl programs. Every instruction explained individually, with copious amounts of example code.[/li][li]The Perl Cookbook - Example code by the buttload. Organized into topic areas with a ‘problem-solution-discussion’ layout that presents a problem, gives a solution, and discusses why the solution works and, sometimes, better ways to solve or prevent the problem.[/li][/ul]All of the above books are published by O’Reilly.


while (<>) { s/secrure/secure/; print; }

A simple program to replace all instances of ‘secrure’ with ‘secure’ in a given text file read in through standard input. Yep, it’s Perl.

Yep, I made a typo in my first post to this thread.

dwc, send me a copy of a file before you edit it, and after so I can see exactly which spaces you want left after it is done. I could probably write this for you in java in a few minutes.

My profile has my email in it.

This would just be a command line version used like:

java SpaceRemover file.txt or
java SpaceRemover *.txt

What would make more sense. If you can get the first file in comma delimited or something similar, it wouldn’t be too hard to write something in java that could take an html template, merge in the data and output the file correctly… Skip excel entirely.

Well, as long as we’re slinging code around, the OP’s problem could be solved in one swell foop by the following perl one-liner:



perl -i.bak -pe 's/  //g; s/ (<|>)/$1/g' *.htm 


Type that in at the command line and it’ll process all of your .htm files in the same subdirectory. It will even make backups of the original files.

[There’s seems to be an extraneous space inserted after the ‘>’. My code is correct as I wrote it, though.]

Good one!

Um…sorry. Carry on now.

A macro in Word could be an easy way to do it - you can cut and paste back and from Word if you are using a plain text editor.

I just made one using the Macro record feature and it seems to work:

Sub Macro2()

’ Macro2 Macro
’ Macro recorded 3/21/2002 by istara

Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = " "
.Replacement.Text = “”
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = " <"
.Replacement.Text = “<”
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "> "
.Replacement.Text = “>”
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Your best bet is to get someone to write a utility for you. I could create a standalone executable in C for you, I’d just need a before and after file to play with. It should only take about 10 minutes to make. Then you can put it in a batch file or run it manually to convert the files.

I love Perl, but I wouldn’t recommend it for this person. A very good text editor is all that’s needed. The program NoteTab ( http://www.notetab.com ) is my favorite text editor. It lets you do global search and replaces, even with regular expressions.

I use NoteTab for all my HTML editing. One especially nice feature is the “clipbook.” Using the HTML clipbook (by indenting the HTML tab at the bottom), I can highlight a word or paragraph, hit the Esc key and a little drop-down box appears where I’m typing. I start typing “italics” until I see that “Italics” appears in the box, then hit Enter. The text I highlighted is suddenly surrounded by <I> and </I>. All the HTML tags work this way. When you insert an image, the clipbook prompts you for the file name, and if it exists, the clipbook automatically looks at the file and puts HEIGHT and WIDTH attributes in the tag to correspond to that image.

For multiple files, it has tabs across the top that let you select which one you’re working with, similar to Opera’s tabbed interface. I think I paid $10 for the upgraded version of NoteTab, and it’s the best software investment I’ve made.

Thanks to everyone who has responded to this question. I will examine the options that I have been given and determine which one best suits my needs and works the most efficiently. I don’t have the time to learn Perl, but I know that being familiar with it would be useful for this and other purposes.