Looking for AWK experts...

I’ve banged my head on this long enough.

I have a text file in the format:

col1 col2 col3 col4 col5 Then there’s a big ol’ string at the end

and I want to get it in the form:

col1,col2,col3,col4,col5,“Then there’s a big ol’ string at the end”

Each of the first columns can be variable length, as can the string at the end. Fine, I think. AWK knows how to parse columns. Which it does, just fine. I can say AWK ‘{ print $1 “,” $2 “,” $3 “,” $4 “,” $5 “,” }’ and it does just what I need.

However, that last string at the end is giving me somewhat of a problem. As it’s variable length, I can’t just dumpe the fields. Plus, I’d like to preserve any spaces between the words. Anyone have any suggestions?

Seems more suited to sed. You could do it with one edit command and ( marked fields, but, if I understand what you want, it seems simpler just to replace the space separators five times, using the last one to also insert the double quote, then inserting the quote on the end:

sed -e “s/^ *//” -e “s/ */,/” -e “s/ */,/” -e “s/ */,/” -e “s/ */,/” -e “s/ */,”/" -e “s/$/”/"

I also stuck in a command to remove any leading spaces. Note the TWO spaces in the rest of the searches - otherwise you match 0 spaces. Some regular expression libraries support “+” to mean “1 or more occurrences”.

It’s been a while since I’ve had to use AWK, but I think the following should work:

bigstring = “”
for i = 6 to NF
bigstring = bigstring " " $i
next i

print $1 “,” $2 “,” $3 “,” $4 “,” $5 “,” bigstring

Hope this helps

Darn. I forgot to escape it to preserve the spaces. Let’s try that again:


sed -e "s/^ *//" -e "s/  */,/" -e "s/  */,/" -e "s/  */,/" -e "s/  */,/" -e "s/  */,\"/" -e "s/$/\"/"

Smartass answer: Use perl.

Somewhat more useful answer:
You can manipulate fields by assigning them to be something, for example
{$1 = “”; print} prints the entire line except the first field. However, this doesn’t get rid of your spaces.

So, you want to use substr() or something. Here is an awk command that worked on your test input:

Input file:


col1 col2 col3 col4 col5 excess text goes here
column1 column2 column3 column4 column5 more excess text

Command & output:


D:	emp> awk '{printf("%s,%s,%s,%s,%s,",$1,$2,$3,$4,$5);$1="";$2="";$3="";$4="";$5="";$foo = substr($0, 6);printf("\"%s\"
",$foo)}' in.txt
col1,col2,col3,col4,col5,"excess text goes here"
column1,column2,column3,column4,column5,"more excess text"

The ‘6’ in the substr was a surprise to me - I expected ‘5’, but this worked for me.
That being said, I’d still rather use perl.
Perl code:


while (<>) {
    @fields = split(/\s/);
    for ($i = 0; $i <= 4; $i++) {
	print (shift(@fields) . ",");
    }
    print '"' . join(" ", @fields) . '"' . "
";
}

That could pretty easily be made into a one-liner, but I’m too lazy and emacs is too easy to use.

Good luck!

I love you guys!

yabob and douglips, both your answers work - almost. The only remaining issue is that some of that excess text at the end has leading spaces that I need to preserve, and both your scripts take it off.


col1 col2 col3 col4 col5 Some excess text
col1 col2 col3 col4 col5      Some indented excess text
col1 col2 col3 col4 col5          Even more indented excess text

I’ll play with your scripts and see if I can figure it out. Otherwise, if you figure it out first, couldya post it here?

Once again, thanks - this is great!

Doug confused.

Wow, that’s weird. Anyway, I’m not going to continue wading through the foggy mists of my awk memories, here is a tightened up version of my perl code that I think does what you want.

Input file:


col1 col2 col3 col4 col5   excess text goes here
column1 column2 column3 column4 column5      more excess text
column1 column2 column3 column4 column5            more indented excess text

Perl command:


perl -e "while(<>){$i=0;chomp;@a=split(/ /);foreach$f(@a){print$f.($i<$#a?($i<5?',':' '):'').($i==4?'\"':'');$i++;}print'\"'.\"
\";}" in.txt

Output:


col1,col2,col3,col4,col5,"  excess text goes here"
column1,column2,column3,column4,column5,"     more excess text"
column1,column2,column3,column4,column5,"           more indented excess text"

NOTE: You may not need all the escapes () I use, I’m on Windows which has a braindead shell. You could just write a perl-script based on the one I’ve posted above (some changes need to be made for leading space preservation), or you could use some kind of sed thing.

Good luck!

Thanks, douglips. I know it’s weird, but I gotta do it.

Now I just gotta go locate a #!@! copy of perl…!

For the sed variant, if you want to preserve the spaces in the trailing text, just take the “*” off the search string in the next-to-last substitution (so you just substitute one space with ," rather than any number of spaces).

I never used awk much. My feeling has always been that when the task reaches the level of complexity that I need real programming logic rather than just mucking around with utilities like sed, cut and grep, I’m better off simply writing a program in a decent language. My personal oddity is that I’ve done a lot of little file-syntax processing tools using lex - it’s intended for building lexical analyzers, but I’ve always found it to be a very convenient way to express a large class of nasty little file massagers.

Just for kicks, here’s a one-statement perl version:


perl -pe 's/\s*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)(.*)/$1, $2, $3, $4, $5, "$6"/'