Linux Bash Script Question

Hi linux scripting gurus. I have a very simple script I need to make, but I just can’t figure out how to do it. Here’s the deal:

I need to know how I can make a script to take a file that looks something like this:

12345678
91234567
56789012
34751021

and turn it into a file that looks like this:

1 12345678 12345678 03
2 91234567 91234567 03
3 56789012 56789012 03
4 34751021 34751021 03

With a couple of stipulations. I need to be able to start the numbering at any arbitrary value, so it would be a variable I would pass the script when running it. I know how to handle variables, and so I know if my variable was $1, the numbering would just be something like

$1
$1+1
$1+2
$1+3

The other stipulation is that I don’t know how many lines are necessarily going to be in each file. So I need to get the line count first and probably store that as a variable and do a for loop. How do I get just a line count for a file to store as a variable? I’m hazy on the syntax on for loops in bash scripts though.

And perhaps most embarrassingly, I’m not sure of how to actually copy the text of each line and then append it so that I have two copies of the number like I need on each line.
And lastly, the 03 was just an example, but this would actually be a variable entered when running the script, so I need to tack on that at the end as well.

So, basically, here’s my pseudocode for what I think I need to do.

$1 = line_number
$2 = batch_number
linecount{filename} = line_count

for i=0; i<=line_count; i++
concatenate {line_number+i," ",line i of filename, " ", line i of filename, " ", batch_number} >> newfile
I hope that makes sense. If you need clarification on anything please let me know.

I would just an awk filter:

awk 'BEGINFILE { kk = 1} { print kk, $0, $0, “03”; kk +=1 } ’

Here I’ve just hardcoded the variables (1 and 03) .

There are other ways to do this if you don’t have awk for some bizarre reason, but you almost certainly will. Be sure you note the difference between comma and semicolon here.

It’s not helpful here because of the additional output required, but I would note that cat -n will prepend a line number to every line.

Thanks. This almost doing what I want… however, it’s not printing the line number on the first line (it starts printing kk on the 2nd line). And also, it doesn’t seem to matter what I’m putting in for kk=… it always seems to start at whatever number I put in for kk += #. I think the kk+=# is overriding whatever kk is getting set to at the beginning for some reason, because when I set it to kk+=2, it starts at 2 and increases by 2.

Oops. I think BEGINFILE is a “gawkism.”

Try BEGIN instead.

Ok, BEGIN works. When I try out that awk command in the terminal, with hardcoded numbers, it works beautifully.

Now, the trouble I’m having is trying to pass my variables into the awk command within my script. It doesn’t seem to want to use them. For example:

line_number=$2
batch_number=$3

awk ‘BEGIN{kk=$line_number}{print kk,$0,$0,$batch_number;kk+=1}’ filename

That’s what I tried coding in my script, but the awk command isn’t seeing those variables at all so it’s not printing things correctly. How do I accurately pass the variables from my script into the awk commands/parameters?

Nevermind. I figured out how to get the variables. It’s all working now except one minor point.

I need to be able to enter the batch numbers with a leading 0 sometimes, but when I pass the script a “03” it converts it to a 3 automatically. How do I make sure it accepts that as a string with the leading 0 in my script? I can’t just prepend a 0 no matter what is entered, because if the batch number is 12 for example, I don’t want a leading 0.

Try:

awk “BEGIN { kk=$2; batch_number=$3 } { print kk, $0, $0, batch_number; kk += 1 }”

Don’t forget the backslashes!

Thanks Derleth, I actually got it working a slightly different way… now I’m just trying to figure out how to make it keep and print a 03 if that’s what I enter, instead of converting it to a 3.

I don’t remember the exact syntax offhand, but you’ll want to use sprintf instead of print – which will allow you to specify a printing format. You should use the “%s” format for your printouts of $0.

It’s midnight here, so I’m going to sleep :slight_smile: If nobody has come up with the exact syntax by morning I’ll check the sprintf syntax when I get into the office and have access to a Unix machine. You can probably figure it out using google, though.

Yeah I can’t seem to make heads or tails out of the sprintf command syntax :\

So close yet so far away!

Ok I figured out a workaround. I’m all done with my script now guys! Thank you all for the help.

For this sort of task, a one-line awk script would have been my first idea too.

But it could also be done entirely with bash commands without using awk. I’ll leave it to OP to look up the details of how these commands work and their syntax. But to help OP further his already-estimable bash skills, here are some useful points:

  1. There is a read built-in command that reads one line from standard input. It can put the entire line into one shell variable, or it can put separate words into several shell variables.

  2. This read command, IIRC, also sets the last-command exit status as usual: 0 (for success), 1 (for failure, like when it hits end-of-file).

  3. Because read returns an exit status, it can be used to control a while loop. Thus, you can build a loop that reads successive lines from standard input, doing something with each line, and the loop will quit at end-of-file.

  4. You can group several consecutive shell commands in parentheses, which causes that group of commands to run kinda-sorta like it was a separate shell script. This group of commands, as a whole, can have its own standard input and standard output, separate from that of the rest of the script. This enables you to read a file (other than the overall script’s standard input) line by line, and write output to a different file (other than the overall script’s standard output).

  5. bash also has several kinds of syntax for doing arithmetic with shell variables that have numeric values. That enables you to do the arithmetic you need to increment the line counter variable.

  6. I’m not sure how you would force that other variable to be written as a two-digit number (with leading zero) even when its value is less than 10. But if you enter it as a parameter on the original command line where you start the script, and you enter it there with a leading zero, I think it should be treated as a string value rather than a numeric value. And since you are never doing any arithmetic with that number, I think the leading zero should never be lost, as long as you enter it with a leading zero in the first place.

These are all just given as suggestion for useful things for drewtwo99 to learn about, as they allow all sorts of useful stuff to be done in shell scripts. For the immediate problem, as shown in all the posts above, that one-line awk script is the simplest.

(I’ve been away from this kind of work for several years and thus have forgotten most of the nitty-gritty command syntax details, so I’ll leave that as an exercise for the OP.)

Thanks Senegoid. The print command does in fact only print out “4” even if the variable was stored as “04”. There is evidently a way to pad a number with 0’s using sprintf but I couldn’t figure out the syntax. Instead I just used an if then statement to check to see how big the value of the batch number was, and to print an extra 0 in front of it in the awk command if it was less than 10. It’s working nicely now that way :slight_smile:

Perl one-liner:


X=5 perl -lne 'printf "%02d $_ $_ 03
", $ENV{X}++' filename.txt


You can change the X= value to start the numbering at whatever number you choose. This will print the numbers with a minimum of 2 digits: 01, 02, 03, etc. If you would like 3, change the “%02d” to “%003d”.

Why did you want to get an initial line count for the file?

For drewtwo99 or anyone else who is working with programs or shell scripts, I have a trivially simple program I wrote that has been immensely helpful in working out arcane details about how command-line arguments are parsed and passed to the program.

Like the echo command, it simply prints all the arguments it got. But in a somewhat more informative format: It prints each argument on a separate line. The lines are numbered. And the argument itself in enclosed in <…> characters, which helps you to see if there are any leading or trailing blank spaces in the argument.

I call the program parg – It’s source file, parg.c, is trivially simple. I recommend that every programmer alive should write such a program. Or I could send my source to anyone who wants it.

Example usage and output:



$ parg What have we here
parg: Argument count: 5
   0: <parg>
   1: <What>
   2: <have>
   3: <we>
   4: <here>


You can play around, typing command with arguments having all kinds of combinations of shell variables, single quotes, double quotes, back-quotes, wild-cards, and any other shell syntax you want to investigate. The output will show you exactly what arguments the program actually sees.

It’s instructive to do this under Unix and Linux, and also in DOS command windows, and compare the results. In particular, DOS handles wild-cards on a command-line entirely differently than Unix and Linux shells do.

What the heck, I’ll just post the entire source code for parg.c here, since it’s so short. As I wrote above, I’ve found this simple program immensely helpful when I write shell scripts (both for Unix/Linux and DOS), in order to study exactly how command-line arguments are parsed and transmitted to a program. (This particular source code is dated June 5, 2010, but actually I’ve had some version of this around since I was a Unix sysadmin in the mid-1980’s.)



/*  parg.c          Print arguments (one per line, numbered)        */

/*  June 5, 2010          */

#include <stdio.h>

int main ( int argn, char* argc[] )
{
    int iarg ;

    printf("parg: Argument count: %d
", argn) ;
    for ( iarg = 0 ;  iarg < argn ;  iarg++ )  {
        printf("%4d: <%s>
", iarg, argc[iarg]) ;
    }
    printf("-----------------------------------------------------------
") ;
}



More thunks about printing numbers with leading zeros.

A character string that is a number with a leading zero will typically lose the leading zero if it gets converted from a string to a numeric value and then back to a string to be printed. So I guess that’s what’s happening with your number inside the awk script. This typically happens when you actually do some arithmetic with the number. You never did any arithmetic with that number, but I suppose awk must have done a string->number->string conversion for you anyway. You might have been able to prevent that by putting some kind of quotes around it.

Here’s a general plan I’ve used to print numbers with leading zeros. You may be able to do something equivalent in just about any language.

Say you have a number that could be anything from 0 to 9999, and you want to print it with exactly four digits (included leading zeros as needed). Simply add 10000 to the number, convert that to a character string, and then print out the right-most four characters of that string. Most languages have enough functionality that you can do all these steps one way or another.

Senegoid: Add a ‘return 0;’ to that and it’s golden. You can also declare iarg in the for loop preamble in the modern C standard (since C99) and as a nonstandard extension to the standard since a lot longer in compilers like gcc. That would look like ‘for (int iarg = 1; iarg < argc; iarg++)’ in this case. This helps document that the loop counter is limited to that loop and does not get reused, which is enforced by the scoping rules.