Help writing a linux script... again

drewtwo99 · May 19, 2012, 12:07am

Hello again, straight dope. You may remember me from last time, asking for help on writing a linux script, and now I humbly come to you for help once again. First let me describe what my task is, and then I’ll put up my ideas in pseudo-code for how I think it could be done.

So we have these things called disklists, and there are a lot of them, they are just simple text files. An example disklist would be “disklist215001702” where the 21500 is a variable called r_line, and the 1702 is a variable called r_stat. These disk lists advance by twos (they are all even). So the previous file would be disklist215001700, disklist215001698, etc.

Now, what we end up having to do with these disklists is to cat ranges of them into a single file. Example: All the disklists between & including 215001224 and 215001262 get cat’d together into a single file called disklist215001224-215001262. There is similarly a disklist215001264-215001302. So on and so on until the last file, disklist215001664-disklist215001702 (12 total disklist-range files for every r_line).

The r_stat ranges/values are the same for every r_line, so here’s what I need the script to do. I need the user to be able to pass the parameter r_line to the script, and then the script goes through and cats together all the disklists in the current directory of that given r_line and creates the given ranged-disklist files.

I’m pretty sure I know the theory behind how to code this, but I’m not so great with the linux script syntax and the cat command and was hoping you could provide me with some pointers. Here is my idea.

r_line = user input

r_stat = 1224 [this is always the first r_stat value and will increase by 2 until reaching a maximum value of 1702]

r_stat_first = 1224 [this variable will be used as the first r_stat value in the output file name, starting at 1224 and advancing by 40 each time until 1664]

r_stat_last = 1262 [this variable will be used as the last r_stat value in the output file name, starting at 1262 and advancing by 40 each time until 1702]

r_count = 1 [This variable will count the number of r_stats added to a given disklist-range file, so that after 20 have been added, it will quit and start a new file]

t_count = 1 [This variable will count the total number of disklist files output, so when it gets to 12 total it will quit the loop. There are always 12 disklist-range files created]

for(t_count <= 12, t_count = t_count + 1)
for(r_count <=20, r_count = r_count + 1)
[INDENT]cat {“disklist”+string(r_line)+string(r_stat)} > {“disklist”+ string(r_line) + string(r_stat_first) + “-” + string(r_line) + string(r_stat_last)}[/INDENT]
[INDENT]r_stat = r_stat + 2[/INDENT]
r_stat_first = r_stat_first + 40
r_stat_last = r_stat_last + 40

I’m worried that the cat stuff I have up there is going to be overwriting the same output file everytime, and I’m not sure how to get around it. So if you could analyze that above piece of pseudo-code and let me know if you think it would work for what I need, that’d be great. And then if you could help me translate it into linux scripting language, it’d be doubly great! I’ve done my best to be explicit as possible but let me just re-iterate in a simplified what I need done, using 21500 as an example r_line:

cat disklist215001224 disklist215001226 … disklist215001262 > disklist215001224-215001262
cat disklist215001264 disklist215001266 … disklist215001302 > disklist215001264-215001302
cat disklist215001304 … disklist215001342> disklist215001304-215001342
cat disklist215001344 … disklist215001382> disklist215001344-215001382
cat … > disklist215001384-215001422
cat … > disklist215001424-215001462
cat … > disklist215001464-215001502
cat … > disklist215001504-215001542
cat … > disklist215001544-215001582
cat … > disklist215001584-215001622
cat … > disklist215001624-215001662
cat … > disklist215001664-215001702

Jragon · May 19, 2012, 12:19am

By Linux script do you mean shell script? If so, what shell (bash, cshell/csh, korn shell/ksh, tcsh etc)? (Granted, it shouldn’t really matter for such a simple script).

Either way, to solve your overwriting problem instead of

cat x > file

Use

cat x >> file

“>” Means “output to”, which overwrites any current data in the file

“>>” means “append to”, which adds the output to the end of the file.

There are a few notes on your syntax in general, for instance ${variable_name} is the usual syntax to access a variable, but that depends on whether “linux script” means “shell script” or something else I’m not familiar with.

Also, if I were you, I’d begin the file with a check to see if the desired output file already exists, and if it does, either removing it or otherwise handling it (i.e. making a backup by moving it to <filename>.old) in some way.

drewtwo99 · May 19, 2012, 12:26am

Let’s go with bash script. Think that’s what I need. Thanks for the tip on using the >> instead of >. If I use >> and the file doesn’t yet exist, will it create it like a > will?

Jragon · May 19, 2012, 12:40am

Yeah, “>>” will create a file if none exists. The reason I recommended handling an already existing file (using

if [ -e “<filename>” ]; then

[…]

fi )

is because if the file already exists, it will append it to the end of the pre-existing file, which could get messy or confusing.

So here’s some basic things:

r_line=$1

Will get you the first command line argument. You may want to check that it’s the correct format, but if you leave that to your user, whatever.

It’s been a while since I’ve done bash extensively, but I think

r_line = $1 will give you an error, you NEED to have no spaces (yes, it IS annoying and terrible).

So by extension

r_stat=1224

<etc>

for t_count in {1…12}

Is the syntax for “t_count from 1 to 12 inclusive”

cat “disklist${r_line}{r_stat}" >> "disklist{r_line}{r_stat_first}-{rline}${r_stat_last}”

SHOULD be the syntax you want for the file output, though I haven’t tested it ${variable} accesses the value of a variable ($variable works too, but bash will get confused in larger expressions sometimes).

For incrementing:

r_stat_first=${r_stat_first}+40
I haven’t tested any of the lines, but I’m pretty confident they should work. That should give you enough info to get started, if you post your finished script I can look at it and help give it a sanity check.

ETA: Also, for for loops

for […] end

In other words, you have to put “end” to tell bash that your for loop is finished.

drewtwo99 · May 19, 2012, 1:17am

Jragon! You’re the bomb! It’s working perfectly Thanks so so so so so so much! It took a few tries and I had a few kinks to work out but it’s working now.

So a few questions. I didn’t quite understand how to implement the file check to see if it exists before appending, or what to do if it already exists. Could someone more explicitly explain/describe how to check to see if file exists and what I should tell the script to do if it doesn’t?

I don’t quite understand the whole if [-e <filename>] syntax or theory. Thanks again for everything, you have no idea how much time and headache this will save!

drewtwo99 · May 19, 2012, 1:30am

Probably going to miss the edit window, but I have another question.

After all the disklist-range files have been created, we do the following command:

ls disklist21500???-* > 21500_replica_import.txt

Which creates a text file containing a list of all the disklist-range files. Is it possible to create that import in the script as well? I just tried it, and unfortunately it’s literally searching for a file with the ??? and the * in the name, and not using those as special wildcard characters as they should be. Is there a way to denote in a bash script that a character is a special character, and to not interpret it literally?

Finally, one last thing…

With that import.txt file, we go in and do the following in vi. Is there a way to automate this inside a script once the file has been created? (can scripts edit text files other than just appending them)?

delete the first 8 characters from each line.
copy/buffer the now first 9 characters from each line
add a space to the end of each line
paste the copied characters to the end of each line.

Can this be done in a bash script? We usually set up a macro and do it in vi, which isn’t terribly slow, but if we can automate it with a script that would be great.

Jragon · May 19, 2012, 1:35am

Essentially since bash scripts rely on bash commands with a little bit of added syntactic sugar to allow for if (while, for) statements, they added a few more bits of sugar to make it easier.

The idea is that there’s a bash command called “test” (for more information, look up its man page), but to bring the syntax in line with more traditional languages like C, there’s an alias for test which is putting things between brackets

The if statement in bash, I believe, fails if it gets a zero (false) and succeeds for any other value (true). Test returns “1” if the results of its arguments and options succeed, and “0” if they fail. “-e” is the argument for “exists”

The other thing is that a semi-colon can be used instead of a newline to separate statements

So


if `test -e &lt;filename&gt;` ;  then

<code>

fi


if [ -e <filename> ]; then

<code>

fi


if [ -e <filename> ]
then

<code>

fi

and


if [ -e filename ] ; then <code> ; fi

Are all functionally equivalent

For checking if a file exists, I’d recommend


if [ -e "disklist${r_line}${r_stat_first}-${rline}${r_stat_last}" ] ; then

 mv "disklist${r_line}${r_stat_first}-${rline}${r_stat_last}" "disklist${r_line}${r_stat_first}-${rline}${r_stat_last}.old"

fi

Which will back up the old file. I’d do this both before the first for loop, and again every time you increment r_stat_first and r_stat last.

Alternatively, if you don’t care about backing it up and consider the old files garbage, you can use


if [ -e "disklist${r_line}${r_stat_first}-${rline}${r_stat_last}" ] ; then

 rm "disklist${r_line}${r_stat_first}-${rline}${r_stat_last}"

fi

Which will delete it.

One note: it’s VERY important you have a space between “[” and “-e” and a space between the end of your expression and “]” or it won’t work.

Note that the test syntax ( “[ -<option> <arguments> ]”) works for while loops too.

Jragon · May 19, 2012, 1:50am

drewtwo99:

Probably going to miss the edit window, but I have another question.

After all the disklist-range files have been created, we do the following command:

ls disklist21500???-* > 21500_replica_import.txt

Which creates a text file containing a list of all the disklist-range files. Is it possible to create that import in the script as well? I just tried it, and unfortunately it’s literally searching for a file with the ??? and the * in the name, and not using those as special wildcard characters as they should be. Is there a way to denote in a bash script that a character is a special character, and to not interpret it literally?

Finally, one last thing…

With that import.txt file, we go in and do the following in vi. Is there a way to automate this inside a script once the file has been created? (can scripts edit text files other than just appending them)?

delete the first 8 characters from each line.
copy/buffer the now first 9 characters from each line
add a space to the end of each line
paste the copied characters to the end of each line.

Can this be done in a bash script? We usually set up a macro and do it in vi, which isn’t terribly slow, but if we can automate it with a script that would be great.

Try

ls “disklist21500???-*” > 21500_replica_import.txt

Though I’m not 100% certain that will work, bash can be finicky with wildcard characters and I never quite figured out all the rules.

As for the second half of your post, it’s definitely possible, I’d do this


for filename in `ls "disklist21500????-*"`
  echo ${filename:8} >> 21500_replica_import.txt
  echo >> 21500_replica_import.txt
  echo ${filename:8:9} >> 21500_replica_import.txt
end

If that’s too many newlines, delete the second echo line

Here’s the theory:

The grave () will use the output of a command as an argument in this case, it will take each file listed by ls` that matches the given pattern as an argument.

Bash has a bunch of pattern matching features which, while useful, are a bit odd.

Let’s say you have a variable called string which contains a string.

${string:<number>}

Will give you every element of the string starting at the integer number

${string:<number>:<length>}

Will give you length number of characters from string starting at the index number.

So

{filename:8} Gives you every character after the 8th, {filename:8:9} Gives you the first 9 characters after the 8th

Jragon · May 19, 2012, 1:56am

Nevermind, quotes definitely won’t work, it should work fine with disklist21500???-*

I’m not sure what’s wrong there.

drewtwo99 · May 19, 2012, 2:48am

Jragon! You have struck gold once again. Thank you so much. It’s almost all working perfectly. I figured out the syntax needed in order to get those wildcards working…

for filename in ls "disklist${r_line}"????"-"* worked perfectly. I just had to isolate all the strings from the ??? and the * by putting them in quotes but leaving the ??? and * outside of any quotes.

Your test to see if the file already exists also works wonders, tested it out, got a bunch of .old files. So yay!!

The last part that doesn’t seem to be working quite correctly is the creation of the replica_import.txt file.

It creates it, almost as expected but instead of this:

215001224-215001262 215001224
215001264-215001302 215001264
…
…

I end up getting this:

215001224-215001262
215001224
215001264-215001302
215001264
…
…

I understand why this is happening, but I don’t know how to fix it. Is there a way to change it slightly so that instead of appending the echo ${filename:8:9} to a new line in the file, it adds a space to the end of the line and then appends it to the end of the line after the space?
Or could we add a bit of code that goes back and deletes the “new line” character on each line?

drewtwo99 · May 19, 2012, 2:54am

Never mind! I got it figured out Thanks again good sir! I owe you a beer. If there’s any way I can repay you for your help please let me know. You’re the best!!!