Need help writing another bash script

Hello everyone, I need help with another script.

So, I have a bunch of files called trace_checker_b00, trace_checker_b01, etc…

These files look something like this:



rec_id               comptype               seqno             id_check
11111111                  1               872131                0.00
11111111                  2               872131                0.00
11111111                  3               872131                0.00
11111111                  4               872131                0.00
22222222                  1               862315                0.00
22222222                  2               862315                0.00
22222222                  3               862315                0.00
22222222                  4               862315                0.00
33333333                  1               872131                0.00
33333333                  2               872131                0.00
33333333                  3               852999                0.00
33333333                  4               840001                0.00
....


and so on.

I’m trying to write a script that will go through and check two things.

  1. id_check is always equal to 0.00. If it’s not, I want the script to spit out the corresponding rec_id that does not have id_check = 0;

  2. I want it to go through each rec_id and make sure that the seqno values are all the same for each comptype per rec_id. So in the example given above, rec_id 11111111 and 22222222 are fine because the seqno values are all the same for each comptype, but rec_id 33333333 does not have the same seqno values for each comptype. If all 4 comptypes do not have the same value for seqno, I want the script to spit out the rec_id value with mismatching seqno values.
    Now, a couple of things. The program that generates these text files does not tab delimit them. The fields are separated by spaces and there is no way to change this. I believe this makes a difference in how the fields can be read in.

I believe this could be handled simply with awk but I’m not very good with bash scripting period, let alone awk which always boggles my mind.

I know that basically I need to do a for loop that will loop through all files matching trace_checker*.

I know that with the awk command I have 4 fields, $1=rec_id, $2=comptype, $3=seqno, $4=id_check.

But I’m lost on exactly what to do with it all. Because predictably there are ALWAYS 4 comptypes, it makes it a little easier to do the checking, but I’m not sure of how to check that all the seqno values are the same for a given rec_id.

Any help is appreciated, and thank you in advance!

Does it HAVE to be a Bash script? This seems like the sort of thing Ruby or Perl were basically made for. Just parse a regexp for each line, make a small object based on a map with a counter for the reqid seqno thing.

Looks like I do have perl and ruby installed so those are options.

My language skills are a bit rusty, but the algorithm is simple enough

Before the loop:
Create a map or hash table (depending on language, Ruby has Hash for this purpose). I’m going to use the syntax Hash[key]value to denote the type you need:

Hash[string]Hash[string]int

That is, a hash that maps a string to a hash that maps a string to an int.

Disregard the first line of the file
Loop:
For each file, parse it according to the regexp

{\d+}\s+{\d}\s+{\d+}\s+{\d+.\d+}

The . is to escape the period. The brackets means to store what’s inside them in a variable (note: this sin’t necessarily the actual syntax, I’m using it as shorthand for “use whatever your language’s method for doing this is”). We’ll refer to the variables as your headings, for clarity.

Now, if id_check != “0.00”, print the rec_id and that the id_check is wrong.

Now increment the value HashTable[rec_id][seqno].
After the loop is finished, iterate through each rec_id in the table, and check the number of keys in that map. If it’s > 1, then you have a seqno conflict, print out the rec_id.

Pseudocode:



myMap = Hash[string]Hash[string]int
file = <read in file>

lines = file.Delimit('
')  // Get each line broken up by a newline character

for line in lines[1:] { // Get every line after the first (assuming 0-indexed)
  rec_id, comptype, seqno, id_check = line.Parse("{\d+}\s+{\d}\s+{\d+}\s+{\d+\.\d+}")

  if id_check != "0.00"
    print "Value of id_check for ", rec_id, "is not 0.00!"

  myMap[rec_id][seqno]++
end

for rec_id in myMap.keySet()
  if myMap[rec_id].keySet().Length() != 1
    print rec_id, "has inconsistent seqno values"
end


In awk:


awk '$4 != 0.00 && $4 != "id_check" { print $1 }' trace_checker_b*

This only achieves your first goal.

nm

I got it working guys! Thanks for the tips :slight_smile:

I ended up using perl, but went about it a slightly different (and possibly clumsier but easier for me to understand) way.

I’ve tested it pretty extensively, and it seems to be doing the job BEAUTIFULLY and INSTANTLY.