Linux Script Question

drewtwo99 · March 31, 2012, 1:54am

Hi guys, I need help from the Linux gurus, or any guru, really. I have a specialized task that I think could be accomplished with a script, but I’m not sure. Ok here it goes.

So here’s the situation. In a folder I’ve got a bunch of files that look like this…

filename_RECID1
filename_RECID2
filename_RECID3
etc.

The “filename” part is the same for every file. Only the RECID changes. These files are very short and they just consist of 2 lines of text. The first line are header titles, and the 2nd line is data.

I need to combine all of these into a single file just called “filename”, so what I need is something to go through, read each file, capture the entire 2nd line of each file, and append it to a new file named “filename”.

Anybody know of a way to do this? Thanks so much!

Senegoid · March 31, 2012, 2:14am

This should be trivially easy to do, but first a question:
Do you need the files to be read and combined in any particular order?

You could do it with 1 (one) simple Linux command . . .

Oops… Just re-read your OP. You want only the second line of the file?
I was going to write:

cat filename* > new_file_name

but that gets both lines of all files. To get just the second line, and assuming that each file really does have EXACTLY two lines, no more and no less, try this:

tail -1 filename* > new_file_name

This reads the last 1 line of each file and writes them all on one standard-output stream which goes to new_file_name – Be sure you choose a name for new_file_name that CANNOT be mistaken for one of the input files, or you may have trouble with this.

If tail -1 doesn’t work (I’m pretty sure it does, across all known implementations), you might need to use: tail -n 1 filename* instead.

ETA: I asked above, does the order matter, and forgot to discuss that. By using the wildcard:
filename*
you will get the files in the order that the shell decides for you, which will be alphabetical order. If you need to order them according to some other rule, then you will need some extra work to deal with that.
I see that even as I am typing this addendum, someone has come in with another way also.

SmartAlecCat · March 31, 2012, 2:14am

In bash, something like this should work:

Senegoid · March 31, 2012, 2:22am

Just did some research on that. My way was close, but you need to write:



tail -q -n 1 filename* > new_file_name

SmartAlecCat’s way looks just fine too, just a tad more typing to do.

Senegoid · March 31, 2012, 2:25am

Approx. how many files do you have there?

My one-line method works providing the expanded wild-card list (that is, all your file names listed on one line one after another with one blank space separating them) does not exceed some maximum length, possibly around 1000, or maybe much longer with modern shells.

If you have a bazillion files, however, you need SmartAlecCat’s solution.

drewtwo99 · March 31, 2012, 2:29am

The order of the files doesn’t matter, but if it does it alphabetical that’s great. I can’t wait to give this a try. I’ll come back with results in a bit! Gotta create all my header files first

drewtwo99 · March 31, 2012, 2:35am

There are only like 60-100 files to deal with each time I do this, so your way will be fine. I will try both just for fun.

But since RECID is changing for each file… how can I include that in the script?

friedo · March 31, 2012, 3:19am

That’s what the * in filename* does. It’s a wildcard.

drewtwo99 · March 31, 2012, 3:45am

Now if there were a way to get the headers in this file without having to copy/paste them manually (not a big deal but automation would be nice).

Is there a way to import the first 2 lines from the first file, and then only the 2nd line from all the others it finds?

drewtwo99 · March 31, 2012, 3:53am

Or, if you guys could even just give me a bit of code that would insert a string at the first line of a text file, that would work too.

Senegoid · March 31, 2012, 4:49am

That should be easy with ANY text editor, of which there are many from which to choose. If you are using Linux with Gnome, you should have gedit – that is probably among the easiest, most Windows-Notepad-like editors.

Let’s make sure I got your question right:
(a) Line #1 in every file is a header line.
(b) Line #1 in every file is identical to Line #1 in every other file?
(c) So you want to put just ONE copy of Line #1 from any one of these files at the beginning of your combined file?

Is this just a one-time job? If so, just use an editor. With gedit and any other GUI-style editor, you should be able to open your big combined file in one tab, open one of the source files in another tab, and just cut-and-paste a line.

Is this a job that you want to automate so you can do it over and over?

Senegoid · March 31, 2012, 9:05am

Okay, if you want the entire task automated, like if you are going to need to do something like this over and over, I think these two lines will do it:



head -n 1 `ls filename* | head -n 1` > output_file
tail -q -n 1 filename* >> output_file

Note carefully! Those two single-quote-like marks in the first line are left-single-quotes, also known as accent grave – the character on the same key as the tilde ~ on most keyboards.

ETA: Now, are you wanting to do this, say, every day, with a new set of data files? Will you want to use a different output file every day? If so, you might want to make it into a script with a command-line parameter to name the output file:



#!/bin/bash
head -n 1 `ls filename* | head -n 1` > $1
tail -q -n 1 filename* >> $1

Senegoid · March 31, 2012, 9:18am

Oh, I see from an earlier post…
You DID say you want to automate this completely, since you will do it repeatedly.
So use the method shown just above, and consider the script idea where you can give the output name as a command-line parameter.

To run it: Suppose you put the above script into a file named catdata (You might be able to cut-and-paste it directly from the above post!)
And suppose, to keep it simple, it’s in the current directory. (We Linux folks don’t call it a folder.)
Then, do this:
chmod 755 catdata (You only need to do this once after you create the file, to make it executable.)
To run it, type: ./catdata my_new_file
where my_new_file is the name of the new output file you’d like to create for the day’s run.

What happens to the original files after you’ve done this with them? You could add to the script to delete them. Or, you could move them to a different directory for safekeeping, leaving the current directory devoid of your source files, ready to begin collecting tomorrow’s new files.

Senegoid · March 31, 2012, 9:29am

More things to think about:

You wrote that all your source files have names like:
filename_RECIDx
and you want to put them into an output file named filename

This requires a bit of care. The script I showed above uses the wildcard filename* which will take all files beginning with filename – If the output file begins with those same characters, that will cause trouble.

If the input files are all like filename_RECIDx (including that underscore character) and the output file will be filename (without the underscore), then include the underscore in the wildcard, in both places where it appears in the script. That is, filename_* instead of just filename*
And DON’T include the underscore in the output file name.

Will you have a different filename part of those names for every day’s run? Or will that be the same every day? Will the combined output file be the same name every day?

goldmund · March 31, 2012, 4:53pm

Here is a one-liner:


find . -name 'filename*' -and -not -name 'filename' -exec bash -c 'if [ -e "filename" ]; then sed -n 2p "{}" >> filename; else head -2 "{}" > filename; fi' \;

This does not rely on globbing (wildcard expansion by the shell), and this should work on an arbitrary number of matching files. So, you avoid the “too many arguments” issue.

It will also include the first two lines of the first matched file, then only the second line of subsequent files, as you wanted.

It decides whether to include the first two lines of matched files based on the existence of the output file. If it doesn’t exist, it will create it and include the first two lines of the file. If it already exists, it will append the second line of the current file. So, remove the output file before re-running the command.

ETA: This also avoids the possibility of unintentionally processing the output file as an input file, even if the naming conventions are similar. The ‘-and -not -name “filename”’ accomplishes that.

drewtwo99 · March 31, 2012, 6:38pm

Thank you so much you guys! You are all life savers. I have it all implemented now and working quite well. You are a wealth of information and I really really cannot tell you how much I appreciate the help and the explanations. And all the options! So nice.

Topic		Replies	Views
Help writing a linux script... again Factual Questions	10	1832	May 19, 2012
fastest way to add text to each line of a file In My Humble Opinion	4	11481	February 8, 2011
Linux Bash Script Question Factual Questions	30	3014	November 12, 2012
UNIX Shell Scripting question Factual Questions	7	722	August 31, 2002
Need a quick way to add up file sizes in Linux Factual Questions	6	3820	June 4, 2004

Linux Script Question

Related topics