I need to mangle a large datafile. Since it is a fixed text file what I would love to be able to do is to say “take column 6 and move it to colum 50, then take column 8 and move it to column 51”. I have a great text editor that lets me do this with a manual clipping and pasting action, but that in itself leaves too much room for errors. Can anyone suggest a text or data editor that would let me manipulate in this way? Ideally freeware and ideally Windows based.
I’m not sure if this will be helpful in your specific situation, but you might want to look into Windows ports of the “sed” and “awk” Unix utilities.
These two tools are extremely useful for performing some fancy operation on every line in a huge file.
My boss recently asked me if there was a way to do global search and replace on a 200MB text file. Rather than attempt to open this file in any editor, I showed him how to use “sed” to process the file and pipe it to a new file.
My first impression is to say MS Access or any other database system. If you don’t already own it though Access is not cheap. I searvched online and couldn’t find any freeware DBS. Is it possible your company has Access or something like it?
Shuffling rows is easy in Excel; import the data into a spreadsheet, populate a blank column with =RAND() all the way down alongside your data, then highlight the whole lot and sort by the column containing the random number, then export it as plain text.
Shuffling columns should be the same, but the problem is going to be importing the data as one character per cell.
Good suggestion, but sed and awk are a bit high on the learning curve, and “cut” is pretty freaken simple to use and is up for the job in the OP. Pretty much all the unix tools are available from http://www.cygwin.com/ as one big download.
In your case, say that the colums were separated by tabs (it also works if the columns are a constant characters wide, or if they’re separated by any other character). Then your cut command to swap columns 6 and 55 and also swap columns 8 and 51 is simply:
cut -f 1-5,55,7,51,9-50,8,53-54,6,55- (filename) > (new_filename)
(take columns 1-5, then 55, then 7, then 51, then 9-50 in order, 8 … then 55 and up).
-lv
People often overlook the “cut” and “paste” utilities available on Unix and in Unix-like toolkits for Windows. These should do exactly what the OP wants - “cut” chops vertical slices out of files, based either on column position or a delimeter, and “paste” glues files together as columns. “cut” in particular is often more convenient for the cases it covers than constructing an edit command for sed.
Thanks, LordVor and yabob!
I wasn’t aware that cut could do that – I always thought it was only used for trimming the left or right side of a text file.
Yet one more reason to never underestimate the power of Unix command-line utilities.
I think there’s a minor typo in LordVor’s example - to cut using character positions, you use the -c option. The -f option is used to specify field positions, and is used in conjunction with -d specifying the delimter character:
cut -c 1-5,10- (produce characters positions 1 through 5, and 10 through the end)
cut -d : -f 4-7 (produce the fourth through 7th colon-delimited columns)