How do I edit a 630MB file?

I have a need to edit and sort a 630MB webserver log. Trying to open it in notepad or metapad seems to cause crashes or hangs. Is there an app that can deal with a file this size, and/or is there a way to split this file down into manageable chunks so I can do so?

The webserver config has been changed to rotate the logs on a monthly basis so this doens’t happen again, but I need to delve in to check out some log shonkiness.

BTW, anyone else getting hammered with MSNbot? I’ve already set robots.txt to deny it any access. Seems to have pulled 1.2G of traffic last month off our site with 200MB of content…

capn

Do you have a Linux Machine? I think you could page through the file using the more command with no problems.

There are specialized programs for log editing/viewing. Why don’t you use one of those?

Very slowly? :slight_smile:

Yeah, I actually use Analog. What I’m trying to do, however, is show my employer exactly how many bytes msnbot used in our last billing cycle, something I don’t think Analog can tell us. If you know otherwise, do tell!

We are on a 5G/month plan, and last month we were over by 1.2G. I haven’t seen the bill, but the boss isn’t happy. The overage charge is well into three digits. Obviously, we’re now looking at hosting the site elsewhere. And maybe sending MS a bill.

Regards,
capn

Dog80 -

On re-reading your post, I see you said “Log editing” programs where I thought you meant “log analysis”. What programs do you speak of?

And mblackwell, I just burned my Mandrake ISO’s tonight :wink:

capn

Um, the file size capacity of Notepad is measured in kilobytes. :smack: Windows will automatically try to load a textfile that’s too big for Notepad into Wordpad. Of course, if you’re opening Notepad, and then trying to import your monster file, it ain’t gonna work. Trust me. :smack: :dubious:

You might try downloading a copy of NoteTab Light (freeware version). The creators say that any version of this program will open filesizes >16MB. I dunno if it will work on a file 40X that or not, but at price = $0, it’s worth trying. :wink: And worth paying the piddling price they ask for one of the fancier versions, if it does what you want!

It has some features you’ll have to see to believe: Frex, not just total word counts, but a comprehensive listing of word frequencies, by word.

Aaaannnnd . . . . .

From the Help module:

And it is absolutely the most flexible text editor I’ve ever seen. Which is, frankly, almost the only thing I’ve ever had the nerve to use it for. It’s not the program’s fault I’m chicken, though. :wink:

A little byte at a time?

Sorry.

I was thinking along the same lines. Wouldn’t VI in UNIX work. Although, if you don’t know VI it would be a bit of a pain.

Well as the others said, if the file is on a UNIX/LINUX machine then you can use ‘vi’ to easily edit it.

If you’re on Windows then the best to use is Ultraedit as I open files up to 1Gb on a daily basis with it. t doesn’t load all the file into memory at once (as Notepad) does so you don’t need a lot of memory to use it.

And of course if you don’t want to edit it but use the data within then Analog is a very powerful tool with several GUI frontends to make it a little easier to use :slight_smile:

It sounds like you want to analyse the data so heavy duty test editors might not be what you are looking for. You can, however, import it into MS Access as a fixed width file if each record has a consistent layout. Analysing it then will be a snap and you can edit records or delete records if you need to.

Try TextPad for viewing large files like that. http://www.textpad.com/

I have had good success in using in to open files that Notepad couldn’t handle. I don’t think I’ve tried it on a file as large as yours, but it’s worth a try.

Sorry, I only skimmed across the posts above, but I think someone mentioned loading it into a database. I do this all the time to view truly large output. I work in an Oracle shop, so its pretty standard. If you are a home user, you might want to try Access.

Good luck.

I’m betting even Access will choke a bit on the 10 million records-ish we’re talking about. Yes, it’ll do it, but it might be painfully slow.

Here’s a MUCH easier way if the log file is a text file and you’re just trying to extract a small fraction of the lines in it according to simple per-line criteria.

Use the CMD FIND program.

The critical failing of all the typical Windows-based editors is that they try to read the whole thing, when you really just need to look at each row independently.

Open a CMD window and then enter
C:>FIND /i “msn.com” MyHuge630MBFile.log >SmallExtract.log

It’ll read through the file, one record at a time, copying only those records containing “msn.com” to the SmallExtract.log file. The one record at a time feature means the file size is limited only by disk space and the /i switch means case-insensitive comparison. It may take a few minutes to run through the 10 million-ish records you have, but it’ll do it.

Then you can use some smarter log analysis tool (or Access or a spread sheet) to look at the details of the surviving records.

If there are several different selection criteria, you can dasiy-chain finds to achieve AND logic. You can conduct parellel finds and merge the results to achieve OR logic.

All this operates only on the per-record level of selection, and it requires that the log file contain just plain ASCII text, but that’s probably all you need.

FIND was added to MS-DOS in version 2.0 and is still as useful as ever. We all tend forget sometimes about the basic tools. For basic jobs they’re often the best bet.

Nice post LSLGuy! I could actually benefit from this! Very handy indeed.

I think you would benefit from getting grep. Download the gnu tools for windows. grep will give you better searching ability than find.

http://unxutils.sourceforge.net/

I have found that VI will not handle really large files. I have been told tha vim will work for really large files but I have not tried it personally.
http://www.vim.org/download.php#pc

If you are a computer professional you really need to get a text editor that is better than note pad. There have been good sugestions here. Vim is used by a lot of people where I work and they like it. Text pad is good.

I looked at the information on the site, and a) it plainly doesn’t do as much as NoteTab, b) they don’t have a free version. If there’s information about file sizes, I didn’t find it, but NoteTab clearly states it will handle files in excess of 16MB.

As far as that goes, I’m not sure that anyone else’s text editor can compete on versatility, or ease of use. Since the company is in Switzerland, it should be obvious that I have no personal interest in the company. Closest connection I have to there is the sister of a friend lives there. It’s just that someone suggested this text editor to me some years ago, and I’ve been happily using it - and recommending it - ever since.

LSLGuy’s suggestion sounds like a good first step, given that the OP’s got what can be called, without fear of hyperbole, an enormous task to perform. :slight_smile: However, after that’s done, he will still need to use a text editor. And for that purpose, I am strongly of the opinion that there is a greater probability that NoteTab will work better with what will still be a humongous file.

I second this. Going from find to grep will help in many ways, all of them related to a more flexible and powerful pattern-matching engine.

Which vi? I heard that the closed-source one that shipped on old Solaris machines was pretty horrible with large files, but that’s one of many vi implementations in the world.

In any case, Emacs is probably better for new users than vi, and I know that GNU Emacs has no arbitrary limitations.

http://www.gnu.org/software/emacs/emacs.html
Emacs FAQ for MS Windows - GNU Project - Free Software Foundation (FSF) – GNU Emacs on Windows 95 to XP, as opposed to x86-DOS/Windows 1-3.

Seconded, thirded, amen-ed, and fuckin’-a-d. :wink:

It was old solaris VI that we used.

We also could not get these files into emacs. They were multi gig files. Chip netlists get pretty large.

Ok, this is a bit odd, and I don’t know if you can blame it on the version or flavor of Emacs. Maybe (probably) they were so much larger than the installed physical RAM that the OS wouldn’t allow the program to malloc that much core (some OSes will overcommit to hell and back so a malloc may never actually fail, but not all), so there was nothing left for Emacs to do.

I do know that GNU Emacs, at least, does not impose any arbitrary limitations on filesize. But if the OS won’t cooperate, there you are. grep and the other fileutils get around this by operating on files a chunk at a time, so they never request a large amount of memory at any one time. I don’t know of any (interactive, as opposed to sed) editors which do this.

I wholeheartedly second UltraEdit. It’s not freeware, but the price is downright cheap for its power and functionality. And working with 630 M file would be no sweat.

Got a Mac handy?

BBEdit, or BBEdit Lite for that matter, is built to handle humongous text files like this. I don’t happen to have a 630 MB text file lying around but I just sicced BBEdit on a half-gig VirtualPC hard disk image, and had it convert line endings on-the-fly on open. Takes a moment to open. The basic info window took a moment to calculate when I asked it to summarize the file, but here you go:

http://home.earthlink.net/~ahunter/Upload_Download/BBEdit_HalfGig.jpg

BBEdit is decently powerful and very easy to use with no appreciable learning curve.