SELECTing from database too big to load into Access

I have a series of very large databases stored as text files (I think comma separated, but could be tab separated or fixed width). For the sake of example let’s say the databases contain a Unique ID Number (UIN) and associated information such as the date of birth. I have twenty files, one for each year going back twenty years. I also have a long list (~40,000) of UINs of my company’s “subscribers.” I want to go through the large databases picking out all the records for every person in my list of subscribers. Since the files are too big to get into the database software I have handy, perhaps the best way to do this would be a text search. Wouldn’t the Unix Grep command (using cygwin) work for this, and maybe one could write a shell script to search each of the twenty database files… but I’m not sure how you would do that. Windows XP. Any suggestions?

Why are the files too large to get into Access? Did it tell you so? An Access database can be upto 2 GB, with the same limit on table size. How much data do you have in your files?

You could also get the data into a freeware RDBMS like MySQL or PostreSQL, both of which will very easily handle the volumes that you are talking about. MySQL on Windows (with NTFS as your file system) can handle databases upto 2TB. PostgreSQL has no limits on database size, with a 32TB limit on individual tables.

It might be the simplest way to go; import into one of these products (both available on Windows) and export the joined set in whatever format you wish.

ETA: There’s also Oracle SE and SQL Server Express, but they both have limits on database size (4GB and 2GB respectively). Both also free.

If the files are too large to store in the database, odds are that searching them line by line in a reasonable amount of time is pretty low. (Of course, maybe that doesn’t matter.)

For Windows, there is Windows Grep which is a nice program to have even if it doesn’t work for you. I’ve never tested it with large files, but I imagine that it can search them fine.

If you intend to be doing this regularly, I’d have to agree with Dervorin: You’re better off to get a real database application like MySQL or PostgreSQL. With these you can add indexes, do search results based on indexed items are instantaneous, and of course add and delete from the databases without having to save multi-gigabyte files.

Honestly, the best thing to do is to take the time to get a real database engine (Access is crap), load the data in, create some appropriate indices on the tables, and run the query that way. Modern database systems are pretty smart about how they execute queries, and your shell script would have to be clever to beat the combination of a good query engine and good indices, and if you want to run a different query, you’ll have to come up with an equally clever script.