What approaches should I consider to write a program that will build a growing data file over the years, in a robust way?
In my original source, data appear every day in a pretty continuous way. This is kind of like stock prices or weather data - the same variables keep getting updated, second by second, day by day, over time. My source is an online data system to which I can submit SQL queries, though I can’t modify the system in any way. It holds a couple years worth of data, which age out of the system. When I submit a query, in a little while I get back a .csv file.
I am doing this because I regularly run a proprietary analysis program, which works just fine. If you can live with the stock market analogy, let’s say my program creates buy and sell recommendations. My analysis method uses all the data I can get my hands on, several years worth, so I use data that have aged out of the source system in addition to data that are still currently available from the source system.
Because I want to use data that are no longer available, and because downloading much of the data set takes hours, I keep a local copy of the data, and occasionally download the stuff that’s new since the last download, and append it to my local copy. So far, so good.
But I want to automate this to run once a day. Extending my current approach would basically mean that, every day, I download yesterday’s data, append it to my multiple gigabyte master data file, and then do today’s analysis with the new master file.
Is this wise? It seems, I dunno, kinda amateurish to keep doing that. Shouldn’t I be updating the master file less frequently? Like maybe append yesterday to a data file for this month, then append the month to the master every time the month changes, or something like that?
Or am I being silly? I could keep backup copies of the master at various stages, have some diagnostics that look for something going wrong (like trying to append data that are older then the last thing that was already appended).
Is there some kind of recommended practice for this sort of thing, that goes beyond fiddling with things until it looks like it’s working?