I assume your brother already knows that there is specialized software for this sort of thing (including the MATLAB Reservoir Simulation Toolbox)?
Yes. He also uses very specialised applications like Eclipse.
The practical limit for Access is about 2GB. My team has run databases that size…the real limitation has been how clean their SQL is.
To add to this, for every cell with a formula in it, Excell has to look at that, and convert =A1*A2 into something that the computer can actual understand, and do that for each and every cell. So if you have multiple celss, it has to do that over and over, which takes time.
In Matlab If I have two columns of data in a Matrix A and want to write the results to Colum1 in Matrix B I would type in once
B(:,1) = A(:,1)*(A:,2) ;
Matlab would only need to evaluate what that means once, and then go and do it
basically the multiplication is really fast, what takes the time in excel is over and over for each cell to be parsed into something the computer understands and can be acted on.
[sidenote]Isn’t Access going away, if not already gone?[sn]
There’s no indication it’s going anywhere.
At the end of the day, Excel is an electronic version of data written on index cards and neatly drawered with no indexing other than whatever order the user has put them in. Let’s set up a quick data set as an example. I own a bar and I keep daily records of how many pints of draft beer I sell. On one card I record the date, the name of the beer, how many pints sold, and how much I made selling them. When it comes time to organize them into the drawer I can sort them by any of those points, but all the others are spread throughout the set. Obviously I’ll organize them by whatever order I’ll be most frequently looking them up, go with by brand for now. The problem is that when I want to know how much draft I sold on St Patty’s day I still have to look at every card to know if it applies. With a few tens of thousands of cards this isn’t much of a challenge for Excel. Scale it up to millions and things really start bogging down. With a properly designed and indexed database in Access or mySQL pulling the relevant data is trivial and near instantaneous.
Well, I read it as “How can** A** brother speed up his spreadsheets”.
At which point I though “that’s a cute title…or is this whole racism everywhere thing is getting a bit out of hand”
Indeed, but there’s no point in indexing the data. He’s doing calculations on all of it, not picking out bits.
I’ve had that problem. It’s like they just lie there…!
Then he really should look at Matlab - I had the same issue ( it was multiple channels of oil well drilling data from multiple sources, al at 1 second basis) and I was trying some new algorithms on some data that hadn’t been acquired before.
I made the switch , and other than some learning and dealing with time formats ( see rant in workplace ranting thread) and making graphs ( which is much more powerful , if a little arcane, still no worse than gnuplot) I am so glad I did make the swap. Matrix manipulation is what Matlab does really well, and with multiple tabs I suspect he is trying to simulate a 3D matrix , just using Tabs as the 3rd dimension
If they can afford a Schlumberger software license they could easily spring for a Matlab one. (although
Still, if Excel and the time sunk into what he has there is too much, the only thing would be to try and put as much calculation into single cells. I used to put multiple colums with each column being a step in the calculation, it helped with development and spotting errors, I think I got some improvement by pushing the steps into one column where I could.

Indeed, but there’s no point in indexing the data. He’s doing calculations on all of it, not picking out bits.
I was actually answering Ruken’s question about why Access would be faster, but this bit of info shed’s new light on your problem. Without use for indexing, Access would add little but prettier reports and a cleaner interface to the project. The actual calculations might even be slower.
Is there a chance of getting a sample up on Google Docs or even a complete sanitized spreadsheet for us data monkeys to pick apart?

In Matlab If I have two columns of data in a Matrix A and want to write the results to Colum1 in Matrix B I would type in once
B(:,1) = A(:,1)*(A:,2) ;
I guess some of the more complex stuff he’s doing would be expressed as
B(:,1) = SomeProcedure (A(:,1))
where the procedure is a pageful of code.
Would Matlab be as fast as Excel?
Pure speculation on what is in that spreadsheet, but quite often you will see VLOOKUP and INDEX functions, etc. that bring together related data. These are relational functions and imply that a relational database would be the correct tool.
Relational database (even Access believe it or not) can bring together millions of rows from multiple tables in an amazingly short time.

Pure speculation on what is in that spreadsheet, but quite often you will see VLOOKUP and INDEX functions, etc. that bring together related data. These are relational functions and imply that a relational database would be the correct tool.
Sigh. A database is NOT the answer. I’ve passed on the recommendation of Matlab, but think there would be a lot of work for him to do to even start getting results.

I guess some of the more complex stuff he’s doing would be expressed as
B(:,1) = SomeProcedure (A(:,1))
where the procedure is a pageful of code.
Would Matlab be as fast as Excel?
Tough to say without knowing exactly what the code is. I suspect so, even if just through getting rid of all the graphics that excel is handling (ie displaying those millions of cells of information) . Although with matlab expresions if you forget to put a ‘;’ at the end of the line it outputs the calculation to the screen so you see thousands of numbers scroll by, once I figured that out calculations sped up even more.
Another area which may need thought is if he has a lot IF THEN ELSE statements in the calculations and loops ( ie looping through sveral million lines perfoming the operation) you may need to rethink that as using for I= 1 to 1,000,000 loopscan take a while to execute.
I basically used the ‘find’ operation which is really fast at fining the index values of what ever the test condition is
eg want to find all values in col 3 of A that are less than 1
idx=find(A(:,3))<1
idy=find(A(:,3))=>1
that gives me the index values in idx that are less than 1 and idy is all the other equal or great than one
then if I want to do something specfic to those values , say multiply by 10 I would say
B(:,1)=A(idx:,3)*10 % multiply al the less than 1 values by 10)
B(:,1)=A(idy:,3) % leave al the other values alone
you can also do it to set up a bunch of logical 1 or 0 flags then do logic operations on those to get the same effect as an AND OR XOR statement , if that is needed ( sometimes performing operations on only bits of data that meet certain criteria, or using different constants is needed)
so yeah there is a learning curve, but if you can get everything into some sort of matrix operation it is super fast.
There are some good forums and a lot of user made routines to help out. The forums can be a little SDMB like, if you ask a question, you will get many ’ the question is wrong’ answers or ’ your don’t want to do that , you want to do something completely different’

Another area which may need thought is if he has a lot IF THEN ELSE statements in the calculations and loops ( ie looping through sveral million lines perfoming the operation) you may need to rethink that as using for I= 1 to 1,000,000 loopscan take a while to execute.
Yes, there’s a fair amount of this. In one sample of code I’ve seen, the code does a calculation, compares the result of that calculation, then performs one of two calculations depending upon the result of the first. Many of these things could be expanded out into standard spreadsheet formulae, but using VBA has the considerable advantage of clarity.
If you have gigabytes of data, doesn’t that imply that a lot of the data is static? Is there no way he can precalculate things and put them in a smaller table? Especially since you say indexing is pointless, meaning he’s just using raw data and not going back and looking up specific values from specific cells.
For a really simplified example: let’s say you have a database of grades, where you add them all up and divide by the number of possible points. To save calculation time, you could total both the total and the total possible, and only use that for your calculations, rather than readding up all the cells every time.
While that’s a simplified example, it just seems like something with GBs of data would have those sorts of optimizations as a possibility. Simply avoid repeating the same calculations that you’ve already done before.

Sigh. A database is NOT the answer. I’ve passed on the recommendation of Matlab, but think there would be a lot of work for him to do to even start getting results.
As a former DBA I have to say that a database is ALWAYS the answer.

If you have gigabytes of data, doesn’t that imply that a lot of the data is static? Is there no way he can precalculate things and put them in a smaller table?
Yes, the source data is static after import, but he generates similarly large amounts of data. He also uses a few tricks like putting the sum of one column in another column or even in a different sheet rather than at the top or bottom of the column, avoiding INDIRECT, deep dependency chains, and much more.