How powerful a PC will I need to properly run this spreadsheet?

I have written many excel spreadsheets but nothing near as large as the one I am currently constructing.

My current computer is really struggling to load and manage what I have constructed so far, and I’m only about 25% finished.

I’ve made sure to reduce the workload on my computer by using an .xlsb file, no formatting, and spreading formulas through multiple cells instead of building huge, nested formulas all crammed into one cell, etc…

When complete, the spreadsheet will have user-entered numerical data in approx. 35 million cells and formulas that crunch that data in approx. 85 million cells. Yes, 35,000,000 and 85,000,000 cells, and yes, it’s neccessary. I don’t really want to get into that aspect of the discussion. And, no, I’m not going to manually enter 35 million pieces of data. That will be automated.

Anyway, the “good” news is that most of the formulas are quite simple:

36 million are the product of three numbers.
36 milion contain 10 nested “IF statements”, each.
12 million are simply the sum of a range of cells.
1 million each contain a formula that is 176 characters long, but the calculations within are quite simple.

Currently, at, 25% completion, the spreadsheet consumes about 1.5 gigs of memory when open. About half the time now as I try and build on it it locks up, gets stuck in a recovery/repair loop, crashes completely, etc… It’s getting to be a pain.

When it’s complete I want to be able to change up to thirty pieces of data and then recalculate the results through crunching the 85 million formulas.

I would also like to have some nice graphs running as well, with one possible graph showing 3.5 million pieces of data.

So, what kind of a PC do I need to build to finish the spreadsheet in a reasonable manner, and then allow it to function and recalculate at a timely pace?

How much memory? What kind of CPU? Do video cards matter? I know they are used for crunching numbers, at times… Do I need one of those pro machines with the dual Xeon processors, etc?

Thanks for your help, it’s greatly appreciated.

I am a Systems Analyst and I thought that I had built some big spreadsheets before but nothing that ever remotely approached that. My first thought is that your basic architecture isn’t right. Spreadsheets are not really designed to handle that much data and I am not sure you can fix that through better hardware. A true database like MySQL, Oracle or SQL Server is much better equipped to handle data of that size and would offer many other advantages as well such as the ability to perform incremental backups and recovery from failures. Setting up a rather simple database isn’t that hard overall and would likely suit your needs much better.

I know that you don’t want to get into “that aspect of the discussion”, but the fact is that you’re using spreadsheets far beyond their intended use case. You should have no expectation at all that manipulating tens of millions of records at a time will work at all.

That said, it’s very likely that RAM is your limiting factor. Spreadsheets tend not to be very CPU limited, but they can eat memory, as you’ve found. Depending on what you have now, you might be able to upgrade your current system to 8 or 16 GB and be fine. The video card doesn’t matter at all, and the hard drive–although it will have an effect on load/save times–won’t matter much once the spreadsheet is loaded.

Shagnasty is right that something like a SQL server is your best bet. I suspect that the time you waste trying to use the wrong tool for the job will exceed the time spent on learning the right tool.

What are the specifications of your current computer?

I handle data that has so many rows that it breaks our spreadsheets all the time. if you can learn a simple scripting language, and export the data to a csv file, or get a package that reads xls files directly, you can handle that data in seconds.
It doesn’t sound all that complicated.

Thanks for the feedback, I appreciate it.

I guess I’m kind of surprised to hear that I’m expecting too much from Excel.

A spreadsheet has over one milion rows and some 16 thousand columns on offer. Of course, that may be wishful thinking on Microsoft’s part and in all practicality it may be far beyond home computation to engage even a majority of those cells. I certainly wouldn’t expect to cram the Apollo 11 moonshot trajectory calcs in each and evey one of those cells, but my sheet is a small fraction of what seems to me as suggested as possible.

The sheet I’m building will have a hair over 1 million rows (granted, near the max) but only 150 columns. There are millions of formulas involved, true, but they all deal in simple arithmetic for the most part. Another detail I should have included is that the struture of the calculating is “massively parallel”, I’ll call it, with only four or five formula steps between raw data and final results, done 3.5 million times. That may be relevant info…

The computer I’m currently using is a modest $500 ACER ES-571-391E with a i3-4005U 1.7GHz CPU and 6 GB of RAM. It was my expectation that a machine comprised of upper-limit consumer PC components would solve my problems and not that I was pushing the limits of excel by some orders of magnitude.

I’ll consider a different approach, as suggest above. Thanks for the suggestions in that respect.

I do some similar things with bucket loads of data , 650,000+ rows, 50-100 columns. I had intel 3770k (3.9 gig) and 16 gig of memory. It was painful, to the point of being impossible so I gave up and went to Matlab
They maybe some free matlab clones out there.

It’s possible that you could construct a PC that will let Excel crunch away happily at that level, but to be honest you are using the wrong tool. The base data needs to be in a proper db eg MySQL or SQL Server. You aren’t going to need Oracle or anything for that much data.

Edit: said everyone else heh

Several responses mention MySQL. I went and had a look at the website and browsed the wiki entry, as well. There is a lot going on there and I know nothing about the program or databases in general. That’s ok, I did a lot of recreational programming 20-some years ago and I’m not afraid of getting my hands dirty learning new stuff.

How would the alternative process using MySQL generally work? The data is stored in MySQL and then fired off to excel which does the calculating and then put the results back into the database, thus freeing excel from having to carry all the data? Or the calculations are done in MySQL, as well?

Sorry, basic questions. Just trying to get an overall grasp of the process. Thanks again.

Are you running the 32 bit or 64 bit version of Excel? Take a look at https://technet.microsoft.com/en-us/library/ee681792.aspx and you’ll see the the 32bit version has a hard limit of 2gb while the 64bit version does not.

Also take a look at PowerPivot, a free add on which is an in - memory compressed column store for large data sets.

If the workbook functions OK with manual calculations, my usual approach is (using VBA):

Copy the formulas to another tab (just one row of them, if you have a table where the formulas are inconsistent row by row, you built the table incorrectly)

Write a sub to copy the formulas into the table and then convert the table from live formulas to values, something like

sub mcr_values()
dim lastrow as double
sheets(“data”).activate
lastrow = Cells.Find("*",SearchOrder:=xlByRows,SearchDirection:=xlPrevious).Row
sheets(“formulas”).range(“ba2:ca2”).copy sheets(“data”).range(ba2:ca" & lastrow)
range(“ba2:ca” & lastrow).select
with selection
.value = .value
end with
end sub

I haven’t tested that code, obviously.

Excel usually handles 30 or 40 million cells of values without much delay, maybe some when you open the book. Trouble begins when you have hundreds of thousands of functions that recalculate each time you change a cell value, and it gets worse when you have volatile functions (NOW, OFFSET, TODAY, etc).

My experience is that complex nested formulas don’t hurt performance all that much, but they’re a pain to evaluate and fix.

Usually when performance begins to choke the first option is to limit the data range (quarterly or monthly workbooks rather than yearly or indefinite, for example).

If that’s not a go, the next option is aggregating data in another workbook. You do the data cleanse (TRIM, FIND, etc) and range finding (EOMONTH, NETWORKDAYS, etc) in that book and only bring in the data the user needs.

Finally, if none of that works, I’ll put the data in MS Access, create a crosstab query, and link that to the calc workbook. I’ve only had to do this a few times since Excel moved beyond the 65k row limit, such as when working with call center transaction data where a single day can create thousands of records. You calculate the basics, such as AHT (average handling time) in Access, and do the analysis in Excel. If you’re good with SQL or can get somebody to write the query for you, yes, by all means link Excel to the database. The problem there usually is getting access to the data.

@FinsToTheLeft

Thanks for pointing that out. I am, in fact running 32-bit Excel. It seems there is an option to uninstall and reinstall the 64-bit version. I’ll take a look at PowerPivot, as well. Thanks for the tips.

MySQL is set up as a server. You’ll just be running it locally, but it means you can connect to it via a variety of clients.

The most straightforward way to use it is through a GUI interface such as MySQL Workbench. Workbench doesn’t have any native programming support; everything is done through SQL queries. SQL is a powerful language and can do everything you need, although it is not necessarily the best fit. Still, the idea here is that your queries can work on an existing table (your input data), do some processing on it, and either output that table directly or store the results in another table for further processing.

Another approach is to connect to the server using bindings to another language. Just about every common programming language has support for connecting to MySQL. This approach is the most flexible, because you can alter where the computation takes place. It could range from everything being done in SQL (with the program just there to format the output), to just using SQL to output all the data unaltered (and using the program to do any processing).

It sounds like the main thing you want to avoid is needing all the data loaded into memory at once. If your row calculations are mostly independent, this shouldn’t be a problem.

If you could give a bit more details on your problem, I (and others) could probably give better advice. Voyager may be right that a scripting language (without using MySQL) may be sufficient. It depends somewhat on how much reprocessing of that data you’ll need to do.

Python with NumPy and Pandas can handle this. You could use an SQL database to store and manipulate your data separately (using the python-sql package) but honestly, handling large data analytics is exactly what Pandas is designed for, and it has extensive functions for data translation, storage (using the common HDF5 format), analysis, and visualization. And it is completely free for the basic Anaconda distribution which has everything you would need any more to do pretty much any technical or financial analytics you could possibly imagine along with the Spyder GUI, which frankly cracks Matlab right out of the water.

There are others (Octave, SciLab) but none as robust or well-documented and supported by a large technical user community and several third party support and training companies.

Stranger

Thanks for the input, Dave. Some of what you wrote is somewhat beyond me but I think I understand some of the basics you shared. I’ll explore those ideas further. I appreciate your response.

Thanks for the response. What I understand from it is that you can have MySQL working hand-in-hand with another program, passing info back and forth as needed, with one of them doing the heavy lifting. If I got that right, that sounds promising…

But as you asked, here are more details about my spreadsheet:

There are (roughly) 35 million pieces of data, each a small number, in groups of 10. So, 3.5 million groups. There is also one additional group of 40 pieces of data that are also each a small number.

The sheet will produce 3.5 million pieces of data that are each calculated from one 10-number group and the 40-number group, which is common.

While I theoretically could have gone from the input data to the final result in one step, the neccessary formula would have had approx. 15.5 thousand characters in it. Well, Excel only allows 8,000-or-so characters in any one formula and monster formulas are a bad practice anyway, so I broke down the process into a few steps.

The first step involves 10 formulas, each containing 10 nested if statements.

The second step involves one formula of simple arithmetic.

The third step involves one formula containing division by a sum of cells.

The fourth step involves a formula of 176 characters but it includes only basic arithmetic.

Now, this all very basic and worked perfectly when I was first constructing the sheet and only asked it to do this a handful of times. The problem is that I’m trying to do this 3.5 million times (once for each 10-group), all at once, which is neccessary as each result of the 3.5 million results is compared.

Well, that’s it in a nutshell. I’ve got around 20 million formulas written into the sheet so far and it and my computer are not happy about it.

Thanks, all.

Thanks, Stranger. From what I can gather as the completely uninitiated, what you suggest sounds very promising! Now I just have to go figure out what all that stuff is. :slight_smile:

Thanks for your help.

Python and pandas does have a bit of a learning curve (albeit not nearly as much as trying to program in C or Java) but once you’ve got your footing you will be able to do transformations, data munging, and analysis with a tiny fraction of the effort required to do it in Excel, and much easier to verify operations and debug problems. It is worth the time to learn the functionality, and there are plenty of tutorial and library resources freely available, as well as the many excellent books from O’Reilly and No Starch.

Stranger

Ok, so I just watched a YouTube video where a guy writes a quick program that transports data from an Excel file into MySQL using Python. It’s evident that this stuff is quite powerful and flexible and I’m going to begin again and approach it from this new angle.

Thanks for all your help.

Is this data going to have to be stored and pulled out on demand, and get input from lots of sources, or do you have more or less static spreadsheets which get generated elsewhere and are processed? In the former case MySQL is best. My data is in an Oracle database, is about a terabyte at this point, and gets pulled out by lots of people all the time. If the latter, a simple scripting language would be easiest. Your code doesn’t look all that difficult, and if you remember programming you can knock it off in half the time to even read the doc of a new system. I use Perl, but almost anything will do. 10 layers of nested ifs might be a big deal in Excel, but it is simple in any decent language, and you night find an easier solution with more programming power. Don’t worry about speed - I’m betting that your coding time is going to be greater than the sums of all the runs you will ever do.

The 35 million pieces of data will need to be stored and extracted on demand for use. The values themselves are static and will not change. The data in the single 40-piece group is open to change. This data will be used in calculations that will produce 3.5 million new pieces of data that will be compared to one another. As the data n the 40-piece group changes, new calculations will be made and another 3.5 milion pieces of data will be created and compared.

I will be the only person using this data and it doesn’t need to be available to anyone else.

Having poked around, learning about some of the avenues suggested to me in this thread, it’s clear to me now that it’ll be best for me to abandon the spreadsheet route and implement my plan using a database and programming.

Thanks for your help!