I have an Excel 2007 workbook (spreadsheet) file with 166,000 rows and about 30 columns. The spreadsheet is built row by row using VBA code and contains permutations of different words that are taken from a different tab (worksheet) of the spreadsheet file. When I work on the workbook, I frequently have to paste one or more columns which takes a long time and I often get out of memory errors.
The separate worksheets use formulas that leverage the Concatenate or the RAND function.
The whole purpose of the workbook is to generate a tab-delimited file which gets read by another program.
The workbook is painful to work on. The VBA script sometimes freezes, and I’ve had to cut down on the columns to make this thing semi workable. When I calculate the formulas (which I have set to do manually) it takes 10+ minutes.
I run this on a Lenovo T61 laptop with 4GB of memory and Windows XP. What options do I have to make this more bearable?
I’ve thought about buying a 64 bit laptop with more memory and running Excel 2010 Beta. I’ve read performance is improved on that. Are there other options? Is there something else I can port it to that would give me similar functionality that wouldn’t require a lot of re-writing?
What about MS Access? Would I have to spend a lot of time re-writing my VBA scripts?
Can you provide a specific example of what you’re trying to do, including sample formulas?
Speaking more generally, it sounds like you have a sheet of source data and a giant sheet of formulas that references the source data. You tell your sheet of formulas to Recalculate, it (eventually) populates all the cells, and you Save As a text file. Is this at all accurate? If so, rather than doing this in Excel, can you write a script in your language of choice, have it read the source data into a 2d array, and perform the manipulations to spit out a properly-formatted text file? This way, you bypass Excel entirely. Is there some advantage you’re hoping to leverage by doing it in Excel/VBA?
A specific example of what you’re trying to do would be very useful.
Hi, basically the formula takes nouns (occasions), numbers, and adjectives and creates permutations and text strings. In the spreadsheet I have the
As an example, if I had the years 1-50, the nouns “paper airplane” and “tissue paper”, and the adjectives “wet”, “moist”, and fluffy, the script would write out records that said,
1st wet paper airplane
2nd wet paper airplane
3rd wet paper airplane
1st fluffy toilet paper
etc., etc.
All the permutations taken together equal about 166,000.
The VBA script below gives you a sense of how the rows are built. The base values are stored in other tabs of the worksheet.
Yes, that’s pretty accurate, except I have part of the spreadsheet built by the VBA script below, and another part is calculated by formulas that take other text snippets and strings them together in a random way.
Yes, I could do that. Except I have considerable sweat equity in the current method, and I suspect it would take me weeks to learn how to write a script that would do all this. I’m not that familiar with Access, but I downloaded the 2010 Beta and have started to import the separate worksheets into tables, and see if that gets me anywhere…
The VBA script is too long to post (over the 20000 character limit, but I’ve included a snippet below…
Sub create_WM_spreadsheet()
Application.ScreenUpdating = False
Dim Year As Range
Dim YearLookup As Integer
Dim YearNumber As Range
<SNIP>
Worksheets("Year").Activate
Set YearRangePrefix = Range("A1", Range("B1").End(xlDown))
Set YearRange = Range("A1", Range("A1").End(xlDown))
Worksheets("Adjective").Activate
Set AdjectiveRange = Range("A1", Range("A1").End(xlDown))
Worksheets("BDay_Anniv_LastWords").Activate
Set BDay_Anniv_LastWordsRange = Range("A1", Range("A1").End(xlDown))
Worksheets("Descriptions").Activate
Set DescriptionsRange = Range("A1", Range("A1").End(xlDown))
Worksheets("Occasion").Activate
Set OccasionRange = Range("A1", Range("A1").End(xlDown))
Worksheets("Sheet1").Activate
' Build Spreadsheet Headers
Worksheets("Sheet1").Activate
Range("Sheet1!A1").Select
ActiveCell.FormulaR1C1 = "Campaign"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Occasion"
ActiveCell.Offset(0, 1).Select
<SNIP>
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Full-Image-Path"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy Logo"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy Events"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 2"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 3"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 4"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 5"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 6"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 7"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 8"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 9"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 10"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 11"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 12"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "Image Copy 13"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "AQ URL"
ActiveCell.Offset(0, 1).Select
ActiveCell.FormulaR1C1 = "#!/bin/sh"
ActiveCell.Offset(0, 1).Select
Range("A2").Select
For Each Occasion In OccasionRange.Cells
For Each BDay_Anniv_LastWords In BDay_Anniv_LastWordsRange.Cells
For Each Adjective In AdjectiveRange.Cells
For Each Year In YearRange.Cells
YearVariable = Year
Set YearPrefix = Year.Offset(0, 1)
Set FirstLowerCase = Year.Offset(0, 2)
Set OneLowerCase = Year.Offset(0, 3)
Set FirstProperCase = Year.Offset(0, 4)
Set OneProperCase = Year.Offset(0, 5)
' If Occasion = "Anniversary" And Year > 60 Then GoTo EndAnniversary
Directory = Replace(Adjective & "/" & Occasion & "/" & BDay_Anniv_LastWords & "/" & Year & YearPrefix & "/", " ", "-") ' Directory
BaseFileName = Replace(Adjective & "-" & Year & YearPrefix & "-" & Occasion & "-" & BDay_Anniv_LastWords, " ", "-") ' BaseFileName
ActiveCell.Formula = Adjective ' Campaign
ActiveCell.Offset(0, 1).Select
ActiveCell.Formula = Occasion ' Occasion
ActiveCell.Offset(0, 1).Select
ActiveCell.Formula = LCase(Occasion) ' OccasionLowerCase
ActiveCell.Offset(0, 1).Select
ActiveCell.Formula = BDay_Anniv_LastWords ' Ad Group
ActiveCell.Offset(0, 1).Select
ActiveCell.Formula = Occasion & " " & BDay_Anniv_LastWords ' WMIndexBreak
ActiveCell.Offset(0, 1).Select
ActiveCell.Formula = Replace("Sitemap" & "-" & Occasion & " " & BDay_Anniv_LastWords, " ", "-") ' WMIndexFileName
ActiveCell.Offset(0, 1).Select
ActiveCell.Formula = "{KeyWord:" & Adjective & " " & Year & YearPrefix & " " & Occasion & " " & BDay_Anniv_LastWords & "}" ' Headline with Keyword
<SNIP>
Next
' Next
Next
End Sub
10 minutes? You lucky dog, I’ve had to run Excel calculations overnight. One of them was still running when I got back to work the next morning. I’m going in tonight to restart another report. The 8GB RAM, 900+MHz monster computer I’m getting soon (I hope) will end those problems, with any luck. But then my problems are more with sets of hyperlinked fields and a slow proprietary database program; the first will be fixed when I learn enough SQL to render the hyperlinks unnecessary, but the second we’re stuck with until I can write a new database program myself, and if that ever happened…well anyway.
To be honest, I’m not 100% sure a VBscript would run any faster. I don’t see anything obvious that would help your script to run more efficiently (although as you say you haven’t included everything…I’m sure you had to set the Application.ScreenUpdating back to True at the end of the sub, for example). Problem is, there’s just no getting around the fact that your program is making close to 166,000*30 RAND calls. That’s going to eat up a lot of time no matter how efficient your coding is. Maybe you could find some way to seed a pseudo-random number? For example, could you set it in some way to seed to the internal clock–really old-school computing, I know, but perhaps it would work a little faster?
Other than a faster computer (and maybe Excel 2010 Beta…I’m going to go to that soon myself) I’m not sure what else to suggest.
First, upgrade your o/s to win7. Win XP only uses 1 gb ram regardless of how much is installed. Win7 (and Vista) allows excel to access all the o/s addressable RAM.
excel2010 is also faster, 64 bit enabled, has more columns & rows.
If you’re seriously looking for horsepower, there is also HPC powered excel and SQL powered excel (aka PowerPivot). Investment banking traders for example use these.
Yeah, the iterations are going to get you no matter what. It might be possible to script this out so that it commits certain arrays to disk periodically, thereby avoiding filling up RAM (if RAM is even an issue, which it may not be). However, if Rnd() / RAND is being called at certain points and we need to check for duplicates, this could get tricky in some places. And things are still going to be CPU-bound, so at a minimum more processing power is needed to make the routine execute faster.
OP, what’s your hardware upgrade budget looking like?
PowerPivot is usable for Excel 2010, so I’d highly recommend it if the OP does upgrade to Office 2010 Beta. But, as Caldazar says, that’s not going to surmount the main problem, which is the millions of RAND calls.
I think it would be easy to do what you are doing with SAS. Just yesterday I was creating random combinations of five variables, some discrete and some continuous, and calculating two results,with a total of about 25 or so intermediate variables, and 200,000 observations (in Excel that would be 25 columns and 200,000 rows). Each one would run a minute or two on my 2 year old laptop. I’ve run narrower datasets with a few million observations, and datasets with thousands of variables and a hundred or so observations, no problem. I can’t imagine trying to make Excel do that sort of thing (though I admit I don’t know much about Excel and can’t really use it, as it never seemed worth learning). When did they get rid of the 64,000 row limit?
I understand you have the investment in work till now and it would be extra effort to learn something new. Still, SAS would make this a cinch.
I only heard about R recently and haven’t used it, but think it too would do this just fine. It’s free and Wikipedia says something like “R has become the default tool for statisticians”.
Cite? 32bit Win XP will in theory address 2^32-1 Mb of RAM (4gb), but various other components (onboard video etc) will use up some, usually leaving you with about 3.5Gb. See this page from Microsoft.
Over the past day or so I downloaded Access 2010 Beta. It seems like that will do the trick. I can modify my VBA script, and use query’s to list all the columns I need downloaded and perform on-the-fly randomization. I’m sure SAS and some of the other tools will work, but this allows me to tweak my existing VBA scipt.
I may still run into a roadblock, but thanks for all the advice!
That sounds like a trivial program to write in Perl, and it would scale probably at least 2 more orders of magnitude on a small computer by today’s standards, without any memory issues at all, as there would be no need to keep it all in memory at the same time.