I have a rather large spread sheet (50k lines) and Excel keeps chewing up large amounts of memory (and I get the spinny circle) when I try to use: “=COUNTIF($A$2:A2,A2)”. It works eventually but God forbid I then try to delete out rows and let it recalc. I’m hoping to just get everything arranged, use the countif formula, then “Copy → Paste Values” in that column - but if I need to change anything it won’t recalc (duh) - but I guess I can always redo the formula and copy->paste values again.
Is there another formula that might work? Situation:
Column A is Product IDs (could be multiple values of same PID). Column B is Version Number. I want to put this formula into Column C to get ranked values (1 through X) of the Version Number for each PID (spreadsheet is already sorted by PID then a date column to get the proper order)
I have the same situation w/ Excel when I need to do analysis on a bunch of data. I have concluded that it’s because Excel is simply not good at handling large datasets when using a standard PC. When dealing with large datasets, a programming environment that is able to efficiently perform calculations on matrices is a much better choice (e.g. MATLAB).
That is a stunningly inefficient way to approach your problem.
Assuming I correctly understand that all the rows with a common PID are grouped together, I suggest instead a column C formula like this assuming your data begins in A2. =IF(A1=A2, C1+1, 1)
Then simply copy that formula from C2 all the way to the bottom of your table. The formula amounts to: count upwards by 1 from the value in the C cell just above, but reset the count to start over at 1 whenever the PID has just changed versus the row above.
I just tested this on a 50K row worksheet on my Windows tablet and the recalc time is an almost indiscernable fraction of a second.
If you do add or delete rows in the middle, you’ll need to recopy this formula from C2 all the way to the bottom to continue to get valid results. Which is itself just a couple of keystrokes: select cell C2, then Ctrl-C then hold down [shift] and type [End][DownArrow] Crl-V and you’re done.
None of this ought to take any time unless your PC or your version of Excel dates from the 1990s. There’s certainly no need to configure your worksheet to recalc manually for acceptable performance.
I went ahead and did your way in column D in addition to my way in column C. On my tablet with current version Excel 365 a full recalc takes about 8 seconds. The real problem with your way is that down near the bottom of the sheet it’s checking the PIDs of e.g. 49,000 rows above to find the last e.g. half-dozen that match the current PID.
You’re doing roughly 50,000 * 50,000 / 2 = 1.25 billion comparisons when you only really need 50,000 doing it my way.
Yep, that did the trick. Thanks. In fairness, it wasn’t “my way”. I was completely stumped how to accomplish this and googled it awhile back and that was the only solution I could find. As you correctly surmised (and I realized after posting this and looking at it closer), it is the solution for if your PIDs are not grouped together and it has to search thru 50k rows.
Thanks for this, too. This could have been a decent workaround if LSLGuy hadn’t a provided a much more elegant solution. I completely spaced on the manual calculation option
It’s not really a job for a flat-file format like a spreadsheet.
After a lot of up close and personal experience with Access, I understand why people whose only database (and/or only database experience) is with Access don’t leap at the chance to do it in Access instead of Excel. But I would totally reach for FileMaker. Doing stuff in FileMaker is as easy as doing things (the normal flat-file kinds of everyday things) in Excel, unlike doing the equivalent in Access, which is like pulling your own teeth with very bad-tasting pliers.