Eliminating blanks and condensing data in Excel

I have a set of data that’s a few columns and maybe 600 rows. Most of the rows are blank (at least in the section I’m interested in), and some of them contain duplicate information. Most of the columns contain a code or description of a product, and one of the columns is numerical, as it is the count of a certain product. I would like to be able to select all of the data, then create a condensed version that A) eliminates the blanks, B) adds up the count for products that are the same, and C) automatically updates. So for example if in the original it looked this:

A B C D 4

F G H I 7

A B C D 6

I would like to be able to automatically transform it into this:

A B C D 10
F G H I 7

Does anyone know of a (relatively) easy way to do this? I know the basics of excel, but I generally don’t use queries or more advanced functions, so I’m not sure if this is something I can do myself without wasting a bunch of time.

Use a Pivot table

Agreed that a PivotTable is exactly what the o.p. needs, although you do have to manually Refresh it when the source data changes. PivotTables are often described as some kind of special magic that only Sage Priesthood of the Cult of Excel can invoke but they are actually quite easy once you’ve spent a few minutes playing with them, and this is essentially the most trivial example of their usage. Just select the data, create a PivotTable in a new tab, and play around with the sorting keys.

However, it sounds as if Excel is being used as an ad hoc database, and if he or she wants to use it repeatedly it is probably advisable to look into using Access or some other database format. I have a labor hours tracker I set up in Excel for doing this kind of thing, although it is more complicated because it has to query by pay period, contract, employee, and charge number, and while it would have been almost trivial to do in a MySQL database using Python or Perl and an appropriate microwebserver framework, I ended up doing it in Excel so that the other people using it could directly cut and paste tables into their documents instead of having to seal with ‘raw’ CSV files. (The alternative was to build an OpenPyXL routine to dump out Excel formatted data but at that point it’s just spite-work at people forcing me to misuse Excel.) Anyway, it works well enough and the most fussy part are the tables for validation fields, which is not something the o.p. has to worry about.

Stranger

Based on the description, the primary problem the OP will have with using a pivot table will be the blank rows (and possibly blank columns).

Is it always
A B C D #
or are there some
A Q C D 7
or
A B _ L 2
or
A G H D 4
?

Basically, does A always go with B & C & D?
And if not:

  • is there ever going to be a blank in any of the columns?
  • if there are differences, do you want it only to add when all of the columns match, or when certain columns match

Also, you want the counts to update when the data updates - but how does the data get updated? is someone manually typing things in? is it getting updated automatically by a different program?

Finally, what do you mean by “the section you’re interested in”? how do you identify that from the section you’re not interested in?

He should be able to just filer out blanks. In fact, I just tried it with my labor spreadsheet and using “Recommended PivotTables” and it automagically filtered out blank lines without my even having to go in and uncheck ‘blanks’. This is probably a one-step task for the o.p if all he needs to to is filter out blank lines and summarize records with the same ID. Of course, he could also set up filters and sort/search/sum macros but that is all of the nonsesnse that PivotTables were made to eliminate. In fact, if I didn’t use Python and Pandas so much for data wrangling almost everything, I’d probably make more extensive use of PivotTables; they seem really slick, and I don’t say that about anything else in Excel. Apparently it can also reference an external data source, and as much as I hate cross-linking files, this is one application where it might make sense if this is data coming from an external file.

Stranger

True, but was more thinking of the OP either clicking on cell A1, launching a pivot table and finding that it’s default range is truncated at the first blank row and column.

The point being they don’t have the contiguous block of data (ie a database format) on which a pivot table depends, though it should not be difficult to do the workaround.

I don’t seem to have that problem, although I will note that a PivotTable requires that the first row have column titles to reference so if they aren’t in the data already the o.p. will have to add them.

Stranger

Thank you all very much - My googling was turning up some fairly complicated formula/macro based fixes, and the PivotTable seems much more my speed. My only (minor) issue would be that ideally I would like to display AB 4, rather than having it broken down by A/B individually (ie have the data broken down by column A, but have column B displayed because it’s a description of what A is). That’s a very minor quibble though, so if there’s not an easy fix anyone suggests I’ll leave it alone (and I’ll look up some PivotTable tutorials - I think they could be very useful for some other things as well).

I’ll agree with those saying that Excel is not an ideal solution for this sort of thing, but it’s a document that has a number of different requirements and elements, some of which are suitable for excel and some of which aren’t. And besides, it’s a collaborative document and I doubt I’ll be able to convince others to find a more suitable software.

Basically, does A always go with B & C & D?
And if not:

  • is there ever going to be a blank in any of the columns?
  • if there are differences, do you want it only to add when all of the columns match, or when certain columns match

A will always go with B, but C and D may change. If A is blank, the whole row will be blank. If A is filled out, the rest will be filled out as well. I would prefer to have it so that only A has to match.

Also, you want the counts to update when the data updates - but how does the data get updated? is someone manually typing things in? is it getting updated automatically by a different program?

Someone will be manually making a change in excel. I know that they could update it manually, but there are a number of users, some of whom are not very good with excel (even compared to me, who had never heard of a PivotTable), so I would prefer to keep it as simple as possible. There seems to be a fairly easy copy/paste macro based fix to this issue in PivotTables, so I may attempt that.

Finally, what do you mean by “the section you’re interested in”? how do you identify that from the section you’re not interested in?

The document pulls together data from a number of different departments; I’m interested in the data from a specific department. That data is in a specific place, color-coded, and identified by a header.

Add a helper column to your data table.
Use a formula like =A2&“-”&B2 to populate
Use that column in your Pivot table rather than the A & B source columns

Will the data from your department be interspersed with the data from other departments, but just color coded differently?
Would I ever see
Yellow: A B C D 5
Red : A B C D 7

on the same worksheet where the yellow matters but the red doesn’t? Or will the red (meaningless to you stuff) always be on a separate worksheet?

Also, do you know all of the first & second columns in already? Or might you wake up one morning and find an:
Φ Ω K E 7
in there?

To eliminate completely blank rows:

Insert a column (let’s say A)
autonumber this for every row.
(Put 1,2,in first two rows col A- highlight them and drag to fill the rest with sequence numbers.
Or use a formula - pout 1 in A1, then A2 is "=A1+1} and copy that all the way down to the end.

Then highlight column A, copy, paste into A with “Paste values” to convert to just number, not formula
All the column A are actual hard-coded numbers now, not relative formulas )

Now sort on all other columns - B, then C, then D, then E etc. descending.
With descending, all blank values are at the bottom if I remember my sorts.
All rows with B blank will be at the bottom.
For B blank, all rows with B and C blank will be at the bottom.
B,C,and D blank lower still.
Any row all blank (except A) will be at the bottom.
Delete these unwanted blank rows…
Now sort by A to get rows back to the original sequence.
For tidiness, delete column A.

1 a x s d f
3 b q x o
7 x o x o
12 o x
2 z
4 <===delete from here down
5
13
8
9
10
11

Well yes, that works, but seems to me rather unwieldy compared to:

  1. Turn on filters
  2. Select on columns A, B, C & D for blank
  3. Delete the blank rows.

Need to bear in mind the comment in the OP “(at least in the section I’m interested in)”
There may not be a contingent block of data, or that deleting some rows may not be a viable option. With filtering in the worksheet or the pivot it’s not necessary to delete.

I learned Excel before they started doing PivotTables, or maybe I simply wasn’t aware of them at the time. But everything I’ve heard since is exactly as you describe. They seem to be oh so cool, but I just can’t figure them out. Can anyone direct me to a good online tutorial? I don’t mean help screens and similar reference works, but an actual tutorial which will teach me to “type this, enter that, and then click here.” Thank you!

I don’t have an offhand reference because I just learned to use them by applying one to some data and playing around with it but you can find plenty of tutorials on line and videos on YouTube. Here is a step-by-step written tutorial that covers the basics. The entire point of a PivotTable is to simplify the sorting, filtering, and presentation of data, and although the term PivotTable is a Microsoft trademark pivot tables or pivot data tools have been around long before it was implemented in Excel; indeed, I remember something similar in Quattro Pro although I never bothered with using it.

Stranger

Are you trying to clean up a trove of data you’ve come in possession of?
Or make sense of an ongoing sheet (regularly updated) that must remain the way it is?
Or is it something generated regularly from a non-excel source?

I was in the same position and kind of avoided them until my company hired an idiot who really loved pivot tables and wouldn’t listen to new information unless it was accompanied by a pivot table and also constructed his badly so I had to go in and fix them. It’s an expensive way to learn excel, as it costs you portions of your sanity and your company a six figure salary + benefits, and some lost contracts and a damaged reputation (he really was an idiot and there’s only so much a fixed pivot table can do), but it was effective. The actual MS tutorial isn’t bad.

It turns out they are cool and can be really cool - but they have limits. The color coding the OP mentions may be one of the limits.

The o.p. doesn’t mention color coding, but overuse of PivotTables is a problem; some people think that they are the solution for every problem, when what they are really intended to be is a tool specifically for summarizing compound data into an easily manipulated table. There is nothing you can do with a PivotTable that cannot be done with filters and macros, and there are a lot of things that cannot be done with PivotTable functionality, like performing statistical analysis (although I would argue you shouldn’t be doing anything more than calculating a mean in Excel, anyway). But for what the o.p. is trying to do–take a collection of data and summarize it to count up quantities relating to specific indices–it is purpose-designed and would take just a few steps (applying the table to a data set, selecting the index and sort fields, perhaps checking or unchecking blanks or unwanted keys, and maybe having to apply headers to the data), and it is readily updated without breaking functionality which is always a problem when using cell formulas, conditional formatting, and manually applying filters and sorts to data.

Stranger

For this, and because the OP doesn’t control the data, neither he nor his teammates have very much excel experience, the number of records, ease of handling updates, I’d probably end up recommending SUMIFS rather than a pivot table. It’s dirtier, but faster and simpler, and still has the color coding problem, but I think it would be easier for the OP to set up and to copy-and-paste out of (or end up using data out of)

But the OP hasn’t returned for a while - I wonder what they decided.

My approach to eliminating blank lines is to copy and paste the data into Word as plain text, and then to do a search for ^p^p (two paragraph marks in sequence) and replacing that with ^p (a single paragraph mark).