Excel: Which products did customers *not* buy?

Let’s pretend I sell treatments for various conditions. I have lots of data on what customers purchased in the past year. I sell multiple products to help treat each condition. And the more products you use, the better your results. But some customers will only purchase one or two of my products for a given disease. I’d like to quickly identify customers who did not purchase the full suite so I can market the rest of the relevant products to them. I don’t want to waste effort marketing products for conditions they don’t need to treat.

Here are some fake data on what people purchased:

Customer Product
Joe p1
Joe p2
Joe p3
Joe p6
Bob p4
Bob p5
Bob p7

And here are the products used to treat various conditions:

Condition Product
malingering p1
malingering p2
malingering p3
malingering p4
itchy teeth p5
itchy teeth p6
halitosis p7
halitosis p8

I’d like some sort of output that shows me that Joe skipped p4 and p5, and that Bob skipped p1, p2, p3, p6, and p8. I don’t care that Joe sought no product for halitosis, because those data would overwhelm everything else.

I feel like solutions might either involve an array FILTER if I can figure out how to generate the correct arrays, or a merge in power query that gives me useful nulls. But I’m stumped.

Any ideas?

Have you tried selecting the range and adding a pivot table?

Yeah, PivotTables are the answer to doing this quickly and effectively. They may seem arcane when you first try to use this functionality but they really simplify data manipulation and summary statistics versus trying to do all of this manually with filters and formulas.

Stranger

Planning long term. it may be worth importing into MySQL.

You’ll have more options to create reports. You can even use a report tool like Crystal Reports to run against the database.

Once setup, you can quickly create reports in Crystal. You start by selecting and linking the tables. You can group items, create subtotals by group. It’s data selection is very versatile. I used it nearly every day at work for several years.

It depends on your reporting needs.

That’s the drum!

Crystal Reports is obsolete. Power Query is built into Excel and can do all of this, or use Power BI Desktop which is also free.

Anything in particular you have in mind? I pivot daily but I’m not seeing anything obvious here. It’s easy to get this:

Row Labels
Joe
itchy teeth
p6
malingering
p1
p2
p3
Bob
halitosis
p7
itchy teeth
p5
malingering
p4
Grand Total

Which is what they bought. But I don’t know how to show something more along the lines of:

Row Labels
Joe
itchy teeth
p5
malingering
p4
Bob
halitosis
p8
itchy teeth
p6
malingering
p1
p2
p3
Grand Total

Which is what they didn’t buy for conditions they were treating.

You should be able to do a count by product and organize by customer (or vice versa, since you are interested in which customers bought what products) and then you can distinguish between the customers who just bought one or two products, and those who bought most or all.

I don’t have Excel in front of me for reference and I mostly do data analysis like this in Pandas these days (which has its own pivot table functionality) unless I’m building a spreadsheet for someone else to use but you should be able to generate a summary table of what person purchased which products and of how many using Excel’s PivotTable function. You may have to use Add Measure or create another column in your spreadsheet that is a joint conditional to link the product and condition being treated; there are multiple ways to skin that cat and unfortunately Excel doesn’t have anything like Python’s lambda function so it’ll be a little clunky but you should be able to work it out.

Stranger

I tested this out in Google Sheets rather than Excel, but I think they should be equivalent here.

Say you have two sheets already, CustomerMap and ProductMap (data starts at row 2).

Make a third sheet, CustomerToConditions, with A2 containing:
=UNIQUE(CustomerMap!A2:A)

And B2 containing:
=TRANSPOSE(UNIQUE(FILTER(ProductMap!A$2:A, MATCH(ProductMap!B$2:B, FILTER(CustomerMap!B$2:B, CustomerMap!A$2:A=A2), 0))))

Extend B2 down to all rows. That gets you all conditions for each customer stretched horizontally.

Make a fourth sheet, CustomerToProducts. A2 contains:
=UNIQUE(CustomerToConditions!A2:A)

And B2 contains:
=TRANSPOSE(FILTER(ProductMap!B$2:B, ISNUMBER(MATCH(ProductMap!A$2:A, CustomerToConditions!B2:2, 0))))

Again extend B2 down to all rows. This contains all products suitable for treating that customer’s set of conditions.

Finally, a sheet with CustomerToProductsFiltered. A2 contains:
=UNIQUE(CustomerToConditions!A2:A)

And B2 onward has:
=FILTER(CustomerToProducts!B2:2, ISNA(MATCH(CustomerToProducts!B2:2, FILTER(CustomerMap!B2:B, CustomerMap!A2:A=A2), 0)))

This correctly shows that Joe should be suggested p4 and p5, while Bob needs p1, p2, p3, p6, and p8.

I think that shows that using Excel (or Google Sheets) this way requires a lot of complicated articulations to do basic data analysis, and also makes my head ache. But if I understand what the o.p. is trying todo, I’m pretty sure it can be done just using a PivotTable without all of this business of adding a bunch of extra worksheets. I know I could do this with less than ten lines of Python code including calling libraries and importing data but I know how much people hate hearing how easy it is to do some data mangling task in Python when they’re just trying to make Excel work the way it seems like it should.

Stranger

I mean, I’d use SQL instead, but yeah, a baseline spreadsheet program is not that ideal. I don’t know enough about PivotTables to say if there’s a solution there.

ETA: The solution above could undoubtedly be simplified by combining the intermediate steps, but for debugging purposes it’s useful to see the results, and there’s not much harm in just having extra sheets.

But to use SQL you’d first have to put this data in a database. Which, if you are doing this repeatedly and want to keep track of how it evolves over time, is the right solution but it comes with a lot of overhead if you just want to do a quick one-off assessment.

Stranger

Am I correct in the observation that you only know if a patient has a condition if they have purchased the medication? And that while they may have bought different treatments for the same condition, any treatment is used only for one condition?

In that case, in a pivot I’d use patients as rows, medications as columns and either count or sum purchases as the value

I think the o.p. is starting with records of individual sales and trying to summarize them into the kind of table you suggest building, which is what Excel’s PivotTable functionality (and pivot tables in general) do automagically. Of course, anything that can be done with a PivotTable can also be done through a set of cell formulae, macros, or processing data from one sheet into another as @Dr.Strangelove did above, but the point of a PivotTable is to make it easy to do and foolproof in just selecting the data and doing a few configurations.

Stranger

While I cannot currently use python and the real data on the same system, I may use python for the mock data here.

Correct.

For this contrived example, yes. For the actual scenario, which has nothing to do with treatments or sales, no. But I figured I’d start simple.

So this is a version of “if you liked products 1, 2, or 3, you may also like products A, B, and C (that you haven’t yet bought”?

Again recognizing the contrived nature of your products & sales example.

Looking at it slightly differently, each “product” is a member of a set of related products. And one product may be a member of multiple such sets, but that does not make the sets themselves transitive.

I think that’s a pretty big jump in complexity, though. If I understand correctly, in the actual problem, if you’ve got a “customer” associated with one “product”, then that might be a part of any of multiple category of “products”, and you can’t know which category they’re actually in. And even if they buy two, and there’s only one category that includes both of those products, they might still not be in that category, but just two different categories.