What is the best method of determining predictors of an outcome in this big data question?

First off, this is not a homework question. I’m a business owner. I have data on 10,000 unique customers spanning 3 years. Every time someone comes to my store, they have the option of buying any combination of 15 items. I have data on whether or not they purchased each one of the 15 items or not for each time they visited, and how much they spent on each item.

I want to find some “predictor” that foretells what’s going to happen to sales of Item Y based on shifts in Items A, B, C… and/or N. What is the best method of determining a predictor or gaining some insight into behavior? Ideally, I’d be looking for something along the lines of “when a customer starts spending less on (or stops buying) Item B’s, it is a sign that they will also start spending less on (or stop buying) Item Y’s in the future.”

My gut says that a Neural Network analysis is what I want, but I want to see what others think is the best way to make sense of this data.

Neural networks would definitely be one way to go. They have the advantage that they can model a lot of complicated relationships, and don’t require a whole lot of insight into the problem before you sent them up but they have the disadvantage that they are pretty much black boxes as far as determining the specifics of the modeling. The network will tell you that a given patron is likely to stop buying B, but it won’t tell you why. If you want something a little less opaque, you may want to look at a logistic or other [generalized regression](Generalized linear model). This will make it much easier to determine the relationship between your variables and a customer response.

In either case, depending on how many variables you are putting into the model you may need to be careful of over-fitting. If there are just a few variables then you should be fine, but if you start having on the order of dozens than your model space may be so large that even 10,000 observations might be too few. To make sure your model is performing well, do some cross-validation training the model on a subset (say 90%) of the data and seeing how it performs on the remaining samples.

I think what you’re looking for is Affinity Analysis or Market Basket Analysis; neural networks are one way of doing the actual computations that you need to do. I’m not aware of any free tools that will allow you to do this, and the actual calculations can be complex and very sensitive to various parameters.

Writing your own neural network based MBA model could be a challenging but fascinating way to do this, or you might want to purchase a commercial product (perhaps with a free trial) to see if that suits your needs better.

Thanks to both of you. This is very helpful.