Statistical/mathematical method to find useful tests

I’ve got a software testing system with hundreds of individual tests (and the number is regularly growing). In total the tests take many hours to run, and I’d like to come up with a subset of tests that takes much less time, but still gives me good coverage in case something breaks. The tests have two possible results: Pass and Fail.

I have historical results of many runs of the tests, so I’m looking for a way to analyze them to determine which tests would be the best ones to put in this subset.

In many cases, the tests overlap quite a while. For example, if I have three tests, A B and C, I might find that most of the time test A fails, test B also fails, but test C tends to fail independently. In that case, I could probably get away with just running test A, or test B, not both of them.

I’m thinking that there’s some kind of clustering or dimensional analysis approach to this that will give me a number (or possibly a set of N numbers for N dimensions) for each test that indicates how effective/predictive it is, and then I can select the subset of tests that’s most predictive and runs within my smaller time budget.

Hopefully, although I haven’t quite used the right terminology, I’ve explained the problem enough that someone can point me at a formula or wikipedia page that can get me started. Thanks :slight_smile:

It was almost three years ago that you asked an almost identical question. You said you liked my suggestion. I guess it didn’t work out though. Did you try to understand, using manual analysis, where the algorithm was going wrong?

Ha! Sorry, no, just had the project shelved didn’t remember asking about it. I’ll go read that thread again and report back.

The OP realized he had posted this before and asked that it be closed.

samclem, moderator