Last year, I built an Excel spreadsheet for the purpose of picking football games against the point spread according to a certain formula. The spreadsheet is based on the advanced statistics from Football Outsiders. (Incidentally, FO is the best football site on the internet by a huge margin – in the near future, all serious football fans will need at least a passing familiarity with their work).
For each game, the spreadsheet uses the respective DVOA numbers of the two teams, along with several other variables, to determine what the point spread “should” be. It then compares that value to the actual point spread to generate a pick for the game.
I am currently in the process of testing and refining the formula to improve accuracy. Since the spreadsheet has now logged picks for over 800 games (going back to 2003), manually tabulating results is no longer an option – if I’m going test changes to the formula, the spreadsheet must do this automatically.
For most types of results (overall success rate, correlation coefficients, etc.) this is not a problem. However, I can’t figure out a way to get the spreadsheet to calculate automatically the single most important measure of its success: its performance specifically in the games for which it returns a strong opinion.
I’ll now give a quick explanation of how the relevant parts of the spreadsheet are set up, and, hopefully, someone here can tell me how to solve my problem. The first person (or persons) to come up with a workable solution will get my sincere gratitude, as well as a copy of the spreadsheet for personal use only – the system is indeed profitable, and I am in the (very) early stages of working out a partnership with FO, so whoever gets the spreadsheet would have to promise not to share it (or its picks) with anyone else.
Ok, on to the details:
The spreadsheet calculates its own point spread, then simply subtracts from it the actual point spread. If the resulting Difference is positive, then the system is picking the favorite – if negative, the underdog. The larger the Difference, the stronger the system feels about the game. Once the game is over, the spreadsheet compares its pick to the score and generates a TRUE/FALSE statement indicating whether or not it picked correctly.
When the Difference between the actual point spread and the system’s point spread is close to zero, the result is (for my purposes) irrelevant. I’m only interested in games for which the Difference is above a certain threshold – say, for instance, three points. What I need is for the spreadsheet to automatically separate (or duplicate) the results for games where the difference is greater than 3 or less than -3, and then generate TRUE/FALSE statements for only those games in a separate column. That way, when I change certain variables I can see immediately how the changes affect the system’s meaningful picks.
I apologize for the length of this post. Thanks in advance to all those who give my problem some thought.
– Jeremy