Need help with Excel. Reward involved.

Last year, I built an Excel spreadsheet for the purpose of picking football games against the point spread according to a certain formula. The spreadsheet is based on the advanced statistics from Football Outsiders. (Incidentally, FO is the best football site on the internet by a huge margin – in the near future, all serious football fans will need at least a passing familiarity with their work).

For each game, the spreadsheet uses the respective DVOA numbers of the two teams, along with several other variables, to determine what the point spread “should” be. It then compares that value to the actual point spread to generate a pick for the game.

I am currently in the process of testing and refining the formula to improve accuracy. Since the spreadsheet has now logged picks for over 800 games (going back to 2003), manually tabulating results is no longer an option – if I’m going test changes to the formula, the spreadsheet must do this automatically.

For most types of results (overall success rate, correlation coefficients, etc.) this is not a problem. However, I can’t figure out a way to get the spreadsheet to calculate automatically the single most important measure of its success: its performance specifically in the games for which it returns a strong opinion.

I’ll now give a quick explanation of how the relevant parts of the spreadsheet are set up, and, hopefully, someone here can tell me how to solve my problem. The first person (or persons) to come up with a workable solution will get my sincere gratitude, as well as a copy of the spreadsheet for personal use only – the system is indeed profitable, and I am in the (very) early stages of working out a partnership with FO, so whoever gets the spreadsheet would have to promise not to share it (or its picks) with anyone else.
Ok, on to the details:

The spreadsheet calculates its own point spread, then simply subtracts from it the actual point spread. If the resulting Difference is positive, then the system is picking the favorite – if negative, the underdog. The larger the Difference, the stronger the system feels about the game. Once the game is over, the spreadsheet compares its pick to the score and generates a TRUE/FALSE statement indicating whether or not it picked correctly.

When the Difference between the actual point spread and the system’s point spread is close to zero, the result is (for my purposes) irrelevant. I’m only interested in games for which the Difference is above a certain threshold – say, for instance, three points. What I need is for the spreadsheet to automatically separate (or duplicate) the results for games where the difference is greater than 3 or less than -3, and then generate TRUE/FALSE statements for only those games in a separate column. That way, when I change certain variables I can see immediately how the changes affect the system’s meaningful picks.
I apologize for the length of this post. Thanks in advance to all those who give my problem some thought.
– Jeremy

Try this.

suppose your results are in column E and you want the TRUE/FALSE to be in column F.
Put the following formula in E1 and copy down the row.

=IF((E1>3),TRUE,IF((E1<-3),FALSE,""))

This guy seems to be quite the Excel black belt - our very own intention

Peter Morris is on the right track, but I think he’s misinterpreted the question.

I assume what you have is a column (let’s say column E) of “Difference” and another column (let’s say column F) of TRUE/FALSE statements indicating a correct or incorrect pick. I think what you want is:

=IF(ABS(E2)>=3,F2,"")

That would copy over the TRUE/FALSE statements from column F only if the difference in column E is greater than or equal to three, and leave the cell blank otherwise.

(Assuming your TRUE/FALSE statements are logical TRUE/FALSE statements, you can then take an average of the final column, too, to optimize your formula.)

zut, thanks much, that’s helpful – I didn’t know how to refer to a cell’s absolute value. In retrospect, I should have tried googling that first. (Yep, googling “excel absolute value” yields the answer in the first link.)

Unfortunately, I’m not done: when taking the average of the new column’s TRUE/FALSE statements, Excel is counting the empty cells as if they were FALSE. Any way to get around this? How do you get an IF function to leave a cell truly blank if the test is FALSE?

That was weird. I swear I posted this an hour ago, but it didn’t show up. Luckily it’s still in my cache:

Sorry about replying to myself, but apparently this isn’t true. Bizzarely, if you use the SUM or AVERAGE command on a column of logical TRUE/ FALSE statements, Excel interprets all the TRUE/ FALSE statements as zeroes, giving a sum of zero and an undefined average. If, instead, you add them one at a time (=F2+F3+F4+F5…), Excel treats the logical statements as one for TRUE and zero for FALSE and reports a numerical sum correctly. I find that strange.

Anyway, if I understand your intention correctly, I would suggest changing your logical TRUE/FALSE statements at some point into ones and zeroes, so you can easily calculate how your expected winning percentage on high-difference picks changes as you tweak your formula. For example, this:

=IF(ABS(E2)>=3,IF(F2=TRUE,1,0),"")

should give a blank for low-difference picks, and give 1’s and 0’s on place of the logical TRUE and FALSE statements. Then you can take an average of the column.

If that helps at all.

If you’re taking the average, then you’re the one counting wrong (don’t worry, I’ll explain). Excel will add up all of the TRUEs and count each one as 1. So if your “success when more than 3 points are at stake” is column G, you’d want to calculate

=(SUM(G1:G9000)/ the_number_of_filled_cells)

You’re probably dividing by the total number of games, which is an irrelevant number. Try calculating

=(SUM(G1:G9000)/COUNT(G1:G9000))

and you’ll get your percentage score. A more sophisticated spreadsheet would calculate what you would have won or lost by wagering a dollar on each of those games, and divide your net gains by the number of games wagered on. If you calculate this for each system under consideration, you’ll discover what percentage you can make on a given amount of money. If you find something that beats 10% in the long run, let me know.

Edit: you say SUM() doesn’t work on t/f? I agree - convert to 1’s and 0’s and then use traditional math.

Excel shouldn’t be counting empty cells as if they were FALSE. I guess the answer depends on just what you’re doing. Note my post above that the AVERAGE function seems to not work on logical values.

However, if you have a mix of 1’s, 0’s, and blank cells, Excel’s AVERAGE function should work correctly, ignoring the blank spaces. If I understand your intention correctly, that’s what I’d do: convert the TRUEs and FALSEs back to 1’s and 0’s and go from there.

Right, but Excel also has an “AVERAGEA” command, which gives the mean of a series of T/F statements. That’s what I’ve been using to tally overall success, but it doesn’t work for a column created in the way you described in Post #4.

Yep, that works perfectly. Thanks very much. I’ll send along the spreadsheet if you’re interested, probably in a day or two.

[QUOTE=VarlosZ when taking the average of the new column’s TRUE/FALSE statements, Excel is counting the empty cells as if they were FALSE. Any way to get around this? How do you get an IF function to leave a cell truly blank if the test is FALSE?[/QUOTE]

maybe something like this:
=IF(ISBLANK(B1),"",IF(ABS(B1)>=3,TRUE(),FALSE()))

I did not notice how old this post was - I’m really, very sorry.