September 12, 2011
I have a question that will help inform a Web page about crossword puzzles, which in case you’re interested starts at Crossword Monograph (and also the answer will satisfy my curiosity).
This is not an emergency. For example, if Ben Bernanke also has a question for you, do him first.
My grade of C in Business Stats at the University of Kansas back in 1975 hasn’t helped. I’ve tried to noodle out this problem several times over the last few years, and I’ve asked people who thought they knew, but I still don’t have a defensible answer so now I’m turning to experts for help. If you don’t have an answer either, can you recommend where I should turn next?
========
Consider a list of 100 data points (they happen to be my finishing times for playing crossword puzzles) sorted from earliest to latest. Assume the trend is obviously if not steeply upward, i.e., on average I’m getting a little slower over time.
Now consider that a contiguous block of 20 of the data points is in fact missing. For example, I have the values for 1 - 35 and I have the values for 56 through 100, but I do not have the values for 36 through 55.
What is the most mathematically sensible way to fill in those 20 missing data?
========
It occurs to me that there are two possibilities. In the simpler one, a single value is calculated and used for all 20 of the missing data points. If that’s the better possibility, how do I calculate that value?
The other possibility, unlike the one above, would somehow account for the obvious upward trend by filling in the missing 20 values with a series of different values based on the known values before and after the gap. If this is the better possibility, how do I calculate those missing values?
If it’s neither of these, what is it? To repeat, what is the most defensible way of somehow filling in those 20 missing data, and what is the rationale?
(The real-world example of this problem, in case you’re interested, can be seen in Excel 2000 format at http://barelybad.com/xwd_times_194.xls. The gap starts at Row 173 and continues to Row 204. You’ll see it’s not one list of times but seven, one column for each day of the week. You’ll also find a second gap starting at Row 360, but I assume whatever answer you give for the first gap will apply to the second one.)
Thanks for any ideas you can offer.
–Johnny
barelybad.com (Laugh Think)