Date puzzle

Yesterday, a computer program I wrote more than ten years ago ran into a snag. I won’t go into detail, but what tripped it up is the fact that two consecutive dates, when written as YYYYMMDD, accidentally formed a third date.

January 20, 2012, was written as 20120120.
January 21 became 20120121.
Joined together, they formed 2012012020120121, which, as you see, contained the string for February 1.

That messed up my program.

I could fix the program in two minutes, but I’d have to spend two days getting approval and filling out forms in order to change a program that’s been in production for years. And it’s not worth it if this is a once-in-a-lifetime occurrence.

So I need to figure out when, if ever, the juxtaposition of two dates in YYYYMMDD format will accidentally form a future date. The two dates don’t have to be consecutive, but they do have to be in order, and the third date has to follow both of them; and all three dates should fall within a twelve-month window.

If it’s not going to happen again for another twenty years, I’ll rest easy. I doubt the program will still be in use then, and even if it is, I’ll be retired.

I’m looking for either the next pair of dates that fit the criteria, or an algorithm for finding them. Any suggestions, other than brute force (i.e., actually checking every possible pair of dates between now and 2032)?

Nvm…

I don’t know the answer to your question but I am intrigued that a program could be parsing two dates written as 16 consecutive digits and then, arbitrarily in the middle of that string, suddenly find a valid date. Every program I’ve ever seen that reads data like this in ASCII strings depends either on delimiters or character positions. How could this program not care that it had to read past 3 valid digits, stop parsing that date, and then start parsing a new date starting in position 4?

I suspect that this behavior may also yield other types of bugs in other situations. My advice is to buckle under and fix the code, but I know that’s not your question.

Say rather, that such a strange parsing algorithm might occur (or something similar) in unknown many other places throughout the code. So fixing this one instance might fix just this one instance, but vaguely similar problems could still arise separately throughout the program.

Here’s my story: Did 1st and 2nd tier customer support for company with a software app. Very few prorammers I’ve ever met, including the ones here, understood that decimal fractions typically can’t be represented precisely in a floating-point variable, no matter what precision you use. Throughout the app there were cases of comparing floating-point numbers and getting the wrong comparison. Their solution was to convert the float to a character string, to the relevant number of decimal places, which the convert-to-string library function rounds, and then compare character strings.
But this was happening in many many places throughout the code, and they only fixed one instance at a time, as the bug reports trickled in over the years. And I wonder if OP will have a situation something like this.

Here’s a programming exercise every programmer should try at least once in his/her career:

Code and run this program in as many languages as you can, correcting the syntax as necessary for each language (C/C++, Java, JavaScript, PHP, T-SQL, bash, perl, Fortran, Algol, Cobol, Pascal, Visual Basic, even bare-bones assembly languages). Change the print statements to display the output any way that’s convenient. Compare the results. Do all compilers and interpreters produce the same result? Can you explain what is happening?

float tenth, unity, addEmUp ;
tenth = 0.1 ;
unity = 1.0 ;
addEmUp = tenth + tenth + tenth + tenth + tenth + tenth + tenth + tenth + tenth + tenth ;
if ( addEmUp == unity ) {
print “There is sanity in the world.” ;
}
else {
print “The gods must be crazy.” ;
}

I think you’ll that with dates of Jan. 21, 2012 and any date from Jan. 22, 2012 and Nov. 30, 2012, will produce the following string:

201201212012mmdd (where mmdd is the string for a date from Jan. 22 through Nov. 30).

which, as I bolded, includes Dec. 1, 2012 in the string.

I taught design and programming for a couple of years inside my company, and I was surprised at how many people who make their living as programmers did not understand this. I had to show them how 1/3 can’t be represented exactly as a decimal before it dawned on them what I was trying to say. (This was Ada programming in 1990 and was part of my lecture on safe numbers and model numbers.)

Same here. I am also wondering how it gets two dates put together like that, unless somebody entered them like that (even then, the positioning thing doesn’t make much sense; if it did that on a valid date, it would only read the last 5 digits and get nonsense). If so, the error can be avoided by making sure you can’t enter more than 8 digits or only reading the first 8. On second thought, maybe a 16 digit string is used for some other type of entry that has a date and some other characters (3 characters, 8 digit date, 5 characters).

Bad planning, I admit. The programs that read the file know that each date string is 8 bytes long. But at one time, there was a problem with duplicate dates in the record. Instead of checking to see if the last date entered matched the next date to be concatenated, I got lazy and just used a function to see whether the date already exists anywhere in the string. As a result, it didn’t add 20120201 to the string because it thought it was already there. Didn’t cause any problems until now.

Thanks, KG. That’s enough to convince me that it’s worth the time to fix the program.

Just to add a little detail for those who were curious: My program reads a SQL database, creates a string of dates that meet a certain criteria, and inserts it into a fixed-length record. The file is later read by a COBOL program and the dates are parsed from the record. The problem was caused because my program, which builds the string, thought there was a duplicate date, so it didn’t insert it. The solution, which as I said would take two minutes to write, is to only check the last 8 bytes in the string for a duplicate date instead of scanning the entire string. But making any knid of change like that to a program that’s been working in production for over ten years is a paperwork headache, unless there’s evidence that it’s necessary.

Seems you got sort of lucky, as there were several years this could have happened in before now, though 2012 has a few extra hits.

For this decade, double dates from the same year will look like:
201xmnde201xMNDE

A date in the middle gets created if these formats match :
xmnd/e2/xM - possible if x<=3, e<=1; mn low gives near future dates
mnde/20/1x - no valid dates
nde2/01/xM - possible if x<=3; n=2, de low gives near future dates
de20/1x/MN - possible for x<=2, (M is constrained); de = 20 gives closest future dates
e201/xM/ND - possible if x<=1 and N <= 3 only creates long past/far future dates

2020, of course, will have a whole lot of past dates in it, but after next year, most near future dates would not have been produced.

I’ve seen something like the OP’s scenario in a piece of code designed to search ‘all fields’ of a database - the query behind it just concatenated all fields in a row, then bookended the entered search value with wildcard characters and presented it as a criterion.

Thus, if the table had seven text columns, and one row happens to contain:

The, pen, is, mightier, than, the, sword

The search query would concatenate this into:
Thepenismightierthanthesword

And would then return the row as a search hit for ‘penis’ - despite it not actually being there in the data.

One really simple solution is just to add separator characters into the concatenation.

When I’m doing a floating-point comparison, I always make sure to either check for “greater than or equal” or “less than or equal”, depending on context, or if I absolutely have to check for equality, I do " if (abs(x-y) < epsilon) ", where epsilon is a small number defined previously.

Chronos, you’re right about using abs() that way for a “near-equal” test. The programmers I worked with didn’t have that concept (it helps if you know some calculus, just to have that idea), so they did that silly thing about converting the numbers to characters strings (which automatically got rounded to some specified number of digits), and then they compared the strings. Blech. (In my code, where floats have 14 sig figs, and which dealt mostly with money amounts, I usually used 0.0000001 for epsilon.)

You might want to think again about how you use < and <= and > and >= to compare floats. For example, suppose that a ( a >= b ) test should “logically” succeed (that is, the numbers are “nearly” equal and logically should be exactly equal), but the test could take the “wrong” branch if the round-off error puts one number just slightly on the wrong side of the other. Think about if your program should really take the “equal” branch when that happens.

If so, the correct way to test for < and <= and > and >= is to test first for “nearly-equal” and if so, take the equal branch. Else, test for < or > next and take the appropriate branch for that. If you think about this for a while, you can find a way to do all this with just one test, and without using the abs() function!

ETA: All things considered, I think I’d really rather go back to training dolphins in Hawaii.

We meet again, Trebek!

Anyway, if anyone here wants to see bad programming on a daily basis, The Daily WTF is the place for you.

In most of the code I write, if you’re so close to the threshold that you can’t tell which branch you should take, it ends up not mattering which branch to take. And really, checking for an “almost-equal” condition first doesn’t actually change anything: You still end up with a razor-thin threshold where you behave differently above or below it; it just moves that threshold by an amount epsilon.