Traditional rounding 'bias'. I don't get it.

Y’all are looking at it far too granularly. Let’s expand this to something else you’re used to - dollars and cents. Now, instead of 1, 1.1, 1.2, etc., you get 1000 numbers:

Sum all of those up, and you get 999.495; if you round them off, you get 999.50.

Now, let’s go another decimal place - 1.0001 etc.
Now, let’s go another decimal place - 1.00001 etc.
Now, let’s go another decimal place - 1.000001 etc.

This goes on forever. If you are looking at real numbers (theoretically), the chance of hitting EXACTLY 1.0 or EXACTLY 1.5 is infinitessimally small, so there is no “bias” in rounding .5 up to the next highest integer. It looks bad when you’re looking at 1 decimal point, but the universe of real numbers does not end at 1 decimal point.

OK, let’s look at reality. When you’re doing calculations on numbers, you quite often will get “normal” differences like 1.0 or 1.5. If you’re rounding to a whole number at that point, YOU HAD BETTER BE DEALING WITH MUCH LARGER NUMBERS - anyone dealing with actual dollars on a calculation isn’t going to be concerned about a difference between $1,000,001 and $1,000,002 because of rounding pennies. If you’re at the cash register at the local Kroger, you’re going to notice the difference between $1.00 and $2.00.

The problem of “rounding bias” comes when you are rounding to too few decimal places. I don’t know how often I’ve seen spreadsheets (or programs!) from people where they round intermediate results to the penny. Let’s say you are calculating future manufacturing costs. Right now, you can create 100 widgets at a cost of $149.50, and you’re thinking of franchising out to 10,000 people. Your cost per widget is 1.495. If you do something extremely silly and round that off to the nearest dollar, you’re going to estimate that your new manufacturing costs will be $1 * 100 * 10,000 = 1,000,000, when your real costs should be $1,495,000.

I worked for a company that developed cash register software. You’d hope a cash register would always get it right.

The programmers there had no understanding whatsoever of the kinds of rounding problems that arise when using floating point arithmetic in a computer. Our customers were incessantly coming up with cases where a total on someone’s receipt, or the sales tax computation, or the interest on a house charge account, or something, was off by a penny.

Why would a cash register be using floating point for anything? Just track everything as an integer number of cents.

Okay, you had me convinced up to the last sentence. I know, for example, that 3.1415 is truncated and that there is something more after the 5, but I don’t know what that is. Because the next digit in pi is a 9, it works out nicely to round it to 3.1416. But if the idea is that we don’t know what is next, and assume for the purposes of argument that the next digit of pi was a 1, then we made a mistake by rounding up to 3.1416.

It seems like we are back to our original question. I know that I have 3.1415 and that there is something added on to that but when I round, is that extra something closer to 3.1415 or 3.1416? And even with that something extra 3.14151 is better expressed as 3.1415 than 3.1416. It just so happens that pi made your example work when in general, as you said, we don’t know the value of the next number so that shouldn’t count as a positive to always round a 5 up.

In Chronos’ scenario, he wants to round 3.1415 (assumed to itself be an approximation to a higher-precision number) to just three digits to the right of the decimal point, i.e. to either 3.141 or 3.142.

If the 3.1415 had been prepared by rounding, then it’s an even-money guess whether 3.141 or 3.142 is the better approximation to the original number. But if it was prepared by truncation, then 3.142 is the better approximation.

Even if there’s just a 20% chance that 3.1415 was prepared by truncation, 3.142 is likelier (60% vs 40%) to be the better approximation.

In this case Chronos’ method and the Even Digit Convention yield the same result. Substitute 3.1405 for 3.1415 to see a difference.

(FWIW, I personally do not recall anyone who approximated pi as 3.1415.)

Ah. Understood now. So if I have 3.1415 and want to round the 5, and I’m pretty sure there is something after the 5, then that is closer to 3.142 so that’s what I should go with.

IOW, with the 5 as the last digit, we know that it is at LEAST 5 (of whatever unit) and most likely was rounded down to 5, so we round up to give the most accurate result. Understood.

So then shouldn’t the rule then expressly not apply when we know that we are using discrete numbers? For example, if the United States recalls all pennies and nickles from circulation and we interpret an old contract that calls for a payment of $122.75 then the same rationale would not apply, correct?

You would certainly interpret $122.75 as a whole number of cents (that’s what it says in the contract, right? Not $122.75123) , and round it off to $122.8 (making it even) if you want to knock off one significant digit. But, naturally, if it were a monthly payment I would certainly object to paying the extra 5 cents each and every time and demand the payments be re-calculated…

Some countries do exactly that when paying cash is concerned. E.g. in Finland any amount is round to the nearest 5 Euro-cent when paying cash.

In my working practice, the main concern when rounding is, that averages slightly go up which is why we use bankers rounding.

Thailand has 25- and 50-satang coins (quarter- and half-baht) which are seen only rarely. Rounding is almost always done against the customer!

For example, if your 7-Eleven bill totals ฿40.25 you have to pay ฿41 and may receive no change if the till has no small coins! (One time they gave me the baht back and the receipt had a printed notation that I was receiving a special 25-satang discount.)

If this cash register was in a food store, and the customer bought some produce by weight, that could easily lead to fractions of a cent.

In my programming days, we once had to deal with an invoice which appeared on two different reports, with a five-cent difference. How could the very same invoice appear with two different amounts? At first glance we couldn’t find any errors in either report. This was the kind of puzzle that makes me love programming. It turns out that this invoice had about 20 line items on it, and the correct report was calculating the grand total, and calculating the sales tax on that. The other report was calculating the sales tax on each line, and rounding them individually. The rounding errors tend to cancel each other out, but in this particular case it accumulated to a very-noticeable five cents.

A weight of produce could (and probably would) result in a non-integer number of cents, but when that happens, you round to an integer number of cents before you do any further calculations with those cents.

I designed tax calculations for interest payments, and had similar problems. The official advice from the tax office was that they weren’t going to sweat the small stuff :slight_smile: You could be a couple of dollars out on your totals, and they wouldn’t worry about it.

That might seem very sensible to you, but it was a bit of a shock to me, because in most of accounting, part of the check to avoid theft and errors is that all the rows and columns have to /exactly/ add up /exactly/. – You expect that if you pay 5% on every line item, then it will be 5% of the total.

The rule for the GST tax was the other way around! They didn’t care if the % was correct, as long as the totals added up correctly. (They had to change that a bit, because the ambiguity about the way the % could be calculated was too painful for a lot of people).

A few things:

Xero and Quickbooks Online use different methods to calculate sales tax. I don’t know what exactly they are, but for some reason I was tasked with manually entering invoices from Xero into QBO, and on multiple occasions the sales tax came out different by 1 cent after entering invoice amounts and markable line items as taxable. I can’t remember for sure at this point, but it might even have been higher in one case and lower in another. I suspect this might be due to the same thing Keeve mentioned, as they were all on invoices with multiple items. Clearly rounding is an issue that not all programmers are 100% at doing consistently still.

The guy whose name is the name of the firm I work for (no longer in charge, but still works a little) does banker’s rounding all the time on tax items, even though every tax authority I know of says to always round 50 cents up, presumably because on average it doesn’t matter because sometimes rounding an amount up is an advantage for the taxpayer, and sometimes it’s a disadvantage. We’re obviously not going to break him of this habit, and it rarely matters anyway, but it is a bit disturbing to see as a regular trend.

For calculation of amortization tables, rounding is generally of lesser significance than simply the choice of method of interpreting the interest rate. We have an application that can calculate amortization schedules that we use when we have to determine how much interest to include in a period when the loan is not covered by a bank statement, but it always calculates on a month-to-month basis, or effectively a 360/30 basis. Whenever something is covered by a bank statement, there are good odds that I can’t get the statement to match to any amortization table, even trying to go day-by-day in month increments in Excel. The most recent cause for this is the fact that the amount of interest charged during a billing period depended precisely on how many days since the last payment, but payments are only processed on banking days, so there’s a good likelihood that weekends or holidays get involved and there’s a nonstandard number of days in each month that can’t reasonably be replicated by a standard Excel computation. Even without that quirk, amortization schedules we see set up on car loans are almost always to the extreme disadvantage of the borrower. While it may say a 5% interest rate, it’s invariably higher than that, because they use a 360/Actual computation basis, meaning they charge you for 365 or 366 days of interest a year when the daily interest rate assumes there are only 360 days in the year. How they get away with this I don’t know, but it’s apparently standard in commercial car loans. On the other hand, I’m pretty sure US treasuries pay interest on an Actual/Actual basis, meaning that if you set up a amortization table that counts the number of days between payments, you need to use a different interest rate per day for leap years and common years. It’s much easier to set up 365/Actual in Excel than Actual/Actual.

I implemented a bunch of different interest rate calculations when I was doing Treasury stuff. Each market used to have it’s own calculation method, and American markets used to often be 30/360 calculation – obviously to make it easier to work out by hand using a pencil and paper. So that stuff was gradually disappearing.

With a 30/360 calculation 6% pa is 0.5% per month (not compounding), and the 0.5% per month is 0.5/30 per day (not compounding) February, you get a couple of days extra interest on the last day of the month: March, you get no interest for the last day of the month. Each interest day is 5%/360. Contract dates were never the end of the month, because that would have been too much trouble – standard contract dates would be something like the 15th of the month, and the daily rates were only used to calculate present value – which as a result had flat spots which the market avoided.

Then you’ve got Indian contracts, which are rounded to 1000, and FX, which used fixed point arithmetic, and real estate valuation using ‘rent day’, 365-365, 365-366, decimal months, Swiss, German, London exchanges…

I’m not **nate **, but I also found borschevsky’s explanation to be the first one that clicked. I’ll think about your question and reply with my findings. :slight_smile:

5 and\\Here is how I look at it. Rounding 1.1 to 1 introduces an error of 1 part iin 11, about 9%. Similarly rounding 1.2 to 1 introduces an error of about 17%. Rounding 1.3 gives 23% and 1.4 gives 28%. These add up to 77%. Leave 1.5 aside for the moment. The results of rounding 1.6 to 1.9 up introduce a total error of 58%. Now whichever way you round 1.5 adds 33% to the error, but it seems best add it to the side that has the smaller total error.

What is going on here is the same phenomenon that cases the middle of a slide rule to come between 3 and 4. The scale is essentially logarithmic. Of course, it instead of 1.?, it is 100.? the actual percentages will be much smaller, but the principle remains.

Yes, but don’t forget that many or most real-world numbers will follow an exponential distribution (cf. Benford’s Law). Given this distribution, the good estimate of 4.5 — given that that number was prepared from a higher-precision number by rounding — is 4.4998, I think.

This reminds me of two (little-known?) tricks useful in lossy compression methods like Jpeg. (Careful readers will note that the tricks are in conflict with each other!)

Trick A) When an AC coefficient in a Jpeg file is, say ‘1’, that means the compressor found a value 0.50 < x < 1.50 and rounded it to 1. HOWEVER, 1.0 is NOT the reconstructed value which will minimize average[sup]*[/sup] squared error. That is because x doesn’t come from a uniform distribution: it comes from a bell-shape with a very high steeple at zero.

Thus a decompressor which treats x=1 as 0.9 (or less) will produce reconstructed images with LESS squared error (relative to the original before compression) than a “normal” decompressor. (The same principle applies, to a lesser extent, with x=2, etc.)

    • Minimize average squared error. x = 1 will still minimize maximum squared error.

Trick B) Suppose you’re a Jpeg compressor and encounter an AC coefficient like x = 0.6. What should you do? The spec calls for you to emit x = 1, but that will spend several bits in the output file, compared with x = 0 which is almost free. Moreover, an in-spec decompressor will reconstruct this x=1 as 1, whereas the real value (x=0.6) is almost as close to zero as to one. A compressor which “spends its bits wisely” will achieve better fidelity by quantizing more finely, but rounding to zero more often.
(Disclaimer: At some point it may have been illegal for you to use the “tricks” described: they probably violated claims in some U.S. patents (now expired?). IANAL and I’m certainly N your L.)

I am so sad. :frowning: I divulge little-known insights into lossy compression methods, which will improve image quality for any given compression ratio, and there is no response — Zero. The tricks are explicitly about rounding, the topic of the thread. Perhaps I should have worked the name of Kellyanne Conway or Monty Hall into the post! :slight_smile:

Perhaps you are more familiar with JPEG and related standards- I have not read any of them all the way through, so cannot say what “tricks” are legal according to the standard. JPEG is pretty old-school and I haven’t checked if there is a simple (or even horribly expensive) way to squeeze a bit more out of the compression stage (also remember that following quantization there is also Huffman or arithmetic encoding which will affect the final file size, and that the desired ultimate results are what looks better and not what minimizes some simple error metric), but you must take into account that the decompressor may be any baseline JPEG and does not know how you might have massaged the quantization step.

BTW, same disclaimer about how this might be wrong and does not constitute legal advice, but I would not worry that your (especially non-commercial) code violates any patents since or cannot usually patent “abstract ideas”, computer programs or algorithms as such. A patent lawyer might argue that “image quantization via topsy-turvy rounding” is a patentable technical method, but even that dubious claim would not make rounding methods patentable instead of being an object of mathematical study.