 # Traditional rounding 'bias'. I don't get it.

You would, but your original numbers total 45, and your rounded numbers total 50.

Yes, thank you. This is it… now I understand. The ‘bias’ is seen in recreating the summation from the original values compared to the rounded values, and in the traditional technique, the summation of the rounded values will always be equal to greater than the summation of the original values.

Suppose you are generating a random floating-point number in the range [0…1] and then round that to the nearest integer (0 or 1). Now do this over a million iterations. You should get the same result as in your experiment.

There is “theoretical” bias in the case you get exactly 0.5 if you always round that one way. If this were to happen a lot, you’d have an actual bias. Thus the rule, if the number to be rounded is exactly .5 you should round up or down to the nearest even number. This should, on the average, lead to about half of your .5’s being rounded up, and half being rounded down.

In actual practice, how often do you think a randomly generated float would come up exactly .5? In actual practice, that would be vanishingly rarely. So it probably doesn’t really matter what you do with .5 because it’s so rare in the first place.

Now consider a Cray-1 supercomputer running 160 megaflops, and you have an atomic simulation that runs for an entire 4-day weekend. How many floating point ops does that amount to? (This problem is left as an exercise for the reader.)

The Cray-1 has a 64-bit word, but can also do half-word (32-bit) computations. That gives plenty enough precision for the simulations they were doing, but 32-bit ops run twice as fast, so that’s what they did.

The computer does automatic rounding of floating point ops in the last binary place. You’d think that would never amount to much. But I heard the programmers talking about it. It turned out, it didn’t do the rounding properly when using 32-bit ops, they discovered. I interpreted this to mean it always rounded down. (Note that the in a binary number system, the last bit is always 0 or 1, where 1 is exactly one-half the value of the next place to the left. In effect, it was always 0 or .5. The machine was supposed to compute an extra binary place and use that to decide if it should round up or down, and I gathered that they were complaining it wasn’t doing that.)

So, the programmers complained, after running a 320mflop 32-bit simulation for 4 days, they said their simulations tended to run “downhill”. As far as I know, they concluded that they just couldn’t use 32-bit operations and had to use the full 64-bit ops even though they only ran at 160 mflops.

I was a computer operator at the laboratory where this was happening in the mid-1970’s.

The bias will be either significant or insignificant depending on the changes being made. So if, say, you are rounding a number from four decimal places to a whole number, then assuming a true random distribution, you will certainly have a much smaller error if you always round 0.5000 up… After all, that’s only going to show up around 1:10000 times.

But if your dataset consists of some out to the nearest tenth, and some to the hundredth… Well, fixing it at the nearest tenth means that it doesn’t take a lot of data for that rounding bias to become fairly significant.

With apologies to Seymour Cray and the Cray Research. Upon thinking twice, I’m recalling that the above discussion referred to the Control Data STAR computer, not the Cray-1. We had one or two of each. The CDC STAR was a monstrosity of a computer, which is what happens when they aren’t designed by Cray. It filled an half an entire large room and was notoriously unreliable. I think it was down for preventive maintenance something like six hours every day.

The Cray-1 was a compact little machine with real-time self-correcting memory parity checking. If a board was suspected of being bad, you could just pull it out and stick in a new one, even while the machine was running, and it had enough memory redundancy that it could self-correct the missing memory and continue running programs on the fly while the memory board was out.

The stuff about 0 not being rounded is getting into the weeds a bit. It’s true in a sense, but you’re not really describing why.

What’s more important is pointing out that the issue with rounding is with sums, not counts. The point with these sums is to try and get close to the same number with rounding as you would if you didn’t round. Assuming you have equally distributed numbers, you get a bunch of numbers that “cancel out.”

0.1 + 0.9 = 1.0, rounding to 0 + 1 = 1
0.2 + 0.8 = 1.0, rounding to 0 + 1 = 1
0.3 + 0.7 = 1.0, rounding to 0 + 1 = 1
0.4 + 0.6 = 1.0 rounding to 0 + 1 = 1
0.5 + 0.5 = 1.0 rounding to… wait, that last one doesn’t work. Either you get 0 + 0 = 0, or 1 + 1 = 2. Either way, you’re off by 1.

The trick of using rounding to evens fixes this by having 0.5 + 0.5 round to 0 + 0, but 1.5 + 1.5 round to 2 + 2. So you get 0.5 + 0.5 + 1.5 + 1.5 = 4 and 0 + 0 + 2 + 2 = 4

Now what about 0.0 and 1.0? Well, they don’t need any counterparts to work. They are already exact. So, in that sense, they don’t round up or down.

But the meat is the other part.

I hope that makes sense–I’m pretty tired. But i had to get this out of my brain.

PS: ironically, the floor and ceiling functions (which always round down or up, respectively) lack this bias. I leave it as an exercise to the reader to use the same logic to figure out why.

Yep. This was called the “Even Digit Convention” six decades ago. We had a thread on the topic 3 years ago and then another thread 2.5 years ago. IIRC I Googled for “Even Digit Convention” and found only a technical write-up from the 1940’s or 1950’s.

Is it a largely forgotten “rule” nowadays? How old are you, Whack-a-Mole? Nope. The arithmetic mean average over {1.00, 1.01, 1.02, …, 1.99} is 1.495 but if we round them your way we get an average of 1.50. Similarly, the average of {1.00, 1.01, 1.02, …, 2.00} is 1.50 but rounded your way we get about 1.505.

But those examples do not exercise even-oddness. Consider the 1001 elements of {0.00, 0.01, …, 10.00}. The true mean is 5.00 and that is the result if the numbers are all first rounded following the Even Digit Convention. Your way, the mean shows as 5.004995.

ETA: BigT started the 3-years ago thread, and seems to have changed his position now! Well done! Who said SDMB never fights ignorance? Let’s rephrase that: If you have a uniform distribution of 0 - 9, is it fair to say that half of them are closer to zero, and half are closer to ten?

In my view, no, it’s not fair to say that. The first five (0,1,2,3,4) are definitely closer to zero than to ten, and the last four (6,7,8,9) are definitely closer to ten. But one cannot say that “Five is closer to ten than it is to zero”, because it’s NOT closer - it is the same distance away. And yet, isn’t “closer” what rounding is all about?

If my previous post wasn’t convincing, try this:

If you have a uniform distribution of the nine numbers from 1 to 9, traditional rounding would say that fewer of them (the four numbers 1,2,3,4) are closer to zero, and more of them (the five numbers 5,6,7,8,9) are closer to ten.

But is that reasonable? In the group of numbers “1 2 3 4 5 6 7 8 9”, it is ridiculous to say that more of them are closer to ten. What we want to say is that they are evenly distributed, and that half are closer to zero, and half are closer to ten. Unfortunately, although we’d like to say that, we’re unable to, because there are an odd number of elements in this group, and it can’t be split in half fairly. The middle element is a problem that has to be dealt with somehow. And there’s the bias.

I think it may be clear by now, but I’ll say it once more:
The issue with rounding fives only occurs when there’s no more digits after the five; when all the info you have is that the measurement is halfway between the two numbers you’re rounding to.
.
A full floating point random number will not have that happen often, if you’re rounding to one or two digits, so you won’t see any bias in a test with RND().

But imagine you’ve got a bunch of measurements made by someone with a stick marked only in cm, and it’s not a precise environment so all they can do is estimate things to the nearest half cm.
Your list is 15 cm; 17 1/2 cm; 18 cm; 12 1/2 cm; etc.
If you want to work in whole cms, and you round all those 1/2’s up, your’re introducing a bias , making on average everything `1/4 cm too high.

And here is where your reasoning is flawed: you’re tossing one of the datapoints.

You have ten numbers, right? Five of them become 1. Five of them become 2. No bias.

You even rounded 1.0 yourself, changing it to 1 in your example. It’s trivial mathematically, but you have made the change of expressing it as a whole number instead of a decimal.

So over a distribution of numbers. Half of them go to the lower value (“1” in this case) and half of them go to the higher value (“2” in this case)

The number of datapoints rounded up vs. rounded down is not an indication of bias. The average rounding error is. Unbiased rounding of a set of random datapoints should have an average rounding error that approaches zero. Let’s look at Whack-a-Mole’s example, where x.5 always rounds up:

1.0 rounds to 1 => rounding error = 0.0
1.1 rounds to 1 => rounding error = 0.1
1.2 rounds to 1 => rounding error = 0.2
1.3 rounds to 1 => rounding error = 0.3
1.4 rounds to 1 => rounding error = 0.4
1.5 rounds to 2 => rounding error = -0.5
1.6 rounds to 2 => rounding error = -0.4
1.7 rounds to 2 => rounding error = -0.3
1.8 rounds to 2 => rounding error = -0.2
1.9 rounds to 2 => rounding error = -0.1

The average rounding error in this case is -0.05, not 0.0. Now change the rounding scheme such that x.5 rounds up half the time and down half the time. The average rounding error for x.5 then approaches 0.0 for large numbers of samples:
1.0 rounds to 1 => rounding error = 0.0
1.1 rounds to 1 => rounding error = 0.1
1.2 rounds to 1 => rounding error = 0.2
1.3 rounds to 1 => rounding error = 0.3
1.4 rounds to 1 => rounding error = 0.4
1.5 rounds to 1 or 2 => rounding error = 0.0 (over many samples)
1.6 rounds to 2 => rounding error = -0.4
1.7 rounds to 2 => rounding error = -0.3
1.8 rounds to 2 => rounding error = -0.2
1.9 rounds to 2 => rounding error = -0.1

The average rounding error is now 0.0.

The trick is how to decide which direction to round x.5 on a sample-by-sample basis. The trick of basing the direction of rounding on the even/oddness of the preceding digit works well if the data is uniformly random in that digit - perhaps from noise in the measurement. If that digit is not random then some other means must be used to round without bias.

Of course you have to round 0, not just leave it the same. 1.07 rounded to the nearest integer is 1, not 1.07 .

The question really comes down to, what were your numbers originally? If they started life as real numbers, then it doesn’t matter what you do with 0.5000…, because 0.5000… will come up only infinitely rarely (5.001… and 4.999… will come up, but there’s no ambiguity about how to round those). On the other hand, if they started life as fixed-precision numbers (such as rounding cents to dollars), then always rounding .5 (or .50 or .500 or whatever) in the same direction will introduce bias, because that exact number will sometimes come up (less often the more digits you originally had, but sometimes).

One solution to this is “banker’s rounding”, which always rounds an exact .5 to the nearest even number (so 10.5 rounds to 10, but 11.5 rounds to 12). This is sometimes up and sometimes down, which removes the bias.

So why is the standard to round up? Because you don’t always know where the numbers came from, and they might have come from someone else who previously performed a rounding step on them. Or possibly from someone who didn’t know what they were doing, and incorrectly applied a truncation step, instead of rounding.

For instance, suppose I want pi, to four digits. I ask you, “What’s the value of pi?”. You tell me “It’s 3.1415” (those being the first five digits, which would continue as 3.14159265358979…). Now, if it were exactly 3.1415 and I wanted to round it one more digit, I wouldn’t know which way to go. If I knew that it wasn’t exactly 3.1415 but that you had rounded, I still wouldn’t know which way to go (though, in actuality, this isn’t what happened, because if you had rounded, you should have told me 3.1416). But, if you truncated the number instead of rounding (as you did), then the actual number must have been greater than 3.1415, and so I know to round up.

Yeah. I was a bit rusty on my math and forgot to account that standard rounding procedure rounds .5 to the even.

You said almost exactly the same thing in one of the years-ago threads I linked to above. (I didn’t scroll to see if I also objected to it then.)

I hope I’ll be forgiven if I rebut your claim by making a caricature of it! :)“Setting 5+3 to 7 has merit because the ‘5’ may have come from one of our sub-accountants, most of whom think 2+2 = 5.” Trying to guess what arithmetic errors might already have occurred may be appropriate in the real world, especially in innumerate America, but unless otherwise specified I think we should assume we’re operating in Puzzleville, where everyone’s a perfect logician and everything! ``Speaking for myself, I frequently reduce precision when I post percentages (or such) in various SDMB threads; and I always go to the bother of rounding rather than truncating.  I DO apply the Even Digit Convention when I'm reducing precision by exactly one digit (or by 2 digits in the case of .50 exactly).``

Standards… IEEE 754 defines five rounding rules, the default being rounding to the nearest value, with ties going to the value with even least-significant digit. Also, correct rounding of the values of mathematical functions may require the computation of intermediate results to extra digits of precision, as mentioned above; it is not at all surprising some random buggy old computer had buggy floating point, which is one of the reasons these modern standards were developed in the first place.

In general, if the precision of the numbers is much tighter than the degree of rounding, it’s not necessary to worry about how to round numbers exactly half-way between, because they are a very small proportion of the numbers. The trouble comes when the precision (say \$0.001) is close to the degree of rounding (often \$0.01). It also commonly happens using the nearly full precision of measurements from a device; for example, measuring voltage to the nearest tenth of a V when the device outputs mV.

Note that this particular digit-rounding problem is an artifact of using an even base for our numbers (decimal, binary, hexadecimal, octal, etc). By using an odd base, it’s always clear which way to round. For example in base-9, 1-4 are closer to 0 and 5-8 are closer to 10. Four up, four down. Even easier, in a balanced odd base rounding is identical to truncating.

To expand on this, in almost all applications, numbers stored on a digital computer are not real. They are rational numbers with finite precision. Maybe the precision is tight enough to ignore the problem, or maybe not. Only symbolic processing avoids the problem altogether.

In Puzzleville, where everyone is a perfect logician, always rounding .5 up would be exactly as good (or as bad) as always rounding it down, and the inhabitants of Puzzleville tend to be the sort who contrive situations where any given deterministic method of deciding will result in a bias. Given that the behavior of Puzzlevillains can give us no guidance on whether to round up or down, we might as well make our decision based on the behavior of the people of Realworldia, and Realworldians do often truncate when they ought to round.

This is fascinating from a teacher’s point of view. To my eye, the many earlier explanations said this in many ways. Yet when borschevsky explained it in yet another way it clicked.

Was it that borschevsky gave a why as well as the result? But Marvin the Martian gave the same why, just expressed slightly differently.

Why did one why work and not the other?