Proof for the divide-by-three rule and other rules of thumb

Well, first of all, on many chip architectures an integer division of a 32-bit number takes 32 cycles, while most other integer instructions take only 1. Also, you can unroll the loop quite easily, since you know exactly how large the number can be.

And you can analyze modulo arithmetic on any base, to find a convenient relationship. So, for example, the number 17.

1 = 1 mod 17, and 1 = -16 mod 17
2 = 2 mod 17, and 2 = -15 mod 17
4 = 4 mod 17, and 4 = -13 mod 17
8 = 8 mod 17, and 8 = -9 mod 17
16 = 16 mod 17, and 16 = -1 mod 17

So, with any integer n represented by two integers i & j where n = i * 16 + j, and 0 <= j < 16, then if n = 0 mod 17, then i * 16 + j = 0 mod 17, so i * -1 + j = 0 mod 17, so if i - j is evenly divisible by 17, then n is evenly divisible by 17.

So, I choose 16 to be the base, which is even more convenient because base-16 (hexadecimal) is a convenient format to show numbers in. If you’re unfamiliar with base 16, every digit is multiplied by a power of 16 for every step left of the ones digit, and the letters a through 4 represent the numbers 10 through 15. for example, the number 482154 would be represented as 75b6a in hexadecimal, usually denoted as 0x75b6a, to make it clear that everything to the right of the “0x” is a hexadecimal number.

So, in hexidecimal, 7 + b + a = 1C, and 5 + 6 = B. And 1C - B = 11, which is 17 in base-16, so 482154 is evenly divisible by 17. That took 3 adds and 1 subtract, each of which only take one cycle to compute, compared to a divide operation that could take up to a cycle per bit.

Continuing the example, in the MIPS instruction set, with the original number in register r1, the code to check the divisibility of a number by 17 would look like this:



andi r2, r1, 0x0f0f
srl r3, r1, 16
andi r3, r3, 0x0f0f
addu r2, r2, r3
srl r3, r2, 8
addu r4, r2, r3
srl r2, r1, 4
andi r2, r2, 0x0f0f
srl r3, r1, 20
andi r3, r3, 0x0f0f
addu r2, r2, r3
srl r3, r2, 8
addu r2, r2, r3
subu r2, r2, r4
srl r3, r2, 7
andi r3, r3, 0x0001
subu r2, r2, r3
srl r3, r2, 4
xor r2, r2, r3
andi r2, r2, 0x000f


…while to do the equivalent using divide would be:



addi r2, r0, 0x0011
divu r1, r2
mfhi r2


While the first code takes 20 instructions to set r2 to 0 if r1 is evenly divisible by 17, it would take exactly 20 cycles to perform. The second code, while only 3 instructions to do the same thing, would take 34 cycles. So, a 40% speedup going from the division to the modular-math technique.

The discrepency on 64-bit architectures is even larger, since the divide would take twice as long, but the modular-math technique would only take a few more instructions.

In considering whether these rules would ever be useful on a computer, it’s worth noting that there are a couple of common situations where the appropriate arithmetic operations are often not “hard-wired”.

One, which has already been mentioned, is dealing with very large integers. For example, the kinds of factorization that is at the heart of cryptography could benefit from quick tests for divisibility by the smaller primes. The suitability of these particular rules is limited in this case by the fact that the underlying representation is binary, not decimal, as ftg and others have noted.

Another situation not yet mentioned is the use of fixed-point calculations, which often do employ an underlying decimal representation. Now, the most common applications of fixed-point would be monetary and divisibility/modulo calculations would, I think, be pretty uncommon in those applications. But in those rare circumstances when you did need it, these rules could come in handy.

That should be, “…and the letters a through f represent the numbers 10 through 15.” Sorry for making the post even more confusing than it already was.