How did scientists and engineers actually use slide rules, in the old days?

It looks like they used the table to iterate the entire division, two bits at a time instead of one. This cut the time for division in half, compared to the much simpler but slower method of iterating one bit at a time. For a 64-bit number, which has a 53-bit mantissa, this would mean 27 clock cycles instead of 53. One would think that time could be saved if the remainder zeroed out, finishing the operation early, but I suspect that the logic to implement that might not be worth the potential savings (division is a comparatively low-traffic operation in computers).