How will Ben’s reverse method affect writes? It seems to me sequential writing of a file backwards would perform poorly. Sorry if this has been mentioned up thread, and I missed it.
I’m not at all sure what you mean. Can you be more specific? Don’t forget that interleaving (1-x-2-x-3) will run the disk at half-speed in normal cases.
Asked and answered upthread. See #46 and #55.
.
Interleaving allows time for the OS to make the request for N+1 and have the disk read/write before it rotates past N+1, so overall performance is improved (depending on specific workload).
We’re going around and around in circles. First let me repeat a sentence you quoted but apparently ignored:
Don’t forget that interleaving (1-x-2-x-3) will run the disk at half-speed in normal cases.
- In any case, the interleaving granule would have to be precisely equal to the application’s data granule for the concept to make sense.
- Moreover and contrary to the needs of your method, in a multi-processing environment the time for application to process a data granule is not predictable.
- Finally, the Unix operating systems I was familiar with in the 1980’s did not offer interleaving at the file system level as a user-level option. Some sophisticated applications did control the details of file allocation rather than leaving it up to kernel, but AFAIK this was rather rare; certainly nobody at Acme ever mentioned this case as a concern for my own work, or in connection with the high-performance disk controller whose firmware was under design.
Hope this helps.
When you wrote “half-speed”, I assumed that you didn’t really mean that the speed of rotation was reduced but rather that the effect for SOME workloads was a reduction in performance. This entire discussion is workload dependent, no system is better or worse in absolute terms, only when compared against specific workloads.
Based on my googling, interleaved did have better performance in general back in the 80’s and that’s the time frame you mentioned for Ben’s method, so it seemed appropriate when comparing methods.
And that is not correct. Interleaving is usually used as a necessary evil, because the controller or some other component is too slow. You use interleaving to make the disk drive work as fast as possible.
(I’m answering RaftPeople. AFAICT Quartz is engaging in weird semantic games.)
There are two cases: Single-user operation where a disk can stay on-track while a single application processes a single file. Multi-tasking operation where the disk will seek back and forth servicing several files concurrently. The latter case is more interesting, but the former case is simpler to analyze and the one you are focused on so let’s discuss it.
The fastest you can read from a track is to write the entire track in one revolution. The fastest you can write to a track is to write the entire track in one revolution. If you use 2-way interleave, these operations will instead each require two revolutions instead of one revolution. Are you with me so far?
Obviously application details will affect disk details. Random-access applications are basically a wash so we ignore them. Small files are also of relatively little interest even if accessed sequentially. Perhaps the most common and important application pattern is a filter, which reads one large file sequentially, while writing another large file sequentially. The most efficient way to do this is to read a very large chunk (perhaps exactly one track but it doesn’t really matter) and then to write a very large chunk. As these chunks become larger and larger, contiguous allocation becomes, asymptotically, twice as fast as 2-way interleave. Are you still with me?
And finally, let’s consider a typical case: A filter which “should” use large chunks but instead is programmed to use small chunks. The kernel and/or disk-controller firmware will cooperate to concatenate these small chunks into large chunks. For reads, this “concatenation” is assisted by controller “pre-fetching” whether using Ben’s method or not. So we consider writes.
Specifically we suppose the application issues writes for blocks {1}, {2}, {3}, {4}, {5}, … Although Linux isn’t Unix, it’s easier to Google nowadays and I find these:
Re-read these, please. Note that, according to the 2nd quote even if the application requests that a write be “synchronous”, that process will unblock as soon as the data to be written is copied to kernel cache. (For “asynchronous” write it doesn’t even wait that long to unblock. :eek: ) The application does not wait for {1} to actually be inscribed on the magnetic disk before issuing the request for {2}, and then {3}.
The disk controller may not initiate the seek for {1} until after, say, {2} is already in its queue. And write requests {3}, {4} may arrive while that seek is in progress. If the application is speedy enough, a track’s worth of data can be laid down in about a revolution and a half (or less, I think on average). This is about the same as with normal sequential data and probably twice as fast as with interleaving. If the application is NOT speedy enough, the disk system will run at application speed, or seek off to service a separate process.
I’ve tried to explain this carefully, but obviously I’m not good at explanation. I will give up now; if you have follow-on questions I hope someone else will “pick up the torch.”
I’m sorry if this sounded snippy. But I honestly have zero idea what your point is. I did understand your tautology, but don’t know what relevance it has to the discussion. AFAIK, interleaving is seldom used with today’s high electronic speeds, but RaftPeople wanted a comparison.
If you have some relevant comment or question, of course we’ll be delighted to hear from you.
It may not wait that long for synchronous writes, either. The write function can return before the copy from user space to kernel space is complete.
But how does that work, you ask. What if the user overwrites the buffer with different data before the copy has finished? That would break the synchronous write semantics.
The answer is that the write function can mark the user pages as read-only. If the user tries to write to them, it triggers a page fault, and the kernel can use that as a signal to finish the copy before allowing the user to continue. If there’s no page fault, then the user just did something else in the meantime and we got asynchronous behavior with synchronous calls.
I don’t know if Linux uses this trick, but it’s been done before in file IO and lots of similar situations (network writes, graphics rendering, etc.).
Everything I’ve found on-line says that interleaving was used in the 80’s because it resulted in the best overall performance for typical workloads.
It seems a bit misleading to call a higher performing method “half-speed” even if the result of a subset of scenarios is worse performance.
Ok, let’s walk through the scenario you just posted using Ben’s method:
1 - Request to write #1 arrives
2 - Request to write #2 arrives
3 - Seek initiated, and because we are using Ben’s method the seek is to #2 because we are going to write in reverse and we are aware of both #1 and #2 requests (correct?)
4 - Request to write #3 arrives
5 - Request to write #4 arrives
6 - Seek to #4 is initiated (so we can write #4 then #3), but due to Ben’s method, that sector is almost a full revolution away because it’s behind #2 and #1 (e.g. #4, #3, #2, #1)
It seems like this would be slower than writing 1,2,3,4
Ok, let’s walk through the scenario you just posted using Ben’s method:
1 - Request to write #1 arrives
2 - Request to write #2 arrives
3 - Seek initiated, and because we are using Ben’s method the seek is to #2 because we are going to write in reverse and we are aware of both #1 and #2 requests (correct?)
4 - Request to write #3 arrives
Does this arrive before or after the seek completes? If before … no problem. If after, does it arrive before or after the rotational delay to block #2?
If before, you may be able to write #3 now. If after, then the seek landed at an “unfortunate” part of the track, something which can also happen with normal ordering.
… It seems like this would be slower than writing 1,2,3,4
It will be slower in many cases; equal or even faster in other cases. Detailed analysis is non-trivial.
*But, again, this worrisome case need not occur at all. The controller, especially if “smart” enough to see or guess that sequential writes are arriving, can wait until several arrive before beginning any of them. (Please review the discussion of asynchronous writes, and note that even “synchronous” writes can be done asynchronously at the controller-level.) Instead reads or non-sequential writes also pending in the request queue are performed while waiting for a large chunk of sequential writes to accumulate.
If the queue is empty, i.e. there is no other work for the disk to do besides starting those writes, then it goes ahead and starts them. But in that idle-queue case the loss of a (single) revolution won’t matter — the disk had nothing else to do anyway.*
I don’t believe I have ever claimed that Ben’s method provides superior results in all cases, nor even that Acme “should” have adopted this method. But I hope we all agree by now that Ben’s method is very nifty, quite novel, and possibly has merit. (Ben’s intention was that the method be selectable on a per-cylinder basis. Obviously a special shifting-copy would be required to change that status on a cylinder that already had data written.)
I don’t work on disk controller software, so I’m not a good judge of what is or isn’t novel in that field, but I certainly understand the argument that laying it out backwards provides an advantage for some scenarios, and it doesn’t seem obvious (I didn’t figure it out).
For me, I enjoy understanding the pros and cons of things like this, you optimize for one thing and at the same time de-optimize for something else, thus pressing on the various scenarios. I work on similar types of problems at work (which I really enjoy) but it’s distribution center software and optimizing picking which involves location assignment, pick walk sequence, order batching, conveyor capacity, zone capacity, putwall capacity, etc. (lots of variables which makes it fun and interesting).
Yes. I’m glad you agree Ben’s method is “nifty.” BTW, one argument for Ben I forgot to mention is that reads (which benefit) outnumber writes (potentially trouble-some). I don’t remember (or never knew) what a typical ratio of disk reads to disk writes is, but clearly reads predominate except among users eager to record things they’ll never actually care about in future!
I’ve had low financial motivation throughout my life; this may be one reason the cleverness of an invention appeals to me much more than its actual economic value. The inventions of which I’m proudest were never patented, while my one “million-dollar idea” seems so mundane and trivial I’m reluctant to discuss it when asked.