What is the oldest line of code still used within Windows?

And how do you even find that right order, without sorting?

Let’s say that I have a database consisting of “apple”, “butterfly”, “camel”, “dunkleosteus”, “eschatology”. And then, somewhere along the line, for some unanticipated reason, someone decides that it’s useful to sort those words by their third letter. How do you create the indices for the third-letter sort?

You scan the database one item at a time, look at the 3rd letter of each item, and create an ID:data pair entry in an index table using that 3rd letter as ID and the item itself as data. In most applications where the database item would be larger and more complex data would be an object ID of the item instead of a copy of it, or might be an object ID anyway for any number of reasons. When the entry is added to the table it is entered in the correct order to start with because the table is a tree of physically non-contiguous nodes which can be navigated to find the location of a data item based on a random ID string whether or not it already exists in the table. The table is performing like memory that can store and return data from locations with random length addresses.

OK, so I assign “apple” the ID of “p”, “butterfly” the ID of “t”, “camel” the ID of “m”, “dunkleosteus” the ID of “n”, and “eschatology” the ID of “c”. How does that put them in the right order?

If I’m understanding this correctly, you’re using an insertion sort, but pretending that it’s order n, because you’re taking some of the overhead of insertion sort and offloading it into the data structure so you have to do that overhead every time you access the data, instead of just during storage.

A couple of things:

  • A sort via B-tree could certainly be thought of as a fast insertion sort. Since a correctly implemented B-tree has insert/delete times of O(log N), a complete sort will be O(N log N).
  • log N is pretty close to constant for large N. So in a big data sense, it’s not entirely wrong to approximate the sort as O(N).

Those entries are stored so they are optimally accessible according to that 3rd letter ID. It’s alphabetical order for simplicity in this case. The IDs are actually bit strings that happen to be small in this example. They’re stored logically in binary numeric order, or higher level constructs such as decimal number order if desired.

The functions of an initial sort are duplicated in keeping the logical order in the B-tree because you are creating a new index. For the B-tree, future insertions and changes to that index continue to insert items in logical order. Otherwise you have to perform some kind of sort or merge operation to add items. This is often done with a temporary table accumulating new data, then periodically sorted and merged with the main table. That requires a two level search for access whenever the table is not up to date. The initial creation of a table may be considered similar in both cases, but after that B-trees don’t have to repeat that whole process.

Access times are superior, databases run non-stop with no need for any table maintenance, data size often is less than raw data size through compression of indexed data, they are far more scalable than other database storage schemes. The ability to manipulate random length data is a natural outgrowth of the technology allowing objects to upgrade their capabilities over time without the need to create new physical structures.

Exactly right. My thought is this - every piece that needs to work flawlessly - sorted tables, dozens of index files - is another way to go wrong. Many of the early PC support issues were files that did not save correctly or somehow became corrupt, power failures caused corrupt files, device malfunctions, etc. The simplest technique was the most reliable. So for example, file headers were simply written to fill the next available entry in the directory structure. File contents filled the next available space on the disk. (There was/is even a simple task you could run to optimize file contents positioning)

Early PC programs evolved with the arrogant but logical assumption they were the only task running on the processor. If it took a second to resort the folder contents - just sort them on the fly by whatever is needed. If I want to filter the contents of the hard drive for th*.jpg, then it gets done on the fly. It’s not competing with the guy reformatting a book for printing in the next office, or the other fellow doing a report generation from a database, or 30 users in the CompSci class compiling their next assignment. The computer is dedicated to me. In microseconds, it has plenty of time to do what it wants.

(Task Manager, for example, can be re-sorting running programs every second for the most active process. That’s easier done from the raw data than updating and writing an index file and then reading that every second. It essentially does the same thing - only, the index is re-created “on the fly” for the GUI in memory, and forgotten when you close the program. What’s the difference?)

I would also suggest that maybe there are elements of the clock program that might be very old. Does windows still calculate from 1/1/1980?

Isn’t it the number of 100 ns intervals elapsed since 12:00 AM January 1, 1601?

IIRC, the old BIOS thing was from 1/1/1980 but each field of date (two digits for year, of course) and time was stored separately, and in BCD! This had nothing to do with MS-Windows internal mechanism.

Google is being particularly helpful in leading me to sites that tell me all about setting the OS time and date and not the internal representation regardless of quotes and such.

The start is generally termed the epoch.

Useful list:

I find this thread fascinating.

To the extent I can understand it.

Thanks, everyone.

First I’ve heard of this. Any logical reason? 1600, like 2000, was a leap year while 1700, 1800 and 1900 were not leap years, this complicating the math. More likely, we use the Windows calculation, but will have a Y2K-like problem in Y21C.

I was thinking of a novel I think by Vernor Vinge, where they mention the oddity centuries in the future that the calendars in the computer age counts not from the birth of electronic computers, but 1/1/1980. (Thanks to Microsoft MS-DOS).

I can only guess why 1601 (perhaps the real, official reason is documented somewhere?), but it is not any kind of Y2K or Y2.1K problem: the value is a (possibly signed) 64-bit integer, so you are good for a few tens of thousands of years.
ETA clicking onthe link inside @Francis_Vaughan 's link yields

but I have no idea what “math come[s] out nicely”, that’s a strange thing to say. Maybe they saved a single addition operation somewhere? Note that the code presumably has to deal with different calendars and time zones and leap seconds and what not anyway.

No, it is idiotic. They were probably re-purposing existing code and didn’t want the additional work of making work correctly.

I have worked with a system that uses 100ns ticks since the real Georgian epoch in October 1582. The less than amusing problem with this is, that whilst it fits inside a 64 bit value easily, it won’t fit inside a Javascript number. So neither will the NT clock. No wonder MS hate Javascript.

Same here.

I used to be really into computers in the mid-80s. As a matter of fact, I was one of only a couple of kids in the class to have one, a present from my parents on my twelfth birthday. I used to do some extremely rudimentary but fun programming. Then, I lost interest and moved on to other hobbies.

Nowadays, I really don’t care much for computers apart from day-to-day use, but for some reason, I find stories of the early days when it all started fascinating.