Mr. Svin's stupid computer questions thread

(Your question seems directed at Digital S., but I’ll sneak in an answer anyway.)

The computer’s processor, a.k.a. the CPU, is the thing that interprets program instructions and makes everything on a computer happen. In your own machine the processor is probably an Intel Pentium of some sort. (You might even have two of them, working in tandem.) In other machines it’s a PowerPC, or a Sparc, or a MIPS, or something else. But all of them work the same way at a low level: they all read bit patterns from memory, execute the instructions that those bit patterns encode, and then go back for more. That’s what a processor spends all it’s time doing, from the moment the power comes on to the moment it goes off.

CPU instructions are very primitive compared to the operations of a high-level programming language, such as JavaScript — or C or C++ or Visual BASIC, the languages that most commercial Windows applications are probably written in. Instead, a CPU’s instruction set is more like the language of a programmable calculator. You have a small and fixed set of registers; you can copy small groups of bytes between those registers and memory; and you can perform operations on the registers’ values — operations like basic arithmetic, bit twiddling, bit shifting, comparisons, branching, and so on. Everything a computer ever does is ultimately built up from these fundamental instructions.

… Instructions which, like everything else in a computer, are represented by sequences of bytes. Bytes in files, and bytes in memory, and bytes transmitted over the network. The processor however can execute them only from its internal memory. Hence, though a program like Word is kept around in a file, that file (or at least a portion of it) must first be copied into memory before it can be run.

Yep, zeros and ones. That’s all there is to the damn things.

By the processor. At its most basic level, a computer program is a big list of instructions. The type of chip (architecture) you have determines which instructions you are able to use (instruction set). This is part of the reason that a program for a Mac just won’t run on a PC. Macs use PPC chips, which have a completely different set of instructions than the x86 chips made by Intel and AMD and used in Windows machines.

So, we’ve got this big list of “instructions.” Every instruction set includes some obvious ones, like add, subtract, load, store, shift, etc. And, technically, that’s all you need to have. But there are also lots of esoteric instructions that can do a bunch of stuff all at once. If your program needs to do all that stuff at once, then those are a big time saver. But lets just stick with add, subtract, load, and store. (In fact, you don’t even need subtract; you can just add negative numbers).

The other thing you need for a processor besides instructions is some place to store your calculations. A scratch pad. These are called registers, and are generally denoted r0, r1, r2… (computers start counting at 0)

Consider a Really Simple Computer. It has only four instructions, and they can be used this way:

add r0, r1, r2 means take the values in registers 0 and 1, add them, and put the result in register 2
subtract r0, r1, r2 means r2 = r0 - r1
load r0, r1 means there’s an address in r0. Look at that address in memory and put whatever data is there into r1
store r0, r1 means the opposite. Put whatever data is in r1 into the spot in memory addressed by r0.

Now, with these four instructions and (let’s say) 4 registers, r0…r3, we can write a nifty little program.

load r0, r1 // gets a value from memory
add r1, r1, r1 // doubles it
store r0, r1 // puts it back

yay! We’ve just doubled a certain number (whatever was there when we started). Ok, it doesn’t do much, but it does do something!

Now look at how this little program would be stored in a computer. Everything is encoded as 0s and 1s, so we have to figure out an encoding for this, too:
load = 00, store = 01, add = 10, subtract = 11
r0 = 00, r1 = 01, r2 = 10, r3 = 11

Now our program looks like:
00000100 // padded with 00 at the end so that all operations are 8 bits long
10010101
01000100

Except there aren’t any line breaks on a hard disk
000001001001010101000100

All just 0s and 1s.

The processor is designed, then, to take that stream of data in 8 bit chunks (modern processors use 32bit or 64 bit chunks) and do what is meant to be done.

I think at this point we’ve covered day one+ of CompSci 101. Please see your T.A. regarding any further questions.

Topic for tomorrow: What differentiates a “High level” programming language from a “Low Level” programming language? Plus: Northbridge vs. Southbridge; how NVidia has leveraged its GPU know-how into motherboard money-making.

See me during office hours.

Then there’s the CPU itself. It’s a complex piece of circuitry that is designed to do specific things to data as long as the data is of a recognizable type (i.e., consists of instructions that it “knows”). Now, we think of programs (or applications, if you prefer) as instructions, but so are documents (which are interpreted by the applications), and so are operating systems (which interpret applications), and all that is in software.

But CPUs are essentially instructions, too, even though they exist in hardware. Those circuits do certain things under certain specified circumstances — the ones and zeros of operating systems and all that follows are sent into those circuits as on-off sequences of electricity and because those sequences match up with patterns the CPU is hardwired to be able to interpret, it makes other circuits active or inactive in certain sequences that create a flow of ones and zeros that generate a result, and then, when appropriate, it responds to that result until the sequence reaches a “done” stage.

All of those operations can be thought of as a set of instructions that could be written down as a program, and that’s exactly what a certain type of specialized program is: an emulator is a program that does in software what some other kind of CPU does in hardware.

Thus, you can emulate a Macintosh computer’s CPU on your PC if you install the emulator program, and then you can execute (run) the instructions of a Macintosh operating system on your PC.

Real emulators fake not only the behavior of the CPU but also reference something (usually a file) that they treat as a hard disk, and they fake the behavior of other chips like video card chips and sound card chips so as to mimic the computer they’re trying to pretend to be. An emulator can be a simple one that just lets you run games and word processors of another operating system (for example, vMac will let your PC pretend to be a Mac Plus but with no printing or networking or external devices) or it can be a more complex one that does a good job of pretending to be the other computer in rich detail (on the PC the most impressive emulator is probably PearPC, which will let you run MacOS X and go online with Mac programs, emulating a Macintosh’s network ports and converting the instructions so as to use your PC’s networking hardware in real life and passing the results back to the imitation Mac environment).

Because hardware is faster, emulators are slower than what you’d have if you had the actual machine you’re pretending to be, unless the machine you’re pretending to be is older and/or slower than the machine you’re running the emulator on.

Yes, I believe you’ve got it. Now, before I say anything else, I feel the need to point out that when I said:

I was slightly inaccurate. The problem is the “4GB”. First of all, 2^32 bits is 4Gb (gigabits); it is actually only 512MB (megabytes), due to the fact that there are 8 bits in a byte. Second, in this case, I was talking about disk space. A 32-bit address space gives you the ability to access 2^32 different and unique clusters. Each cluster, as described earlier and as I believe you understand, is a group of bytes, generally ranging from 4-16K (although it could be more or less). If you have clusters of 64K, you have the ability to access a total of 134,218GB. I humbly admit my mistake and hope I’ve rectified it.

Now, you don’t actually get the full 134,218GB to store files in the above example. You also need space for the FAT and some other data. For a really detailed accounting, check this page. I’d also like to point out that there’s a correlation between cluster size and “wasted” disk space. It turns out that most files on a computer system are relatively small; if you store a small file in a big cluster, you end up wasting space. If you happen to feel like doing some academic reading, I’d suggest a classic paper on file systems: A Fast File System for UNIX. If you can wade through that, you’ll fully grasp the basics of UNIX file systems.

Okay, I’m just going to back-track a bit here, now, if that’s okay, and continue asking my stupid questions as I re-read the thread. Starting here, originally by posted by Digital Stimulus:

I think I understand some of this, but just for clarification –

Can you give me a couple of concrete examples of what you mean with “system data?” If I run REGEDIT on Windows XP, for example, a window pops up with a list of folders entitled:

HKEY_CLASSES_ROOT
HKEY_CURRENT_USER
HKEY_LOCAL_MACHINE
HKEY_USERS
HKEY_CURRENT_CONFIG

Now, this must be the database that contains all the information regarding how my particular computer and its various programs are configured. If I open the first folder, I discover hundreds and hundreds of subfolders labelled stuff like *, .323, .386, 3g4, and so forth. Under * are two other folders, one entitled “OpenWithList” and the other entitled “shellex.” Can you explain a little more what it is, exactly, that I’m looking at? Or would I need a Ph.D. in Computer Science to understand it?

Why is it an advantage to know exactly where my system data is stored?

Why are registry entries called “keys?”

What does the term “root-level” mean?

Manny:

I was going to ask about .dll files. I now strongly suspect that .dll = dynamic link library.

So…what the heck’s a dynamic link library file?

Ah, yes, to boot or not to boot, that is the question. Whether 'tis nobler in the mind to suffer the ones and zeros of outrageous binary, or, to take arms against a sea data and, by opposing, so end them.

But, before I continue waxing poetic about booting, maybe somebody could explain to me exactly what the phrase “to boot” actually means. I know I must sound like an idiot, but I’ve been fooling around with computers for ten years or more and I still have no idea what’s up with all this booting that seems to be going on around me.

I’m using “system data” to refer to any information the computer system uses. Sorry for being vague, but I think that it’s such an expansive term that the vagueness is warranted (and necessary). Now, as to why it’s advantageous. This gets into a debate about centralized vs. decentralized systems. If I, as a programmer, know exactly where information is on your computer, I save a lot of work by not having to write code to determine where things are. Generally, at least in my view, it makes things easier, but at the cost of robustness.

“Key” is just a generic computer science term for an “identifier”. In a hashtable, the “key” is used to access the associated “value”. In a database, the “key” is used to locate a particular “record” or “entry” in a table. “Root” is UNIX-speak for an “Administrator” on Windows systems; it is the conventional name given to the “super-user” that has privileges to all the system resources.

“Boot” is short for “boot-strap”; this is the process of incrementally loading and processing (in stages) enough information to have a fully functioning system.

A library is a set of commonly used functions packaged together in one place. For example, math.h is a standard C library that contains implementations of lots of math functions. Rather than every programmer duplicating the work to calculate a cosine, they just all use the standard implementation. However, such libraries still have to be compiled into the program and take up space. If my library to do something cool takes up 10 MB of space, and I have 100 programs that want to do that cool thing, there will be 100 copies of the library inside all those programs, taking up one Gig of space.

A dynamic link library is a shared library (would be called a shared object in *nix). Basically, rather than compiling the library into each separate program, you compile it once, put it in some standard place, and all programs that need it know where to find it. There are (I’m guessing) some tradeoffs to be made here. You save disk space, but things probably run a little slower. Someone else may be able to explain that, because we’ve reached the edge of my knowledge.

Another reason to use .dlls is that they let you release an implementation of something without revealing how you did it. math.h is in clear text. Any programmer can look at it and figure out how it works. A dll is compiled code, so it is much harder to just look at it and figure out what it’s doing.

Digital Stimulus:

To expand on that, it comes from the phrase/notion “lifting one’s self by one’s own bootstraps”, a proposal that’s impossible to realize in real life, just like a perpetual-motion machine is impossible — but we use it metaphorically to refer to things like a totally impoverished homeless person making it to millionaire without any help from others.

It’s sort of an old in-joke among computer programmers — between the time you hit the on-button and the time that you’re staring at icons and Desktop, your computer starts off with only the extremely limited hard-wired programming that’s on the CPU itself and ends up with a very rich and complicated hierarchy of instructions that it is implementing or is able to implement as soon as you issue a command. How does it get from there to here?

The very first things is does when the power comes on is send signals down certain pathways in search of a physical disk; if it finds one, many computers look next in a special hardware chip that contains some instructions that tell it which of several possible disks or files to try to boot from: a PC has something called a BIOS (basic input-output uhhh something-with-an-s) which can be edited to tell the computer to boot from C or boot from D or boot from a CDROM or whatever; a Macintosh has something called PRAM (parameter RAM) which can be set with a Control Panel to tell the computer to boot from the MacOS X system that’s on Susie’s Hard Drive or to boot from the MacOS 9 system that’s on Disk II.

With some computers, if the boot volume specified can’t be found or the files it needs are messed up when it tried to use them, it polls other drives until it finds something it can boot from.

Either way, the computer’s next step is to try reading information out of those files, and that information consists of instructions: how to understand the rest of the hierarchy of files and folders, how to draw a screen, how to understand the electrical signals coming from the ethernet port, the USB port, and other places; how to draw a window, display text using fonts, how to spray pixels up on a screen at a certain bit depth, resolution, and refresh rate; the computer may know from pretty early on some really basic things about how to understand signals from the keyboard (and, on some machines, the mouse), but more info about keyboard and mouse signals gets loaded.

Then drivers get loaded that tell the computer how to deal with other devices that are plugged in or added on that are apart from the bare-bones hardware: printers, scanners, 3rd-party drives like DVD burners and Zip-disk drives, MIDI ports, and so on.

Each layer of instruction makes it possible for the computer to make sense out of the next layer, so it builds upon what it “knows” to decipher the next “lesson” until finally it’s ready for you to be able to use it.

By itself, back when you first turned it on, your PC doesn’t know how to “be” Windows. In fact, if you install Linux instead, the very same CPU loads up a totally different set of instructions. The very limited nature of the instructions that are hardwired to your processor makes the processor flexible — it can do a very wide variety of different things if given the instructions rather than being a one-trick pony. Dell or Gateway or Apple could, hypothetically, put the operations of the operating system and a double-handful of applications right there on the chip, and then you wouldn’t need an operating system or programs in software, but then you’d never be able to upgrade or add anything.

Anyhow, this process of starting off with just the simplest of bare-bones information about tasks it can do, and loading more instructions that tell it how to read in and make use of more instructions and so on was thought to be akin to lifting one’s self by one’s own bootstraps, and that’s why we speak of “booting” up the computer.