1950s computers: question #1

Every aspect of the 1950s computer environment was dramatically different than anything we see today. Not just the size. In 1957 I was a Customers Engineer on the IBM 704, 705 and 709 at Lockheed Burbank and JPL Pasadena. Lockheed had two 704s and one 705 in the same room. These were supported by 9 computer operators 12 technicians and a couple of senior guys who were available for consultation on tough problems. We worked three 8 hour rotating shifts for 24 hour coverage. The computers were rented for $350 an hour.

Each computer had a card reader, a card punch and a printer that was a modified 407 EDPM machine. These peripherals were big oily things that required lots of preventive maintenance. Then there were the tape drives. Each drive was almost the size of a phone booth. The reels fed tape into vacuum columns. The tape was moved over the R/W head with prolay units, solenoid operated spinning wheels that could start and stop the tape in milliseconds. All of this generated dust from the oxide on the tape that had to be constantly removed.

The 704 was a 32 bit accumulator (31+ sign) binary system. Accumulator operations were parallel. It had a 32 bit adder. The system at JPL had extended precision. A second accumulator (larger than a fridge) was attached to the main frame for 64 bit parallel data. If I remember correctly, it had a 13.5 microsecond add time.

The 705 was a variable word length BCD machine. The A-F codes were used for beginning of word, end of word, decimal point etc.. It was an easy system to program.

These systems required stable power. So, the feed from the power company was attached to a motor generator with a big flywheel. To turn the system on from a cold start you pushed the button on the contactor and waited for the motor generator to run up and stabilize. Then you turned on the computer power supply. This started the power up sequence. The first thing up was the VT filament voltages. These were arrayed in a balanced bridge so if a filament was out you got a loud ding and the system shut down. Then you could apply filament voltage only and look for dark filaments. When the filaments were good you could restart the power up sequence which took about 15 minutes if all was well.

Support personnel had a work area for scopes and equipment and parts. I think we had 4 long work benches. Also part of the uniform in addition to the 3 piece suit, white shirt and tie was a tool kit housed in a leather brief case with your name in gold.

A dropped bit would be an operator problem. The operator had a listing of each program. A dropped bit would stop execution with a parity error and the contents of all registers would be displayed on the control panel. The operator would read the program listing to find the missing bit, replace it and figure out the best way to continue execution.

Running the computer was a hands on activity. You watched and listened to all of the components. There were lots of visual and audio clues to potential problems. A tape drive would reread a word if it got a read check. That made a sound like a pop. Standing in the middle of the computer floor you could hear the read checks as pop pop pop here and there. The sounds would lead you to the drives with weak tubes and you could change them on the fly. Most systems had 8 to 12 drives on line. The address of the drive was determined by the operator using a dial on the drive door. The operator would take a drive off line with the number switch. We had to do frequent visual checks to see which drives were actually on line and do quick fixes.

Running those computers was a lot like what I imagine driving a horse drawn wagon to be. There would always be 3 or more people assisting when the system was running. There was always the issue of using non-IBM cards and tapes. The consent decree forbade us from telling the customer that his programs wouldn’t load or run because he was using Crosby tape or cheap cards. We could adjust the card reader and punch to accept the cheap cards but we would get more card jams. A card jam in a punch that is running payroll checks is a nightmare. You have to recover every piece of every card and put them back together.

With tape it was a confrontation with the customer. He said he was getting read errors. I could only say the tape drives are working properly. The cure was that he had to run diagnostics on his tape reels. That could take hours of computer time that he paid for if the tapes did not pass. In the case I am thinking of, after 12 hours of testing the customer returned the Crosby tapes and went back to IBM.

There is really no comparison between those systems and todays computers. IBM was service and reliability. We described our products as “cut the speed in half, double the price, paint it grey and put IBM on the side”.

Each?

Customers Engineer … did the $350 an hour include your salary?

Thanks for taking the time to do that great response.

As Wolfpup points out the price was set by contract for each system. I remember the $350 number per hour because it was part of the tape test issue. That was for each system. We were paid by IBM.

At JPL we supported an attempt by the Army to achieve orbital velocity with a Redstone missile. The input to the computer was position data that came in from Goldstone and Australia on a teletype. The teletype output was paper tape. That was fed into the computer room where it was listed. A data entry person typed it into a card punch. Those cards were given to another person at a verifier. The verified deck was fed into the computer.

The Environment was more dangerous than it appeared. I was in the emergency room twice in my first year, The space behind the tape drives was very crowded. One night I managed to slice open my forehead on the sharp edge of the plexiglass relay cover. When I staggered out from the tape drives, I thought the operator was going to pass out. My face was covered with blood that was dripping onto my white shirt. Shoulda been wearing one of those grey smocks with IBM on the back.

Yes, there are a number of “cute tricks” that squeezed extra data out of floating point.

I remember (vaguely) the Computational Mathematics course which got into ring theory and then the prof demonstrated a few bad algorithms where iterations of a loop instead of squeezing out the rounding error, compounded it. Imagine each float as X+ΔX where ΔX is the rounding and precision lost in the conversion to binary. If with each calculation, that increases as a proportion of the number, it could end up a bad algorithm.

(In the class they mentioned the first attempt to use a computer to calculate the size of wing spar needed for a design of passenger jet, and the algorithm psits out “11 feet thick”.)

Numerical stability is an immense subject with numerous pitfalls. It’s actually surprising to me that FP math works as well as it does, even without special effort. Somehow it mostly works.

The A380 wing is about 11 feet thick at the root, but something tells me you’re thinking of a smaller craft :slight_smile: .

Numerical methods used to be a mandatory second year CS subject when I went through. Numerical analysis an optional third year, and advanced numerical analysis at fourth year.

We all thought it was a waste of time. Until we hit real life.

When I was in the supercomputer game about half of my client researchers were, in the end, performing numerical solutions of PDEs. The rest were doing lattice gauge QCD or numerical QED for quantum chemistry.

Numerical stability was a watchword for every single one of these. Sitting under every single one of these was BLAS in one form or another. The big data parallel machine (CM5) has its own highly optimised version. Gaussian (aka The Chemistry Virus) was built directly over the standard distribution. The QCD guys split the difference.

Entire research careers disappeared into getting these codes solid. Really some of the unsung heroes of the modern computational sciences.

Par for the course these days. NVIDIA has cuBLAS (CUDA implementation of BLAS).

There are lots of things where rolling your own version is a fine idea. Optimized, stable matrix math isn’t one of them.

Also note the inflation-adjusted value. Taking 1955 as a midpoint of the 1950s, $350 then was equal to more than $4200 today. $4200 per wall clock hour for time on a mainframe seems reasonable. Assuming 70% productive utilization, the rest being needed for maintenance and other overhead, that’s just short of $90,000 per month, which is completely in line with the sorts of prices that customers were used to paying for IBM’s Big Iron. And the computing power those machines had totally pales in comparison to what’s sitting on my desk right now!

Of course, one of the most valued characteristics of operating systems of the time was efficiency, both in execution time and memory use. Today, we need massive amounts of memory and processing power just to shoulder the burden that Microsoft’s massively bloated crap imposes on the hardware!

Doesn’t IBM still support hexadecimal and decimal floating point (and, of course, binary as well)?

Huh! So they do. At least up to the Telum/z16 chip. Well, can’t weasel my way out of that one. It’s a genuine counterexample, implemented in hardware.

Uh, can I add “consumer-level hardware” to my caveats?

I was making about $175 a week, so $350 an hour looked like big bucks to me.

I think that’s mostly just because most applications require much less precision than the floating-point precision of most data types, so that even when error does grow, it still usually doesn’t grow big enough to be a problem.

Usually.

Back when I was working with a legacy Fortran stellar-simulation code, there was one graph that always came out jagged and noisy-looking, which my advisor said was due to quantum effects. But I traced through the code, and realized that the calculation behind that graph had a catastrophic loss of precision. I hand-made some higher-precision arithmetic routines and slotted them in, and suddenly that graph was smooth. Which was obviously wrong, because that’s not what the graph looks like.

701 certainly did calculate floating point. They already had machine code libraries for floating point and would use them, but the programmers would be worried about programming errors, since the machine code could have a tedious interface

They developed Speedcode for (well, at the same time as ) the IBM 701 This was a high level language with The idea arose from the difficulty of programming the IBM SSEC machine when Backus was hired to calculate astronomical positions in early 1950.[3] The speedcoding system was an interpreter and focused on ease of use at the expense of system resources.

This meant that it ran like a programmable calculator … not powerful in terms of data and file IO, and function calls (eg difficult to build a library or ABI or SDK ) but you could get a tight loop running to get a calculation done.

The wikipedia says that a program in speedcode would take 20 times longer than if it was coded in their directly in machine code. IT was fast for the programmer to get a result, not that it was fast in terms of FLOPS.

The next IBM , the 704, had the floating point in hardware .. keep in mind that such a thing would then use microcode. So the transcendentals ( square root , sine etc ) calculations would be handed over to it, but it wouldn’t come back with an answer in the same time as for an add … it would run a microcode routine which implemented the polynomial approximation in 5 , 10 steps, but that beats the 100 for machine code or 2000 for the speedcode interpreter doing it.

Just saying the speedcode wikipedia page has the answer… what they actually did with floating point on the IBM 701

Yeah. That was me and my failing memory. I had a different product line in mind.

The capability of these machines was really pretty extraordinary. The lengths that coders went to to achieve remarkable results with very little is quite something. And the pace of advances similarly so.

Here is a link to the Speedcoding manual for the IBM 701. Had opcodes for square root, sine, arctan, exp, and ln in addition to elementary arithmetic operators. Manual also has a detailed description of the 701 itself.

Speed code required a card deck prepended to the program. I had assume it was a rudimentary assembler, but an interpreter makes sense as well.

A side note, but I recall an ad for a piece of software that would turn your 386 into a 486 so your games played faster. A little parsing of the ad suggested it replaced the software floating point library (in the computers without math coprocessors). By reducing the precision of trig and other floating point fuction by over half, it cut the processing time per math instruction significantly - the logic being, on a screen 649x480 how much floating point precision is necessary for calculating screens and motion in games?

And a process control program I helped debug once - it turned out the programmer had used a linear finction to calculate a square root. Basically, for the range of values the program was dealing with, a line was sufficiently close to that part of the parabola that it was “good enough” and in an 8088 saved a lot of processing time.

Every (analytic) function is linear at a small enough scale…

Also, cos(x) is 1 for small x, and e^x is 1+x.

Indeed. My adviser had another project on interval arithmetic, which computed the range of values for which the answer to a calculation was still valid, which of course decreases due to limited precision of most computations.

The Army Corps of Engineers did a lot of calculations to figure out where and how much of the Mississippi bed to dig to keep it from doing what it wants to do, and change course. Which would be somewhat embarrassing for Baton Rouge or New Orleans. The project found that the results of the ACE calculations were more or less random numbers, with the valid interval basically zero.

I once made a half-hearted effort at writing an interval arithmetic library, but gave up when I realized the intervals expand very rapidly until they become useless. The trouble is that the interval tracking doesn’t really “know” if you have something well-behaved or not. So essentially it gives you the worst-case result, which can be very bad.

IMO, a Monte Carlo approach is better. Perturb your inputs within reasonable bounds and see what happens to the results. Are the perturbed results smooth or random-looking? Something like the Mandelbrot Set will be the latter.

Chaos theory has similar origins:

Edward Lorenz was an early pioneer of the theory. His interest in chaos came about accidentally through his work on weather prediction in 1961.[82][13] Lorenz and his collaborator Ellen Fetter and Margaret Hamilton[83] were using a simple digital computer, a Royal McBee LGP-30, to run weather simulations. They wanted to see a sequence of data again, and to save time they started the simulation in the middle of its course. They did this by entering a printout of the data that corresponded to conditions in the middle of the original simulation. To their surprise, the weather the machine began to predict was completely different from the previous calculation. They tracked this down to the computer printout. The computer worked with 6-digit precision, but the printout rounded variables off to a 3-digit number, so a value like 0.506127 printed as 0.506. This difference is tiny, and the consensus at the time would have been that it should have no practical effect. However, Lorenz discovered that small changes in initial conditions produced large changes in long-term outcome.[84] Lorenz’s discovery, which gave its name to Lorenz attractors, showed that even detailed atmospheric modeling cannot, in general, make precise long-term weather predictions.

That happens, but also a lot of common calculations just manage to avoid the most serious issues. They’re linear, or use exponents close to 1, like sqrt() or quadratics. You mostly don’t get into trouble with these.

I’d guess that the most dangerous common operation is taking a numeric derivative. It hits the catastrophic cancellation problem, where you divide the difference between two close numbers by another small number. I.e., (f(x+h) - f(x))/h. If you aren’t careful, you can lose most of the bits in your result (or all of them). But you can fiddle with it until it works, and it’ll probably be ok.