Why was the stored program concept developed?

No-reboot servicing is bog standard stuff and has been for a long long time. There are hard cases where it can’t be used, but it’s increasingly the default way to do things, with reboot-required servicing being the second choice.

It’s definitely for advanced users only.

There’s another kind of hybrid version for embedded systems. These are often Harvard architectures, but with the code residing in flash. It’s writable, but slow–you wouldn’t want to do it within an inner loop or anything.

But it’s often useful to have a bootloader that can write a new firmware to the device. On boot, it checks to see if a new firmware is available, and if so writes it to flash (ideally, you keep two copies around, so that you can jump to the old one if something fails).

It’s self-modifying in that you’re definitely writing code that you then go on to execute from the same memory space, but probably most people wouldn’t think of it that way.

I’ll take a shot at and then read the thread:

I assume Von Neumann designed the single bus computer primarily to reduce the number of components in the system.

When I was working for Sun, I had to reboot my workstation maybe once every few years - and the O/S definitely patched during that time. Our bank and Wall Street customers would not be happy with computers going down for rebooting during trading. So, nothing new.

Would load and go software count also in this view? I was thinking of software which modifies itself without any help from the outside.

We’re getting into “are tacos sandwiches?” territory here. Even the patching stuff is sort of an edge case… after all, any time you load a program, the OS reads it from storage and writes into memory. Taken as a whole, an OS and all the programs under it are constantly using self-modifying code, and couldn’t work at all with a Harvard architecture. But I think patching just barely slips in because you’re altering the behavior of an existing piece of code into something else.

Which might depend on your definition of piece of code. The O/S is patching itself, so if you count the O/S as a single piece of code, it qualifies. However if the code that does the patching is not the same as the code that gets patched, it isn’t.
I don’t know the details - this wasn’t a thing when I taught operating systems.
I can hear Eckert and Mauchly say “if only we had such problems.”

I believe you have a chicken and egg problem. That’s an advantage of the architecture but a single bus architecture was not necessary. The NSC COP420 was a Harvard processor with indexed addressing, and the Analog Devices SHARC has that and more.

.

Please remind of where I said anything about that level of architecture. I was interested in the stored program concept which is independent of architecture, pretty much.

I thought the Harvard architecture couldn’t do stored programs?

They couldn’t spare a role of holes along one edge for indexing the cards sequentially? For any not too huge number of cards you could do a binary sort rather quickly; see Edge Notched Card

He was my oldest brother (and by a longshot from me) and this was in the early days of doing things like this. Not everyone had computers, few had experience with computers. You learn the hard way and future people learn from those hard earned lessons.

He had two bachelor degrees, two masters and a PhD and was doing some test at SLAC at the time (Stanford Linear Accelerator Center). He was not a dumb person. But sometimes the simple solutions get by.

Heck…maybe they did that and it was still a shitty process to put back together. Been up for three days getting it all together I can see how it would be upsetting to spill them.

I had to do a little bit of research to lock down the definition of a “stored program computer” but I haven’t searched for an answer yet nor checked if one has been posted to the thread.

I’m not a hardware engineer but I understand that the general principal of all engineering is to simplify things and make them generally useable.

For example, if I could have a function that sorts, that’s more useful than a function that specifically and limitedly takes frog sprites and orders them left to right through some complicated decision tree of preference. You can still get to the latter from the former, if done right, and you haven’t locked yourself in. Attaching sorting and frog comparison together isn’t as good as having them separate and free to be mix-matched as needed.

In general, a computer is going to have some form of input and some form of output. Having those together is worse. Turing’s tape, which is both, is a mess.

And likely, you need some intermediate zone that’s neither input nor output. Ideally, that would be its own separate piece.

Overall, we need four pieces that are going to be glued together in some way, a processor, an input source, an output source, and an intermediate state buffer.

In terms of performance, the intermediate state is in the strongest need of tight coupling to the processor.

Now, I happen to know that modern architectures generally map input and output into memory locations. If I want to “read” data, I go to a particular place in memory and move data from it into the processor. Successive reads from the same location reveal new data, despite no updates being sent there. Likewise, when I want to write data, I send it to some memory location and that will go out to the hard drive that’s been mapped to that memory address.

This makes sense. I want to simplify how my processor interacts with other components. If everything can look the same then that provides an easier paradigm of interoperation.

So now the question is whether I want to map my entire input into memory, with free-form read head movability - to try and mimic the rest of RAM. Well, given the hardware of the time, that would be hard. It also doesn’t really work for output since that could be anything from a terminal to a printer to a light bulb.

Where RAM is fairly abstract and idealizable, with a certain conceptual purity, all other hardware is complicated and needs to be dealt with individually. That’s why we have drivers today, to try and map the quirks of the devices that interact with the outside world onto a few known and accepted paradigms.

In general, we should be able to handle a situation where we’ve got real bad I/O and can’t do more really than expect to be able to read or write single bytes at a time. We would expect this to be slow and horrible.

But, we can envision that there might be some devices that are good for I/O, where we can blitz large quantities of data from tape into memory all at once through a single operation.

So I can tune my processor to expect instructions one by one (which is universal but locks me down to the worst possible performance in all cases), I can make it adjust for either (which is complicated and bad since who knows how many variations I’ll need to support), or I can try to get the best of both worlds by telling the processor to run operations out of RAM, from a bulk cache. This isn’t quite as straight forward as the single-byte strategy but it lets me separate concerns and achieve a good result when it’s available. If I have bad I/O, I just spend the time populating RAM, byte by byte. If I have a blitz function in the hardware then I can load up and run much faster.

Right, but not particularly modern - this was being done about 50 years ago.
Engineering is all about tradeoffs. What drives these is the relative cost of hardware versus software. Before microprocessors, hardware was very expensive relative to software, and so you had microprogramming which allowed simpler, cheaper, hardware with microcode written to implement the machine’s instruction set. Doing this made for slower execution than doing it in hardware, which was why the slower, cheaper models of the System 360 line were microprogrammed while the faster, more expensive ones were not.
For I/O, while a programmer saw memory mapped I/O as you said, this was usually not done by hardware but by parts of the O/S written to handle interrupts from the I/O. (Does anyone teach interrupt handling to CS majors?)
As hardware got cheaper (a microprocessor you mass produce is a lot cheaper than a minicomputer which had to be assembled) more and more capability was put in hardware thanks to cheaper transistors. First, bigger caches. Then, better ways of scheduling instructions to make better use of resources. (Starting in around 1962 with the CDC6600 scoreboard.) Then bringing off chip bus controllers and interface protocols on chip. Nothing new there either, IBM moved I/O handling around a long time ago. Now we’ve run out of steam in making more complex chips, so it pays to make multiple CPUs on a chip.
As for blitzing large amounts of I/O, you are limited by the number of data ports on your chip. We used to use 8, 16, or even 32 pins for data transfer. That started to cause problems due to crosstalk between the signals which slowed down the maximum transfer rate. Today data is sent into a chip on a serial port with a clock which gets recovered from the data by a very complex bunch of hardware on chip. The data comes with a complex protocol. (This always seemed to be where the bugs were on our chip designs.) The single signal can load data faster than parallel input lines. However, once it gets on chip you still can’t blitz it because the internal memory has only a limited amount of data inputs, and routing and timing issues limit how many signals you can bring from the I/O to an embedded memory.
It gets even more complicated when it comes to testing this stuff.

Yep, at least 54 years! I think the first computer architecture – or certainly one of the first – to map device registers to memory addresses was the PDP-11. The first model, the PDP-11/20, was released in 1970. In all models, the upper 4K word segment of physical address space was the “I/O page”.

That may have been true for some machines, but I’m pretty sure this was never true for any model of PDP-11. Besides, most or all models after the 11/20 had memory management hardware as standard equipment. User programs ran in user mode with 32K words of virtual address space and had no access to the I/O page, which only existed in kernel mode address space. User programs did I/O through OS system calls.

As someone who once had to write a device driver for RSX-11M for some custom hardware, my feeble memory of those days nevertheless assures me that when you read or write a device register in the I/O page, you are talking directly to the hardware. It’s an elegant artifact of the PDP-11 architecture. As I recall, when a device interrupt occurs, the CPU hardware saves the PC and PSW and reloads them from values in the device-specific interrupt vector, causing control to transfer to the appropriate location in kernel address space – e.g.- directly to my device driver’s interrupt handler. There is no OS software mediation.

That’s my take.

I’d use “self-modifying” solely to mean: a single program or module alters instructions within its own image, AND, those changes are not persisted after that program is unloaded from RAM or its moral equivalent. The altered instructions can be thought of as just one more entry in the global dynamic data state of the program. Which, except for constant definitions, data state doesn’t exist once the program is unloaded.

Altering a branch to proceed to [there] rather than [over here] or altering a memory reference instruction to refer to [this] rather than [that] address are the pure cases of “self-modifying code”. Anything much beyond that muddies the definition of the term almost immediately into uselessness.

IIRC some of the early-ish PCs (say 286 era) had a feature where during start up the BIOS which was stored in ROM was copied verbatim into fast RAM and thereafter the OS or userspace code used the RAM copy. Which of course survived only until the next reboot. You can argue that that case represents self-modifying code in that the RAM did contain all zeros or garbage, then due to programmatic action it was altered to contain a sensible instruction stream that later got executed. It’s logically stepwise correct, but IMO that stretches the definition unto uselessness.

YMMV of course.

If you define it that way, the way instances on the cloud work qualify. For example, at my work, I have a fixed image that defines a virtual machine. It gets spun up on demand, does its thing, and then is gone. Every time it gets spun up, it’s in the exact same state. But while it’s operating, it is modifying itself, based on its inputs. The changes are all ephemeral to that instance.

I don’t agree with this. The issue is not about persistence, it’s about what is being modified. In your example of a virtual machine, it contains a great deal of state information, just like a physical computing environment, but that state information is generally going to be all data. That is, it’s not the result of programs busily modifying themselves, it’s the result of programs transforming and creating data. When I’m actively writing something in Word or photo editing in Photoshop, I’m producing a great deal of new state information in RAM, but I’m not altering the code one iota. Even if I go into options settings and change various settings affecting how the programs behave, I’m just changing data tables, not actual code.

In fact, this distinctions between instructions and data was crucially important when writing commonly used programs for timesharing systems. You wanted the program code to be “pure” so that it was shareable – i.e.- only one copy needed to exist in memory for many users. Only the data portion was unique to each individual user. We generally don’t care much about that today from an efficiency perspective, but it’s still good, structured programming practice.

The example of updating/patching an operating system or an application is a red herring because from the standpoint of whatever agent is doing the updating, the target code is just data from the perspective of the agent.

I think we’re kinda wandering away from the original point, which is that code that modifies itself, with the exception of trivial stuff like modifying instruction address fields which is logically the same as using index registers, is in general very bad programming practice and should be avoided. Some ancient machine I once briefly worked with in my youth – I think it was the IBM 7040 – had an “Execute” instruction (XEC). It meant “execute the instruction at the given address, then continue, unless the instruction is a transfer (branch) instrucion”". I never understood what possessed the designers to include such an instruction, and I imagined all kinds of programming pitfalls that it could create.

The question you posed was " Why was the stored program concept developed?" not ‘what was the aha moment for the stored program computer’. Your example delt with indexed addressing as a limitation of Harvard processors. It isn’t and I gave examples of Harvard architectures that include indexed addressing.

Oh yeah, a computer is an adder and a memory map, size doesn’t count.

The stored program computer was developed because of it’s greater utility and lower parts count.

Thanks for your anecdote on Von Neumans’ aha moment and I apologize for having missed your point.

The instances I run are most definitely modifying their own code. The code builds “filters” on the fly based on the data and then applies those ad-hoc filters to the data.

We can argue the semantics, since from the cloud’s point of view, my whole virtual machine is just a blob of data that the cloud manipulates. But from this developers point of view, I build a machine that writes code to deal with each dataset as it’s processed.

I learned it somewhere but I was never a CS major.