Anyway, here’s what I would propose as a simplified description of the actual machine in front of me along the lines of your description. Keep in mind that many different architectures are possible; this is just one kind of design that happens to be used. Also, keep in mind that this description is a compromise between the full details of the real world and simplicity:
There is a sequence of numbered boxes, each of which contains some number of bits called a “word” (which can be thought of as representing a number up to some fixed size). [The only reason we bother having most of our architecture based around words rather than individual bits is for optimization reasons]. The total number of boxes is small enough to be represented by a word. These boxes are collectively known as the “memory” or the “heap”.
Certain sequences of words specify certain “instructions”. More on this later.
There are also some special boxes apart from that sequence, known as the registers. Most of the machine instructions act directly on the registers, rather than on the memory. [To some extent, this is also an optimization decision.]
One particular register is the instruction pointer. At the start of each “step”, the machine first reads the number P in the instruction pointer, then reads the instruction J stored starting at the memory location P, then does whatever J says to do. After this, it starts the next “step”, doing the same thing.
Among the possible instructions are the following. (Unless otherwise specified, all of the following instructions conclude by having the machine increment the value in the instruction pointer by their own length, so as to point to the next instruction consecutively):
ADD-Reg1-Reg2: This instruction consists of an initial “ADD” code, followed by codes specifying two particular registers. The end result of this instruction is that the value in the first specified register is incremented by the value in the second specified register. [I could just as well have had two specified input registers and one specified output register, or have always fixed the same registers to be used in addition, or have had addition work directly on the heap, or other things. I’m just illustrating one possible design found in the world.]
MUL-Reg1-Reg2, SUB-Reg1-Reg2: These are just like ADD, only with different arithmetic expressions.
READ-Reg1-Reg2: This instruction copies into Reg1 the value in the memory location specified by Reg2. Thus, this allows us to move data from the heap into the registers.
WRITE-Reg1-Reg2: This instruction writes the value in Reg1 into the memory location specified by Reg2. Thus, this allows us to move data from the registers into the heap.
COPY-Reg1-Reg2: This instruction copies the value in Reg2 into Reg1. Thus, this allows us to move data between registers.
LOAD-Reg1-Value: The code for this instruction consists of a “LOAD” code followed by the code for a register (Reg1) followed by an arbitrary word (Value). The effect of this instruction is to copy Value into Reg1. Thus, this allows us to put specific constant data into a register.
BRANCHZERO-Reg1-Reg2: This instruction examines the value in Reg1; if that value is non-zero, then nothing happens except the usual incrementing of the instruction pointer. However, if the value in Reg1 is zero, then the usual incrementing of the instruction pointer is suppressed, and, instead, the value in Reg2 is copied into the instruction pointer. This instruction allows “conditional jumps”.
INPUT-Reg1: This instruction asks the input device for one word of data, which is stored in Reg1. Presumably, the input device knows how to produce that word (perhaps by reading it off of a queue where it builds up).
OUTPUT-Reg1: This instruction sends to the output device the value in Reg1. Presumably, the output device knows what to do with this.
Most of these instructions are fairly realistic, except those two INPUT/OUTPUT commands. However, in a real-world machine, there would be many more instructions of many kinds (in particular, more kinds of jumping [e.g., you can accomplish unconditional jumping with the above, but it’s nice to have a built in command for it], codes designed specifically for manipulating the so-called “stack” (part of memory helping to organize the data used in nested function calls), and so on). There are also features I haven’t gotten into, such as flags, interrupts, realistic input/output design (the above is nothing like it, just enough to get you going), the cache, and so on.
Anyway, the above design isn’t Turing-complete in itself, because of the finite memory space. However, if the input/output commands are hooked up to a peripheral memory device which is capable of growing to unboundedly large size, then this will be Turing complete. (But it’s very easy to be Turing complete; all a Turing machine is is a finite state machine hooked up to an unbounded memory device.)