Reinforcement learning in particular is an area Python starts to fall apart in. For big data tasks or even big matrix computations, sure, 99% of your time is spent somewhere in GPU or FORTRAN/BLAS-land, and the wrapping around that doesn’t matter as much.
RL tends to require really involved strategies and algorithms around these computations though, as well as environments that may have to do non-trivial computation to generate the next state. Developing non-trivial RL environments is almost always done in something like C++, Rust, C, etc because it’s closer to a form of game development than anything. I view this as a weakness of Python as the de-facto ML environment, when for one aspect of ML (RL) people have to step outside of it so drastically. Hell, I’ve seen people develop bizarre strategies where they like… write data from a Java environment to a memory mapped file shared with Python just to write a faster environment (note: I do not recommend this).
Also on the whole “it’s fast once you get into C-land” thing.
IMO, this also often enforces a really weird coding standard. Since you can only get into C-land by using the “right” functions, you can do one of any number of obvious ways to do something in numpy and be screwed because you ended up spending too much time in Python.
For instance, a common newbie mistake is using a lot of for loops, or even np.apply (which is just a python for loop internally). Instead, numpy has a ton of really niche functions that do something extremely specific, but not complex enough to require BLAS, but they’re still much faster than what you could produce because they farmed out what’s essentially the exact same code you just wrote to C.
But now I’m straying from parallelism concerns.
E: Also, Samuel A. My concern isn’t with the multiprocessing library per se. Yeah, using raw OS threads the pthreads way is often madness. I just find it funny that there’s actually a multithreading package in Python, but it’s actually broken by the GIL so there’s actually an entirely separate second package that spawns multiple Python processes and uses file or network-based message passing (abstracted away for you). There’s no reason a similar library couldn’t work in multithreading. Hell, Erlang, Go, and Rust all have safe channel-based message passing paradigms to various degrees and use OS threads (though in Erlang and Go they’re Green Threads that use OS threads under the hood).
They were all developed in systems that never saw a Windows kernel. Either Unix variants or custom OS kernels. The big driver for a lot of this has actually been OLTP. Historically a big part being web commerce. But high performance numeric work as well.
There was a serious amount of really good stuff happening in the late 80’s early 90’s. Both h/w and OS wise. Much was swept away by x86, Windows, and Linux. Machines like the Encore Multimax ran research and commercial systems on multiprocessor designs, Cray had their commercial systems division (later sold to Sun). And of course we then saw companies like Thinking machines with the CM-2 and CM-5. Plus Terra MTA. There were so many interesting things going on. Sigh.
The alternate implementations like Threadless and PyPy have dried up, but it wasn’t because they were inferior – it was because they didn’t get the love.
You say that like efficient context switching is nothing. Actually of course, the context switching is most of the computer, and the core is partially parallel and multi-tasking anyway. The fact that hyperthreading has almost no value for almost everybody shouldn’t lead people to confuse “true multitasking with shared resources” and “faked”.
Earlier than that, surely. FPUs were invented for the microcomputers of the 1970s, but before that mainframes had coprocessors that handled other tasks such as I/O.
As stated this is completely wrong. It is true that running a “multi-tasking” operating system on single-core CPUs simply gives the illusion of concurrent execution. Maybe that is what the teacher meant. However, concurrent asynchronous execution (ie true multi-tasking) has been used since the early 1960s. Since the mid-1990s it has been fairly common in the PC and workstation space.
Multi-tasking is just running multiple code paths concurrently, whether threads or processes. This can be on asymmetric or symmetric multiprocessing computers. If asymmetric, certain tasks (often OS or I/O) are often assigned to one CPU, if symmetric they are all even and any task can run on any CPU.
IBM mainframes used symmetric multi-processing since the mid-1960s.
In 1967 the GE-645 running the Multics operating system used symmetrical multi-processing.
In the mid-1970s, some versions of the DECSystem-10 used symmetrical multi-processing.
Around 1978, Data General released the M600, an asymmetric multi-processor mini-computer.
DEC released the the dual-processor VAX 11/782 in 1982, but this was not a truly symmetric multiprocessing from a scheduling standpoint.
Around 1991 Sun Microsystems released their first multi-processor server, the SPARCserver 600MP. In 1992 Sun’s first desktop multi-processor workstation was the SPARCstation 10.
In the PC era before modern multi-core CPUs, multi-socket computers were used. Windows NT ran on several multi-socket computers, and around 1994 Microsoft demonstrated Oracle running on a 16-socket Intel server.
In 1995 the Intel Pentium Pro supported up to four sockets without any external “glue” logic. This simplified motherboard design, and there were several multi-socket workstations and servers during that period which ran Windows NT.
Those all were truly “multi-tasking” in that multiple code paths were in simultaneous asynchronous execution.
We could potentially get into a squabble about definitions here, but IMO throwing around “multitasking” and “multiprocessing” as if they were synonymous has never been standard usage in computer science and muddling the two is likely to be extremely confusing to someone asking about this stuff. OTOH, language can be inherently confusing because words are often misused, which IMO is all the more reason to use technical terms correctly.
Multitasking in standard computer science usage means the interleaving of multiple tasks, or processes, so that they execute under control of a scheduler and run as resources become available to them rather than sequentially running to completion. It embodies the suite of techniques by which the OS saves and restores process context efficiently and transparently so that many processes are in execution states in the scheduling queue at the same time, and therefore resources are optimally utilized. It has absolutely nothing to do with having multiple processors, for which the term “multiprocessing” is reserved. The perceived concurrency of multitasking is not an “illusion”; it was a very important advance in computing efficiency and greatly increased the productivity of expensive mainframes, and is still used everywhere today. Moreover, a multiprocessing system is usually also a multitasking system – that is, it schedules many tasks on two or more processors. Such an OS can usually run many tasks on just one processor, too, in which case it’s multitasking, but not multiprocessing.
Here is Wikipedia’s definition; I’ve highlighted the most important parts:
In computing, multitasking is the concurrent execution of multiple tasks (also known as processes) over a certain period of time. New tasks can interrupt already started ones before they finish, instead of waiting for them to end. As a result, a computer executes segments of multiple tasks in an interleaved manner, while the tasks share common processing resources such as central processing units (CPUs) and main memory. Multitasking automatically interrupts the running program, saving its state (partial results, memory contents and computer register contents) and loading the saved state of another program and transferring control to it. This “context switch” may be initiated at fixed time intervals (pre-emptive multitasking), or the running program may be coded to signal to the supervisory software when it can be interrupted (cooperative multitasking).
Multitasking does not require parallel execution of multiple tasks at exactly the same time; instead, it allows more than one task to advance over a given period of time. Even on multiprocessor computers, multitasking allows many more tasks to be run than there are CPUs.
Here is a similar definition from Encyclopedia Britannica:
Multitasking involves overlapping and interleaving the execution of several programs. This is often achieved by capitalizing on the difference between a computer’s rapid processing capacity and the slower rates of its input/output devices. While the computer is reading data from a magnetic disk at a fairly limited rate, for example, its powerful central processor can execute at high speed another program that involves extensive calculations but very little input. Operating systems coordinate the competing demands of various programs in a variety of ways …
None of this has anything to do with multiple processors. Multitasking is the fundamental technique by which virtually all large computers from around the 60s onward executed programs efficiently, often in either timesharing configurations as I mentioned earlier (the path that companies like DEC took) or in concurrently scheduled batch-job configurations, typical of IBM. IBM’s early System/360 releases were a single-task (single partition) base system called PCP, and multitasking systems designated MFT (multiprogramming with fixed tasks) and MVT (multiprogramming with variable tasks) in which tasks were allocated an arbitrary number of variable-size memory partitions. These were all single-CPU systems, needless to say.
Conversely, multiprocessing is defined here:
Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system.
Also, not to nitpick, but I take issue with your statement that “IBM mainframes used symmetric multi-processing since the mid-1960s”. Virtually none of them did. The only IBM mainframe of that era to my knowledge that offered multiprocessing was the System/360 Model 65 MP, and that wasn’t even announced until 1968. The System/370 Models 158 and 168 also had MP options, but they weren’t announced until 1972. The DEC PDP-6 had that beat by introducing timesharing back in 1964, and MIT did pioneering development on timesharing with the establishment of Project MAC in 1963.
It’s been around for a while. For consumer CPUs since the Pentium 4 era (2002).
So a 4-core, double hyper-threaded CPU could be running 8 simultaneous tasks, not counting all the stuff going on in GPUs and elsewhere.
Hyper-threading doesn’t always speed things up depending on the nature of the computation. So it is falling into disfavor for certain applications where the silicon could be put to better use. For example as extra cores. So you can buy a brand new CPU that is not hyper-threaded but is faster than the previous model.
Also: Very, very, very few great sequential programmers are good at multiprogramming. It just requires a special mindset that a lot of people don’t have. This has really hurt the development of programs to take advantage of parallel opportunities (threads, cores, distributed, etc).
Please ignore my last sentence in my post #29. It was a half-constructed comment originally meant to go in the previous paragraph. What I was trying to say, in the context of multitasking, is that while IBM was struggling to make MVT work (along with some mostly unsuccessful forays into different approaches to timesharing like TSO and TSS/360) both DEC and MIT’s Project MAC had developed working timesharing systems years earlier, on a PDP-6 and a modified IBM 7094, respectively. These were all examples of sophisticated multitasking on single-processor systems.
Exactly. This has nothing to do with Widows or UI’s but rather back end OLTP.
It was well known in the early 90’s that rapid increases in processor clock speeds was not going to continue. SMT and multiple cores (introduced in 2002 and 2004 respectively) were natural responses to that constraint to continue increasing server performance.
Multi-processor systems certainly existed already, but required advanced/expensive memory management across cpu’s to be effective. Increasing per cpu efficiency was a natural result.
Seymour Cray built the first RISC-based supercomputer, the Control Data 6600, in the early 1960’s (announced 1964) by using a main CPU + 10 Peripheral Processing Units (PPU’s) which were actual computers in themselves (modified CDC 1604’s) to do I/O processing. Also, the 6600 CPU included 10 parallel functional units to do things like add, multiply, divide, shift, etc. (Something of a predecessor to the FPU & GPU chips used in modern PCs.)
So the main CPU can be doing a computation on part of the task, while PPU2 fetches the next item into memory, and PPU6 is writing the last result into storage – that should count as multi-tasking, since multiple processors are working on parts of the program task at once.
And by 1967, they were selling the CDC 6500, which was literally 2 CDC 6400 CPUs in one box. Certainly multi-tasking at that point, since the 2 CPUs could be running completely different programs at the same time.
Seymour Cray built the first RISC-based supercomputer, the Control Data 6600, in the early 1960’s (announced 1964) by using a main CPU + 10 Peripheral Processing Units (PPU’s) which were actual computers in themselves (modified CDC 1604’s) to do I/O processing. Also, the 6600 CPU included 10 parallel functional units to do things like add, multiply, divide, shift, etc. (Something of a predecessor to the FPU & GPU chips used in modern PCs.)
So the main CPU can be doing a computation on part of the task, while PPU2 fetches the next item into memory, and PPU6 is writing the last result into storage – that should count as multi-tasking, since multiple processors are working on parts of the program task at once.
And by 1967, they were selling the CDC 6500, which was literally 2 CDC 6400 CPUs in one box. Certainly multi-tasking at that point, since the 2 CPUs could be running completely different programs at the same time.
So it’s sort-of a definition problem: when multiple processors within the computer system are working on different parts of the same problem, do you count that as multi-tasking? Or do you say they have to be working on separate problems to count as multi-tasking?*
Even then, one could claim the CDC 6600 met that definition. It was common for the CPU to be working on a problem, while the operating system had some of the PPUs busy loading the next program into memory.
Those are examples multiprogramming and/or parallel processing, not multi-tasking.
Multi-tasking is a bit of a different concept where (typically the operating system) is able to keep track of multiple tasks and go from one to the other without losing information.
So different levels of parallel computing such as bit-level, instruction-level, and task-level, with their long history, have been covered. It may be worth mentioning textbook constraints like Amdahl’s Law, which show that there is some theoretical limit on how fast one can execute a given “task”. This may or may not have been explained well by the computer-science teacher in question.
Exactly what you said, with the caveat that “multiprogramming” is usually synonymous with “multitasking”.
Multitasking has a long and storied history in computer science, encompassing the evolution of fundamental technologies like timesharing, realtime multitasking, and even how operating systems and their services are structured, and it’s silly that we’re having a debate about it with various folks offering up incorrect information when the definition is clear (see post #29). Multitasking specifically has nothing to do with having multiple processors.
In general:
Multiprocessing: Running processes on two or more tightly-coupled processors with common memory. (In my view and in commonly accepted terminology, running processes on two or more loosely-coupled network-connected processors – including clusters – is not multiprocessing, it is distributed computing.)
Multitasking: Running processes under an OS scheduler that context switches between them based on timeslicing, event-based or priority-based preemption, or cooperative scheduling
Multiprogramming: In most contexts, same as “multitasking”. For example, old single-processor System/360 OS variants were given the designations MFT and MVT – multiprogramming with a fixed [number of] tasks / multiprogramming with variable tasks.