Recursion in Python - How Does It Know to Unwind?

That certainly makes sense.

The IA-64 (Itanium) architecture has a rotating register file that serves much the same purpose. The same hardware gets repurposed to do software pipelining of loops, too.

Interesting. Can’t say I ever learned much about that architecture; it became obvious the Itanic was going down before I got around to it.

First I’ll say that everything is so fast and huge now, I don’t bother looking at the details anymore. But based on the conditions a few years ago, I’ll say this:

Fast cache has reduced the need for expanding register capability, but you probably don’t need L2 cache for registers because the access to register windows isn’t very random, requires smaller addresses, and you don’t want to use up cache resources. The SPARC design is quite old, and the idea, as Derleth says, was to pack a lot of register memory on the chip. There are just a lot of ways to implement fast register memory now, on chip and off.

The architecture is fascinating to study, actually – it’s like HP and Intel threw everything but the kitchen sink into it: rotating register files, predication, VLIW, the works!

The way they use the rotating register file for software pipelining, in particular, is very cool: each iteration of the loop gets a different mapping of registers, but the mappings overlap, so the physical register one iteration calls r11 may be called r10 by the next iteration.

If you only had one application, you’d be able to use the window effectively to stay out of RAM but as soon as you have, say, 6 applications going in parallel, and each one with an average of 3-4 deep call stack, then you’ll be swapping your windows out to RAM every function call anyways. True, multitasking is only the norm for PCs, not embedded devices, but possibly the PC market sets the standard for CPU design?

Yes, very true, and given that SPARC was designed to be used in workstations this is a valid objection.

Not anymore, and not really back then, either: Back then, ‘embedded’ meant an eight-bit chip, like an 8085 or a PIC. Now, those are still used, but all the various ARM designs count for a lot more, from industrial applications all the way up to smart phones. Meanwhile, desktop systems have been largely x86 since… what? The late 1980s? Ever since the Apple II and Commodore 64 died off.

The embedded world is still big. It’s the dies that have gotten small.

The SPARC register windows were a bit more sneaky than described. It wasn’t just that you had a stack of registers, you actually had three units of registers visible at any one time. The point is that the SPARC was optimised for C style function calling. The optimisation was also about the speed of parameter passing. Normally parameters are passed on the stack, and you push the parameters on before making the call. (It is a matter of choice whose responsibility removing them again is, caller, or callee - something that has some other implications.) Anyway, most functions don’t need a lot of parameters, and it becomes possible to think about passing them all in registers as well. If your register window mechanism is set up right, a set of registers above your working registers in the window can be used to accept parameters, and once the call is made, the window is slid up so that those same registers are now visible (renamed) as the incoming registers for the new function call. So the common case of C style function calls with not many parameters, and not a lot needing to be done in any given function call (and thus fitting into registers) is well catered for, and fast.

Where Sparc had problems was when things were not the common case. If it ran out of register windows it had to spill a set of registers to memory. (At least when going to cache that was not too bad, but it was not cheap.) Worse was early handing of exceptions, including context switching. Then the system had to spill all the registers. Worse, the machine was unable to cope with a second exception when spilling - so registers had to be spilled to physical, not virtual memory, and the spill code had to walk the translation tables manually, so as to avoid generating a second fault. All these problems were solved, by the application of more transistors, and the current Sparc chips are actually pretty impressive. Given the dominance of x86 Sparc is targeting niche markets - with its highly threaded multi-core design. One would guess that they have extended the idea of managing a very large number of registers to even greater extremes, and context switch directly between banks.

The dominance of x86 really is a pity. Given the same resources and development effort just about any of the competition would have delivered a much better result. But it hasn’t been about the hardware for rather a while. Worth noting that both gcc and MSVC provide a fast call pragma, where two parameters can be passed in registers on an x86. If you are crafting code to extract the last drop of performance this makes a difference.

Substantially more than this on x86-64, in fact, and clang is generating pretty fair code as well these days.

(Interesting information about SPARC, too.)

Indeed, finally getting enough registers to be able to breath makes a big difference.

Clang/LLVM is seriously good stuff. It is truly a generation ahead of gcc.

OP brings up a point that I have groused about for nigh 40 years, and it’s essentially the same point that math teachers debate: With everyone having calculators, is there any reason why school children need to learn elementary arithmetic any more? Why not just start them on algebra right from the get-go? Likewise, with high-level languages that do everything from recursion to strict type-checking to wiping the programer’s nose, is there any reason to learn the nitty-gritty of how data structures, function linkage, recursion, etc. really work?

The day Pascal got invented, I started to notice that lower-division comp sci students were being taught an abstractified programming regime, where you could just write a function that called itself and it just worked. Magic! As Post #2 (I think) says!

Just two years before that, when I took Programming Techniques and Data Structures, we did the ENTIRE class in assembly language! We studied data structures (arrays, lists, FIFO’s, LIFO’s, stacks, trees, doubly-linked lists, etc), techniques for manipulating those, various ways to build and search symbol tables, parsing of expressions, subroutine linkage, and yes, even how to do recursion.

And mind you, this was on a machine that had NO inherent support for stacks, as all modern machines do. We had to roll our own.

My point here: For damned sure, we learned how all those things work, from the ground up. To be sure, doing things like all the above in assembly language is tedious as all hell, and nobody would do anything so much like that today. But I don’t think anybody who knows only modern high-level languages really has a clue or any appreciation for how computers programs REALLY work any more.

You can write whole operating systems or web browsers or the entire Facebook back-end software without really knowing the behind-the-scenes nitty gritty, just like you can drive a car without knowing too much of the internal mechanics. But if you really think you want to understand programming, you won’t get that by writing recursive functions in high-level languages.

(The astute reader will readily infer from the above that I also think children still need to learn arithmetic.)

Amen.

CDC Cyber 172 series?

Okay, I wrote the above screed about how the art/science/skill of programming has gone straight down the tubes, even as it has advanced astronomically. Now for some practical advice (as I see it) for OP:

Screw recursion. For now, at least. Just forget you ever heard of it.

It appears you are a beginning programmer student, or self-student. Recursion is commonly presented as an elementary technique, that you should learn early and use as often as you can, even to the point that it should be your first choice of algorithms rather than your choice of last resort.

I say, bull f*cking horse manure. Recursion should be about the LAST thing you learn (although ultimately you do need to know about it), and even then it should always be your choice of last resort in case you just can’t think of a better non-recursive way. And you CAN always find a non-recursive way (not necessarily always easy). And your non-recursive way, if done reasonably well, will ALWAYS be more efficient and faster and MAY eat through MUCH LESS memory. This is because all those recursive function calls are time-consuming and memory-consuming, and are best avoided if possible.

Also, I will argue, that by the time you analyze your algorithm in such depth that you can come up with the non-recursive way, you will FOR DAMN SURE know your algorithm! I claim that modern programmers blithely write recursive algorithms because it is so easy – TOO easy – to whip out an algorithm without even having to think too much about what you just wrote. Indeed, that is the whole beauty of recursive programming. But I don’t feel that’s a good way to go about making algorithms.

I will post yet another reply, elaborating on this point, describing my personal experience with Ackerman’s Function, just as soon as I can Google it up and remind myself of how it works.

Almost. CDC 6400, back at U. C. Bezerkely in 1969.

[hijack]
Are you related to all the other computer progammer Vaughan’s? There was Vance Vaughan of U. C. Berkeley, early 1970’s, later chief pooh-bah of Mt. Xinu, purveyor of Berkeley Unix.

More recently, there is Chris Vaughan, instructor of Comp Sci at Modesto JC where I recently finished certificate in network administration.
[/hijack]

Okay, here’s something non-trivial you can do with recursion. But if you actually think a while, you could do better with a non-recursive algorithm.

Look at the basic definition of Ackerman’s Function in Wikipedia, that is, the first of the many variations shown. There is a table of values there showing some of the smaller numbers. They get really really big, really really fast. There is also a table showing how those functions get expanded, sort of like the explanation that Chronos gave for the factorial function, several posts above. Ack(4,2) is said to be 19,729 digits long. Past that, they get astronomical and the time required to compute them approaches (or maybe surpasses) the estimated lifetime of the universe.

Well, I thought and thought about that function (this was way back in 1969) and figured out what it’s really doing (as shown in the Expansion part of that Wiki page), and wrote a NON-recursive program to compute as many numbers in that table as I could.

The point I am leading to: I was able to compute all of the not-too-terribly-big numbers in that table, which was beyond what any recursive algorithm could do.

Suggested exercise for OP: Now you try it!

And when you finish that – try it in assembly language! Use the largest number of bytes for an integer that your machine can easily do – 8 bytes maybe, or even 16 (128 bits) if you can do 128-bit ints.

(emphasis added)
THIS.

You know what I’m going to say. If you want to extract the last drop of performance, do it in assembly language. No, you don’t have to write the whole program in asm. Just some small critical parts, usually.

Places where they run humongous numerical simulations on super-computers, they had substantial parts of the math library hand-coded in asm to get every possibly machine cycle squozen out. We’re talking about simulations that ran for anywhere from 8 to 100 hours on the fastest super-computers of the day (1970’s).

I wrote a small piece of asm that sped up the printer spooler by almost 50%. It had a loop to process all the spool files in the queue. Inside that, there was a loop to read each line of the file and thus process all the lines. Inside that, there was a loop to scan all the characters of the line, converting each character from 6-bit ASCII to 8-bit ASCII. So this inner-most loop was running for every character of all the files in the queue. I re-wrote just that little piece in asm.

Ah! I almost guessed that, your mention of Pascal suggested a few years later, but it is a fine line. Fine machines. I still have a set of 6400 series logic modules. Discrete transistors lovingly designed by Seymore. (Got a couple of boards from a Cray 1 too, they are quite something.)

Not that I know of. I’m on the other side of the planet from them anyway.

This is only true insofar as your non-recursive way might include maintaining and manipulating a stack (e.g., a non-recursive way to parse XML). At that point, your non-recursive algorithm is going to basically be simulating the behavior of a recursive algorithm.

Well-optimized tail-recursion is going to perform more or less exactly the same as the equivalent iterative algorithm. In fact, the generated assembly code is going to look almost like an equivalent while loop’s assembly.

Yes. I’ll encourage the wrath of the high and mighty once again and state this:

Algorithms are poorly defined implementations.

Largely true, I agree, particularly for any really interesting stuff (like parsing XML).

Computational formulas with multiple recursion are awful. Fibonacci numbers are a simple example. Computing, say, fibo(50) with the usual recursive function entails tremendous amount of duplicated effort. An iterative algorithm would be vastly superior. Ackerman’s function is an X-treme example of this.