Recursion in Python - How Does It Know to Unwind?

Senegoid, you say that nobody who doesn’t do assembly programming really understands how programming works. That’s true, but far too weak a statement, since most assembly programmers don’t really understand programming, either. Do you understand what’s going on at the logic gate level? How about at the transistor level? Or the semiconductor junction level? Have you ever tried to implement an algorithm by selectively doping a crystal of silicon? There are just too many layers to computers; nobody can actually understand all of them. At some point, you just have to set a floor to what you’re going to try to tackle, and it makes sense for different people to set that floor in different places.

My senior year, I took an algorithms class, just for fun. At one point, the professor asked which of two algorithms would be more efficient for some task. I answered that it depended on which had higher overhead, a for loop or function recursion. Everyone in the room, the professor included, gave me a dumbfounded look. “Oh, right, you’re not a CS student, you wouldn’t have realized how expensive function calls are.”

Not algebra, but something higher-level, because it’s more important to have people who can think than people who can count back change: The latter is a parlor trick, the former is essential to our continued survival.

Not usually, because it’s rarely relevant to how to solve any problems and, if it is, such trivia can be taught on an as-needed basis.

Also, your ideas about recursion are founded in an outmoded concern for low-level efficiency, if not a distaste borne of the fact you still don’t understand it and are therefore not comfortable with it. Relegating recursion to a method of last resort is fatal to understanding any data structure more complex than a list, not to mention being just another dumb superstition in a field already too full of them.

That’s actually not the response I would have expected in an algorithms class (in an architecture or compilers or other systems class, sure). In an algorithms class, I would have expected an “Overhead? Who cares about constant factors!?” response :slight_smile:

When I used to teach the intro data-structures and algorithms class I set a new assignment, which had the students implement some simple container class in a number of ways, and then measure the results. With a specific requirement that they pushed the implementations hard. The while point was to get them to see how things that might be O(n), O(log(n)), O(n^2) etc really behaved. Typically they got some very amusing performance graphs, where the various implementations crossed over in speed at various points, sometimes in unexpected ways. Hash based implementations are a great way to get unexpected glitches in performance.

Bingo!

Bingo!

Bingo!

(Just wanted to note that I agree wholeheartedly with everything in this post)

This probably has nothing to do with recursive vs. non-recursive; it’s just the difference between built-in hardware support for addition and addition computed by iterated successor. Of course adding numbers with iterated successor is much slower than adding numbers with hardware addition!

If you use the recursive definition,



ack 0 n = n + 1
ack 1 n = n + 2 -- For hardware addition rather than iterated successor
ack m 0 = ack (m - 1) 1 -- For m > 1
ack m n = ack (m - 1) (ack m (n - 1)) -- For m > 1 and n > 0


you can also compute all the not-too-terribly big numbers in that table in a couple seconds on a perfectly average desktop computer.

If you take advantage of hardware multiplication as well, with the recursive definition



ack 0 n = n + 1
ack 1 n = n + 2 -- For hardware addition rather than iterated successor
ack 2 n = 2 * n + 3 -- For hardware multiplication rather than iterated addition
ack m 0 = ack (m - 1) 1 -- For m > 2
ack m n = ack (m - 1) (ack m (n - 1)) -- For m > 2 and n > 0


every small value in the table is computed indistinguishably from instantly, and even all 19729 digits of the bignum ack 4 2 are computed on my machine in about 8 seconds (run through an interpreter rather than an optimizing compiler).

(Less than 1 second for the bignum when compiled with optimization with the Glasgow Haskell Compiler)

Er, I suppose “2 * n + 3” isn’t taking too much advantage of hardware multiplication; that’s more like just computing “n + n + 3” with hardware addition. But, anyway, you get the point. We could also give a recursive definition of the “hyper” function with built-in hardware support for addition and multiplication, and then express the Ackermann function in terms of that, and get it even faster. But all we’re doing is taking advantage of the fact that our hardware has built-in support for certain operations on bitstrings which would otherwise be much slower to calculate by iteration/recursion. But it’s nothing wrong with recursion; a for loop to calculate sums is as bad as a recursive calculation of sums.

(Or, alternatively, our bignum library has built-in routines for certain operations on digit-strings which would otherwise be much slower to calculate by iteration/recursion.)

Technically stack frame and activation record aren’t interchangeable. The stack frame is on, well, the stack, and contains space for anything that needs to be backed up, the activation record also constitutes information in registers.

More specifically, if you have what’s called a “leaf function” (a function that does not call another function) you can perform what’s called a “leaf function optimization” if you have only up to as many local variables as can be stored in caller-saved registers (as well as the return address register) you can have an activation record (return address in ra register, variables in caller-saved registers, return result in return val register) without a stack frame, since you don’t need the space in main memory. Whether this optimization is possible for a given function call is, of course, architecture dependent, but that’s not the point, the point is that the activation record encompasses the stack frame, but the stack frame doesn’t constitute the whole of the activation record.

Sometimes, depending on the calling convention. The stack frame does encompass it, but it doesn’t have to be used if you’re not either:

A. Storing values in scratch registers before calling a function or
B. Storing values in insured registers while you’re in a function.

Eh, heavily depends on the language. In Haskell there isn’t any iteration, but recursion in Haskell isn’t really recursion at the assembly level the way C recursion is, in that it doesn’t grow the stack etc every time you recurse. I’m not sure about Prolog’s inner workings, but you’re more or less obligated to use recursion there too.

Recursion is a tool like any other, and the best advice is to always consider WHAT you’re doing. And either way, efficiency < correctness <= clarity. No point making efficient iterative code if you can’t maintain it or if it’s a landmine for bugs. If you’re doing something performance intensive (graphics, physics simulator, hardcore calculations) then of course it’s something to consider, but generally a pretty recursive solution is better than a gnarled, jumbled mess of an iterative one (and iterative unfolding will often turn into a gnarled mess, especially the ones that basically use a stack as mentioned above).

I’m not saying use it haphazardly, unfortunately the easiest recursive examples (fibonacci, factorial) are also rather easy to implement iteratively, so it kind of gets you in the wrong mindset to start with. You rarely run into out of stack memory errors or function call overhead errors with most code with recursion, and by the time you’re solving a problem difficult and intensive enough where this actually might become a problem, you’ll probably have a good idea of the when to use recursion and what its problems are anyway. Perhaps the best argument against recursion is that it’s comparatively a giant pain to get a big-Theta analysis of it :p.

Odd, at my university in the CS department function call overhead is mentioned rather often. I mean, maybe not in the bare bones level course, but once you get into the Object Oriented Design, Discrete Structures Analysis, C, and Assembly required courses it becomes rather frequent, often mentioning algorithms that avoid too many function calls (even if they’re abstracted away into constant time for formal mathematical analysis), or design patterns that lessen the burden. What we DON’T go over is the cost of pointer dereferencing, so sometimes people get a bit object-happy…

It’s not really about the language, I say pedantically. It’s about the language implementation: how the compiler/interpreter/whatever chooses to implement recursion. This can be influenced by design choices in the language, but it isn’t dictated by it. You could write a C compiler that produced efficient tail calls and a Haskell compiler that didn’t, if you liked. (Though you’d have to be careful, in the C compiler, about making sure not to optimize away a function’s stack frame if you’ve managed to leak away pointers to its local variables and need them to stay accessible during the tail call. That’s the sort of problem that you only get when working with such a low-level language…)

Do you think 1st graders can learn high level math? Without learning basic arithmetic? And that this is only means by which people can think? Maybe you could clarify that a little bit.

You can learn what addition and multiplication mean, and understand them perfectly well, without worrying about memorizing the details of, and practicing the quick and flawless execution by hand of, some particular semi-efficient algorithms to turn decimal strings into decimal strings. Why not? Plenty of people understand square roots, logarithms, sines, etc., without worrying about how to compute them (in terms of decimal representations) efficiently. And, of course, those who do care can always learn such things later, as they desire.

(Particularly since such memorization tends to be introduced by rote, and thus does very little except teach students that mathematics is both boring and pointless. And by the time such algorithms needn’t be memorized purely by rote, there’s hardly any need to memorize them at all: once one understands the distributivity of multiplication over addition, etc., the traditional multiplication algorithm falls right out. What’s the harm in understanding polynomials before doing precise multiplication of large numbers by hand? After all, there are, let’s be honest, zero situations in the world, tip calculations be damned, where the latter is a necessary or even useful skill. Calculators are ubiquitous, even if many of them are called “phones” now. Learning to use calculators effectively would be a far more useful skill than barring calculators from math class just for tradition’s sake.)

True, but aside from C, C++, and LISP*, languages are often rather defined by their compilers or interpreters over their syntax. C and C++ have a lot of proprietary compilers, or alternative ones, or modded versions of gcc, or whatever. In Haskell you basically have GHC and HUGS, so pretty much any argument about “Haskell” generally entails “GHC and HUGS.” Besides, I’m talking in generalities, in general C compilers won’t be optimized for tail recursion, because C programmers generally use less of it (bit of a vicious cycle there, I guess). Haskell compilers better be optimized for it, because you damn well ain’t getting anything else from the syntax.

  • Among more frequently used modern languages at least, I’m not counting things that are really only around for maintenance or specialized usage like FORTRAN and Pascal. Most older languages have tons of compilers, it just seems that it’s a relic of older languages, whereas newer languages are higher level and thus their users usually use whatever’s available rather than rolling their own compiler to optimize certain things.

I think we SHOULD start higher level math early, at the very least, first grade tests really should move beyond rote:

“1+1 = ?
2 + 5 = ?
3x3 = ?”

To the more conceptual:

“Explain why 1 + 2 and 2 + 1 are the same.” Note not REPLACE, they should just spend less time on the rote and reduce the half a semester on adding two digit numbers and add some conceptual thinking. Some basic properties should be drilled in rotely (10 * <x> = <x>0) for convenience, but most things can be done with calculators, so I think going conceptual more early would be to society’s benefit. But hell, I don’t have a degree in Child Development or Education, so for all I know their tiny malleable brains would explode on the spot with this.

Yes, that makes sense. But of course for most people they need a little practice with the basics, and need to get a little further in life before taking on the higher level concepts. I’d agree that there may be too much time spent on arithmetic in primary education, probably just as an artifact of an earlier time when these were more essential skills. But there probably has to be some amount of practice at using calculators and computers for simpler math skills before jumping into calculus. I asked for clarification because the statement could be interpreted in different ways, and I doubted the intent was for the extreme.

ETA: I wrote this as a response to Indistinquishable, but it applies to Jragon’s as well. And the rote of arithmetic tables is overdone.

Of course, to answer a more general version of the OP’s question: recursion in language A =/= recursion in language B. Unwinding is done in different ways, in GENERAL:

FORTRAN 77:
Statically allocates space at compile time. Unfortunately this means it has no recursion or in fact any dynamic memory allocation at all. This does make it a bit quicker at runtime, however.

C/C++/Java:
Stack allocation, what’s discussed in this thread. This is generally the paradigm for a lot of other common languages (like Python), especially since most of these newer languages are WRITTEN in C, and thus inherit this a bit. Unwinding is done by simple moving the stack pointer up when getting some sort of “return” command at the end of a block of code.

LISP flavors (Scheme, ANSI Common, Allegro etc):
Activation Recording is done of the heap, so memory allocation and deallocation can be done in any order. This is rather flexible, however it incurs a rather large overhead since a Garbage Collector is required to reclaim memory. Recursion here doesn’t really “unwind” in the traditional sense so much as it just forgets it was in the function call to begin with and then drunkenly wanders over and shoots it in the face some time down the road.

It most certainly does have recursion. I wrote my first Tower of Hanoi program in Fortran. Lack of memory allocation means it may inefficient in a multi-processing environment, but nobody has unlimited memory.

You are making assumptions about the implementation based on a logical description of the language here.

A good figurative description, but again you are assuming implementation details about the language. Recursion as a logical construct works the same way in all languages, and within different versions of the same language call-return ordering may implemented in different ways.

Hmm, I was assured by a professor that said that FORTRAN (specifically) 77 has no recursion, at least when he used it back when it came out while he was an ECE major in India.

He may have been saying was that the facilities that make it easy to recurse were not available. You have to allocate arrays and maintain your own level indices to maintain local data levels. Depending on the version of FORTRAN you also have to protect your parameters at each level. In pre-77 versions of FORTRAN you would have to implement your own stack level index, but 77 uses a conventional call stack. My FORTRAN usage predates 77 so I can’t tell you every facility available there.

Or he was referring to some of the earlier versions which did not have a conventional stack, and what they didn’t support was re-entrant code. However recursion can be implemented without re-entrancy. Recursion is a technique, which can be accomplished in any useful programming language.