Can someone explain monads to me like I'm an idiot?

Frylock: Well, the thread sort of belongs to everyone interested, I suppose, so till the OP comes back, addressing your questions is useful too.

Well, if you know anything about mathematics, you’re in luck; monads were studied in that context long before Haskell was a twinkle in Philip Wadler’s eye. A great many mathematical constructions turn out to be monads; for some random, unilluminating-in-isolation examples, “powerset” is a monad, as are all closure operators (such as the idea of the span of a collection of vectors, or of the logical consequences of a collection of axioms). I’ll try to show how monads are useful in analyzing these concepts from outside of programming as well.

You are describing one particular monad; the Maybe monad. But remember, having the concept “monad” is of no great use if one is only interested in one particular monad. The usefulness of the concept is entirely in seeing the common structure between many different examples. (Just as there is not much purpose in knowing what a ring is if one only cares about integer arithmetic (and, indeed, the vast majority of people understand the latter without having heard of the former). One studies rings in order to appreciate the general structure which integers, complex numbers, modular arithmetic, linear transformations, the sentences of propositional logic, etc., have in common, but not in order to study any particular one of those in isolation. This is why I keep making the disclaimer that understanding the abstract concept of a monad is not prerequisite to understanding how to write side-effecting code in Haskell)

Whoops, the OP came back. Let me respond to him first.

Instantiating is the (or, at least, a) right word, but instantiating the IO constructor at a particular type gives us a type, not a function.

That is, just as Int, [Int], Bool, Int -> Bool, (Bool, Int -> [Bool]), and so on are types (I assume you are familiar with the Haskell syntax here? If not, I can switch to whatever you prefer. [Int] is the type of lists of integers, Int -> Bool is the type of functions which take in integers and output booleans, (Bool, Int -> [Bool]) is the type of pairs of a boolean and a function from integers to lists of booleans, etc.), there are also types IO Int, IO [Int], IO Bool, and so on.

Now, of course, types are associated with values; the values of type Int are things like 3, -9, 10, and so forth. The values of type [Bool] are things like [True, False, False], , [False, True], and so on. And the values of type Int -> Int are things like “Take in input, add 3 to it, return that as output”, “Take in input, square it, subtract 9, return that as output”, “Take in input: if it is prime, return 4; otherwise, return its negation”.

But what of types like IO Int? What are the values of these types?

Well, the values of type IO Int are blueprints for how to generate an integer with side-effecting instructions, the values of type IO Bool are blueprints for how to generate a boolean with side-effecting instructions, etc. For example, one useful value of type IO [Char] is “Print to the screen ‘Welcome. Please type your name.’ and then beep twice. Wait for the user to enter some text followed by Enter. The list of characters I generate will be the text entered by the user.” One useful value of type IO Int is “Query the system clock for the number of seconds which have elapsed since January 1, 1970. The integer I generate will be this value divided by 1000, rounded down”.

One other useful value of type IO Int is “Draw a bunch of triangles on the screen in pretty patterns. The integer I generate will be 8.” Of course, no one cares about this blueprint because they want to know what the integer generated by the specified procedure is… they only care about it because carrying out the instructions mentioned in the blueprint will cause some pleasant actions to occur.

But, the key thing is this: (you can adopt the perspective that) Haskell code only ever generates these blueprints. It has no means to actually carry out the instructions specified in these blueprints.

[Sorry, I keep getting interrupted and delayed by frustrating visiting-family-domestic-strife-bullshit. I’ll finish this post later…]

I await this eagerly.

-FrL-

So, for example, suppose we have some function promptForInput :: String -> IO String with the following behavior: promptForInput x = the blueprint for the action “Print the string x to the screen, then wait for the user to type some text followed by Enter. This typed text will be the string I generate.”

You might think 'Well, then, sometimes (promptForInput “What is your name?”) will equal “friedo” and sometimes (promptForInput “What is your name?”) will equal “Frylock”, depending on what gets typed in".

But that’s actually not quite correct. promptForInput “What is your name?” will never equal “friedo”, and it will never equal “Frylock”. It will always equal the blueprint for the action “Print the string x to the screen, then wait for the user to type some text followed by Enter. This typed text will be the string I generate.” It’s not actually a string; it’s just a blueprint for how to generate a string. We can see this because the type of (promptForInput “What is your name?”) isn’t String; its type is IO String.

And, the interesting thing is, in some sense, Haskell code can only draw up these blueprints; it can’t actually run them. There’s no Haskell function of type IO String -> String which takes in a blueprint for generating strings, and outputs a string according to the specified procedure.

It’s as though we said to ourselves “Hm… Haskell code can’t do certain kinds of things. But I know Java code can. So I’ll just make a datatype in Haskell which represents Java programs. Whenever I want to do something which Java can do but Haskell can’t, instead of futilely trying to accomplish it directly in Haskell, instead, I’ll just write some Haskell which generates the Java code that does what I want. Then I’ll feed the output of my Haskell interpreter to a Java interpreter.” Only, instead of Java, we’re talking about some special language extending normal Haskell, which we might call IO-Haskell. And every Haskell interpreter comes bundled with an IO-Haskell interpreter as well, in case you want to hand things off in this manner.

This way of looking at things probably isn’t actually what happens under the hood when you run your Haskell programs. But it’s a lot like what happens. It’s so much like what happens that it could be what happens. For you as a Haskell programmer, it’s indistinguishable from what actually happens; if the clever compiler-writers happen to actually be doing something else under the hood, well, that needn’t bother you. And that’s the point: the Haskell IO system is set up in such a way as that you can pretend this is what happens, with all “real” Haskell functions remaining pure and side-effect free, even though many of them exist solely to generate code for another language which has side effects.

Note how the above explanation of Haskell IO can take place without a single invocation of the concept of monads; you don’t need to know what monads are to understand how the Haskell IO system is set up.

One wants certain constructions and rules of reasoning to be available for the generation and combination of “blueprints” specified in IO-Haskell. A particular subset of these constructions and rules of reasoning happen to match the definition of the abstract mathematical concept called a “monad”. But, while this is both interesting and illuminating for many purposes, it’s not vital to know how to separate those particular constructions and rules are from the various other potentially idiosyncratic properties of IO-Haskell in order to be able to grok Haskell’s system of manipulating side-effecting code.

(It’s still worth learning what monads are, of course, even for many purely programming purposes, if only because knowledge of generalizations and common structure is generally worth learning. Such knowledge just isn’t all that helpful for the particular purpose of learning how to craft side-effecting programs in Haskell, despite the common impression. [Because I love analogies, let’s try “… just as knowing what context-free grammars are isn’t all that helpful for learning to speak Spanish.”])

For what it’s worth, it’s possibly more convenient for you to learn SML first, which is very similar to Haskell, though without the ideological trappings about side-effect purity and other complications like type-classes and laziness (though SML has a very restricted form of type-classing in equality types). Once you get the basics down of Hindley-Milner type inference, recursion, inductive datatypes etc., moving over to Haskell should be easier.

That is the route that I took when learning FP. You also shouldn’t consider learning SML a mere stepping stone: SML’s module system is considered state-of-the-art, still, and is an area in which Haskell cannot yet compare.

It’s supposed to! :stuck_out_tongue: Judging by other examples, academic language design is generally 40-50 years ahead of mainstream languages. Stuff invented in the 60’s is only just making its way into languages that software engineers actually use on a day to day basis.

What’s lost in all the academic talk about the mathematics of monads is the fact that monads make certain tasks a lot nicer to do monadic parsing being one example.

Well, the OP’s left again (I fear I’ve done him a disservice by adhering to the letter of his question even while repeatedly explaining the distinction between that and the spirit of his query. I should probably instead have first stuck to expounding upon the “It’s like writing a Haskell program which outputs Java code to be shipped off to a separate Java interpreter” metaphor and even explaining how to actually write Haskell code using IO, while only much later, if ever, actually dealing with defining the concept of a monad. Oh well… more practice for next time.), but, in the hopes that the thread isn’t entirely moribund, let me try addressing this. Trying to learn from my mistakes, I’ll first give a concrete example, and then move to a discussion in abstract terms.

The concrete example I want to look at is one which can perhaps be appreciated both by mathematicians and programmers. Specifically, let’s talk about probabilistic maps.

A normal function is, as we all know, a deterministic method of associating input values with output values. By a probabilistic map, in contrast, I mean a potentially non-deterministic mapping of this sort, with various probabilities of going this way and that; thus, while normal functions include such maps as “Take in a word and output the number of letters in it”, “Take in an angle and output its cosine”, and “Take in a person and output his father”, probabilistic maps include such examples as “Take in a person and output one of his parents at uniform random”, “Take in a vector and output the result of scaling it by a random factor with a standard normal distribution”, and “Take in two cards and output the first with probability 65% and the second with probability 35%”.

Of course, there are many, many applications where probabilistic maps arise naturally (studying games of chance, thermodynamics, financial markets, …). But so much of our familiarity and mathematical machinery is with traditional functions; it would accordingly be quite useful if, instead of having to develop the tools to analyze and manipulate probabilistic maps entirely anew, we could instead find a simple, clean, compositional way to interpret these in terms of the functions we already know and love.

And, of course, in practice, this is just what we do. We specify a probabilistic map from Xs to Ys by giving a function from Xs to probability distributions of Ys. And we know, in terms of these representations as ordinary functions, just how to sequence series of probabilistic maps together (given a function from A to probability distributions on B, and a function from B to probability distributions on C, and so on down to Z, we know how to collapse this into a function from A to probability distributions on Z). We even know how to turn ordinary functions into very special kinds of probabilistic maps (using probabilities of 100% and 0%).

The combination of all this data (the type-constructor “probability distribution on _”, and the descriptions of how to carry out sequencing and reinterpretation of ordinary functions as above) is a monad. [In language-abusing shorthand, we generally refer to the whole monad by just the name of its type-constructor; thus, we say that “probability distribution on _” is a monad]. Given all this data, one doesn’t need to be able to directly comprehend probabilistic maps as such; instead, one can interpret discussion about probabilistic maps in terms of the language of traditional mathematical functions (in a simple, clean, compositional manner). Indeed, one needn’t have ever even heard of probabilistic maps before; one can reconstruct the notion directly from this data.

Does that help clear things up? Monads are many things, but from the particular definitional perspective I’ve taken throughout this thread, their importance, and the reason functional programmers care about them, is that they serve as bridges between different notions of mapping (or of “functions” or “programs” or “operations” or “morphisms” or whatever word you care to use), just as we saw with the above example.

Of course, the kind of connection via a monad we just saw is not the only possible flavor of connection two different notions of mapping could have. But it is one which is very common. To see why, I will next delve into the aforementioned abstract discussion (unless there are any objections. As always, the way to make explanations better is through the back-and-forth of feedback/discussion)…

Oops. You aren’t asking about this Monad.

Nevermind… :smack:

Ah, I’m still working on this. Figuring out the best pedagogical approach is just taking some time…

I mean, you can probably see how “powerset of _” acts like “set of probability distributions on _” in modelling non-deterministic maps (just “possibilistically” rather than probabilistically), and is thus a monad. But this on its own is not particularly interesting; at least, not above and beyond what’s already within the scope of programming. What I would like to do is explore the rich variety of ways the concept of “monad” arises in mathematics, even when far removed from this particular application/definition of “A way to augment the notion of function by augmenting the allowed ‘output behaviors’” (for example, monads are intimately connected with algebra, and indeed, this allows for a thoroughly different definition of monad as something like “single-sorted infinitary algebraic theory”; in another vein, the etymology of the name can be explored by showing how “monad = monoid [interpreted in an unusual way]”; and, yes, there is the fact that monads are nought but generalized closure operators) … but describing all this in the spirit of idiot-readable English is a bit of challenge. But a good one.

[sub]Bumping this as I’m finally getting around to learning FP stuff again now that I’m quitting my job that sucks balls*[/sub]

I think this explanation is making things more clear. Where I was thinking of promptForInput as a function which can return a different thing each time it’s called, it’s actually a function that generates a brand new function for taking input, which is used exactly once. Is that on the right track? Keep in mind, I’m still thinking about things in Perl which of course does not have purely functional constraints. But I imagine it to be akin to this:



sub promptForInput { 
    my $fn = sub { 
        print $_[0];
        return scalar <>;
    };
    return $fn->( $_[0] );
}

my $input = promptForInput( "say something: " );
print "you said $input";


In this example, each time promptForInput is called, a new function is created which is used to get the input, and then its value is returned.

But it occurs to me that if you think of promptForInput as a black box, then it’s still the same as if it were reading stdin without constructing the new function. So maybe it should really be this:



sub promptForInput { 
    my $p = $_[0];
    my $fn = sub { 
        print $p;
        return scalar <>;
    };
    return $fn;
}

my $input = promptForInput( "say something: " );
print "you said " . $input->();


This uses a closure to make a new function that prompts the user, which is returned to the caller, who then calls that function to get the input. That seems more FP-ish. But it still seems like I’m missing a large piece of the puzzle: though promptForInput now returns a function, it’s still a different function each time.

[sub]* The job itself does not actually involve ball-sucking.[/sub]

In what sense is it a different function each time? promptForInput("say something: ") always returns the same function <<print "say something: "; obtain input from the user and return it>>, doesn’t it?

Anyway, yes, what’s going on is something like that. Except, instead of thinking of promptForInput as returning a function directly, think of it as returning only an “encrypted” description of a function (e.g., think of it as returning the Java source code for a function). And there’s no way to “decrypt” this description from within Haskell and actually run the function (i.e., there’s no Haskell function which actually interprets Java source code, as such). But I’m probably just confusing things again, so I’ll just sit back and wait for the questions to come rather than trying to pre-emptively answer them.

“encrypted” might not be the best word, but the idea is just that the description of the function isn’t actually in the form of native Haskell code, per se, but instead is given in some other language.

:smack: Yes, that’s right.

Thanks for your help so far. It is becoming clear to me why so many people who call themselves programmers suck so bad at programming. They’re unwilling to do the work necessary to learn about more powerful abstractions. FP has been very taxing on me, especially since my math background basically ends at flunking integral calculus three times.

No problem; I’m always happy to see more people interested in learning about this stuff.

Ah, but at least you’re trying, which, as you noted, already puts you way ahead of the curve.

So, any more questions yet, or still working through it on your own?

I must say I absolutely love it too. Also, band name I’m afraid.
Still, cargo-culting code. Rolls right off the tongue, describes something oh so very common… I’m keeping this one.

It’s a nice phrase, I agree, but, lest anyone think elsewise, it wasn’t coined in this thread.

Not yet – I’ll have more time to mess with stuff over the weekend though.

Just popping in to say that the proximity of the monads thread to the nomads thread on the forum page is confusing the hell out of me. I guess I need new contact lenses…

If I were a particularly malicious moderator, now would be a perfect time for an impromptu thread merge…