Can someone explain monads to me like I'm an idiot?

I know there’s a few FP gurus out there in doperland. I’ve been trying to learn a lot more about functional programming in general and have had great successes with things like closures, currying and higher order functions, lazy evaluation, and so on.

After reading about ten million words over the past six weeks about the topic of monads, however, I have learned exactly two things:

  1. In a purely functional language, all functions must be idempotent, and therefore things which have side-effects or rely on outside conditions are impossible.
  2. Monads make it possible.

A that point I’m completely lost, as everything I’ve read either descends into unintelligible reams of mathematical nonsense or giant listings of Haskell or Scheme code which I don’t understand.

Can anybody explain it in English?

First, one nitpick: I don’t think “idempotent” is exactly the word you’re looking for in condition 1, though I guess what you mean by it is something like “Everytime you call this function with the same input, it does the same thing and gives the same output”.

Secondly, yes, I wager I could explain what monads are. But first, let’s figure out if that’s exactly what you want to figure out right away. That is, there is one thing I want to stress highly: you no more need to know what a “monad” is or understand the abstract theory of monads in order to do input/output and so forth in Haskell than you need to know what a “commutative ring” or understand the abstract theory of commutative rings in order to do integer arithmetic in C or Java.

So, the question is, what is your primary concern? Understanding what monads are or learning to do input/output and so forth in Haskell? (Of course, we can address both).

I primarily want to understand what monads are. I’m just beginning to learn Haskell, and I’m perfectly comfortable just copying syntax examples from a book without entirely understanding WTF is going on (this is how I learned every programming language I work with.)

But eventually I have to learn WTF is going on if I’m to move beyond cargo-culting code to solve problems, and so that’s why I’m asking.

Hm, the above post of mine might have been unclear. Let me clarify it.

Do you know what a type constructor is? (If the answer is “Yes, of course”, I apologize, but I’ve been asked to treat like you an idiot). For our purposes, a type constructor is some way of taking in one type and producing another; e.g., “list of _” (as in “list of integers”, “list of Booleans”, “list of lists of integers”) or “function from integers to _” or “pair of a Boolean and _” or “binary tree whose nodes and leaves are marked with _”. That sort of thing.

If a type constructor (with some associated structure) has certain properties, then we call it a monad. For example, it turns out that “list of _” is a monad. I’m not going to tell you what a monad is right now.

There is a very particular type constructor in Haskell called IO; values of the type IO t are blueprints for procedures to run in order to generate a value of type t. For example, one value of the type IO String is the blueprint for the procedure “Ask the user to type in some text, then return this text” and one value of the type IO Integer is the blueprint for the procedure “Beep twice and turn the screen red, then figure out how many seconds have passed since noon and return that value”. As you might begin to see, this begins to allow us a way out of the problem of carrying out side-effects in Haskell, where side-effects are technically verboten. Whenever we want to create a value of type t using a side-effect, we’ll instead technically just create a blueprint (of type IO t). Creating the blueprint doesn’t involve actually carrying out any side-effects; it just tells us how we would carry them out, if we cared to.

As it turns out, the IO type constructor is a monad as well. You could hardly be expected to realize this, because I haven’t bothered to tell you what a monad is.

Key point: Just as you were perfectly well able to go about your life manipulating lists in Haskell without having to “understand monads”, you will be perfectly well to go about your life manipulating IO types in Haskell without having to “understand monads”. Learning the theory of monads is almost orthogonal to learning how to carry out input/output and other side-effects in Haskell.

That having been said: There are certain useful-to-observe similarities between “list of _” and “a blueprint for carrying out a (possibly “impure”) procedure for generating a value of type _” and many other type constructors. Obviously, the similarities I am talking about here are the ones which define “monads”. Studying these similarities can A) change your way of viewing things, bring you further insight in understanding techniques you already employ (perhaps by seeing them as specific instances of more general ones), open the doors of new programming tricks, etc., as well as B) allow you to write and employ code which works generally for any such type constructor, instead of having to rewrite essentially the same boilerplate in each instance. (Just as it is useful to be able to write a single polymorphic function that acts on “list of _” without having to prespecify what _ is, it can be useful to work at such a level of abstraction, writing code that works on arbitrary monads without having to prespecify which one).

But, remember the key point: You don’t need any insight into what a monad is in order to understand IO (which is the vehicle of carrying out input/output and other side-effects in Haskell). Rather, learning about monads simply allows you to see the connections between IO, “list of _”, and other type constructors, just as learning about commutative rings allows you to see the connections between integer arithmetic, floating-point arithmetic, complex number arithmetic, etc. It allows you to think at a higher level of abstract generality. The reason monads often look scary is because abstract generality often looks scary.

Alright, that’s enough beating around the bush. Next, I’ll start to explain just what a monad is, now that I’m sufficiently comfortable that enough disclaimers have been given as to what this understanding is and isn’t good for and necessary for.

Er, locked out of the edit window (as I so often am):

Thus, we can technically pretend that our code doesn’t actually have any side-effects, which makes many things easier to reason about. But, all the same, cleverly, trickily, when it comes time to execute our code, as the very last step, after having generated such a blueprint, the Haskell interpreter is designed to actually carry out the procedure specified in the blueprint, side effects and all. Thus, we have our cake and eat it too.

(Incidentally, re-reading my posts above, I note that the text is very clunky in some places (for some reason, I often rush to compose and submit posts quickly rather than leisurely waiting till I’ve gotten them just right, with the expected results), so, if anything in my posts is confusing, please ask me to clarify)

Let’s toss another example of a monad (the term still at the moment undefined) into the mix: Maybe. A value of type Maybe T is either a value of type T, tagged with the label “Just”, or a special value called “Nothing”. Thus, the values of type Maybe Boolean are Just True, Just False, and Nothing; the values of type Maybe Integer are Nothing, Just 0, Just 1, Just -1, Just 2, Just -2, and so on. The reason I introduce this is because it’s another nice example of a monad (actually, every type constructor I’ve mentioned so far happens to give a monad, but for our pedagogical purposes, some are more natural than others).

So, the three monads I want to talk about right now are Maybe, List, and IO. What is it that these all have in common?

Answer*: For each of these type constructors M, we can naturally think of normal functions from the type A to the type M B as some special kind of “generalized functions” from the type A to the type B. For example, a function of type A -> Maybe B is like a function from A to B with the special ability to “abort” and return no value at all (i.e., to return Nothing). A function of type A -> List of Bs is like a multi-valued function from A to B, which can return more than one value at a time. And, of course, a function from A to IO B is like a special function from A to B with the ability to be nondeterministic, carry out side-effects, and all the usual craziness. Whenever we have a type constructor which can be viewed in this manner as giving a notion of “generalized function”, we call it a monad.

Alright, let’s make this informal definition more precise. What makes for a suitable notion of “generalized function”? Well, let’s say there are just two requirements:

  1. Just as normal functions include do-nothing “identity functions”, there should also be “identity generalized functions”.
  2. Just as normal functions can be composed together, with the output of one becoming the input to another, we should also have a way to compose together generalized functions.

Thus, our more precise definition of a monad is this:
A) A type constructor M. [For convenience’s sake, we shall say “M-function from A to B” to mean “normal function from A to M B”].
B) An M-function from A to A which “does nothing” (at all types A). [In Haskell, this function is named “return”]
C) A way to compose M-functions from B to C and from A to B into one from A to C, as though chaining them together (again, at all types). [Let’s denote the composition of g and f as g # f. This isn’t actual Haskell syntax, but whatever. (As we’ll see later, Haskell has special syntactic sugar that allows one to work with monads as though simply writing in a typical imperative programming language)]

Formally, the “do-nothing” condition in (B) is the condition that f, (return # f), and (f # return) are all the same. Furthermore, we require that composition be associative, which is to say, (h # g) # f should be the same as h # (g # f).

And there you have it. That is the entire definition of a monad. To recap, a monad is a type constructor M along with a way to see “function from A to M B” as “generalized function from A to B”, complete with an associative notion of composition which also has identity maps.

[In the next post, I will illustrate this definition by demonstrating how Maybe and List are monads.]

[*: Incidentally, another way to look at all this is that Maybe, List, and IO all act like “container classes”; that is, for each of these type constructors M, a value of type M T is like a container which, in some way, can be thought of as holding values of type T. With Maybe, the container can contain 0 or 1 many things; with List, the container can obviously contain a whole sequence of things; and with IO, the container will store things in recipe-form. If you think about it, this isn’t far removed from the perspective used above: for suitable notions of “container”, we can naturally think of “container-valued functions” as generalized functions, and, conversely, can think of generalized functions taking 0 arguments of input as containers. Thus, if you like, you can think “monad = suitable notion of container” instead of “monad = suitable notion of generalized function”.]

Thanks for all the posts so far, Indistinguishable. I still have some questions but I’m going to read over them a couple more times first.

I appreciate all the effort you put into your replies.

I have to apologize; the definition of “monad” I gave is lacking one technical condition (I had somehow convinced myself it was redundant, but now think I was mistaken). I am torn as to how best to express this last condition, but I guess I might just as well list all the equivalent ways I am choosing between. The basic idea is that we want the structure of generalized function composition/application to be “compatible” with that of normal function composition/application, allowing us to think of it as a true generalization. That is, we want to think of every normal function as corresponding to a boring generalized functions that just doesn’t take advantage of the wider possibilities. However, when switching between looking at something as a normal function or a boring generalized function, it shouldn’t matter whether we carry out a composition/evaluation from the one viewpoint or the other.

So, here are a bunch of equivalent ways to formally state the mistakenly left-out condition and thus complete the definition of a monad [where, unless otherwise stated, each statement is meant to hold as generally as makes sense (i.e., for all values which make the types work out)]:

Way 1: (n # m) . f = n # (m . f) [where the dot denotes normal function composition (as in actual Haskell syntax)]. Thus, generalized composition and normal composition are so compatible that we can even kind of get an associative law going between them…

Way 2: There is a binary operator =<< such that (n # m) x = n =<< (m x). [I apologize for picking such a terrible-looking name as =&lt;&lt;, but this is actual Haskell syntax as well]. Think of this operator as “generalized application”, just as # was “generalized composition”.

Way 3: If m x = w y, then (n # m) x = (n # w) y

Way 4: return . (g . f) = (return . g) # (return . f)

Way 5: return (f x) = (return . f) # (return x)

Right, so, this must all look like precisely the gobbledygook symbol-soup you were hoping to avoid; I’ll work on figuring out a good way to express this condition in intuitive English and hopefully clear it up. (I’ll also try to motivate the definition some more too (explain why we would require that composition be associative, for example), and of course deliver on my promise of illustrating how Maybe and List actually are monads. And then, perhaps, we can move on to the separate topic of how to get stuff done with Haskell IO.). But I just wanted to get the quick correction in now, before anyone called me out on it…

So, again, the definition of “monad” I gave above was almost complete, but missing one condition. I’ve described that missing condition in this post, but you shouldn’t worry if this post is nigh-unreadable; after I’ve rested and got some free time again, I’ll hopefully be able to explain this final condition better, and you can pretend this post never happened.

The more I look at the above post, the more I think you’d be best off ignoring it and waiting for the “idiot-readable”/“English” one instead (not because you’re an idiot, but because post #8 is poorly written, and I’d like to hew more strictly to your guidelines). Just know that I mistakenly left something out of the definition of a monad, and that I’ll be fixing it in my next post.

First, the term you are looking for is not idempotence, but referential transparency. What monads allow you to do is to recover equational reasoning about programs (not just functional programs, they were introduced by Moggi for reasoning about imperative programs) in the presence of side-effects. This is a good thing, not just for program design, but also for verifying software, to ensure it does what it’s supposed to.

Now, having said that, I originally wrote a huge post showing how monads can be developed from the ground up. Halfway through, I realised I was just repeating what this guy had said, and gave up.

(PS. You may find it easier to understand monads if you replace the word “monad” with “effect wrapper” wherever you see it, or similar. It’s recognised now that using the word “monad” in Haskell has been a mistake.)

The two necessary operations for monadic programming are 1) creating a monad from a pure value and 2) stringing existing monads together, so you can sequence computations. If you bear that in mind, then everything else is just the necessary type theory required to make these things work.

Just being very nitpicky about the terminology, but surely you mean to say something like “1) creating a monadic value from a pure value and 2) stringing existing monadic values together, so you can sequence computations”. That is, things like Just 5, [True, False, False], and getLine are not themselves monads; they are just monadic values. It is the type constructors Maybe, List, and IO (along with their associated functions) which are actually the monads.

Not that the terminology matters all that much. Indeed, not that knowing what “monads” are matters all that much. Oh, I tried to dissuade him, I really did. But he was so insistent… :slight_smile:

Oh, my – what a wonderful phraseology. I hope you don’t mind if I steal it.

Now I’m going to go back and attempt to grok the other posts…

Yes, you are right. I added that part as an edit, and had to type it quickly. It’s also worth noting that the IO monad is slightly different from all other monads, in that it is built into the language itself, and forms part of the monadic IO sublanguage of Haskell, as opposed to being defined in the Prelude or by the programmer.

I do want to re-emphasize, over and over, that one needn’t understand what monads are in order to understand how Haskell IO works, how it accomplishes what is desired of it, and how to use it at a level beyond cargo-culting code, just as one needn’t understand what monads are in order to intelligently work with lists or binary trees, one needn’t understand what a category is in order to grok Java or matrix multiplication, a child needn’t understand the defining characteristics of the clade Mammalia in order to know what a cow is, etc. In all of these cases, the knowledge of how the specific case is an instance of a more general structure can be illuminating, but it is not at all prerequisite, and conceivably even distracting. Maybe I should put this disclaimer at the top of every post, just to make absolutely sure no one gets the wrong idea. :slight_smile:

[Next actually substantive post coming soon]

The IO monad for people who simply don’t care.

Ok, at last, the (hopefully) long-awaited next posts.

Standard disclaimer: You should not want to know what monads are! (Well, maybe that’s a bit too strong. :))

Now, as for what monads are… Let’s recap (and correct one small omission). (If you want to skip the recap, I’ve put the stuff I’ve said before in blue)

The idea is that a monad is a type-constructor M, along with a way to see normal functions from A to M B as “generalized functions” from A to B. For example, List-functions can be seen as having the ability to return multiple values, Maybe-functions can be seen as having the ability to abort and return nothing, IO-functions can be seen as having the ability to carry out impure side effects, and so on. (Of course, this is just one view we can take of terms of type A -> M B; we can also, of course, look at these as normal functions. For example, a term of type A -> List B doesn’t have to be thought of as returning multiple values; we also can, and usually do, think of it it as just returning a single value which happens to be a list. A term of type A -> IO B doesn’t have to be thought of as carrying out side effects; we can also say it chugs along purely normally and just happen to return a blueprint for how one could carry out side effects. And so on. The usefulness of monads is in the ability to switch between these two perspectives.)

So what do we mean when we say we want to be able to regard normal functions from A to M B as “generalized functions” (what we’ll call “M-functions”) from A to B?

Well, as far as we’re concerned, the essence of functions is in the ability to sequence them together. What this means is that we want notions of “do this, then that” as well as “do nothing” for M-functions; thus, a notion of composition of M-functions, as well as an identity M-function, which satisfy the usual properties. More on what “the usual properties” are and why this is the essence of functions will come later (they comprise the definition of what is known as a “category”, a concept of great use in studying programming languages).

There’s also one thing I forgot to mention before: M-functions should generalize normal functions, in some sense. Thus, for every normal function f from A to B, there should be a corresponding M-function from A to B as well.

So, our precise, slightly corrected definition of a monad is the following:
A) A type constructor M [We want to be able to think of normal functions from A to M B as “M-functions” from A to B].
B) A way to compose M-functions from B to C and from A to B into one from A to C, as though chaining them together. [We denote the M-function composition of g and f as g # f, meaning “first do f, then do g to the result”; for composition of normal functions, we will write . instead of #]. Formally, we demand that composition be well-behaved in the sense that h # g # f should be the same no matter how it is parenthesized [i.e., it doesn’t matter how one breaks a sequencing of operations down].
C) An identity M-function from A to A. [We’ll call this “return”]. Formally, we demand that this do nothing in the sense that f should be the same as return # f and f # return [i.e., it doesn’t matter if one introduces extraneous “do nothings” into a sequencing of operations].
D) A way to turn normal functions from A to B into corresponding M-functions from A to B [When f is the normal function, we’ll denote the corresponding M-function by <f>]. Formally, we demand that these actually correspond in the sense that g # <f> should be the same as g . f [i.e., when this kind of choice is available, it doesn’t matter if one sequences from the perspective that one is dealing with M-functions or the perspective that one is dealing with normal functions].

Great, that completes the definition of a monad. Now, as I promised, let’s show some actual examples, such as illustrating how Maybe and List are monads.

Why is Maybe a monad? Well…

A) Maybe is a type constructor. A Maybe-function is meant to be a function which has the special ability to abort and return nothing.
B) Given g :: B -> Maybe C and f :: A -> Maybe B, we can sequence them together into g # f :: A -> Maybe C by the following definition: (g # f) x is Nothing if f x is Nothing; otherwise, if f x has the form Just y, then (g # f) x is g y. The idea here is that, when sequencing together computations which may abort and return nothing, if such abortion happens at any step, then the whole sequence is considered to have aborted. Otherwise, everything just chugs through in the usual manner. With this definition, the behavior of h # g # f is independent of its parenthesization; thus, our associativity law is satisfied.
C) We can define a do nothing function return :: A -> Maybe A by return x = Just x. This just passes along the input it gets as output (in the appropriate wrapper), with no possibility of aborting and returning nothing. Clearly, inserting extraneous instances of this into a sequence of computations changes nothing, given our notion of sequencing; thus, our identity laws are satisfied.
D) Given a function f :: A -> B, we can define the corresponding <f> :: A -> Maybe B by <f> x = Just (f x). On any input, this simply outputs the same thing that f would (in the appropriate wrapper), with no possibility of aborting and returning nothing. Given our notion of sequencing, the behavior of g # <f> is to apply f to its input, then apply g to the result, which is the same as the behavior of (g . f); thus, our normal-function-generalization law is satisfied.

So there you have it. Maybe is a monad, naturally allowing us to express and sequence together computations which may “abort” without producing a value.

Before moving on, I figured I should probably ask:

Are the above styles of posts helpful? Is there anything unclear that I could disuss more or otherwise help clarify? Should I show more details or less? Should I be organizing the explanation in some other way? Explain or motivate the definition of monad more before going on? Show “how to use” monads, complete with Haskell special syntax? Should I start explaining what I meant by the fundamental importance of associative composition (i.e., “categories”) in analyzing programming? Have you taken my disclaimers to heart and realized you don’t actually care to learn about monads after all? :slight_smile:

If you give me some feedback, I can better suit your needs. (That’s the one advantage we have here over reading any of the many tutorials and explanations of monads available elsewhere on the web)

I think I understand what a monad is. I just don’t understand how it’s useful. I see statements about what they’re used for, but (perhaps because I know nothing about Haskell) don’t quite follow them. An illustration–maybe solving a problem without and then with a monad or monads–would help me. (But my question is tangential to the OP’s so this would be a slight hijack I suppose.)

It may be I’m just wrong in thinking I know what a monad is*, but in that case as well the kind of illustration I requested would still help.

-Kris

*Seems like a monad is basically a way to make a superset out of a set, where the superset includes “abort” element(s)? With some constraints to make sure the superset behaves just like the set in most circumstances? Maybe I totally don’t get it after all.

Whew, I finally got around to checking out this thread again. Thanks for all the information so far, Indistinguishable.

I’m afraid, after going over your introductory posts again, that there are some leaps of logic that I’m just not getting.

For example, in post #4, you say:

But I don’t see how this allows us to carry out side-effects. If instantiating (is that the right word?) an IO constructor returns a new function for, say, getting input from the user, and that function returns the input, then it still seems like we’re getting a different result for the same function call each time (the IO constructor.) I think I understand the idea of constructing a brand new function each time we want to cause a side-effect, though.

Capt. Ridley’s Shooting Party, thanks for the link – that was actually one of the many things I read before posting here, but unfortunately I got lost about halfway through when the author starts spewing a lot of Haskell code (as I mentioned I’m still very much the Haskell n00b.)

Maybe I’m approaching this the wrong way, and should learn a lot more Haskell before I try to learn the functional theory.

As I mentioned I’ve had a lot of good luck putting basic FP concepts to use in Perl, but the advanced stuff is making my brain melt. :frowning: