This is a short, fast, and somewhat loose post to connect the dots between monads in math and monads in functional programming hopefully in a way that illuminates both and makes sense to readers from all parts of the math-programming spectrum.

Monads are an idea from category theory that have an equivalent but slightly different formulation in functional programming. A monoid is a set with a way to mush (multiply/mappend) two values together to make another value of the same type. A monad is a function with a way to mush two applications of itself together into one. So a monad is a function that behaves like a monoid. Monoids are ubiquitous: integers with addition, integers with multiplication, strings (of characters) with concatenation, booleans with `or`, booleans with `and`. The canonical example of a monad is `List`: `List` is a function that, given any datatype (set of values) `d`, returns another datatype `List d` whose values are lists of things of type `d`; `List` is a monad because we can always mush a list of lists into a list no matter what those lists are lists of. In general, a function `M` is a monad if it has a mushing map `M M -> M`, that is, a map `M (M d) -> M d` that works for all `d`.

Another monad is `Maybe`: a value of type `Maybe Integer` is either something of type `Just Integer` or it is `Nothing`. Since `Nothing` is one of its possible values, you can perform operations that fail, but still return legitimate values, and then handle them in your program without having to anticipate failure (if…else…) at every turn. The mushing map `Maybe (Maybe d) -> Maybe d` is obvious: `Nothing` of anything and anything of `Nothing` map to `Nothing`, `Just (Just d)` maps to `Just d`; you only succeed if everything succeeds.

Functional programmers use a slightly different formulation of monads because they want to compose functions that don’t exactly start and end in the right places. They want to do things like compose two functions `f:a —> Maybe b` and `g:b —> Maybe c`, each of which might fail, without having to repeat the logic each time of how to handle failure. In terms of a general monad `M`, this means that given functions `f: a —> M b` and `g:b —> M c`, we want a way to stick an `M b` into `g` even though `g` only knows how to eat a `b`. The trick is: apply `M` to `g` to make a new function `M g: M b —> M (M c)` which knows how to eat an `M b`, then use the mushing map `M M —> M` to turn the resulting `M (M c)` into an `M c`. This operation, called `bind` and written `>>=` in Haskell, is what functional programmers consider the defining characteristic of a monad.

A quick example before we see what “apply `M` to a function” means in general. Suppose` readLog: LogFile -> List String` is a function that takes a web log and produces a list of urls and `tokenize: String -> List String` tokenizes a url by splitting it on ‘/’ so `tokenize 'yes.com/no/maybe?q=unsure' = ['yes.com','no','maybe?q=unsure']`. Monad-composing these applies `tokenize` to every element of our list of urls–that’s what “apply `M` to `tokenize`” means when `M` is `List`–then mushes the resulting list of lists into a list. Monads aren’t fancy, they just abstract a common pattern so you can program at a high level without repeating yourself.

So, what does “apply `M` to a function” mean and how is this related to category theory? A category (in the sense of Eilenberg and  Mac Lane) consists of three things: objects, functions, and composition. A map between categories is called a functor; a functor `F:C -> D` has to map objects to objects, functions to functions, and composition to composition. The last two parts mean that `F` maps functions `h: a -> b` and `k:b -> c` in `C` to functions `F h: F a -> F b` and `F k: F b -> F c` in `D` and that this mapping of functions must satisfy `F(k.h) = (F k).(F h)`: if you compose first in `C` and then apply `F`, you must get the same thing as if you first apply `F`, then compose in `D`. So, the punchline is: the set of all datatypes is a category and “functions on datatypes” like `List` and `Maybe` are functors from this category to itself. Since they are functors, you can apply them not only to datatypes, but also to functions between datatypes: that’s what “apply `M` to a function” means. A monad is a functor from a category to itself that has a mushing map `M M —> M`. That `List` and `Maybe` are functors means that they map functions like `h` and `k` to functions `List h:List a —> List b` and `List k:List b —> List c` (and similarly for `Maybe`)* that you can compose as usual because they, too, start and end in the right places, but the fact that they are monads means that you can also use them to “compose” functions like `f:a —> Maybe b `and `g:b —> Maybe c` that don’t.

* When `M` is `List`,  `List h [x, y, z] = [h x, h y, h z]`; when `M` is `Maybe`,  `Maybe h (Just h) = Just (h x)` and `Maybe h Nothing = Nothing`.
Notes
• Inspiration and references. This post is inspired by Beckman’s Don’t Fear the Monad, a very beginner-friendly video which goes more thoroughly through monoids and monads as a tool for composition. Another wonderfully intuitive and thorough introduction to monads in programming is You Could Have Invented Monads! (And Maybe You Already Have). For more technical connectings of dots between ideas in category theory and ideas in Haskell, see Notions of Computations as Monoids and Category Theory for Programmers.
• List again, List again, jiggity jig. Mathematicians: `List d` is a monoid–the free monoid on the set `d`–and `List` is the free monoid functor: that’s why we get a monad. Non-mathematicians: every free functor comes with a forgetful functor and the two together always make a monad.
• Haskell punchline. F`or g: b -> M c`, if we call the mushing map `join` and write `fmap g` instead of `M g`, we said that (>>= g) = join . (fmap g). See Control.Monad and page 10 of  Notions of Computations as Monoids.
• Neglected. Monoids also have an identity element/mempty value mushing with which is the identity map. The analogous thing for monads is a map `d -> M d`, called `return` in Haskell, which is similarly an identity for mushing.
• Thank yous. This conversation started as an email exchange with my dad and continued at the white board with my colleague Scott. I reap maximum joy from these conversations so thank you both.