Jump to:
Introduction
Why do we Need Monads?
The Monad Laws
Examples
Category Theory
Introduction
I'd like to preface this article by explaining why I think many people find monads confusing. In the
context of programming, a monad is not fundamentally unlike an interface; it's just that the functions
that a monad must implement (and perhaps the properties that the functions must satisfy) seem alien at
first. Many examples of a certain monad in action can be given, but they all seem unrelated, which can
be confusing. It's ultimately the existence of these many examples that makes monads such a
critical idea - it turns out that many useful notions of computation (to
use Moggi's terminology) can be encoded in a pure functional language as monads.
In other words: I think you think monads are confusing because you are looking for a motivation for the
definition, but the usefulness of the definition comes after we start applying it.
A pure functional program is simply a composition of functions. That is, a sequence of functions whose
domains and codomains match up so that input can be passed through consecutive functions. A pure
imperative program is a sequence of commands for the retrieval and manipulation of data. It's difficult
to see how commands like this could be composed in the way pure functions can be. A monad is a special
kind of operator on types that allows us to compose imperative actions in the same way that we do
functions. This is why monads are important - they allow us to do things like achieve side effects,
monitor state, and throw exceptions in a pure functional language.
It's best to know a little bit about Haskell syntax before reading this article. If you know any other
functional language you can probably catch on pretty easily. The only Haskell features that pop up are
lambdas, pattern matching, and type signatures. If you're totally new to Haskell, I would recommend
checking out Real World Haskell or
Miran Lipovača's Learn You a Haskell for Great
Good!
Why do we Need Monads?
Monads answer the question of how to get imperative behavior in a pure functional language. What
distinguishes functional behavior from imperative behavior? In a pure functional language the only
thing you can actually do is compose and
apply functions, where functions are always look-up tables that don't change and do nothing but return
a value whenever they are passed an argument of the correct type. Just like in all forms of
programming, it is common to solve a problem by
composing many functions, each of which solves part of the problem and is more understandable to the
programmer. Most compilers for imperative languages aren't optimized for this sort of behavior in its
purest form, which always uses recursion
instead of loops. Imperative programs are sometimes written as a sequence of commands; each one without
any input or output. A crude example:
void main() {
getInput();
processInput();
printOutput();
}
You could of course make processInput()
a pure function, bind the input to a
variable, pass that variable to your new pure function, and then print out the result. However, the act
of getting input and printing output are
still side effects. They can't be composed like ordinary functions. This doesn't mean there isn't any
way to compose them at all. Let's take a look at the defintion of a monad in Haskell:
class Monad m where
return :: a -> m a
(>>=):: m a -> (a -> m b) -> m b
(>>):: m a -> m b -> m b
What the first line of this declaration is saying is that there exists a class called Monad
whose instances, m
, act as wrappers for an arbitrary type a
. Below the
declaration is a number of functions
which this wrapper must implement. So it's kind of like an interface in object oriented programming.
The first function that our monad, m
, must implement is called return
, not at
all to be confused with the return statement in imperative programming. The type signature for this
function says that it goes from the arbitrary type a
into the monadic value of that
type, m a
, i.e. the monad wrapping that type. This is like the generic constructor that
classes/structs/objects have in object-oriented programming. That is to say, every monad must have at
least one type constructor. We'll get into what exactly is special about return when I talk about the
constraints that these functions must satisfy.
The next weird looking function is called the bind operator. Looking at it's type signature, we see
that it takes in m a
, followed by a function from a
to m b
, and
then outputs something in m b
. What could that be useful for? Well, let's return to the
example of the imperative program with side effects, there is a monad called IO
, which
deals with input/output actions, there is a value in IO String
called getLine
,
which secretly holds the value of the string that a user enters at the console. There is also a
function putStrLn
with type String -> IO ()
that simply takes a string as
input and outputs a monad action that prints it to the console. Recall that () is the type of tuples
that wrap no values - i.e. the type with only one element. Then I could write a Haskell program that
looks like this:
main = getLine >>= putStrLn . processInput
In this case, processInput
must be a function that takes any string to a string.
The "." represents the composition of processInput
and putStrLn
. So this
program executes the action of getting the input from the console, the bind operator then unwraps it
and sends it to a function that processes it and then prints the output to the console. I like to think
of the >>=
symbol as squeezing the value out of the monadic value and then injecting it
into the function on the right. Regardless of metaphor, what we've done is compose imperative actions
even though they don't output values! This may look very alien and non-imperative, but Haskell has
something called "do notation" that can be used to make particularly long sequences of monad actions
more readable.
Finally, >>
gives us a way to compose arbitrarily many monad actions of different types
(the monad must remain the same but the type that it wraps can be different for each of the actions
being composed), without having to explicitly unwrap any data. This means that >>
can
literally be read as "and then do" in code. For example:
main = (getLine >>= putStrLn . processInput) >> main
Is the exact same program that we constructed before except that it repeats itself every time after
printing. I like to think of the >>
symbol as an arrow telling the program where to go
next.
The Monad Laws
You can't just make any old parametrized type that implements functions with the above type signatures
and call it a monad. You could in Haskell, GHC doesn't check that an instance of Monad
obeys these rules; that would be extrmely difficult, so its up to the programmer. If you wrote an
instance of monad
that didn't satisfy the properties described below then it wouldn't be a
monad by the mathematical definition and it probably wouldn't be as useful as an actual monad. For a
monad to be a monad it's functions have to satisfy the following "monad laws:"
Left Identity: return a >>= f = f a
Right Identity: m >>= return = m
Associativity: (m >>= f) >>= g = m >>= (\x -> f x >>= g)
These requirements are a lot simpler than they appear. The first one says that if we wrap up a
value using return
, then we want >>=
to be able to unwrap that same value and
feed it to the next function. The second
condition says something similar it just looks more strange because there aren't any non-monadic values
in the definition. It says that if >>=
unwraps a value and feeds it to return
,
then we'll get the same wrapped
value back. Hopefully it makes sense that both of these things would be something desirable for a
default type constructor, which is basically what return
is.
As for the associativity condition, since function composition is a associative, i.e. (f . g) . h
= f . (g . h)
, we desire the same thing to be true when we compose imperative actions.
The left side of the equation represents feeding the value inside m
into f
and
then feeding the result into g
; the right side of the equation represents
first constructing a function (the lambda) that takes x
then feeds it into f
then into g
and then feeding the value inside m
into this lambda. The
associativity condition ensures that these two ways of feeding the value inside m
through
the functions are the same.
Examples
The example that I've already brought up is the IO
monad. IO stands for
Input/Output, so it's role is pretty self-exaplantory - monad actions get compiled to input/output
actions.
Another common example of a monad that isn't often referred to as a monad is [a]
,
the type of lists with elements from a
. Here's what the monad functions look like for
lists:
return x = [x]
xs >>= f = concat (map f xs)
xs >> ys = ys
return
simply takes a value and makes a list that contains only that value. The bind
operator maps the function onto the list that is passed to it, which makes a list of lists, and then
concat
flattens that list. The
then operator simply returns the second list. This may seem useless, but its not quite useless. In
functional programming the great thing about the type of lists is that it is an instance of Functor
.
This is the typeclass which gives us the map
function for lists, which takes a list and a
function and applies the function to every element of the list. It turns out that the monad structure
of the list type isn't completely essential for applications, but map
can be written in
terms of >>=
and return
(how?).
Maybe a
is a type that is probably less familiar to object-oriented programmers but it's
not at all difficult to understand:
Maybe a = Just a | Nothing
This means that Maybe a
, for some type a
, is either a box called
Just
with a value from a
inside it or it's a box called Nothing
with nothing inside it. This wrapper for types is excellent for representing computations that could
fail. Here's how it works as a monad:
return x = Just x
Nothing >>= f = Nothing
Just x >>= f = f x
Nothing >> _ = Nothing
Just _ >> m = m
In practice, the use of this type's monad functionality is extremely useful. Imagine this, you
have an important function, foo :: String -> Maybe String
, and it's broken into steps:
foo = stepThree . stepTwo . stepOne
.
But the failure that causes the output to be Nothing
could happen at any step,
i.e.:
stepOne :: String -> Maybe MyType
stepTwo :: MyType -> Maybe [Int]
stepThree :: [Int] -> Maybe String
So since the domains and ranges don't match up, foo = stepThree . stepTwo . stepOne
would actually cause a type error. What we can do, however, is say
foo = \x -> stepOne x >>= stepTwo >>= stepThree
And huzzah! We have a way
of composing computations that could fail without having to manually unwrap the value for each step!
I want to conclude this section by mentioning that if the metaphor of >>= unwrapping the value
in its first argument and injecting that value into the function that is its second argument, that's
okay. The Control.Monad
library provides a few variations of the bind operator:
(=<<) :: (a -> m b) -> m a -> m b
(>=>) (a -> m b) -> (b -> m c) -> a -> m c
The first operator can be thought of as if its "lifting" a function a -> m b
to a function m a -> m b
. And the second operator could be thought of as a literal
composition of functions, so we can rewrite our
important functions like so:
foo = stepOne >=> stepTwo >=> stepThree
Category Theory
Everything that I've covered so far should allow you to effectively use monads as a programmer. But
there is more to the definition of a monad. I'm assuming you're familiar with basic categry theoretic
terminology from this point on.
If you aren't, I'd recommend Bartosz Milewski's online book Category
Theory for Programmers, which has lots of C++ examples, or his
YouTube
series, which makes more use of Haskell.
The first thing we should discuss is monoids. Monoids are usually formulated in terms of sets.
Specifically, a monoid is a set $S$ with an operation $(\cdot) : S \times S \rightarrow S$ and $1\in S$
such that $$\forall s\in S \quad 1\cdot
s = s \cdot 1 = s$$ $$\forall a,b,c\in S \quad (a\cdot b)\cdot c = a \cdot (b\cdot c)$$ One familiar
example of a monoid is the type of lists of any data structure. That is, the identity, $1$, is the
empty list and the binary operation is concatenation.
Another example of interest is the set of all functions from a given set to itself; the identity is the
identity function and the operation is composition of functions.
In category theory however, we don't like talking about sets. We want to construct a monoid based on
the morphisms of some category. The first thing we have to do is define a monoidal category. A monoidal
category $\mathbb{C}$ is one
with a bifunctor, called the tensor product, $\otimes : \mathbb{C}\rightarrow \mathbb{C} \rightarrow
\mathbb{C}$ and an "identity object," $I$, such that $\forall A,\, B, \, C\in Ob(\mathbb{C})$ there is
an isomorphism $\alpha_{A,B,C} : (A \otimes
B)\otimes C \rightarrow A\otimes (B\otimes C)$ (associativity) as well as isomorphisms $\lambda_A :
I\otimes A \rightarrow A$ (left identity), and $\, \rho_A : A\otimes I \rightarrow A$ (right identity).
We can generalize the idea of a monoid categorically by defining a monoid in a monoidal category.
A monoid in the monoidal category $\mathbb{C}$ is an object $M$ in $\mathbb{C}$ with a morphism $\mu:
M\otimes M \rightarrow M$ and
another morphism $\eta: I \rightarrow M$. These morphisms must satisfy the following "coherence
conditions:" $$\mu \circ (id_M\otimes \mu) \circ \alpha_{M, M, M} = \mu \circ (\mu \otimes id_M)$$
$$\mu \circ (id_M \otimes \eta) = \lambda_M$$ $$\mu
\circ (\eta \otimes id_M) = \rho_M$$ Recall that $\alpha$ is an isomorphism that "reassociates" the
tensor product, so the first condition is analogous to associativity. The second and third conditions
are left identity and right identity, respectively.
The meaning of these conditions is made much more clear with diagrams, something I do not have but that
Wikipedia does. If one assumes
that the underlying category is the category
of sets, with $\otimes$ being the Cartesian product and $I$ being a set with one element, then the
original algebraic defintion of a monoid is recovered.
But again, sets are boring. What's another example of a monoidal category? The category of endofunctors
on some category $\mathbb{C}$. $\otimes$ is identified as the composition of endofunctors (which, like
the composition of all morphisms,
is associative) and the identity object is $id_{\mathbb{C}}$ - the identity endofunctor. So what would
a monoid in this category look like? Well, it would be an endofunctor $T$ with a natural tranformation
$\mu:T\circ T \rightarrow T$ and another
natural transformation $\eta : id_{\mathbb{C}} \rightarrow T$ that satisfy the conditions for a monoid;
that's what a monad is!
For a monad, the coherence conditions are as follows: $$\mu \circ T \eta = \mu \circ \eta T = id_T$$
$$\mu \circ T \mu = \mu \circ \mu T$$ In this case, the left and right identity morphisms, $\lambda_T$
and $\rho_T$, are the identity
on $T$. $\, (T\eta)_x=T(\eta_x)$ and $(\eta T)_x = \eta_{T\, x}$. Likewise for $T\mu$ and $\mu T$.
But wait how could any of that possibly relate to all the programming? To see, let's make a
Haskell type class:
class MyMonad M where
fun :: (a -> b) -> (M a -> M b)
mu:: M (M a) -> M a
eta :: a -> M a
fun
is the endofunctor itself. The endofunctor is acting on the category of all Haskell
types. A functor acts on both objects (types) and morphisms (functions). When the functor acts on a
morphism $f$ it yields a morphism whose domain is the functor acting on the domain of $f$, likewise for
the codoamin. This explains the type signature of the first function in our class. The others shouldn't
be too difficult to understand as they are equivalent to $\mu:M\circ M \rightarrow M$ and $\eta :
id_{\mathbb{C}} \rightarrow M$. The difference is that in the category-theoretic notation we consider
the endofunctor as acting on the entire category but in Haskell notation we are considering it as
acting on a particular object.
We also demand that these functions satisfy the above conditions for a monad. In Haskell,
these can be written like so:
eta . f = fun f . eta
nu . fun (fun f) = fun f . nu
Now that we have that set up, let's show that it's equivalent to the definition of monad in
Haskell. Haskell monads are actually called Kleisli triples in Category theory, so let's call our next
class that:
class KleisliTriple T where
ret :: a -> T a
>>== :: T a -> (a -> T b) -> T b
Notice that the then operator, >>
is not defined in a Kleisli triple.
That's because it actually has the following default implementation:
m1 >> m2 = m1 >>= (\_ -> m2)
The KleisliTriple
class must satisfy the "monad laws" discussed earlier in the
article. Now all that's left to do is the following:
instance KleisliTriple t => MyMonad t where
fun f m = m >>== (ret . f)
eta = ret
mu m = m >>== id
instance MyMonad t => KleisliTriple t where
ret = eta
m >>== f = mu (fun f m)
One can verify that if the conditions for either instance are assumed, the conditions for the
other follow. I won't show that here since I think its a useful exercise for the reader. Once that's
been done, however, you've proven that
Haskell monads are monoids on the category of endofunctors on the underlying category of Haskell types!