Earlier today I pasted a snippet of Haskell code into our work chat that contained a multi-parameter type class instance that looked more or less like this:
a ~ b in this declaration is a type equality constraint: it means that
b must be the same type. One of my coworkers saw this and remarked, more or less, “Why did you write it in that obtuse way? You could have just declared an instance for
C a a, which means the same thing.”1
Those two declarations actually don’t mean the same thing—but I can’t blame said coworker for thinking they did, because the difference between the two is subtle and probably won’t come up unless you’ve done a fair amount of fiddling with type classes. To that end, I want to show an example where those two differ, and then explain why.
Let’s say we have this (more than a little contrived) multi-parameter type class (which will require the
I then define an instance that looks like this (which will require the
So far, so good! Now let’s try a particular use of
thePair in a (probably even more contrived) definition:
What do I expect to happen in this code snippet? I would hope that
obtuseMempty ends up being the same as
mempty. In particular, I want
thePair to resolve to the instance I defined above, and consequently evaluate to
(mempty, mempty), and then,
fst (mempty, mempty) == mempty. What actually happens when I compile this?
• Could not deduce (PairOf a b0) arising from a use of ‘thePair’ from the context: Monoid a bound by the type signature for: obtuseMempty :: Monoid a => a at sample.hs:11:1-29 The type variable ‘b0’ is ambiguous Relevant bindings include obtuseMempty :: a (bound at sample.hs:12:1) These potential instance exist: instance Monoid a => PairOf a a -- Defined at sample.hs:8:10 • In the first argument of ‘fst’, namely ‘thePair’ In the expression: fst thePair In an equation for ‘obtuseMempty’: obtuseMempty = fst thePair
Well, that’s not what I had wanted, but I guess it makes sense:
thePair has to be a pair (because of the use of
fst), but since we never use the second element in
thePair, there’s no reason for GHC to believe that it’s the same type as the first. However, let’s try tweaking the definition a bit: instead of
PairOf a a, let’s define it as
PairOf a b and then include a constraint that keeps
b equal (which will require either
If we try compiling
obtuseMempty with this definition, it works without any errors! But why does one work while the other doesn’t? What’s the difference between the two? The answer has to do with a subtlety of how type class resolution works, a subtlety that can be hard to observe right until it stares you in the face.
How Type Class Resolution Works
When GHC searches for an appropriate instance of a type class, it has to check against all the instances you’ve defined and find one that “matches” your use of the type class. This is pretty easy when you define instances for a basic type like
Bool. For example, using the following (pretty useless) type class:
If I use
myValue in a context where it’s being interpreted as
Int, then Haskell will search through the available instances and find our declaration of
MyClass Int. That’s simple enough! Things get a bit more complicated when we add types that include type parameters. For example, this is a definition of
MyClass for pairs of values:
As a point of vocabulary, everything before the
=> in this definition is called the context, and everything after it is called the head. So now, if I use
myValue in a context where it’s being interpreted as
(Int, Bool), then GHC will find the matching instance
MyClass (a, b), and then in turn tries to search for two other instances: one for
MyClass Int and another for
Notice the precise way I phrased that, because this is where the basic intuition around type class instances can diverge from the actual implementation: it looks like the instance definition is saying, “If we have an instance for
MyClass a and an instance for
MyClass b, then we can also have an instance for
MyClass (a, b).” This is a reasonable intuition, but it’s precisely backwards from what Haskell is actually doing. What happens instead is that, when it sees
myValue being used as a pair, GHC will first select the instance
MyClass (a, b) without having even examined the constraints in the context. Only after it’s already committed to using the instance
MyClass (a, b) will it then examine the context and try to satisfy the
MyClass a and
MyClass b constraints. Or, to say it in a more concise but jargon-heavy way, it first matches the head, and once it has done so, attempts to satisfy the context.
Consider what happens if write the following code without having defined an instance of the form
The precise error GHC gives us is: “No instance for
(MyClass ()) arising from a use of
myVal.” GHC has already settled on the
MyClass (a, b) instance, has found a satisfying instance for
MyClass Int, but then cannot find an instance for
In this case, making that distinction seems a little bit pedantic. In code like this, it amounts to simply an implementation detail! It shouldn’t really matter what order GHC is using underneath the surface: either way, the problem is that we don’t have an instance for
MyClass (), and as long as GHC reports that error to us in some way, it doesn’t matter whether it first tries to satify the context before committing to to
MyClass (a, b), or if it commits first and then tries to satisfy the context: we’ll run into the same problem either way! But knowing the precise way GHC tries to instantiate type classes can be very useful in other contexts.
Repeated Types in Instance Heads
Let’s look at our first problematic
Remember: GHC has to match the instance head first, and after that it solves the remaining constraints from the context! So, in the definition of
obtuseMempty, what is the inferred type of the use of
It should be obvious, but for good measure, let’s replace
thePair with a hole and find out what type GHC is assigning based on the surrounding context:
• Found hole: _ :: (a, b0) Where: ‘a’ is a rigid type variable bound by the type signature for: obtuseMempty :: forall a. Monoid a => a at sample.hs:11:17 ‘b0’ is an ambiguous type variable • In the first argument of ‘fst’, namely ‘_’ In the expression: fst _ In an equation for ‘obtuseMempty’: obtuseMempty = fst _
So GHC needs to find an instance of the form
MyPair t b0. Let’s look at our instances: do we have one that has a matching head? …well, no, we only have
MyPair a a.
“But wait!” you say. “We don’t do anything with the second part of the pair, so why can’t GHC choose
MyPair a a as a valid instance?” But GHC can’t read your mind, and there’s nothing in your program to indicate that the type variables
b0 are supposed to be the same. As soon as you somehow indicate that, GHC will comply and find your
MyPair a a instance:
In this definition, it’s clear that
thePair is supposed to have two fields of the same type, because we’re clearly using
mappend to combine them. But is there a way of making it work even if we don’t give GHC that extra nudge at the use site?
Type Equality to the Rescue
Let’s move on to the second
MyPair instance I described above:
Using typical informal reasoning, it seems like this should be identical to the other instance definition: “It’s an instance where both parameters are the same type, and that type needs to be a
Monoid”. But remember how GHC chooses instances: first by matching the head, then by elaborating the constraints. And this definition is different in an important way: the fact that the two type parameters are the same type is no longer a feature of the head, but rather a feature of the context! So when we revisit our definition of
obtuseMempty with this new definition, GHC behaves differently:
thePair in this context still has the inferred type
PairOf t b0 => (t, b0), but now we have an instance for
PairOf a b, which means GHC can unify those and select the instance we want! And now that it has chosen that instance, GHC can take the constraints associated with that instance and try to solve them: the expression
thePair now has the inferred type
(Monoid t, t ~ b0) => (t, b0), which GHC can solve as:
(Monoid t) => (t, t). After that, our function type-checks successfully!
So What’s The Lesson?
A rough intuition is that you can think of type equality in code like this as helping you propagate the fact that types are equal in some contexts. In the examples above, we as programmers intended the two fields of
thePair should have the same type, but the first definition—the
MyPair a a one—meant we could only use
myPair when GHC already knew those types were equal. By moving the type equality to the context, we can use
myPair in places where the two types aren’t already known to be distinct and introduce that fact. This isn’t the only use of type equality—type equality is very important in the context of type families, for example—but it’s nevertheless an important use.
GHC type class resolution is really very straightforward, even in the face of extensions like the ones we enabled in this post. In fact, part of the problem is that it’s sometimes tempting to imagine GHC is doing something more clever than it actually is! Methodically stepping through the way GHC processes your code can really help understand situations like these, where intial intuitions sometimes lead you down the wrong path.
Well, okay, they ACTUALLY assumed I wrote an excessively obtuse type signature as an elaborate way of trolling them. And to be fair, the code snippet I was pasting was a deliberately silly and ill-advised piece of code—but the use of type equality wasn’t part of the joke.
…well, if you MUST know, it was an instance definition very similar to the following:
this allows you to use numeric literals as functions that take another function and apply that function to themselves. This gives you some cute syntactic sugar:
This is a disgusting and terrible, and please never do it.↩