Haskell Type Equality Constraints

Earlier today I pasted a snippet of Haskell code into our work chat that contained a multi-parameter type class instance that looked more or less like this:

instance a ~ b => C a b where {- ... -}

The constraint a ~ b in this declaration is a type equality constraint: it means that a and b must be the same type. One of my coworkers saw this and remarked, more or less, "Why did you write it in that obtuse way? You could have just declared an instance for C a a, which means the same thing."1

Those two declarations actually don't mean the same thing—but I can't blame said coworker for thinking they did, because the difference between the two is subtle and probably won't come up unless you've done a fair amount of fiddling with type classes. To that end, I want to show an example where those two differ, and then explain why.

The Example

Let's say we have this (more than a little contrived) multi-parameter type class (which will require the MultiParamTypeclasses extension):

class PairOf a b where
  thePair :: (a, b)

I then define an instance that looks like this (which will require the FlexibleInstances extension):

instance Monoid a => PairOf a a where
  thePair = (mempty, mempty)

So far, so good! Now let's try a particular use of thePair in a (probably even more contrived) definition:

obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair

What do I expect to happen in this code snippet? I would hope that obtuseMempty ends up being the same as mempty. In particular, I want thePair to resolve to the instance I defined above, and consequently evaluate to (mempty, mempty), and then, fst (mempty, mempty) == mempty. What actually happens when I compile this?

• Could not deduce (PairOf a b0) arising from a use of ‘thePair’
  from the context: Monoid a
    bound by the type signature for:
               obtuseMempty :: Monoid a => a
    at sample.hs:11:1-29
  The type variable ‘b0’ is ambiguous
  Relevant bindings include obtuseMempty :: a (bound at sample.hs:12:1)
  These potential instance exist:
    instance Monoid a => PairOf a a -- Defined at sample.hs:8:10
• In the first argument of ‘fst’, namely ‘thePair’
  In the expression: fst thePair
  In an equation for ‘obtuseMempty’: obtuseMempty = fst thePair

Well, that's not what I had wanted, but I guess it makes sense: thePair has to be a pair (because of the use of fst), but since we never use the second element in thePair, there's no reason for GHC to believe that it's the same type as the first. However, let's try tweaking the definition a bit: instead of PairOf a a, let's define it as PairOf a b and then include a constraint that keeps a and b equal (which will require either GADTs or TypeFamilies enabled):

instance (Monoid a, a ~ b) => PairOf a b where
  thePair = (mempty, mempty)

If we try compiling obtuseMempty with this definition, it works without any errors! But why does one work while the other doesn't? What's the difference between the two? The answer has to do with a subtlety of how type class resolution works, a subtlety that can be hard to observe right until it stares you in the face.

How Type Class Resolution Works

When GHC searches for an appropriate instance of a type class, it has to check against all the instances you've defined and find one that "matches" your use of the type class. This is pretty easy when you define instances for a basic type like Int or Bool. For example, using the following (pretty useless) type class:

class MyClass t where
  myValue :: t

instance MyClass Int where myValue = 0
instance MyClass Bool where myValue = True

If I use myValue in a context where it's being interpreted as Int, then Haskell will search through the available instances and find our declaration of MyClass Int. That's simple enough! Things get a bit more complicated when we add types that include type parameters. For example, this is a definition of MyClass for pairs of values:

instance (MyClass a, MyClass b) => MyClass (a, b) where
  myValue = (myValue, myValue)

As a point of vocabulary, everything before the => in this definition is called the context, and everything after it is called the head. So now, if I use myValue in a context where it's being interpreted as (Int, Bool), then GHC will find the matching instance MyClass (a, b), and then in turn tries to search for two other instances: one for MyClass Int and another for MyClass Bool.

Notice the precise way I phrased that, because this is where the basic intuition around type class instances can diverge from the actual implementation: it looks like the instance definition is saying, "If we have an instance for MyClass a and an instance for MyClass b, then we can also have an instance for MyClass (a, b)." This is a reasonable intuition, but it's precisely backwards from what Haskell is actually doing. What happens instead is that, when it sees myValue being used as a pair, GHC will first select the instance MyClass (a, b) without having even examined the constraints in the context. Only after it's already committed to using the instance MyClass (a, b) will it then examine the context and try to satisfy the MyClass a and MyClass b constraints. Or, to say it in a more concise but jargon-heavy way, it first matches the head, and once it has done so, attempts to satisfy the context.

Consider what happens if write the following code without having defined an instance of the form MyClass ():

blah :: (Int, ())
blah = myValue

The precise error GHC gives us is: "No instance for (MyClass ()) arising from a use of myVal." GHC has already settled on the MyClass (a, b) instance, has found a satisfying instance for MyClass Int, but then cannot find an instance for MyClass ().

In this case, making that distinction seems a little bit pedantic. In code like this, it amounts to simply an implementation detail! It shouldn't really matter what order GHC is using underneath the surface: either way, the problem is that we don't have an instance for MyClass (), and as long as GHC reports that error to us in some way, it doesn't matter whether it first tries to satify the context before committing to to MyClass (a, b), or if it commits first and then tries to satisfy the context: we'll run into the same problem either way! But knowing the precise way GHC tries to instantiate type classes can be very useful in other contexts.

Repeated Types in Instance Heads

Let's look at our first problematic PairOf instance:

instance Monoid a => PairOf a a where
  thePair = (mempty, mempty)

Remember: GHC has to match the instance head first, and after that it solves the remaining constraints from the context! So, in the definition of obtuseMempty, what is the inferred type of the use of thePair?

obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair

It should be obvious, but for good measure, let's replace thePair with a hole and find out what type GHC is assigning based on the surrounding context:

• Found hole: _ :: (a, b0)
  Where: ‘a’ is a rigid type variable bound by
           the type signature for:
             obtuseMempty :: forall a. Monoid a => a
           at sample.hs:11:17
         ‘b0’ is an ambiguous type variable
• In the first argument of ‘fst’, namely ‘_’
  In the expression: fst _
  In an equation for ‘obtuseMempty’: obtuseMempty = fst _

So GHC needs to find an instance of the form MyPair t b0. Let's look at our instances: do we have one that has a matching head? …well, no, we only have MyPair a a.

"But wait!" you say. "We don't do anything with the second part of the pair, so why can't GHC choose MyPair a a as a valid instance?" But GHC can't read your mind, and there's nothing in your program to indicate that the type variables a and b0 are supposed to be the same. As soon as you somehow indicate that, GHC will comply and find your MyPair a a instance:

evenMoreObtuseMempty :: Monoid t => t
evenMoreObtuseMempty = let p = thePair in (fst p `mappend` snd p)

In this definition, it's clear that thePair is supposed to have two fields of the same type, because we're clearly using mappend to combine them. But is there a way of making it work even if we don't give GHC that extra nudge at the use site?

Type Equality to the Rescue

Let's move on to the second MyPair instance I described above:

instance (Monoid a, a ~ b) => PairOf a b where
  thePair = (mempty, mempty)

Using typical informal reasoning, it seems like this should be identical to the other instance definition: "It's an instance where both parameters are the same type, and that type needs to be a Monoid". But remember how GHC chooses instances: first by matching the head, then by elaborating the constraints. And this definition is different in an important way: the fact that the two type parameters are the same type is no longer a feature of the head, but rather a feature of the context! So when we revisit our definition of obtuseMempty with this new definition, GHC behaves differently:

obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair

The value thePair in this context still has the inferred type PairOf t b0 => (t, b0), but now we have an instance for PairOf a b, which means GHC can unify those and select the instance we want! And now that it has chosen that instance, GHC can take the constraints associated with that instance and try to solve them: the expression thePair now has the inferred type (Monoid t, t ~ b0) => (t, b0), which GHC can solve as: (Monoid t) => (t, t). After that, our function type-checks successfully!

So What's The Lesson?

A rough intuition is that you can think of type equality in code like this as helping you propagate the fact that types are equal in some contexts. In the examples above, we as programmers intended the two fields of thePair should have the same type, but the first definition—the MyPair a a one—meant we could only use myPair when GHC already knew those types were equal. By moving the type equality to the context, we can use myPair in places where the two types aren't already known to be distinct and introduce that fact. This isn't the only use of type equality—type equality is very important in the context of type families, for example—but it's nevertheless an important use.

GHC type class resolution is really very straightforward, even in the face of extensions like the ones we enabled in this post. In fact, part of the problem is that it's sometimes tempting to imagine GHC is doing something more clever than it actually is! Methodically stepping through the way GHC processes your code can really help understand situations like these, where intial intuitions sometimes lead you down the wrong path.

  1. Well, okay, they ACTUALLY assumed I wrote an excessively obtuse type signature as an elaborate way of trolling them. And to be fair, the code snippet I was pasting was a deliberately silly and ill-advised piece of code—but the use of type equality wasn't part of the joke.

    …well, if you MUST know, it was an instance definition very similar to the following:

    instance (a ~ b) => Num ((Integer -> a) -> b) where
      fromIntegral x f = f x
      {- the rest of this is not important here -}

    this allows you to use numeric literals as functions that take another function and apply that function to themselves. This gives you some cute syntactic sugar:

    newtype Pixels = Pixels { fromPixels :: Integer }
    width, height :: Pixels
    width = 320 Pixels
    height = 240 Pixels

    This is a disgusting and terrible, and please never do it.