posts/tc-post.md.bak - documents (master)

Tree @master (Download .tar.gz)

tc-post.md.bak @master — raw · history · blame

Earlier today I pasted a snippet of Haskell code into our work chat that contained a multi-parameter typeclass instance that looked more or less like this:

```.haskell
instance a ~ b => C a b where {- ... -}
```

The constraint `a ~ b` in this declaration is a type equality constraint: it means that `a` and `b` must be the same type. One of my coworkers saw this and remarked, more or less, "Why did you write it in that obtuse way? You could have just declared an instance for `C a a`, which means the same thing."[^ref]

[^ref]: Well, okay, they ACTUALLY assumed I wrote an excessively obtuse type signature as an elaborate way of trolling them. And to be fair, the code snippet I was pasting was a deliberately jokey and ill-advised piece of code: it's just that the type equality wasn't part of the joke.

    …well, if you MUST know, it was the following instance definition:

    ```
    instance (a ~ b) => Num ((Integer -> a) -> b) where
      fromIntegral x f = f x
      {- the rest of this is not important here -}
    ```

    this allows you to use numeric literals as functions that take another function and apply that function to themselves. This gives you some cute syntactic sugar:

    ```
    newtype Pixels = Pixels { fromPixels :: Integer }

    width, height :: Pixels
    width = 320 Pixels
    height = 240 Pixels
    ```

    This is a disgusting and terrible, and please never do it.

However, the two actually _don't_ mean the same thing—but I can't blame said coworker for thinking they did, because the difference between the two is subtle and probably won't come up unless you've done a fair amount of typeclass fiddling. To that end, I want to show an example where those two differ, and then explain why.

# The Example

Let's say we have this (more than a little contrived) multi-parameter typeclass (which will require the `MultiParamTypeclasses` extension):

```.haskell
class PairOf a b where
  thePair :: (a, b)
```

I then define an instance that looks like this (which will require the `FlexibleInstances` extension):

```.haskell
instance Monoid a => PairOf a a where
  thePair = (mempty, mempty)
```

So far, so good! Now let's try a particular use of `thePair` in a (probably even more contrived) definition:

```.haskell
obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair
```

What do I expect to happen in this code snippet? I would hope that `obtuseMempty` ends up being the same as `mempty`. In particular, I want `thePair` to resolve to the instance I defined above, and consequently evaluate to `(mempty, mempty)`, and then, `fst (mempty, mempty) == mempty`. What _actually_ happens when I compile this?

```
• Could not deduce (PairOf a b0) arising from a use of ‘thePair’
  from the context: Monoid a
    bound by the type signature for:
               obtuseMempty :: Monoid a => a
    at sample.hs:11:1-29
  The type variable ‘b0’ is ambiguous
  Relevant bindings include obtuseMempty :: a (bound at sample.hs:12:1)
  These potential instance exist:
    instance Monoid a => PairOf a a -- Defined at sample.hs:8:10
• In the first argument of ‘fst’, namely ‘thePair’
  In the expression: fst thePair
  In an equation for ‘obtuseMempty’: obtuseMempty = fst thePair
```

Well, _that's_ not what I had wanted, but I guess it makes sense: `thePair` has to be a pair (because of the use of `fst`), but since we never use the second element in `thePair`, there's no reason for GHC to believe that it's the same type as the first. However, let's try tweaking the definition a bit: instead of `PairOf a a`, let's define it as `PairOf a b` and then include a constraint that keeps `a` and `b` equal (which will require either `GADTs` or `TypeFamilies` enabled):

```.haskell
instance (Monoid a, a ~ b) => PairOf a b where
  thePair = (mempty, mempty)
```

If we try compiling `obtuseMempty` with this definition, it works without any errors! But _why_ does one work while the other doesn't? What's the difference between the two? The answer has to do with a subtlety of how typeclass resolution works, a subtlety that can be hard to observe right until it stares you in the face.

# How Typeclass Resolution Works

When GHC searches for an appropriate instance of a typeclass, it has to check against all the instances you've defined and find one that "matches" your use of the typeclass. This is pretty easy when you define instances for a basic type like `Int` or `Bool`. For example, using the following (pretty useless) typeclass:

```.haskell
class MyClass t where
  myValue :: t

instance MyClass Int where myValue = 0
instance MyClass Bool where myValue = True
```

If I use `myValue` in a context where it's being interpreted as `Int`, then Haskell will search through the available instances and find our declaration of `MyClass Int`. That's simple enough! Things get a bit more complicated when we add types that include type parameters. For example, this is a definition of `MyClass` for pairs of values:

```.haskell
instance (MyClass a, MyClass b) => MyClass (a, b) where
  myValue = (myValue, myValue)
```

As a point of vocabulary, everything before the `=>` in this definition is called the _context_, and everything after it is called the _head_. So now, if I use `myValue` in a context where it's being interpreted as `(Int, Bool)`, then GHC will find the matching instance `MyClass (a, b)`, and then in turn tries to search for two other instances: one for `MyClass Int` and another for `MyClass Bool`.

Notice the precise way I phrased that, because this is where the basic intuition around typeclass instances can diverge from the actual implementation: it looks like the instance definition is saying, "If we have an instance for `MyClass a` and an instance for `MyClass b`, then we can also have an instance for `MyClass (a, b)`." This is a reasonable intuition, but it's _precisely backwards_ from what Haskell is actually doing. What happens instead is that, given `myValue` used as a pair, Haskell first selects the instance `MyClass (a, b)` _regardless of whether the other constraints are met_. It then examines the constraints for that instance and tries to satisfy them _after having already committed to that instance definition_. Or, to say it in a more concise but jargon-heavy way, it first matches the head, and then handles satisfying the context.

Consider what happens if write the following code without having defined an instance of the form `MyClass ()`:

```.haskell
blah :: (Int, ())
blah = myValue
```

the precise error GHC gives us is: "No instance for `(MyClass ())` arising from a use of `myVal`." GHC has already settled on the `MyClass (a, b)` instance, has found a satisfying instance for `MyClass Int`, and then cannot find an instance for `MyClass ()`.

In this case, making that distinction seems a little bit pedantic. In this instance, it's really just an implementation detail! It shouldn't really matter _what_ order GHC is using underneath the surface: either way, the problem is that we don't have an instance for `MyClass ()`, and as long as GHC reports that error to us, it doesn't matter whether it commits to `MyClass (a, b)` first, or tries to solve the other constraints first.

# Repeated Types in Instance Heads

Let's look at our first problematic `PairOf` instance:

```.haskell
instance Monoid a => PairOf a a where
  thePair = (mempty, mempty)
```

Remember: GHC has to match the instance head first, and after that it solves the remaining constraints from the context! So, in the definition of `obtuseMempty`, what is the inferred type of the use of `thePair`?

```.haskell
obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair
```

It should be obvious, but for good measure, let's replace `thePair` with a hole and find out what type GHC is assigning based on the surrounding context:

```
• Found hole: _ :: (a, b0)
  Where: ‘a’ is a rigid type variable bound by
           the type signature for:
             obtuseMempty :: forall a. Monoid a => a
           at sample.hs:11:17
         ‘b0’ is an ambiguous type variable
• In the first argument of ‘fst’, namely ‘_’
  In the expression: fst _
  In an equation for ‘obtuseMempty’: obtuseMempty = fst _
```

So GHC needs to find an instance of the form `MyPair t b0`. Let's look at our instances: do we have one that has a matching head? …well, no, we only have `MyPair a a`.

"But wait!" you say. "We don't do anything with the second part of the pair, so why _can't_ GHC choose `MyPair a a` as a valid instance?" But GHC can't read your mind, and there's nothing in your program to indicate that the type variables `a` and `b0` are _supposed_ to be the same. As soon as you somehow indicate that, GHC will comply and find your `MyPair a a` instance:

```
evenMoreObtuseMempty :: Monoid t => t
evenMoreObtuseMempty = let p = thePair in (fst p `mappend` snd p)
```

In this definition, it's clear that `thePair` is supposed to have two fields of the same type, because we're clearly using `mappend` to combine them. But is there a way of making it work even if we don't give it that clue?

# Type Equality to the Rescue

Let's move on to the second `MyPair` instance I described above:

```
instance (Monoid a, a ~ b) => PairOf a b where
  thePair = (mempty, mempty)
```

Using typical informal reasoning, it seems like this should be identical to the other instance definition: "It's an instance where both parameters are the same type, and that type needs to be a `Monoid`". But remember how GHC chooses instances: _first_ by matching the head, _then_ by elaborating the constraints. And this definition is different in an important way: the fact that the two type parameters are the same type is no longer a feature of the _head_, but rather a feature of the _context_! So when we revisit our definition of `obtuseMempty` with this new definition, GHC behaves differently:

```
obtuseMempty :: Monoid t => t
obtuseMempty = fst thePair
```

The value `thePair` in this context still has the inferred type `PairOf t b0 => (t, b0)`, but now we have an instance for `PairOf a b`, which means GHC can unify those and select the instance we want! And _now that it has chosen that instance_, GHC can take the constraints associated with that instance and try to solve them: the expression `thePair` now has the inferred type `(Monoid t, t ~ b0) => (t, b0)`, which GHC can solve as: `(Monoid t) => (t, t)`. After that, our function typechecks successfully!

# So What's The Lesson?

A rough intuition is that you can think of many of the uses of type equality in practice as helping you _propagate_ the fact that types are equal in some contexts. In the simple example above, we as programmers intended the two fields of `thePair` should have the same type, but the `MyPair a a` definition meant we could only use `myPair` when GHC _already knew_ those types were equal. By moving the type equality to the context, we can use `myPair` in places where the two types aren't already known to be distinct and _introduce_ the fact that they are. This isn't the _only_ use of type equality—for example, type equality is very important in the context of type families, for example—but it's nevertheless an important one.

GHC typeclass resolution is really very straightforward, even in the fact of extensions like the ones we enabled in this post. In fact, part of the problem is that it's _very_ straightforward: in cases like these, it's sometimes tempting to imagine GHC is doing something more clever than it actually is!