posts/semigroup.md - documents (master)

Tree @master (Download .tar.gz)

semigroup.md @master

cb58185

As part of recent type-refactoring efforts in Haskell,
a discussion about adding
[`Semigroup` as a parent class of `Monoid`](https://mail.haskell.org/pipermail/libraries/2015-March/025381.html)
has been bouncing around the mailing list.
From a theoretical point of view, this is a great idea:
it is more flexible than the current approach that would allow
for greater expressibility.

From a _practical_ point of view, however, I am inclined to oppose it.
Not because it is _in itself_ a bad change—it's a very reasonable
change that has advantages for new code—but
because I have, in the past, had to update large systems
written in Haskell after GHC updates, and therefore I know that
_this kind of change has a cost_. The Applicative-Monad changes
seem to have made way for the Foldable-Traversable Prelude, but
those have in turn inspired a number of suggestions for
modifications to the Haskell standard library, each one of which,
taken on its own, is reasonable, but taken as a mass, mean _much
more work_ for maintainers. This is _especially_ true if we continue
on this path of making minor refactorings at each release: each year
a project will need changes, or it will quickly bit-rot beyond
utility.

# Default Superclass Instances

There is, however, an alternative I would like to discuss.
Some of these changes—especially the `Semigroup`/`Monoid`
split—seem to involve taking the functionality of
a class and splitting it into multiple smaller classes. For
example, we can think of the `Semigroup`/`Monoid` change as
converting

~~~~{.haskell}
class Monoid m where
  mempty  :: m
  mappend :: m -> m -> m
~~~~

into[^semi]

~~~~{.haskell}
class Semigroup m where
  mappend :: m -> m -> m

class Semigroup m => Monoid m where
  mempty :: m
~~~~

[Something that has been proposed before](https://ghc.haskell.org/trac/ghc/wiki/DefaultSuperclassInstances)
(in a [few](https://mail.haskell.org/pipermail/haskell-prime/2006-August/001587.html)
[different](https://wiki.haskell.org/Superclass_defaults)
[forms](https://wiki.haskell.org/Class_system_extension_proposal))
and which I suggest be
more actively considered if changes like these are to become
common is to allow _superclass instances to be declared within
a subclass declaration_. This would allow you to write a single
`instance` declaration for a class, and in that body _also include
implementations for methods belong to a superclass of that
class_ by some means[^note]:

[^note]: This isn't a concrete proposal, so maybe the actual syntax
or semantics of these things should be changed! I want to focus
on the _feature_ and not the _instantiation_.

~~~~{.haskell}
newtype MyId a = MyId a

instance Monad MyId where
  -- Functor method
  fmap f (MyId x) = MyId (f x)

  -- Applicative methods
  pure = MyId
  MyId f <*> MyId x = MyId (f x)

  -- Monad methods
  return = MyId
  MyId x >>= f = f x
~~~~

For the `Monoid`/`Semigroup` proposal, this would mean that any
`Monoid` instances that exist would continue to
work unchanged, but new instances could (optionally) split apart
their declarations. Under this proposal, either of these would
be acceptable:

~~~~{.haskell}
class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m

-- combined `instance` declarations:
instance Monoid [a] where
  mempty = []
  mappend = (++)
~~~~

or, equivalently,

~~~~{.haskell}
class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m

-- split apart `instance` declarations
instance Semigroup [a] where
  mappend = (++)

instance Monoid [a] where
  mempty = []
~~~~

And because the `Monoid` declaration for `[]`
[is already written like the former](http://hackage.haskell.org/package/base-4.8.0.0/docs/src/GHC-Base.html#line-227),
we can make the `Semigroup`/`Monoid` split without having to rewrite
the instance declarations!

Because this lowers the cost of updating for new versions, various
_other_ useful changes might be considered that would otherwise
involve far too much breakage. For example, we could consider
splitting `Num` apart into small constituent parts[^num]:

~~~~{.haskell}
class Add a  where (+)  :: a -> a -> a
class Sub a  where (-)  :: a -> a -> a
class Zero a where zero :: a

class Mul a  where (*)  :: a -> a -> a
class One a  where one  :: a

class FromInteger a where
  fromInteger :: Integer -> a

  instance Zero a where zero = fromInteger 0
  instance One a where one = fromInteger 1

class Signed a where
  negate :: a -> a
  abs    :: a -> a
  signum :: a -> a

class ( Eq a
      , Show a
      , Add a
      , Sub a
      , Mul a
      , FromInteger a
      , Signed a) => Num a where
~~~~

which would allow certain numeric types to only implement a
subset of the relevant operations:

~~~~{.haskell}
data Nat = Zero | Succ Nat

instance Add Nat where
  Z   + y = s
  S x + y = S (x + y)

{- et cetera --- but no implementations for e.g. Signed,
 - which is not meaningful for `Nat`!
 -}
~~~~

and also allow current `Num` functions to have a looser set of
constraints than they do at present:

~~~~{.haskell}
sum :: (Zero a, Add a) => [a] -> a
sum (x:xs) = x + sum xs
sum []     = zero

prod :: (One a, Mul a) => [a] -> a
prod (x:xs) = x * prod xs
prod []     = one
~~~~

We could also consider splitting `Arrow`[^arr] into distinct
components:

~~~~{.haskell}
class Category a => Pairwise a where
  first  :: a b c -> a (b, d) (c, d)
  second :: a b c -> a (d, b) (d, c)
  (***) :: a b c -> a b' c' -> a (b, b') (c, c')
  (&&&) :: a b c -> a b c' -> a b (c, c')

class Pairwise a => Arrow a where
  arr :: (b -> c) -> a b c
~~~~

or even (dare I dream) splitting `Bits` into
[something that is not a 22-method monstrosity](https://downloads.haskell.org/~ghc/7.8.2/docs/html/libraries/base-4.7.0.0/Data-Bits.html#t:Bits)!

# Potential Drawbacks

On the other hand, this proposal does have some down-sides:

##  Grepping for Instance Declarations

Right now, I can often find an instance declaration for a type `T` by
grepping for `instance C T` (modulo some line noise) whereas with this
change, it's possible that there _is_ no declaration for
`instance C T`, because all of `C`'s methods are declared by a
subclass `C'` instead. The compiler ought to be able to deal with
this without problem, which means that tools like Haddock documentation
should somewhat alleviate this problem, but a user might be confused.

## Introduces New Possible Errors

The declarations below are of course nonsensical, and would be
rejected by the compiler—but the fact that this change would
_introduce new failure conditions at all_ is a drawback
of the proposal.

~~~~{.haskell}
instance Semigroup Int where
  mappend = (+)

instance Monoid Int where
  -- this conflicts with an existing declaration
  mappend = (*)
  mempty  = 1
~~~~

## A Pragma-Less Extension

In order to be _really_ useful, we'd want to use this without a
`LANGUAGE` pragma. After all, we're arguing for it on the basis of
preserving backwards-compatibility, but that argument is much less
compelling if we still have to change the source files to make use
of it! On the other hand, that means we'd have included a GHC
_extension_ that takes effect despite not being enabled, which
is _also_ a worrying precedent!

It still might be a useful extension even if it had to be enabled by
a `LANGUAGE` pragma, as it is easier to add said pragma to a source
file than to manually break apart dozens of instance declarations,
but it makes this feature less compelling in general.

# In Conclusion

As I said before, my primary objection to changes of the above nature is
that they are _breaking_. I don't want to have to modify a handful of
miscellaneous instance declarations on a yearly basis as people
discover new levels of abstraction or become dissatisfied with current
choices, especially as those changes will grow more extensive
as I build more projects in Haskell! But with an extension like this,
we could grow the typeclass ecosystem gradually and fix what we see
as past warts _while maintaining backwards-compatibility_, which
would be a very powerful tool to have.

[^arr]: I have on numerous occasions had reason to use the
`Arrow` abstraction, but haven't had a sensible implementation
of `arr`. To use a contrived example, I could define a GADT that
can describe the structure of boolean circuits in a way that
resembles an Arrow, but has no way of expressing `arr`:

    ~~~~{.haskell}
    data Circ a b where
      BoolFunc :: (Bool -> Bool) -> Circ Bool Bool
      Ident    :: Circ a a
      Compose  :: Circ a b -> Circ b c -> Circ a c
      First    :: Circ a b -> Circ (a, d) (b, d)
      Second   :: Circ a b -> Circ (d, a) (d, b)
      Parallel :: Circ a b -> Circ a' b' -> Circ (a, a') (b, b')
      Fanout   :: Circ a b -> Circ a  b' -> Circ a (b, b')

    instance Category Circ where
      id  = Ident
      (.) = flip Compose

    instance Arrow Circ where
      first  = First
      second = Second
      (***)  = Parallel
      (&&&)  = Fanout
      arr    = error "Nothing sensible to put here!"

    -- for example, using this definition, we can write xor:
    xor :: BoolCircuit (Bool, Bool) Bool
    xor = ((first Not >>> And) &&& (second Not >>> And)) >>> Or

    -- ...and using xor, write a half-adder:
    halfAdder :: BoolCircuit (Bool, Bool) (Bool, Bool)
    halfAdder = xor &&& And
    ~~~~

    This is not an unreasonable definition—it would be nice to
abstract over such a definition using existing tools—but the structure
of the `Arrow` typeclass makes it difficult!

[^num]: For this example, I added `Zero` and `One` classes so that a
given type might implement an additive and multiplicative unit while
not necessarily having a sensible `FromInteger` implementation. For
example, it might not make sense to implement `fromInteger` for a
complex number, but complex numbers clearly have an additive unit:

    ~~~~{.haskell}
    data Complex = Complex Float Float
      deriving (Eq, Show)

    instance Zero Complex where
      zero = Complex (0.0, 0.0)
    ~~~~

    This means that the `Sum` and `Product` monoids could be rewritten like:

    ~~~~{.haskell}
    newtype Product a = Product { getProduct :: a }
      deriving (Eq, Show)

    instance (One a, Mult a) => Monoid (Product a) where
      mempty        = Product one
      x `mappend` y = Product (getProduct x * getProduct y)
    ~~~~

    Notice that I added `Zero` and `One` in such a way that an existing
`Num` instance declaration will not have to change anything to implement
those classes!

[^semi]: This is perhaps more simplistic than we want: we can also use
the existing `Semigroup` class from
[the `semigroup` package](http://hackage.haskell.org/package/Semigroup-0.0.7/docs/Data-Semigroup.html#t:Semigroup)
and then, in the `Monoid` class declaration, explain how to derive the
methods of the superclass. This would look like:

    ~~~~{.haskell}
    class Semigroup m => Monoid m where
      mempty  :: m
      mappend :: m -> m -> m

      instance Semigroup m where
        (.++.) = mappend
    ~~~~

    The example above is slightly simpler, which is why I relegated this
version to a footnote.