As part of recent type-refactoring efforts in Haskell,
a discussion about adding
[`Semigroup` as a parent class of `Monoid`](https://mail.haskell.org/pipermail/libraries/2015-March/025381.html)
has been bouncing around the mailing list.
From a theoretical point of view, this is a great idea:
it is more flexible than the current approach that would allow
for greater expressibility.
From a _practical_ point of view, however, I am inclined to oppose it.
Not because it is _in itself_ a bad change—it's a very reasonable
change that has advantages for new code—but
because I have, in the past, had to update large systems
written in Haskell after GHC updates, and therefore I know that
_this kind of change has a cost_. The Applicative-Monad changes
seem to have made way for the Foldable-Traversable Prelude, but
those have in turn inspired a number of suggestions for
modifications to the Haskell standard library, each one of which,
taken on its own, is reasonable, but taken as a mass, mean _much
more work_ for maintainers. This is _especially_ true if we continue
on this path of making minor refactorings at each release: each year
a project will need changes, or it will quickly bit-rot beyond
utility.
# Default Superclass Instances
There is, however, an alternative I would like to discuss.
Some of these changes—especially the `Semigroup`/`Monoid`
split—seem to involve taking the functionality of
a class and splitting it into multiple smaller classes. For
example, we can think of the `Semigroup`/`Monoid` change as
converting
~~~~{.haskell}
class Monoid m where
mempty :: m
mappend :: m -> m -> m
~~~~
into[^semi]
~~~~{.haskell}
class Semigroup m where
mappend :: m -> m -> m
class Semigroup m => Monoid m where
mempty :: m
~~~~
[Something that has been proposed before](https://ghc.haskell.org/trac/ghc/wiki/DefaultSuperclassInstances)
(in a [few](https://mail.haskell.org/pipermail/haskell-prime/2006-August/001587.html)
[different](https://wiki.haskell.org/Superclass_defaults)
[forms](https://wiki.haskell.org/Class_system_extension_proposal))
and which I suggest be
more actively considered if changes like these are to become
common is to allow _superclass instances to be declared within
a subclass declaration_. This would allow you to write a single
`instance` declaration for a class, and in that body _also include
implementations for methods belong to a superclass of that
class_ by some means[^note]:
[^note]: This isn't a concrete proposal, so maybe the actual syntax
or semantics of these things should be changed! I want to focus
on the _feature_ and not the _instantiation_.
~~~~{.haskell}
newtype MyId a = MyId a
instance Monad MyId where
-- Functor method
fmap f (MyId x) = MyId (f x)
-- Applicative methods
pure = MyId
MyId f <*> MyId x = MyId (f x)
-- Monad methods
return = MyId
MyId x >>= f = f x
~~~~
For the `Monoid`/`Semigroup` proposal, this would mean that any
`Monoid` instances that exist would continue to
work unchanged, but new instances could (optionally) split apart
their declarations. Under this proposal, either of these would
be acceptable:
~~~~{.haskell}
class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m
-- combined `instance` declarations:
instance Monoid [a] where
mempty = []
mappend = (++)
~~~~
or, equivalently,
~~~~{.haskell}
class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m
-- split apart `instance` declarations
instance Semigroup [a] where
mappend = (++)
instance Monoid [a] where
mempty = []
~~~~
And because the `Monoid` declaration for `[]`
[is already written like the former](http://hackage.haskell.org/package/base-4.8.0.0/docs/src/GHC-Base.html#line-227),
we can make the `Semigroup`/`Monoid` split without having to rewrite
the instance declarations!
Because this lowers the cost of updating for new versions, various
_other_ useful changes might be considered that would otherwise
involve far too much breakage. For example, we could consider
splitting `Num` apart into small constituent parts[^num]:
~~~~{.haskell}
class Add a where (+) :: a -> a -> a
class Sub a where (-) :: a -> a -> a
class Zero a where zero :: a
class Mul a where (*) :: a -> a -> a
class One a where one :: a
class FromInteger a where
fromInteger :: Integer -> a
instance Zero a where zero = fromInteger 0
instance One a where one = fromInteger 1
class Signed a where
negate :: a -> a
abs :: a -> a
signum :: a -> a
class ( Eq a
, Show a
, Add a
, Sub a
, Mul a
, FromInteger a
, Signed a) => Num a where
~~~~
which would allow certain numeric types to only implement a
subset of the relevant operations:
~~~~{.haskell}
data Nat = Zero | Succ Nat
instance Add Nat where
Z + y = s
S x + y = S (x + y)
{- et cetera --- but no implementations for e.g. Signed,
- which is not meaningful for `Nat`!
-}
~~~~
and also allow current `Num` functions to have a looser set of
constraints than they do at present:
~~~~{.haskell}
sum :: (Zero a, Add a) => [a] -> a
sum (x:xs) = x + sum xs
sum [] = zero
prod :: (One a, Mul a) => [a] -> a
prod (x:xs) = x * prod xs
prod [] = one
~~~~
We could also consider splitting `Arrow`[^arr] into distinct
components:
~~~~{.haskell}
class Category a => Pairwise a where
first :: a b c -> a (b, d) (c, d)
second :: a b c -> a (d, b) (d, c)
(***) :: a b c -> a b' c' -> a (b, b') (c, c')
(&&&) :: a b c -> a b c' -> a b (c, c')
class Pairwise a => Arrow a where
arr :: (b -> c) -> a b c
~~~~
or even (dare I dream) splitting `Bits` into
[something that is not a 22-method monstrosity](https://downloads.haskell.org/~ghc/7.8.2/docs/html/libraries/base-4.7.0.0/Data-Bits.html#t:Bits)!
# Potential Drawbacks
On the other hand, this proposal does have some down-sides:
## Grepping for Instance Declarations
Right now, I can often find an instance declaration for a type `T` by
grepping for `instance C T` (modulo some line noise) whereas with this
change, it's possible that there _is_ no declaration for
`instance C T`, because all of `C`'s methods are declared by a
subclass `C'` instead. The compiler ought to be able to deal with
this without problem, which means that tools like Haddock documentation
should somewhat alleviate this problem, but a user might be confused.
## Introduces New Possible Errors
The declarations below are of course nonsensical, and would be
rejected by the compiler—but the fact that this change would
_introduce new failure conditions at all_ is a drawback
of the proposal.
~~~~{.haskell}
instance Semigroup Int where
mappend = (+)
instance Monoid Int where
-- this conflicts with an existing declaration
mappend = (*)
mempty = 1
~~~~
## A Pragma-Less Extension
In order to be _really_ useful, we'd want to use this without a
`LANGUAGE` pragma. After all, we're arguing for it on the basis of
preserving backwards-compatibility, but that argument is much less
compelling if we still have to change the source files to make use
of it! On the other hand, that means we'd have included a GHC
_extension_ that takes effect despite not being enabled, which
is _also_ a worrying precedent!
It still might be a useful extension even if it had to be enabled by
a `LANGUAGE` pragma, as it is easier to add said pragma to a source
file than to manually break apart dozens of instance declarations,
but it makes this feature less compelling in general.
# In Conclusion
As I said before, my primary objection to changes of the above nature is
that they are _breaking_. I don't want to have to modify a handful of
miscellaneous instance declarations on a yearly basis as people
discover new levels of abstraction or become dissatisfied with current
choices, especially as those changes will grow more extensive
as I build more projects in Haskell! But with an extension like this,
we could grow the typeclass ecosystem gradually and fix what we see
as past warts _while maintaining backwards-compatibility_, which
would be a very powerful tool to have.
[^arr]: I have on numerous occasions had reason to use the
`Arrow` abstraction, but haven't had a sensible implementation
of `arr`. To use a contrived example, I could define a GADT that
can describe the structure of boolean circuits in a way that
resembles an Arrow, but has no way of expressing `arr`:
~~~~{.haskell}
data Circ a b where
BoolFunc :: (Bool -> Bool) -> Circ Bool Bool
Ident :: Circ a a
Compose :: Circ a b -> Circ b c -> Circ a c
First :: Circ a b -> Circ (a, d) (b, d)
Second :: Circ a b -> Circ (d, a) (d, b)
Parallel :: Circ a b -> Circ a' b' -> Circ (a, a') (b, b')
Fanout :: Circ a b -> Circ a b' -> Circ a (b, b')
instance Category Circ where
id = Ident
(.) = flip Compose
instance Arrow Circ where
first = First
second = Second
(***) = Parallel
(&&&) = Fanout
arr = error "Nothing sensible to put here!"
-- for example, using this definition, we can write xor:
xor :: BoolCircuit (Bool, Bool) Bool
xor = ((first Not >>> And) &&& (second Not >>> And)) >>> Or
-- ...and using xor, write a half-adder:
halfAdder :: BoolCircuit (Bool, Bool) (Bool, Bool)
halfAdder = xor &&& And
~~~~
This is not an unreasonable definition—it would be nice to
abstract over such a definition using existing tools—but the structure
of the `Arrow` typeclass makes it difficult!
[^num]: For this example, I added `Zero` and `One` classes so that a
given type might implement an additive and multiplicative unit while
not necessarily having a sensible `FromInteger` implementation. For
example, it might not make sense to implement `fromInteger` for a
complex number, but complex numbers clearly have an additive unit:
~~~~{.haskell}
data Complex = Complex Float Float
deriving (Eq, Show)
instance Zero Complex where
zero = Complex (0.0, 0.0)
~~~~
This means that the `Sum` and `Product` monoids could be rewritten like:
~~~~{.haskell}
newtype Product a = Product { getProduct :: a }
deriving (Eq, Show)
instance (One a, Mult a) => Monoid (Product a) where
mempty = Product one
x `mappend` y = Product (getProduct x * getProduct y)
~~~~
Notice that I added `Zero` and `One` in such a way that an existing
`Num` instance declaration will not have to change anything to implement
those classes!
[^semi]: This is perhaps more simplistic than we want: we can also use
the existing `Semigroup` class from
[the `semigroup` package](http://hackage.haskell.org/package/Semigroup-0.0.7/docs/Data-Semigroup.html#t:Semigroup)
and then, in the `Monoid` class declaration, explain how to derive the
methods of the superclass. This would look like:
~~~~{.haskell}
class Semigroup m => Monoid m where
mempty :: m
mappend :: m -> m -> m
instance Semigroup m where
(.++.) = mappend
~~~~
The example above is slightly simpler, which is why I relegated this
version to a footnote.