gdritter repos documents / master posts / semigroup.md
master

Tree @master (Download .tar.gz)

semigroup.md @masterview markup · raw · history · blame

As part of recent type-refactoring efforts in Haskell, a discussion about adding Semigroup as a parent class of Monoid has been bouncing around the mailing list. From a theoretical point of view, this is a great idea: it is more flexible than the current approach that would allow for greater expressibility.

From a practical point of view, however, I am inclined to oppose it. Not because it is in itself a bad change—it's a very reasonable change that has advantages for new code—but because I have, in the past, had to update large systems written in Haskell after GHC updates, and therefore I know that this kind of change has a cost. The Applicative-Monad changes seem to have made way for the Foldable-Traversable Prelude, but those have in turn inspired a number of suggestions for modifications to the Haskell standard library, each one of which, taken on its own, is reasonable, but taken as a mass, mean much more work for maintainers. This is especially true if we continue on this path of making minor refactorings at each release: each year a project will need changes, or it will quickly bit-rot beyond utility.

Default Superclass Instances

There is, however, an alternative I would like to discuss. Some of these changes—especially the Semigroup/Monoid split—seem to involve taking the functionality of a class and splitting it into multiple smaller classes. For example, we can think of the Semigroup/Monoid change as converting

class Monoid m where
  mempty  :: m
  mappend :: m -> m -> m

into4

class Semigroup m where
  mappend :: m -> m -> m

class Semigroup m => Monoid m where
  mempty :: m

Something that has been proposed before (in a few different forms) and which I suggest be more actively considered if changes like these are to become common is to allow superclass instances to be declared within a subclass declaration. This would allow you to write a single instance declaration for a class, and in that body also include implementations for methods belong to a superclass of that class by some means1:

newtype MyId a = MyId a

instance Monad MyId where
  -- Functor method
  fmap f (MyId x) = MyId (f x)

  -- Applicative methods
  pure = MyId
  MyId f <*> MyId x = MyId (f x)

  -- Monad methods
  return = MyId
  MyId x >>= f = f x

For the Monoid/Semigroup proposal, this would mean that any Monoid instances that exist would continue to work unchanged, but new instances could (optionally) split apart their declarations. Under this proposal, either of these would be acceptable:

class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m

-- combined `instance` declarations:
instance Monoid [a] where
  mempty = []
  mappend = (++)

or, equivalently,

class Semigroup m where mappend :: m -> m -> m
class Semigroup m => Monoid m where mempty :: m

-- split apart `instance` declarations
instance Semigroup [a] where
  mappend = (++)

instance Monoid [a] where
  mempty = []

And because the Monoid declaration for [] is already written like the former, we can make the Semigroup/Monoid split without having to rewrite the instance declarations!

Because this lowers the cost of updating for new versions, various other useful changes might be considered that would otherwise involve far too much breakage. For example, we could consider splitting Num apart into small constituent parts3:

class Add a  where (+)  :: a -> a -> a
class Sub a  where (-)  :: a -> a -> a
class Zero a where zero :: a

class Mul a  where (*)  :: a -> a -> a
class One a  where one  :: a

class FromInteger a where
  fromInteger :: Integer -> a

  instance Zero a where zero = fromInteger 0
  instance One a where one = fromInteger 1

class Signed a where
  negate :: a -> a
  abs    :: a -> a
  signum :: a -> a

class ( Eq a
      , Show a
      , Add a
      , Sub a
      , Mul a
      , FromInteger a
      , Signed a) => Num a where

which would allow certain numeric types to only implement a subset of the relevant operations:

data Nat = Zero | Succ Nat

instance Add Nat where
  Z   + y = s
  S x + y = S (x + y)

{- et cetera --- but no implementations for e.g. Signed,
 - which is not meaningful for `Nat`!
 -}

and also allow current Num functions to have a looser set of constraints than they do at present:

sum :: (Zero a, Add a) => [a] -> a
sum (x:xs) = x + sum xs
sum []     = zero

prod :: (One a, Mul a) => [a] -> a
prod (x:xs) = x * prod xs
prod []     = one

We could also consider splitting Arrow2 into distinct components:

class Category a => Pairwise a where
  first  :: a b c -> a (b, d) (c, d)
  second :: a b c -> a (d, b) (d, c)
  (***) :: a b c -> a b' c' -> a (b, b') (c, c')
  (&&&) :: a b c -> a b c' -> a b (c, c')

class Pairwise a => Arrow a where
  arr :: (b -> c) -> a b c

or even (dare I dream) splitting Bits into something that is not a 22-method monstrosity!

Potential Drawbacks

On the other hand, this proposal does have some down-sides:

Grepping for Instance Declarations

Right now, I can often find an instance declaration for a type T by grepping for instance C T (modulo some line noise) whereas with this change, it's possible that there is no declaration for instance C T, because all of C's methods are declared by a subclass C' instead. The compiler ought to be able to deal with this without problem, which means that tools like Haddock documentation should somewhat alleviate this problem, but a user might be confused.

Introduces New Possible Errors

The declarations below are of course nonsensical, and would be rejected by the compiler—but the fact that this change would introduce new failure conditions at all is a drawback of the proposal.

instance Semigroup Int where
  mappend = (+)

instance Monoid Int where
  -- this conflicts with an existing declaration
  mappend = (*)
  mempty  = 1

A Pragma-Less Extension

In order to be really useful, we'd want to use this without a LANGUAGE pragma. After all, we're arguing for it on the basis of preserving backwards-compatibility, but that argument is much less compelling if we still have to change the source files to make use of it! On the other hand, that means we'd have included a GHC extension that takes effect despite not being enabled, which is also a worrying precedent!

It still might be a useful extension even if it had to be enabled by a LANGUAGE pragma, as it is easier to add said pragma to a source file than to manually break apart dozens of instance declarations, but it makes this feature less compelling in general.

In Conclusion

As I said before, my primary objection to changes of the above nature is that they are breaking. I don't want to have to modify a handful of miscellaneous instance declarations on a yearly basis as people discover new levels of abstraction or become dissatisfied with current choices, especially as those changes will grow more extensive as I build more projects in Haskell! But with an extension like this, we could grow the typeclass ecosystem gradually and fix what we see as past warts while maintaining backwards-compatibility, which would be a very powerful tool to have.

abstract over such a definition using existing tools—but the structure of the Arrow typeclass makes it difficult!

Num instance declaration will not have to change anything to implement those classes!

version to a footnote.


  1. This isn't a concrete proposal, so maybe the actual syntax or semantics of these things should be changed! I want to focus on the feature and not the instantiation

  2. I have on numerous occasions had reason to use the Arrow abstraction, but haven't had a sensible implementation of arr. To use a contrived example, I could define a GADT that can describe the structure of boolean circuits in a way that resembles an Arrow, but has no way of expressing arr:

    ~~~~{.haskell} data Circ a b where BoolFunc :: (Bool -> Bool) -> Circ Bool Bool Ident :: Circ a a Compose :: Circ a b -> Circ b c -> Circ a c First :: Circ a b -> Circ (a, d) (b, d) Second :: Circ a b -> Circ (d, a) (d, b) Parallel :: Circ a b -> Circ a' b' -> Circ (a, a') (b, b') Fanout :: Circ a b -> Circ a b' -> Circ a (b, b')

    instance Category Circ where id = Ident (.) = flip Compose

    instance Arrow Circ where first = First second = Second (***) = Parallel (&&&) = Fanout arr = error "Nothing sensible to put here!"

    -- for example, using this definition, we can write xor: xor :: BoolCircuit (Bool, Bool) Bool xor = ((first Not >>> And) &&& (second Not >>> And)) >>> Or

    -- ...and using xor, write a half-adder: halfAdder :: BoolCircuit (Bool, Bool) (Bool, Bool) halfAdder = xor &&& And ~~~~

    This is not an unreasonable definition—it would be nice to 

  3. For this example, I added Zero and One classes so that a given type might implement an additive and multiplicative unit while not necessarily having a sensible FromInteger implementation. For example, it might not make sense to implement fromInteger for a complex number, but complex numbers clearly have an additive unit:

    ~~~~{.haskell} data Complex = Complex Float Float deriving (Eq, Show)

    instance Zero Complex where zero = Complex (0.0, 0.0) ~~~~

    This means that the Sum and Product monoids could be rewritten like:

    ~~~~{.haskell} newtype Product a = Product { getProduct :: a } deriving (Eq, Show)

    instance (One a, Mult a) => Monoid (Product a) where mempty = Product one x mappend y = Product (getProduct x * getProduct y) ~~~~

    Notice that I added Zero and One in such a way that an existing 

  4. This is perhaps more simplistic than we want: we can also use the existing Semigroup class from the semigroup package and then, in the Monoid class declaration, explain how to derive the methods of the superclass. This would look like:

    ~~~~{.haskell} class Semigroup m => Monoid m where mempty :: m mappend :: m -> m -> m

    instance Semigroup m where (.++.) = mappend ~~~~

    The example above is slightly simpler, which is why I relegated this