gdritter repos documents / master posts / sexprs.md
master

Tree @master (Download .tar.gz)

sexprs.md @masterview rendered · raw · history · blame

I did not like any of the S-Expression libraries that are available for
Haskell, for various reasons. Sometimes the API was quite poor. Sometimes
the implementation was wonky. Sometimes it was just a license issue.
Whatever it was, I felt like it was possible to do better.

So I did.

In the process, I _completely_ over-engineered the library. It does a
lot, perhaps more than necessary. It understands several flavors of
S-Expressions (Common Lisp's, Scheme's, Rivest's, and Clojure's),
exposing them in different ways depending on how the consumer wants
to use them, with optional support for various reader macros. It
should be easy for a library user to extend it in various ways.

That said, I hope it should be useful enough!

# S-Expression Representations

S-Expressions are generally built up of cons pairs, which are written

~~~~
  ( x . y )
~~~~

and empty lists, which can be written differently depending on the
language

~~~~
  NIL ;; Common Lisp
  ()  ;; also Common Lisp
  '() ;; Scheme, which is sugar for
  (quote ()) ;; also Scheme
  ;; etc.
~~~~

So one way of exposing an S-Expression based API is to use a data type
like:

~~~~{.haskell}
data SExpr
  = Atom {- some atom representation -}
  | Cons SExpr Sexpr
  | Nil
~~~~

…but this also isn't how S-Expressions are usually used. You rarely see
dotted pairs in practice; instead, most S-Expressions are well-formed
lists, which are sugar for sequences of cons pairs that end in an
empty list:

~~~~
  (a b c d) === (a . (b . (c . (d . ()))))
~~~~

So perhaps we want this representation:

~~~~{.haskell}
data SExpr
  = Atom {- ... -}
  | List [SExpr]
~~~~

but while this is acceptable for some uses, this API means we can't
address the presence of dotted lists at all! So perhaps we should
include them as a special case, making them easy to throw away in
processing:

~~~~{.haskell}
data SExpr
  = Atom {- ... -}
  | List [SExpr] -- for (a b ... c)
  | DotList [SExpr] Sexpr -- for (a b ... c . d)
~~~~

[library] always uses the first representation internally, but
exposes a version of each of these data types. The first is
called `SExpr`, while the latter two are called `WellFormedSExpr`
and `RichSExpr`. It also exposes, for each of these, a set of
pattern synonyms so that they can be used interchangeably.
For example, the code will work regardless of which S-Expression
type is being used:

~~~~{.haskell}
import Data.SExpression.Repr.{- any -}

sum (Atom n)   = n
sum (Cons x y) = sum x + sum y
sum Nil        = 0
~~~~

# Which Atoms?

There is no "standard S-Expression" grammar, and the various kinds of
S-Expressions understood by different tools have general commonality
but can vary wildly in the specifics. For example, the data types
understood by Common Lisp and Scheme are broadly similar but have
mild differences, while the Rivest Internet Draft about S-Expressions
as a transport mechanism has a very idiosyncratic set of atom types
used to represent octet strings.

[Library] does not specify one, but instead exposes several. The
modules `Data.SExpression.CommonLisp`, `Data.SExpression.Scheme`,
and `Data.SExpression.Rivest` all understand various flavors of
atoms: each defines its own `Atom` type and functions to parse
and serialize S-expressions using that `Atom`.

However, the machinery of S-expressions themselves (i.e. the part
that is basically just nested parens) is generic and shared
between the implementations. The `SExpr` type is parameterized
by an atom type, which could be one of the built-in atom types:

~~~~
Prelude> import qualified Data.SExpression.CommonLisp as CL
Prelude CL> import qualified Data.SExpression.Scheme as S
Prelude CL S> :t CL.parse
CL.parse :: Text -> SExpr CL.Atom
Prelude CL S> :t S.parse
S.parse :: Text -> SExpr S.Atom
~~~~

Additionally, interfaces are present which would allow a user to
extend the library with their own notion of an atom. We can use
any type we want as an atom, so long as we can provide an
AttoParsec parser for it and a printer for it, we can build
up a `SExprSpec` value, which describes a custom S-expression
format:

~~~~{.haskell}
import Control.Applicative
import Data.SExpression.General
import Data.Text (Text, pack)

mySpec :: SExprSpec SExpr Int
mySpec = mkSExprSpec parseNum showNum
  where parseNum = read <$> many1 digit
        showNum  = pack . show

myParse :: Text -> Either String (SExpr Int)
myParse = parseGeneral mySpec

myEmit :: SExpr Int -> Text
myEmit = emitGeneral mySpec

myParseRich :: Text -> Either String (RichSExpr Int)
myParseRich = parseGeneral (toRich mySpec)

main = do
  assert $ myParse "(1 2 3)" == SCons 1 (SCons 2 (SCons 3 Nil))
  assert $ pack "(1 2 3)" == myEmit (SCons 1 (SCons 2 (SCons 3 Nil)))
~~~~

For the most part, you shouldn't need to do this, but it is possible
in case you'd want to.

# Reader Macros

Simple S-expressions are a common data storage format, but lots of
Lisp-based uses of S-expressions include some facility for
_reader macros_ as well. The most commonly seen reader macro in
most Lisps is introduce by the quote, which allows you to execute
a