gdritter repos documents / master posts / sexprs.md
master

Tree @master (Download .tar.gz)

sexprs.md @masterview markup · raw · history · blame

I did not like any of the S-Expression libraries that are available for Haskell, for various reasons. Sometimes the API was quite poor. Sometimes the implementation was wonky. Sometimes it was just a license issue. Whatever it was, I felt like it was possible to do better.

So I did.

In the process, I completely over-engineered the library. It does a lot, perhaps more than necessary. It understands several flavors of S-Expressions (Common Lisp's, Scheme's, Rivest's, and Clojure's), exposing them in different ways depending on how the consumer wants to use them, with optional support for various reader macros. It should be easy for a library user to extend it in various ways.

That said, I hope it should be useful enough!

S-Expression Representations

S-Expressions are generally built up of cons pairs, which are written

  ( x . y )

and empty lists, which can be written differently depending on the language

  NIL ;; Common Lisp
  ()  ;; also Common Lisp
  '() ;; Scheme, which is sugar for
  (quote ()) ;; also Scheme
  ;; etc.

So one way of exposing an S-Expression based API is to use a data type like:

data SExpr
  = Atom {- some atom representation -}
  | Cons SExpr Sexpr
  | Nil

…but this also isn't how S-Expressions are usually used. You rarely see dotted pairs in practice; instead, most S-Expressions are well-formed lists, which are sugar for sequences of cons pairs that end in an empty list:

  (a b c d) === (a . (b . (c . (d . ()))))

So perhaps we want this representation:

data SExpr
  = Atom {- ... -}
  | List [SExpr]

…but while this is acceptable for some uses, this API means we can't address the presence of dotted lists at all! So perhaps we should include them as a special case, making them easy to throw away in processing:

data SExpr
  = Atom {- ... -}
  | List [SExpr] -- for (a b ... c)
  | DotList [SExpr] Sexpr -- for (a b ... c . d)

[library] always uses the first representation internally, but exposes a version of each of these data types. The first is called SExpr, while the latter two are called WellFormedSExpr and RichSExpr. It also exposes, for each of these, a set of pattern synonyms so that they can be used interchangeably. For example, the code will work regardless of which S-Expression type is being used:

import Data.SExpression.Repr.{- any -}

sum (Atom n)   = n
sum (Cons x y) = sum x + sum y
sum Nil        = 0

Which Atoms?

There is no "standard S-Expression" grammar, and the various kinds of S-Expressions understood by different tools have general commonality but can vary wildly in the specifics. For example, the data types understood by Common Lisp and Scheme are broadly similar but have mild differences, while the Rivest Internet Draft about S-Expressions as a transport mechanism has a very idiosyncratic set of atom types used to represent octet strings.

[Library] does not specify one, but instead exposes several. The modules Data.SExpression.CommonLisp, Data.SExpression.Scheme, and Data.SExpression.Rivest all understand various flavors of atoms: each defines its own Atom type and functions to parse and serialize S-expressions using that Atom.

However, the machinery of S-expressions themselves (i.e. the part that is basically just nested parens) is generic and shared between the implementations. The SExpr type is parameterized by an atom type, which could be one of the built-in atom types:

Prelude> import qualified Data.SExpression.CommonLisp as CL
Prelude CL> import qualified Data.SExpression.Scheme as S
Prelude CL S> :t CL.parse
CL.parse :: Text -> SExpr CL.Atom
Prelude CL S> :t S.parse
S.parse :: Text -> SExpr S.Atom

Additionally, interfaces are present which would allow a user to extend the library with their own notion of an atom. We can use any type we want as an atom, so long as we can provide an AttoParsec parser for it and a printer for it, we can build up a SExprSpec value, which describes a custom S-expression format:

import Control.Applicative
import Data.SExpression.General
import Data.Text (Text, pack)

mySpec :: SExprSpec SExpr Int
mySpec = mkSExprSpec parseNum showNum
  where parseNum = read <$> many1 digit
        showNum  = pack . show

myParse :: Text -> Either String (SExpr Int)
myParse = parseGeneral mySpec

myEmit :: SExpr Int -> Text
myEmit = emitGeneral mySpec

myParseRich :: Text -> Either String (RichSExpr Int)
myParseRich = parseGeneral (toRich mySpec)

main = do
  assert $ myParse "(1 2 3)" == SCons 1 (SCons 2 (SCons 3 Nil))
  assert $ pack "(1 2 3)" == myEmit (SCons 1 (SCons 2 (SCons 3 Nil)))

For the most part, you shouldn't need to do this, but it is possible in case you'd want to.

Reader Macros

Simple S-expressions are a common data storage format, but lots of Lisp-based uses of S-expressions include some facility for reader macros as well. The most commonly seen reader macro in most Lisps is introduce by the quote, which allows you to execute a