Tree @master (Download .tar.gz)
- ..
- 2014-02-03
- 4d-chess.md
- a-small-haskell-record-complaint
- cherenkov.md
- codata
- commonplace
- daemontools.html
- daemontools.md
- data-files-and-backpack.md
- distributed-git.md
- file-system
- formats.md
- higher-rank-trait-bounds.md
- idiomaticity.md
- in-defense-of-hm
- madlibs-intro.md
- madlibs-ml.md
- madlibs-scheme.md
- multiwhile.md
- orwell.md
- resources-laziness-and-cps.md
- script-based-data-analysis
- semigroup.html
- semigroup.md
- sexprs.md
- shift-resolve-parsing.md
- some-rust-errors.md
- static-versus-dyn
- tansu
- tc-post.md
- tc-post.md.bak
- telml.md
- typeclasses.md
- writer-leak.md
sexprs.md @master — view markup · raw · history · blame
I did not like any of the S-Expression libraries that are available for Haskell, for various reasons. Sometimes the API was quite poor. Sometimes the implementation was wonky. Sometimes it was just a license issue. Whatever it was, I felt like it was possible to do better.
So I did.
In the process, I completely over-engineered the library. It does a lot, perhaps more than necessary. It understands several flavors of S-Expressions (Common Lisp's, Scheme's, Rivest's, and Clojure's), exposing them in different ways depending on how the consumer wants to use them, with optional support for various reader macros. It should be easy for a library user to extend it in various ways.
That said, I hope it should be useful enough!
S-Expression Representations
S-Expressions are generally built up of cons pairs, which are written
( x . y )
and empty lists, which can be written differently depending on the language
NIL ;; Common Lisp
() ;; also Common Lisp
'() ;; Scheme, which is sugar for
(quote ()) ;; also Scheme
;; etc.
So one way of exposing an S-Expression based API is to use a data type like:
data SExpr
= Atom {- some atom representation -}
| Cons SExpr Sexpr
| Nil
…but this also isn't how S-Expressions are usually used. You rarely see dotted pairs in practice; instead, most S-Expressions are well-formed lists, which are sugar for sequences of cons pairs that end in an empty list:
(a b c d) === (a . (b . (c . (d . ()))))
So perhaps we want this representation:
data SExpr
= Atom {- ... -}
| List [SExpr]
…but while this is acceptable for some uses, this API means we can't address the presence of dotted lists at all! So perhaps we should include them as a special case, making them easy to throw away in processing:
data SExpr
= Atom {- ... -}
| List [SExpr] -- for (a b ... c)
| DotList [SExpr] Sexpr -- for (a b ... c . d)
[library] always uses the first representation internally, but
exposes a version of each of these data types. The first is
called SExpr
, while the latter two are called WellFormedSExpr
and RichSExpr
. It also exposes, for each of these, a set of
pattern synonyms so that they can be used interchangeably.
For example, the code will work regardless of which S-Expression
type is being used:
import Data.SExpression.Repr.{- any -}
sum (Atom n) = n
sum (Cons x y) = sum x + sum y
sum Nil = 0
Which Atoms?
There is no "standard S-Expression" grammar, and the various kinds of S-Expressions understood by different tools have general commonality but can vary wildly in the specifics. For example, the data types understood by Common Lisp and Scheme are broadly similar but have mild differences, while the Rivest Internet Draft about S-Expressions as a transport mechanism has a very idiosyncratic set of atom types used to represent octet strings.
[Library] does not specify one, but instead exposes several. The
modules Data.SExpression.CommonLisp
, Data.SExpression.Scheme
,
and Data.SExpression.Rivest
all understand various flavors of
atoms: each defines its own Atom
type and functions to parse
and serialize S-expressions using that Atom
.
However, the machinery of S-expressions themselves (i.e. the part
that is basically just nested parens) is generic and shared
between the implementations. The SExpr
type is parameterized
by an atom type, which could be one of the built-in atom types:
Prelude> import qualified Data.SExpression.CommonLisp as CL
Prelude CL> import qualified Data.SExpression.Scheme as S
Prelude CL S> :t CL.parse
CL.parse :: Text -> SExpr CL.Atom
Prelude CL S> :t S.parse
S.parse :: Text -> SExpr S.Atom
Additionally, interfaces are present which would allow a user to
extend the library with their own notion of an atom. We can use
any type we want as an atom, so long as we can provide an
AttoParsec parser for it and a printer for it, we can build
up a SExprSpec
value, which describes a custom S-expression
format:
import Control.Applicative
import Data.SExpression.General
import Data.Text (Text, pack)
mySpec :: SExprSpec SExpr Int
mySpec = mkSExprSpec parseNum showNum
where parseNum = read <$> many1 digit
showNum = pack . show
myParse :: Text -> Either String (SExpr Int)
myParse = parseGeneral mySpec
myEmit :: SExpr Int -> Text
myEmit = emitGeneral mySpec
myParseRich :: Text -> Either String (RichSExpr Int)
myParseRich = parseGeneral (toRich mySpec)
main = do
assert $ myParse "(1 2 3)" == SCons 1 (SCons 2 (SCons 3 Nil))
assert $ pack "(1 2 3)" == myEmit (SCons 1 (SCons 2 (SCons 3 Nil)))
For the most part, you shouldn't need to do this, but it is possible in case you'd want to.
Reader Macros
Simple S-expressions are a common data storage format, but lots of Lisp-based uses of S-expressions include some facility for reader macros as well. The most commonly seen reader macro in most Lisps is introduce by the quote, which allows you to execute a