More explanation on the README
Getty Ritter
10 years ago
| 25 | 25 | someSpec :: SExprSpec atom carrier |
| 26 | 26 | ~~~~ |
| 27 | 27 | |
| 28 | Various functions will be provided that modify the carrier type (i.e. the | |
| 29 | output type of parsing or input type of serialization) or the language | |
| 30 | recognized by the parsing. Examples will be shown below. | |
| 31 | ||
| 32 | ## Representing S-expressions | |
| 33 | ||
| 28 | 34 | There are three built-in representations of S-expression lists: two of them |
| 29 | 35 | are isomorphic, as one or the other might be better for processing |
| 30 | 36 | S-expression data, and the third represents only a subset of possible |
| 32 | 38 | |
| 33 | 39 | ~~~~ |
| 34 | 40 | -- cons-based representation |
| 35 |
data SExpr atom |
|
| 41 | data SExpr atom | |
| 42 | = SCons (SExpr atom) (SExpr atom) | |
| 43 | | SNil | |
| 44 | | SAtom atom | |
| 36 | 45 | |
| 37 | 46 | -- list-based representation |
| 38 | 47 | data RichSExpr atom |
| 39 | 48 | = RSList [RichSExpr atom] |
| 49 | | RSDotList [RichSExpr atom] atom | |
| 50 | | RSAtom atom | |
| 51 | ||
| 52 | -- well-formed representation | |
| 53 | data WellFormedSExpr atom | |
| 54 | = WFSList [WellFormedSExpr atom] | |
| 55 | | WFSAtom atom | |
| 40 | 56 | ~~~~ |
| 57 | ||
| 58 | In the above, an `RSList [a, b, c]` and a | |
| 59 | `WFList [a, b, c]` both correspond to the structure | |
| 60 | `SCons a (SCons b (SCons d SNil))`, which corresponds to an | |
| 61 | S-expression which can be written as | |
| 62 | `(a b c)` or as `(a . (b . (c . ())))`. A `RSDotList` | |
| 63 | corresponds to an sequence of conses that does not terminate | |
| 64 | in an empty list, e.g. `RSDotList [a, b] c` corresponds to | |
| 65 | `SCons a (SCons b (SAtom c))`, which in turn corresponds to | |
| 66 | a structure like `(a b . c)` or `(a . (b . c))`. | |
| 67 | ||
| 68 | Functions for converting back and forth between | |
| 69 | representations are provided, but you can also modify a | |
| 70 | `SExprSpec` to parse to or serialize from a particular | |
| 71 | representation using the `asRich` and `asWellFormed` | |
| 72 | functions. | |
| 73 | ||
| 74 | ~~~~ | |
| 75 | *Data.SCargot.General> decode spec "(a b c)" | |
| 76 | Right [SCons (SAtom "a") (SCons (SAtom "b") (SCons (SAtom "c") SNil))] | |
| 77 | *Data.SCargot.General> decode (asRich spec) "(a b c)" | |
| 78 | Right [RSList [RSAtom "a",RSAtom "b",RSAtom "c"]] | |
| 79 | *Data.SCargot.General> decode (asWellFormed spec) "(a b c)" | |
| 80 | Right [WFSList [WFSAtom "a",WFSAtom "b",WFSAtom "c"]] | |
| 81 | *Data.SCargot.General> decode spec "(a . b)" | |
| 82 | Right [SCons (SAtom "a") (SAtom "b")] | |
| 83 | *Data.SCargot.General> decode (asRich spec) "(a . b)" | |
| 84 | Right [RSDotted [RSAtom "a"] "b"] | |
| 85 | *Data.SCargot.General> decode (asWellFormed spec) "(a . b)" | |
| 86 | Left "Found atom in cdr position" | |
| 87 | ~~~~ | |
| 88 | ||
| 89 | # Comments | |
| 90 | ||
| 91 | By default, an S-expression spec does not include a comment syntax, but | |
| 92 | the provided `withSemicolonComments` function will cause it to understand | |
| 93 | traditional Lisp line-oriented comments that begin with a semicolon: | |
| 94 | ||
| 95 | ~~~~ | |
| 96 | *Data.SCargot.General> decode spec "(this ; has a comment\n inside)\n" | |
| 97 | Left "Failed reading: takeWhile1" | |
| 98 | *Data.SCargot.General> decode (withSemicolonComments spec) "(this ; has a comment\n inside)\n" | |
| 99 | Right [SCons (SAtom "this") (SCons (SAtom "inside") SNil)] | |
| 100 | ~~~~ | |
| 101 | ||
| 102 | Additionally, you can provide your own comment syntax in the form of an | |
| 103 | AttoParsec parser. Any AttoParsec parser can be used, so long as it meets | |
| 104 | the following criteria: | |
| 105 | - it is capable of failing (as is called until SCargot believes that there | |
| 106 | are no more comments) | |
| 107 | - it does not consume any input in the case of failure, which may involve | |
| 108 | wrapping the parser in a call to `try` | |
| 109 | ||
| 110 | For example, the following adds C++-style comments to an S-expression format: | |
| 111 | ||
| 112 | ~~~~ | |
| 113 | *Data.SCargot.General> let cppComment = string "//" >> takeWhile (/= '\n') >> return () | |
| 114 | *Data.SCargot.General> decode (setComment cppComment spec) "(a //comment\n b)\n" | |
| 115 | Right [SCons (SAtom "a") (SCons (SAtom "b") SNil)] | |
| 116 | ~~~~ | |
| 117 | ||
| 118 | # Reader Macros | |
| 119 | ||
| 120 | A _reader macro_ is a Lisp macro which is invoked during read time. This | |
| 121 | allows the _lexical_ syntax of a Lisp to be modified. The most commonly | |
| 122 | seen reader macro is the quote, which allows the syntax `'expr` to stand | |
| 123 | in for the s-expression `(quote expr)`. The S-Cargot library enables this | |
| 124 | by keeping a map of characters to AttoParsec parsers that can be used as | |
| 125 | readers. There is a special case for the aforementioned quote, but that | |
| 126 | could easily be written by hand as | |
| 127 | ||
| 128 | ~~~~ | |
| 129 | *Data.SCargot.General> let mySpec = addReader '\'' (fmap go) spec | |
| 130 | where go c = SCons (SAtom "quote") (SCons c SNil) | |
| 131 | *Data.SCargot.General> decode (asRich mySpec) "(1 2 '(3 4))" | |
| 132 | Right [RSList [RSAtom "1",RSAtom "2",RSList [RSAtom "quote",RSList [RSAtom "3",RSAtom "4"]]]] | |
| 133 | ~~~~ | |
| 134 | ||
| 135 | A reader macro is passed the parser that invoked it, so that it can | |
| 136 | perform recursive calls, and can return any `SExpr` it would like. It | |
| 137 | may also take as much or as little of the remaining parse stream as it | |
| 138 | would like; for example, the following reader macro does not bother | |
| 139 | parsing anything else and merely returns a new token: | |
| 140 | ||
| 141 | ~~~~ | |
| 142 | *Data.SCargot.General> decode (addReader '?' (const (pure (SAtom "huh"))) mySpec) "(?1 2)" | |
| 143 | Right [SCons (SAtom "huh") (SCons (SAtom "1") (SCons (SAtom "2") SNil))] | |
| 144 | ~~~~ | |
| 145 | ||
| 146 | Reader macros in S-Cargot can be used to define common bits of Lisp | |
| 147 | syntax that are not typically considered the purview of S-expression | |
| 148 | parsers. For example, to allow square brackets as a subsitute for | |
| 149 | proper lists, we could define a reader macro that is initialized by the | |
| 150 | `[` character and repeatedly calls the parser until a `]` character | |
| 151 | is reached: | |
| 152 | ||
| 153 | ~~~~ | |
| 154 | *Data.SCargot.General> let pVec p = (char ']' *> pure SNil) <|> (SCons <$> p <*> pVec p) | |
| 155 | *Data.SCargot.General> let vec = addReader '[' pVec | |
| 156 | *Data.SCargot.General> decode (asRich (vec mySpec)) "(1 [2 3])" | |
| 157 | Right [RSList [RSAtom "1",RSList [RSAtom "2",RSAtom "3"]]] | |
| 158 | ~~~~ | |