README changes including formatting changes and addition of Atom module docs
Getty Ritter
8 years ago
1 | 1 | [![Hackage](https://img.shields.io/hackage/v/s-cargot.svg)](https://hackage.haskell.org/package/s-cargot) |
2 | 2 | |
3 | S-Cargot is a library for parsing and emitting S-expressions, designed | |
4 | to be flexible, customizable, and extensible. Different uses of | |
5 | S-expressions often understand subtly different variations on what an | |
6 | S-expression is. The goal of S-Cargot is to create several reusable | |
7 | components that can be repurposed to nearly any S-expression variant. | |
8 | ||
9 | S-Cargot does _not_ aim to be the fastest or most efficient | |
10 | s-expression library. If you need speed, then it would probably be | |
11 | best to roll your own [AttoParsec]() parser. | |
12 | Wherever there's a choice, S-Cargot errs on the side of | |
13 | maximum flexibility, which means that it should be easy to | |
14 | plug together components to understand various existing flavors of | |
15 | s-expressions or to extend it in various ways to accomodate new | |
16 | flavors. | |
3 | S-Cargot is a library for parsing and emitting S-expressions, designed to be flexible, customizable, and extensible. Different uses of S-expressions often understand subtly different variations on what an S-expression is. The goal of S-Cargot is to create several reusable components that can be repurposed to nearly any S-expression variant. | |
4 | ||
5 | S-Cargot does _not_ aim to be the fastest or most efficient s-expression library. If you need speed, then it would probably be best to roll your own [AttoParsec]() parser. Wherever there's a choice, S-Cargot errs on the side of maximum flexibility, which means that it should be easy to plug together components to understand various existing flavors of s-expressions or to extend it in various ways to accomodate new flavors. | |
17 | 6 | |
18 | 7 | ## What Are S-Expressions? |
19 | 8 | |
20 | S-expressions were originally the data representation format in | |
21 | Lisp implementations, but have found broad uses outside of that as | |
22 | a data representation and storage format. S-expressions are often | |
23 | understood as a representation for binary trees with optional values | |
24 | in the leaf nodes: an empty leaf is represented with empty | |
25 | parens `()`, a non-empty leaf is represented as the scalar value | |
26 | it contains (often tokens like `x` or other programming language | |
27 | literals), and an internal node is represented as `(x . y)` where | |
28 | `x` and `y` are standing in for other s-expressions. In Lisp | |
29 | parlance, an internal node is called a _cons cell_, and the first | |
30 | and second elements inside it are called the _car_ and the _cdr_, | |
31 | for historical reasons. Non-empty lef nodes are referred to | |
32 | in the s-cargot library as _atoms_. | |
33 | ||
34 | Often, s-expressions are used to represent lists, in which case | |
35 | the list is treated as a right-branching tree with an empty leaf as | |
36 | the far right child of the tree. S-expression languages have a | |
37 | shorthand way of representing these lists: instead of writing successsively | |
38 | nested pairs, as in `(1 . (2 . (3 . ()))`, they allow the sugar | |
39 | `(1 2 3)`. This is the most common way of writing s-expressions, | |
40 | even in languages that allow raw cons cells (or "dotted pairs") to | |
41 | be written. | |
42 | ||
43 | The s-cargot library refers to expressions where every right-branching | |
44 | sequence ends in an empty leaf as _well-formed s-expressions_. Note that | |
45 | any s-expression which can be written without using a dotted pair is | |
46 | necessarily well-formed. | |
47 | ||
48 | Unfortunately, while in common use, s-expressions do not have a single | |
49 | formal standard. They are often defined in an ad-hoc way, which means | |
50 | that s-expressions used in different contexts will, despite sharing a common | |
51 | parentheses-delimited structure, differ in various respects. Additionally, | |
52 | because s-expressions are used as the concrete syntax for languages of | |
53 | the Lisp family, they often have conveniences (such as comment syntaxes) | |
54 | and other bits of syntactic sugar (such as _reader macros_, which are | |
55 | described more fully later) that make parsing them much more complicated. | |
56 | Even ignoring those features, the _atoms_ recognized by a given | |
57 | s-expression variation can differ widely. | |
58 | ||
59 | The s-cargot library was designed to accomodate several different kinds | |
60 | of s-expression formats, so that an s-expression format can be easily | |
61 | expressed as a combination of existing features. It includes a few basic | |
62 | variations on s-expression languages as well as the tools for parsing | |
63 | and emitting more elaborate s-expressions variations without having to | |
64 |
|
|
9 | S-expressions were originally the data representation format in Lisp implementations, but have found broad uses outside of that as a data representation and storage format. S-expressions are often understood as a representation for binary trees with optional values in the leaf nodes: an empty leaf is represented with empty parens `()`, a non-empty leaf is represented as the scalar value it contains (often tokens like `x` or other programming language literals), and an internal node is represented as `(x . y)` where `x` and `y` are standing in for other s-expressions. In Lisp parlance, an internal node is called a _cons cell_, and the first and second elements inside it are called the _car_ and the _cdr_, for historical reasons. Non-empty lef nodes are referred to in the s-cargot library as _atoms_. | |
10 | ||
11 | Often, s-expressions are used to represent lists, in which case the list is treated as a right-branching tree with an empty leaf as the far right child of the tree. S-expression languages have a shorthand way of representing these lists: instead of writing successsively nested pairs, as in `(1 . (2 . (3 . ()))`, they allow the sugar `(1 2 3)`. This is the most common way of writing s-expressions, even in languages that allow raw cons cells (or "dotted pairs") to be written. | |
12 | ||
13 | The s-cargot library refers to expressions where every right-branching sequence ends in an empty leaf as _well-formed s-expressions_. Note that any s-expression which can be written without using a dotted pair is necessarily well-formed. | |
14 | ||
15 | Unfortunately, while in common use, s-expressions do not have a single formal standard. They are often defined in an ad-hoc way, which means that s-expressions used in different contexts will, despite sharing a common parentheses-delimited structure, differ in various respects. Additionally, because s-expressions are used as the concrete syntax for languages of the Lisp family, they often have conveniences (such as comment syntaxes) and other bits of syntactic sugar (such as _reader macros_, which are described more fully later) that make parsing them much more complicated. Even ignoring those features, the _atoms_ recognized by a given s-expression variation can differ widely. | |
16 | ||
17 | The s-cargot library was designed to accomodate several different kinds of s-expression formats, so that an s-expression format can be easily expressed as a combination of existing features. It includes a few basic variations on s-expression languages as well as the tools for parsing and emitting more elaborate s-expressions variations without having to reimplement the basic plumbing yourself. | |
65 | 18 | |
66 | 19 | ## Using the Library |
67 | 20 | |
68 | The central way of interacting with the S-Cargot library is by creating | |
69 | and modifying datatypes which represent specifications for parsing and | |
70 | printing s-expressions. Each of those types has two type parameters, which | |
71 | are often called `atom` and `carrier`: | |
21 | The central way of interacting with the S-Cargot library is by creating and modifying datatypes which represent specifications for parsing and printing s-expressions. Each of those types has two type parameters, which are often called `atom` and `carrier`: | |
72 | 22 | |
73 | 23 | ~~~~ |
74 | 24 | +------ the type that represents an atom or value |
79 | 29 | printer :: SExprPrinter atom carrier |
80 | 30 | ~~~~ |
81 | 31 | |
82 | Various functions will be provided that modify the carrier type (i.e. the | |
83 | output type of parsing or input type of serialization) or the language | |
84 |
|
|
32 | Various functions will be provided that modify the carrier type (i.e. the output type of parsing or input type of serialization) or the language recognized by the parsing. | |
85 | 33 | |
86 | 34 | ## Representing S-expressions |
87 | 35 | |
88 | There are three built-in representations of S-expression lists: two of them | |
89 | are isomorphic, as one or the other might be better for processing | |
90 | S-expression data in a particular circumstance, and the third represents | |
91 | only the well-formed subset of possible S-expressions. | |
36 | There are three built-in representations of S-expression lists: two of them are isomorphic, as one or the other might be convenient for working with S-expression data in a particular circumstance, while the third represents only the "well-formed" subset of possible S-expressions, which is often convenient when using s-expressions for configuration or data storage. | |
92 | 37 | |
93 | 38 | ~~~~.haskell |
94 | 39 | -- cons-based representation |
109 | 54 | | WFSAtom atom |
110 | 55 | ~~~~ |
111 | 56 | |
112 | The `WellFormedSExpr` representation should be structurally | |
113 | identical to the `RichSExpr` representation in all cases where | |
114 | no improper lists appear in the source. Both of those are | |
115 | often more convenient than writing multiple nested `SCons` | |
116 | constructors in Haskell. | |
117 | ||
118 | Functions for converting back and forth between | |
119 | representations are provided, but you can also modify a | |
120 | `SExprSpec` to parse to or serialize from a particular | |
121 | representation using the `asRich` and `asWellFormed` | |
122 |
|
|
57 | The `WellFormedSExpr` representation should be structurally identical to the `RichSExpr` representation in all cases where no improper lists appear in the source. Both of those representations are often more convenient than writing multiple nested `SCons` constructors, in the same way that the `[1,2,3]` syntax in Haskell is often less tedious than writing `1:2:3:[]`. | |
58 | ||
59 | Functions for converting back and forth between representations are provided, but you can also modify a `SExprSpec` to parse to or serialize from a particular representation using the `asRich` and `asWellFormed` functions. | |
123 | 60 | |
124 | 61 | ~~~~.haskell |
125 | 62 | >>> decode basicParser "(a b)" |
136 | 73 | Left "Found atom in cdr position" |
137 | 74 | ~~~~ |
138 | 75 | |
139 | These names and patterns can be quite long, so S-Cargot also exports | |
140 | several pattern synonyms that can be used both as expressions and | |
141 | in pattern-matches to make working with these types less verbose. | |
142 | These are each contained in their own module, as their names conflict | |
143 | with each other, so it's recommended to only import the type that | |
144 | you plan on working with: | |
76 | These names and patterns can be quite long, especially when you're constructing or matching on S-expression representations in Haskell source, so S-Cargot also exports several pattern synonyms that can be used both as expressions and in pattern-matching. These are each contained in their own module, as their names conflict with each other, so it's recommended to only import the module corresponding to the type that you plan on working with: | |
145 | 77 | |
146 | 78 | ~~~~.haskell |
147 | 79 | >>> import Data.SCargot.Repr.Basic |
160 | 92 | 9 |
161 | 93 | ~~~~ |
162 | 94 | |
163 | If you are using GHC 7.10, several of these will be powerful | |
164 | bidirectional pattern synonyms that allow both constructing and | |
165 |
|
|
95 | If you are using GHC 7.10 or later, several of these will be powerful bidirectional pattern synonyms that allow both constructing and pattern-matching on s-expressions in non-trivial ways: | |
166 | 96 | |
167 | 97 | ~~~~.haskell |
168 | 98 | >>> import Data.SCargot.Repr.Basic |
172 | 102 | |
173 | 103 | ## Atom Types |
174 | 104 | |
175 | Any type can serve as an underlying atom type provided that it has | |
176 | a Parsec parser or a serializer (i.e. a way of turning it | |
177 | into `Text`.) For these examples, I'm going to use a very simple | |
178 | serializer that is roughly like the one found in `Data.SCargot.Basic`, | |
179 | which parses symbolic tokens of letters, numbers, and some | |
180 | punctuation characters. This means that the 'serializer' here | |
181 |
|
|
105 | Any type can serve as an underlying atom type in an S-expression parser or serializer, provided that it has a Parsec parser or a serializer (i.e. a way of turning it into `Text`.) For these examples, I'm going to use a very simple serializer that is roughly like the one found in `Data.SCargot.Basic`, which parses symbolic tokens of letters, numbers, and some punctuation characters. This means that the 'serializer' here is just the identity function which returns the relevant `Text` value: | |
182 | 106 | |
183 | 107 | ~~~~.haskell |
184 | 108 | parser :: SExprParser Text (SExpr Text) |
188 | 112 | printer = flatPrint id |
189 | 113 | ~~~~ |
190 | 114 | |
191 | A more elaborate atom type would distinguish between different | |
192 | varieties of token, so a small example (that understands just | |
193 | identifiers and numbers) is | |
115 | A more elaborate atom type might distinguish between different varieties of token. A small example (that understands just alphabetic identifiers and decimal numbers) would look like this: | |
194 | 116 | |
195 | 117 | ~~~~.haskell |
196 | 118 | import Data.Text (Text, pack) |
212 | 134 | myPrinter = flatPrint sAtom |
213 | 135 | ~~~~ |
214 | 136 | |
215 | We can then use this newly created atom type within an S-expression | |
216 | for both parsing and serialization: | |
137 | We can then use this newly created atom type within an S-expression for both parsing and serialization: | |
217 | 138 | |
218 | 139 | ~~~~.haskell |
219 | 140 | >>> decode myParser "(foo 1)" |
222 | 143 | "(0 bar)" |
223 | 144 | ~~~~ |
224 | 145 | |
225 | Several common atom types appear in the module | |
226 | [`Data.SCargot.Common`](https://hackage.haskell.org/package/s-cargot-0.1.0.0/docs/Data-SCargot-Common.html), | |
227 | including various kinds of identifiers and number literals. The | |
228 | long-term plan for S-Cargot is to include more and more kinds of | |
229 | built-in atoms, in order to make putting together an S-Expression | |
230 | parser even easier. If you have a common syntax for an atom type | |
231 | that you think should be represented there, please | |
232 | [suggest it in an issue](https://github.com/aisamanra/s-cargot/issues)! | |
146 | Several common atom types appear in the module [`Data.SCargot.Common`](https://hackage.haskell.org/package/s-cargot-0.1.0.0/docs/Data-SCargot-Common.html), including various kinds of identifiers and number literals. The long-term plan for S-Cargot is to include more and more kinds of built-in atoms, in order to make putting together an S-Expression parser even easier. If you have a common syntax for an atom type that you think should be represented there, please [suggest it in an issue](https://github.com/aisamanra/s-cargot/issues)! | |
147 | ||
148 | To make it easier to build up parsers for atom types without having to use Parsec manually, S-Cargot also exports `Data.SCargot.Atom`, which provides a shorthand way of building up a `SExprParser` from a list of parser-constructor pairs: | |
149 | ||
150 | ~~~~.haskell | |
151 | import Data.SCargot.Atom (atom, mkParserFromAtoms) | |
152 | import Data.SCargot.Common (parseR7RSIdent, signedDecNumber) | |
153 | ||
154 | -- we want our atom type to understand R7RS identifiers and | |
155 | -- signed decimal numbers | |
156 | data Atom | |
157 | = Ident Text | |
158 | | Num Integer | |
159 | deriving (Eq, Show) | |
160 | ||
161 | myParser :: SExprParser Atom (SExpr Atom) | |
162 | myParser = mkParserFromAtoms | |
163 | [ atom Ident parseR7RSIdent | |
164 | , atom Num signedDecNumber | |
165 | ] | |
166 | ~~~~ | |
233 | 167 | |
234 | 168 | ## Carrier Types |
235 | 169 | |
236 | As pointed out above, there are three different carrier types that are | |
237 | used to represent S-expressions by the library, but you can use any | |
238 | type as a carrier type for a spec. This is particularly useful when | |
239 | you want to parse into your own custom tree-like type. For example, if | |
240 | we wanted to parse a small S-expression-based arithmetic language, we | |
241 | could define a data type and transformations from and to an S-expression | |
242 |
|
|
170 | As pointed out above, there are three different "carrier" types that are used to represent S-expressions by the library, but you can use any type as a carrier type for a spec. This is particularly useful when you want to parse into your own custom tree-like type. For example, if we wanted to parse a small S-expression-based arithmetic language, we could define a data type and transformations from and to an S-expression type: | |
243 | 171 | |
244 | 172 | ~~~~.haskell |
245 | 173 | import Data.Char (isDigit) |
262 | 190 | fromExpr (Num n) = A (T.pack (show n)) |
263 | 191 | ~~~~ |
264 | 192 | |
265 | then we could use the `convertSpec` function to add this directly to | |
266 | the `SExprSpec`: | |
193 | then we could use the `convertSpec` function to add this directly to the `SExprSpec`: | |
267 | 194 | |
268 | 195 | ~~~~.haskell |
269 | 196 | >>> let parser' = setCarrier toExpr (asRich myParser) |
277 | 204 | |
278 | 205 | ## Comments |
279 | 206 | |
280 | By default, an S-expression parser does not include a comment syntax, but | |
281 | the provided `withLispComments` function will cause it to understand | |
282 |
|
|
207 | By default, an S-expression parser does not include a comment syntax, but the provided `withLispComments` function will cause it to understand traditional Lisp line-oriented comments that begin with a semicolon: | |
283 | 208 | |
284 | 209 | ~~~~.haskell |
285 | 210 | >>> decode basicParser "(this ; has a comment\n inside)\n" |
288 | 213 | Right [SCons (SAtom "this") (SCons (SAtom "inside") SNil)] |
289 | 214 | ~~~~ |
290 | 215 | |
291 | Additionally, you can provide your own comment syntax in the form of an | |
292 | Parsec parser. Any Parsec parser can be used, so long as it meets | |
293 | the following criteria: | |
294 | - it is capable of failing (as is called until SCargot believes that there | |
295 | are no more comments) | |
296 | - it does not consume any input in the case of failure, which may involve | |
297 | wrapping the parser in a call to `try` | |
216 | Additionally, you can provide your own comment syntax in the form of an Parsec parser. Any Parsec parser can be used, so long as it meets the following criteria: | |
217 | - it is capable of failing (as is called until SCargot believes that there are no more comments) | |
218 | - it does not consume any input in the case of failure, which may involve wrapping the parser in a call to `try` | |
298 | 219 | |
299 | 220 | For example, the following adds C++-style comments to an S-expression format: |
300 | 221 | |
304 | 225 | Right [SCons (SAtom "a") (SCons (SAtom "b") SNil)] |
305 | 226 | ~~~~ |
306 | 227 | |
307 | The | |
308 | [`Data.SCargot.Comments`](https://hackage.haskell.org/package/s-cargot/docs/Data-SCargot-Comments.html) | |
309 | module defines some helper functions for creating comment syntaxes, so the | |
310 | `cppComment` parser above could be defined as simply | |
228 | The [`Data.SCargot.Comments`](https://hackage.haskell.org/package/s-cargot/docs/Data-SCargot-Comments.html) module defines some helper functions for creating comment syntaxes, so the `cppComment` parser above could be defined as simply | |
311 | 229 | |
312 | 230 | ~~~~.haskell |
313 | 231 | >>> let cppComment = lineComment "//" |
315 | 233 | Right [SCons (SAtom "a") (SCons (SAtom "b") SNil)] |
316 | 234 | ~~~~ |
317 | 235 | |
318 | Additionally, a handful of common comment syntaxes are defined in | |
319 | [`Data.SCargot.Comments`](https://hackage.haskell.org/package/s-cargot/docs/Data-SCargot-Comments.html), | |
320 | including C-style, Haskell-style, and generic scripting-language-style | |
321 | comments, so in practice, we could write the above example as | |
236 | Additionally, a handful of common comment syntaxes are defined in [`Data.SCargot.Comments`](https://hackage.haskell.org/package/s-cargot/docs/Data-SCargot-Comments.html), including C-style, Haskell-style, and generic scripting-language-style comments, so in practice, we could write the above example as | |
322 | 237 | |
323 | 238 | ~~~~.haskell |
324 | 239 | >>> decode (withCLikeLineComments basicParser) "(a //comment\n b)\n" |
327 | 242 | |
328 | 243 | ## Reader Macros |
329 | 244 | |
330 | A _reader macro_ is a Lisp macro---a function that operates on syntactic | |
331 | structures---which is invoked during the scanning phase of a Lisp parser. This | |
332 | allows the _lexical_ syntax of a Lisp to be modified. The most commonly | |
333 | seen reader macro is the quote, which allows the syntax `'expr` to stand as sugar | |
334 | for the s-expression `(quote expr)`. The S-Cargot library accomodates | |
335 | this by keeping a map from characters to Haskell functions that can be used as | |
336 | readers. There is a special case for the aforementioned quote, but that | |
337 | could easily be written by hand as | |
245 | In Lisp variants, a _reader macro_ is a macro---a function that operates on syntactic structures---which is invoked during the _scanning_, or lexing, phase of a Lisp parser. This allows the _lexical_ syntax of a Lisp to be modified. A very common reader macro in most Lisp variants is the single quote, which allows the syntax `'expr` to stand as sugar for the literal s-expression `(quote expr)`. The S-Cargot library accomodates this by keeping a map from characters to Haskell functions that can be used analogously to reader macros. This is a common enough special case that there are shorthand ways of writing this, but we could support the `'expr` syntax by creating a Haskell function to turn `expr` into `(quote expr)` and adding that as a reader macro associated with the character `'`: | |
338 | 246 | |
339 | 247 | ~~~~.haskell |
340 | 248 | >>> let quote expr = SCons (SAtom "quote") (SCons expr SNil) |
341 |
>>> |
|
249 | >>> :t quote | |
250 | quote :: IsString atom => SExpr atom -> SExpr atom | |
251 | >>> let addQuoteReader = addReader '\'' (\ parse -> fmap quote parse) | |
252 | >>> addQuoteReader :: IsString atom => SExprParser atom c -> SExprParser atom c | |
342 | 253 | >>> decode (addQuoteReader basicParser) "'foo" |
343 | 254 | Right [SCons (SAtom "quote") (SCons (SAtom "foo") SNil)] |
344 | 255 | ~~~~ |
345 | 256 | |
346 | A reader macro is passed the parser that invoked it, so that it can | |
347 | perform recursive calls into the parser, and can return any `SExpr` it would like. It | |
348 | may also take as much or as little of the remaining parse stream as it | |
349 | would like; for example, the following reader macro does not bother | |
350 |
|
|
257 | A reader macro is passed the an s-expression parser so that it can perform recursive parse calls, and it can return any `SExpr` it would like. It may also take as much or as little of the remaining parse stream as it would like. For example, the following reader macro does not bother parsing anything else and merely returns a new token: | |
351 | 258 | |
352 | 259 | ~~~~.haskell |
353 | 260 | >>> let qmReader = addReader '?' (\ _ -> pure (SAtom "huh")) |
355 | 262 | Right [SCons (SAtom "huh") (SCons (SAtom "1") (SCons (SAtom "2") SNil))] |
356 | 263 | ~~~~ |
357 | 264 | |
358 | Reader macros in S-Cargot can be used to define bits of Lisp | |
359 | syntax that are not typically considered the purview of S-expression | |
360 | parsers. For example, to allow square brackets as a subsitute for | |
361 | proper lists, we could define a reader macro that is indicated by the | |
362 | `[` character and repeatedly calls the parser until a `]` character | |
363 | is reached: | |
265 | We can define a similar reader macro directly in Common Lisp, although it's important to note that Common Lisp converts all identifiers to uppercase, and also that the quote in line `[3]` is necessary so that the Common Lisp REPL doesn't attempt to evaluate `(huh 1 2)` as code: | |
266 | ||
267 | ~~~~.lisp | |
268 | [1]> (defun qm-reader (stream char) 'huh) | |
269 | QM-READER | |
270 | [2]> (set-macro-character #\? #'qm-reader) | |
271 | T | |
272 | [3]> '(?1 2) | |
273 | (HUH 1 2) | |
274 | ~~~~ | |
275 | ||
276 | Reader macros in S-Cargot can be used to define bits of Lisp syntax that are not typically considered the purview of S-expression parsers. For example, some Lisp-derived languages allow square brackets as a subsitute for proper lists, and to support this we could define a reader macro that is indicated by the `[` character and repeatedly calls the parser until a `]` character is reached: | |
364 | 277 | |
365 | 278 | ~~~~.haskell |
366 | 279 | >>> let vec p = (char ']' *> pure SNil) <|> (SCons <$> p <*> vec p) |
280 | >>> :t vec | |
281 | vec | |
282 | :: Stream s m Char => | |
283 | ParsecT s u m (SExpr atom) -> ParsecT s u m (SExpr atom) | |
367 | 284 | >>> let withVecReader = addReader '[' vec |
368 | 285 | >>> decode (asRich (withVecReader basicParser)) "(1 [2 3])" |
369 | 286 | Right [RSList [RSAtom "1",RSList [RSAtom "2",RSAtom "3"]]] |
371 | 288 | |
372 | 289 | ## Pretty-Printing and Indentation |
373 | 290 | |
374 | The s-cargot library also includes a simple but often adequate | |
375 | pretty-printing system for s-expressions. A printer that prints a | |
376 |
|
|
291 | The s-cargot library also includes a simple but often adequate pretty-printing system for S-expressions. A printer that prints a single-line s-expression is created with `flatPrint`: | |
377 | 292 | |
378 | 293 | ~~~~.haskell |
379 | 294 | >>> let printer = flatPrint id |
383 | 298 | (foo bar) |
384 | 299 | ~~~~ |
385 | 300 | |
386 | A printer that tries to pretty-print an s-expression to fit | |
387 | attractively within an 80-character limit can be created with | |
388 |
|
|
301 | A printer that tries to pretty-print an s-expression to fit attractively within an 80-character limit can be created with `basicPrint`: | |
389 | 302 | |
390 | 303 | ~~~~.haskell |
391 | 304 | >>> let printer = basicPrint id |
400 | 313 | s-expression) |
401 | 314 | ~~~~ |
402 | 315 | |
403 | A printer created with `basicPrint` will "swing" things that are too | |
404 | long onto the subsequent line, indenting it a fixed number of spaces. | |
405 |
|
|
316 | A printer created with `basicPrint` will "swing" things that are too long onto the subsequent line, indenting it a fixed number of spaces. We can modify the number of spaces with `setIndentAmount`: | |
406 | 317 | |
407 | 318 | ~~~~.haskell |
408 | 319 | >>> let printer = setIndentAmount 4 (basicPrint id) |
415 | 326 | s-expression) |
416 | 327 | ~~~~ |
417 | 328 | |
418 | We can also modify what counts as the 'maximum width', which for a | |
419 | `basicPrint` printer is 80 by default: | |
329 | We can also modify what counts as the 'maximum width', which for a `basicPrint` printer is 80 by default: | |
420 | 330 | |
421 | 331 | ~~~~.haskell |
422 | 332 | >>> let printer = setMaxWidth 8 (basicPrint id) |
426 | 336 | three) |
427 | 337 | ~~~~ |
428 | 338 | |
429 | Or remove the maximum, which will put the whole s-expression onto one | |
430 | line, regardless of its length: | |
339 | Or remove the maximum, which will always put the whole s-expression onto one line, regardless of its length: | |
431 | 340 | |
432 | 341 | ~~~~.haskell |
433 | 342 | >>> let printer = removeMaxWidth (basicPrint id) |
435 | 344 | (this stupendously preposterously supercalifragilisticexpialidociously long s-expression) |
436 | 345 | ~~~~ |
437 | 346 | |
438 | We can also specify an _indentation strategy_, which decides how to | |
439 | indent subsequent expressions based on the head of a given | |
440 | expression. The default is to always "swing" subsequent expressions | |
441 | to the next line, but we could also specify the `Align` constructor, which | |
442 | will print the first two expressions on the same line and then any subsequent | |
443 | expressions horizontally aligned with the second one, like so: | |
347 | We can also specify an _indentation strategy_, which decides how to indent subsequent expressions based on the head of a given expression. The default is to always "swing" subsequent expressions to the next line, but we could also specify the `Align` constructor, which will print the first two expressions on the same line and then any subsequent expressions horizontally aligned with the second one, like so: | |
444 | 348 | |
445 | 349 | ~~~~.haskell |
446 | 350 | >>> let printer = setIndentStrategy (\ _ -> Align) (setMaxWidth 8 (basicPrint id)) |
450 | 354 | four) |
451 | 355 | ~~~~ |
452 | 356 | |
453 | Or we could choose to keep some number of expressions on the same line and afterwards | |
454 | swing the subsequent ones: | |
357 | Or we could choose to keep some number of expressions on the same line and afterwards swing the subsequent ones: | |
455 | 358 | |
456 | 359 | ~~~~.haskell |
457 | 360 | >>> let printer = setIndentStrategy (\ _ -> SwingAfter 1) (setMaxWidth 8 (basicPrint id)) |
461 | 364 | four) |
462 | 365 | ~~~~ |
463 | 366 | |
464 | For lots of situations, we might want to choose a different indentation strategy based | |
465 | on the first expression within a proper list: for example, Common Lisp source code is often | |
466 | formatted so that, following a `defun` token, we have the function name and arguments | |
467 | on the same line, and then the body of the function indented some amount subsequently. | |
468 |
|
|
367 | In many situations, we might want to choose a different indentation strategy based on the first expression within a proper list: for example, Common Lisp source code is often formatted so that, following a `defun` token, the function name and arguments are on the same line, and then the body of the function is indented a fixed amount. We can express an approximation of that strategy like this: | |
469 | 368 | |
470 | 369 | ~~~~.haskell |
471 | 370 | >>> let strategy (A ident) | "def" `Text.isPrefixOf` ident = SwingAfter 2; strategy _ = Align |
481 | 380 | |
482 | 381 | ## Putting It All Together |
483 | 382 | |
484 | Here is a final example which implements a limited arithmetic language | |
485 | with Haskell-style line comments and a special reader macro to understand hex | |
486 |
|
|
383 | Here is a final example which implements a limited arithmetic language with Haskell-style line comments and a special reader macro to understand hex literals: | |
487 | 384 | |
488 | 385 | ~~~~.haskell |
489 | 386 | {-# LANGUAGE OverloadedStrings #-} |
565 | 462 | [EOp Add (EOp Mul (ENum 2) (ENum 20)) (ENum 10),EOp Mul (ENum 10) (ENum 10)] |
566 | 463 | ~~~~ |
567 | 464 | |
568 | Keep in mind that you often won't need to write all this by hand, | |
569 | as you can often use a variety of built-in atom types, reader | |
570 | macros, comment types, and representations, but it's a useful | |
571 | illustration of all the options that are available to you should | |
572 |
|
|
465 | Keep in mind that you often won't need to write all this by hand, as you can often use a variety of built-in atom types, reader macros, comment types, and representations, but it's a useful illustration of all the options that are available to you should you need them! |