gdritter repos when-computer / 2c0c184
Some grammar problems Getty Ritter 7 years ago
1 changed file(s) with 1 addition(s) and 1 deletion(s). Collapse all Expand all
3232
3333 Structural regular expressions build on a similar but non-identical command language, but the \em{first} deficiency identified in traditional Unix regexp-ey tools was that they were \em{necessarily} line-oriented. This isn't a feature of the theory of regular languages, but rather a practical API choice for Unix programs, which often deal with newline-delimited text files. While practical for some applications, this does create a weird edge case for regular expressions where some hopefully-straightforward uses of regular expressions don't suffice: for example, I might want to write a short script to search my prose for accidentally repeated instances of common words like \em{the}: a regex like \tt{/the +the/} would suffice for most cases, but would completely fail to match the string \tt{"the\\nthe"}.
3434
35 Structural regular expressions begin by tossing out line-orientedness: a regular expression like \tt{.*} would match the entire file, newlines and all. The regular expression allow for the escape sequence \tt{ \\n } to represent a newline, so if I wanted to match a single line, I could write the regular expression \tt{.*\\n} to describe it; consequently, I can handle the \tt{"the\\nthe"} case by writing \tt{/the[ \\n]+the/}, and replace all instances of repeated \em{the}—even across newlines—with the command\ref{sam}
35 Structural regular expressions begin by tossing out line-orientedness: a regular expression like \tt{.*} would match the entire file, newlines and all. Structural regular expressions use the escape sequence \tt{ \\n } to represent a newline, so if I wanted to match a single line, I could use the regular expression \tt{.*\\n} to describe it. Consequently, I can handle the \tt{"the\\nthe"} case by writing \tt{/the[ \\n]+the/}, and replace all instances of repeated \em{the}—even across newlines—with the command\ref{sam}
3636 \sidenote{I'm marking these snippets with \link{http://doc.cat-v.org/plan_9/4th_edition/papers/sam/|\tt{sam}}, which is the \tt{ed}- and \tt{ex}-inspired stream editor that appeared in Plan 9. There's a bit more complexity to actually using \tt{sam} which I'm eliding for the sake of explanation.}
3737
3838 \code{\ttcom{(sam)} \ttkw{s}/the[ \\n]+the/the/g}