Commit 2c0c184f6714db0fe51ece18c81811d17c22ec30 - when-computer - gdritter repos

gdritter repos when-computer / 2c0c184

Some grammar problems Getty Ritter 9 years ago

1 changed file(s) with 1 addition(s) and 1 deletion(s). Collapse all Expand all

+1

-1

drafts/structural-res.telml less more

32	32
33	33	Structural regular expressions build on a similar but non-identical command language, but the \em{first} deficiency identified in traditional Unix regexp-ey tools was that they were \em{necessarily} line-oriented. This isn't a feature of the theory of regular languages, but rather a practical API choice for Unix programs, which often deal with newline-delimited text files. While practical for some applications, this does create a weird edge case for regular expressions where some hopefully-straightforward uses of regular expressions don't suffice: for example, I might want to write a short script to search my prose for accidentally repeated instances of common words like \em{the}: a regex like \tt{/the +the/} would suffice for most cases, but would completely fail to match the string \tt{"the\\nthe"}.
34	34
35		Structural regular expressions begin by tossing out line-orientedness: a regular expression like \tt{.} would match the entire file, newlines and all. ~~The regular expression allow for the escape sequence \tt{ \\n } to represent a newline, so if I wanted to match a single line, I could write the regular expression \tt{.\\n} to describe it; c~~onsequently, I can handle the \tt{"the\\nthe"} case by writing \tt{/the[ \\n]+the/}, and replace all instances of repeated \em{the}—even across newlines—with the command\ref{sam}
	35	Structural regular expressions begin by tossing out line-orientedness: a regular expression like \tt{.} would match the entire file, newlines and all. Structural regular expressions use the escape sequence \tt{ \\n } to represent a newline, so if I wanted to match a single line, I could use the regular expression \tt{.\\n} to describe it. Consequently, I can handle the \tt{"the\\nthe"} case by writing \tt{/the[ \\n]+the/}, and replace all instances of repeated \em{the}—even across newlines—with the command\ref{sam}
36	36	\sidenote{I'm marking these snippets with \link{http://doc.cat-v.org/plan_9/4th_edition/papers/sam/\|\tt{sam}}, which is the \tt{ed}- and \tt{ex}-inspired stream editor that appeared in Plan 9. There's a bit more complexity to actually using \tt{sam} which I'm eliding for the sake of explanation.}
37	37
38	38	\code{\ttcom{(sam)} \ttkw{s}/the[ \\n]+the/the/g}