gdritter repos documents / master scraps / lkp.tex
master

Tree @master (Download .tar.gz)

lkp.tex @masterraw · history · blame

\documentclass[twocolumn]{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage[cm]{fullpage}
\usepackage[usennames,dvipsnames]{xcolor}
\usepackage{listings}
\lstdefinelanguage{latka}
  { morekeywords={puts,import,data,let,in,as,fixed}
  , sensitive=false
  , morecomment=[l]{--}
  , morestring=[b]"
  }
\lstset{language=latka}
\lstdefinestyle{lkstyle}
  { basicstyle=\ttfamily\small
  , commentstyle=\color{Gray}
  , keywordstyle=\color{MidnightBlue}
  , stringstyle=\color{ForestGreen}
  , showstringspaces=false
  }
\lstset{style=lkstyle}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
% use upquote if available, for straight quotes in lstlisting environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
  \usepackage[utf8]{inputenc}
\else % if luatex or xelatex
  \usepackage{fontspec}
  \ifxetex
    \usepackage{xltxtra,xunicode}
  \fi
  \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
  \newcommand{\euro}{}
\fi
% use microtype if available
\IfFileExists{microtype.sty}{\usepackage{microtype}}{}
\ifxetex
  \usepackage[setpagesize=false, % page size defined by xetex
              unicode=false, % unicode breaks when used with xetex
              xetex]{hyperref}
\else
  \usepackage[unicode=true]{hyperref}
\fi
\hypersetup{breaklinks=true,
            bookmarks=true,
            pdfauthor={Getty D. Ritter},
            pdftitle={Latka: A Language For Random Text Generation},
            colorlinks=true,
            urlcolor=blue,
            linkcolor=magenta,
            pdfborder={0 0 0}}
\urlstyle{same}  % don't use monospace font for urls
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
\setlength{\emergencystretch}{3em}  % prevent overfull lines
\setcounter{secnumdepth}{0}

\title{Latka: A Language For Random Text Generation}
\author{Getty D. Ritter}
\date{}

\begin{document}
\maketitle

Latka is a total, strongly typed functional programming language for
generating random text according to predefined patterns. To this end,
Latka incorporates weighted random choice as an effect in its expression
language and provides a set of combinators for constructing high-level
semantic constructs such as sentences and paragraphs. The eventual
purpose of Latka is to produce code that can be embedded into other
systems such as video games.

The primary operators of the expression language are concatenation
(which is represented by juxtaposing two expressions, e.g.,
\texttt{e1 e2}) and choice (which is represented by a vertical bar to
mimic BNF notation, e.g., \texttt{e1 \textbar{} e2}), with weighted
choice \texttt{n: e1 \textbar{} m: e2} (where \texttt{m} and \texttt{n}
are \texttt{Nat}s) being a convenient syntactic sugar. Another piece of
syntactic sugar is repetition, in which \texttt{n @ e} (where \texttt{n}
is an expression of type \texttt{Nat}) stands for \texttt{n} repetitions
of \texttt{e}. A simple program can be built out of just these
primitives:

\begin{lstlisting}
consonant, vowel, syllable : String
consonant = "p" | "t" | "k" | "w"
          | "h" | "m" | "n"
vowel     = "a" | "e" | "i" | "o" | "u"
syllable  = let v = 5: vowel | vowel "'" in
              5: consonant v | 2: v

-- e.g., pate'hai, aku, e'epoto'
puts (2 | 3 | 4 | 5) @ syllable
\end{lstlisting}

By default, evaluation order is call-by-name so that repeated use of the
same name will result in different values, but by prefixing any binding
with the keyword \texttt{fixed} one can specify call-by-value evaluation
order, effectively selecting a value consistently for the duration of
the use of the name:

\begin{lstlisting}
-- Can evaluate to aa, ab, ba, or bb
puts let x = "a" | "b" in x x

-- Can only ever evaluate to aa or bb
puts let fixed x = "a" | "b" in x x
\end{lstlisting}

Latka has a set of features like other strongly typed functional
languages like Haskell or OCaml, including named sums as a datatype
mechanism, pattern matching, and typeclasses for ad-hoc polymorphism.
Like Coq or Agda, recursive functions are restricted to structural
recursion to ensure termination of embedded Latka programs. Latka also
includes a set of functions for structuring larger blocks of text. These
take advantage of a particular aspect of the type system which allows
for variadic polymorphism. In the example below,
\texttt{sent} is one
such function; it takes arbitrary-length tuples of values coercible to
sentence fragments and intelligently converts them to a correct sentence
with capitalization and punctuation.\footnote{Latka's function
invocation syntax is an infixed \texttt{.} operator, borrowed from the
notation Edsger W. Dijkstra outlined in his note EWD1300. This operator has
the highest binding power and is left-associative so that
\texttt{f.x.y == (f.x).y}.}

\begin{lstlisting}
samp : String * Word * Sentence
samp = ("one",wd."two",sent.("three","four"))

-- prints "One two three four."
puts sent.samp
\end{lstlisting}

Some of these examples can be seen in practice in the following program.

\begin{lstlisting}
import Language.Natural.Latin as Latin

data Gender = Male | Female

pronoun : Gender -> Word
pronoun . Male   = wd."he"
pronoun . Female = wd."she"

noun : Gender -> Word
noun . Male   = wd."man"
noun . Female = wd."woman"

puts let fixed g = Male | Female in
  para.( sent.( "You see a Roman"
              , noun.g
              , "from"
              , proper_noun.(Latin/cityName)
              )
       , sent.( pronoun.g
              , "has"
              , ("brown"|"black"|"blonde")
              , "hair and carries"
              , range.50.500
              , "denarii"
              )
       )
-- It might print, for example, "You see a
-- Roman woman from Arucapa. She has black
-- hair and carries 433 denarii."
\end{lstlisting}

\end{document}