gdritter repos when-computer / aa90bab
Added algebraic data types post Getty Ritter 8 years ago
2 changed file(s) with 182 addition(s) and 0 deletion(s). Collapse all Expand all
1 \meta{( "early-algebraic-data-types" "early algebraic data types" ("programming"))}
2 Algebraic data types are a very convenient and expressive way
3 of modeling data. You're familiar with them if you've ever
4 used any ML-like language: you have \em{products},
5 which are pairs or records:
6
7 \code{("hello", 7) : String * Int}
8
9 and \em{sums}, which are discriminated unions:
10
11 \code{Left "hello" : String + Int}
12 \code{Right 7 : String + Int}
13
14 Usually, modern ML-like languages provide a mechanism for
15 creating \em{named} sums, where you provide your own names for
16 each discriminant, or \em{constructor}:
17
18 \code{\ttkw{data} \ttcn{IntList} = \ttcn{Cons} (Int * IntList) \| \ttcn{Nil}}
19
20 You also have the so-called \em{unit} or \em{top} type, which
21 is the type with a single possible value, usually written \tt{()},
22 and—at least in the underlying theory—the so-called \em{bottom}
23 type, which has no values. The \em{top} and \em{bottom} types
24 are sometimes written with the symbols \\(\\top\\) and \\(\\bot\\), and sometimes
25 with the numerals \tt{1} and \tt{0}\ref{alg}.
26 \sidenote{
27 This is because
28 \link{http://chris-taylor.github.io/blog/2013/02/10/the-algebra-of-algebraic-data-types/|they act as the identity elements in the algebra of types.}
29 The types \tt{a+0} and \tt{0+a} are both isomorphic to \tt{a}.
30 Likewise, \tt{a*1} and \tt{1*a} are also both isomorphic to \tt{a}.
31 }
32
33 Algebraic data types first appeared as a usable feature in the
34 \link{https://en.wikipedia.org/wiki/Hope_(programming_language)|Hope programming language},
35 which was a very early statically typed functional
36 language. Despite some syntactic weirdness, they worked very
37 much like modern algebraic data types: the operator for product
38 types was written as \tt{#} instead of \tt{*}, and sum constructors
39 were written with \tt{++} in between variants, but
40 \link{https://web.archive.org/web/20140825021150/http://www.soi.city.ac.uk/~ross/Hope/hope_tut/node18.html|Hope type declarations}
41 still look remarkably familiar 45 years later:
42
43 \code{\ttkw{data} \ttcn{numlist} == \ttcn{cons} ( num # numlist ) ++ \ttcn{nil} ;}
44
45 Pattern-matching on those types also looks remarkably similar to
46 modern functional languages, complete with definition-level
47 pattern-matching as found in languages like Haskell:
48
49 \code{\ttkw{dec} \ttcn{sum} : numlist -> num ;
50 --- \ttcn{sum} ( nil ) <= 0 ;
51 --- \ttcn{sum} ( cons ( x, xs ) ) <= x + sum xs ;}
52
53 Algebraic data types also featured in early ML implemenations, but in a
54 very awkward way. The first DEC-10 ML implementation didn't have a
55 mechanism for defining \em{named} sums; instead, you had to use
56 the built-in sum operator \tt{+} with its constructors\ref{syn}:
57 \sidenote{The symbols \tt{*} and \tt{**} in these signatures are
58 type variables, like \tt{'a} in modern versions of ML.}
59
60 \code{\ttcn{inl} : * -> * + **
61 \ttcn{inr} : ** -> * + **}
62
63 You couldn't pattern-match on those values: you had to use the
64 accessor functions \tt{outl} and \tt{outr} instead, which would throw
65 exceptions if you tried to access the wrong value. You could
66 explicitly check the tag with the function \tt{isl}:
67
68 \code{\ttcn{outl} : * + ** -> *
69 \ttcn{outr} : * + ** -> **
70 \ttcn{isl} : * + ** -> bool}
71
72 We could create a data type by giving the algebraic
73 representation of the type in terms of products, sums, and existing
74 types. Declarations used the \tt{\ttkw{abstype}} or
75 \tt{\ttkw{absrectype}} keywords, depending on whether the type was
76 recursive or not. Like Hope, ML used \tt{#} for pairs, and
77 also used the symbol \tt{.} to stand for \tt{unit}:
78
79 \code{\ttkw{absrectype} \ttcn{intlist} = (int # intlist) + . ;;}
80
81 This definition would create a new type as well as two
82 functions, \tt{absintlist} and \tt{repintlist}, with
83 the following types:
84
85 \code{\ttcn{absintlist} : ((int # intlist) + .) -> intlist
86 \ttcn{repintlist} : intlist -> ((int # intlist) + .)}
87
88 Whenever we defined a type \tt{T} as isomorphic to a
89 representation \tt{R}, the language would
90 define the functions \tt{absT : R -> T} and
91 \tt{repT : T -> R}, filling in whatever particular type
92 name and representation we give.
93 Those functions allow us to, for example, construct values
94 of type \tt{intlist} by "wrapping" a value of the
95 representation type:
96
97 \code{\ttkw{val} \ttcn{mylist} = absintlist(inl(1, absintlist(inr ())));;}
98
99 Of course, constructing all our values like that would be
100 a nightmare, so DEC-10 ML programs often used the
101 \tt{with} construct.
102 The \tt{with}
103 keyword is preceeded by a set of declarations and followed
104 by a different set of declarations. The value definitions
105 before the \tt{with} keyword are available to the
106 rest of the declarations in the \tt{with} construct, but
107 only the ones after the \tt{with} keyword are introduced
108 into the surrounding scope. In our case, this allows us
109 to write a type with a set of constructor and accessor
110 functions, but keep the \tt{abs} and \tt{rep} functions
111 hidden:
112
113 \code{\ttkw{absrectype} \ttcn{intlist} = (int # intlist) + .
114 \ttkw{with} \ttcn{cons}(x, xs) = absintlist(inl(x, xs))
115 \ttkw{and} \ttcn{nil} = absintlist(inr())
116 \ttkw{and} \ttcn{head} lst = fst(outl(repintlist lst))
117 \ttkw{and} \ttcn{tail} lst = snd(outl(repintlist lst))
118 \ttkw{and} \ttcn{isempty} lst = not(isl(repintlist lst));;}
119
120 The semantics of \tt{with} ensure that the \tt{absintlist} and
121 \tt{repintlist} functions don't escape the declaration block,
122 and lists can \em{only} be constructed the user-written
123 constructor functions. We can then construct our values much more
124 tersely:
125
126 \code{\ttkw{val} \ttcn{mylist} = cons(1, nil)}
127
128 The successor to DEC-10 ML, VAX ML\ref{vax}, did include the same datatype
129 \sidenote{
130 DEC-10 ML and VAX ML are both collectively referred to as Edinburgh ML.
131 The names come via Luca Cardelli, who wrote VAX ML as a successor to and
132 \link{http://sml-family.org/history/macqueen-lucafest.pdf|improvement on} the original DEC-10 ML.
133 }
134 mechanism as described above,
135 but also included both named sums and named products. Named products
136 corresponded to what are called \em{structs} or \em{records}, but
137 unlike structs in languages like C or Pascal, they didn't need to
138 have their types named or declared, as structural descriptions of
139 the types would suffice. The VAX ML syntax used
140 so-called "decorated parentheses":
141
142 \code{(\| a = "foo"; b = 7 \|) : (\| a: string; b: int \|)}
143
144 The named sum mechanism \em{also} did not need any declarations: the
145 type instead was written as the set of possible constructor names along
146 with their argument types:
147
148 \code{[\| c = "foo" \|] : [\| c: string; d: int \|]
149 [\| d = 7 \|] : [\| c: string; d: int \|]}
150
151 Unlike sums and products in DEC-10 ML, these sums and products
152 \em{could} be picked apart by pattern matching, using a syntax
153 very similar to the syntax used to create the values:
154
155 \code{\ttkw{case} e : [\| c: string; d: int \|] \ttkw{of}
156 [\| c = s . 0;
157 d = n . n + n
158 \|]}
159
160 The effort to create Standard ML started in the early 80's, and
161 it drew on features from both Hope and the existing ML languages. In
162 \link{http://sml-family.org/history/SML-proposal-11-83.pdf|\em{A Proposal for Standard ML}}, Robin Milner explains that:
163
164 \blockquote{
165 …a difficult decision had to be made concerning HOPE's treatment of
166 data types—present only in embryonic form in the original ML—and
167 the labelled records and variants which Cardelli introduced in his
168 VAX version. The latter have definite advantages which the former
169 lack; on the other hand, the HOPE treatment is well-rounded in its
170 own terms. Though a combination of these features is possible, it
171 seemed (at least to me, but some disagreed!) to entail too rich a
172 language for the present proposal. Thus the HOPE treatment is
173 fully adopted here.
174 }
175
176 Consequently, the first versions of SML drew their algebraic data
177 types from Hope's implemenation, abandoning the tedious and difficult
178 DEC-10 ML versions. Later versions did end up drawing on Cardelli's
179 named products to create records, with a change in notation
180 from \tt{(\| ... \|)} to \tt{ \{ ... \} }, and thus we get the
181 algebraic data types we all know and love.
1 ../drafts/dec-types.telml