New Blog Post: " A Very Early History of Algebraic Data Types"
Last week I said that this week's newsletter would be a brief history of algebraic data types.
I was wrong.
That history is now a 3500 word blog post.
I'm speaking at P99 Conf!
My talk, "Designing Low-Latency Systems with TLA+", is happening 10/23 at 11:30 central time. It's an online conf and the talk's only 16 minutes, so come check it out!
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.
Don't miss what's next. Subscribe to Computer Things:
This is a misunderstanding of what the word "tagged" means that permeates throughout the essay. An untagged union is one where values do not "remember" the variant used to construct them. The most well-known example is C unions:
union { int i; char* s; } v{ .s = "0" };
I cannot query
v
to see if it was created with.i
or.s
. I instead would need to keep track of that separately. If you bundle that "separately" with the data itself, you get a tagged union:struct { enum { INT, STR } tag; union { int i; char* s; } data; } v{ .tag = STR, .data = { .s = "0" } };
Now I can query
.tag
to see which variant it is, then access the correct member of.data
based on that. Higher-level languages do this same bundling for you "under the hood" for their tagged unions.(This is similar to how C arrays do not store the length alongside the array, but Java arrays do.)
You used "tag" to refer to the names given to each variant:
i
ands
. But tag actually refers to the fieldtag
. Note that it also doesn't refer to the enum labelsINT
andSTR
themselves either;tag
could have typeint
with values0
and1
instead. "Tag" really refers to the extra storage needed to store the tag value in run-time memory, compared to the lack thereof in the untagged union.The word you're looking for seems to be "labelled", used at least in the first three documents here: https://smlfamily.github.io/history/. These refer to the syntactic names that make things easier for programmers reading the code. A type like
Str + Str
lacks labels but still has an integer tag to distinguishLeft ":("
fromRight ":("
at run-time ((0, ":(")
vs(1, ":(")
).Pascal, in your linked paper, has both tagged and untagged unions: "discriminated" and "free" unions respectively. While Wirth thought having only tagged unions and not untagged unions was inflexible, he recommends using tagged unions when possible in the following sentence (in parentheses).
There was this though (from Milner 1978, bottom-right of page 123/PDF page 5):
absrectype α list = . + (α × α list) with nil = abslist(inl()) and $.(x,ℓ) = abslist(inr(x,ℓ)) and null(ℓ) = isl(replist(ℓ)) and hd(ℓ) = fst(outr(replist(ℓ))) and tl(ℓ) = snd(outr(replist(ℓ)))
So you could do:
absrectype contact = str + str with inemail(e) = abscontact(inl(e)) and inaddress(a) = abscontact(inr(a)) and isemail(c) = isl(repcontact(c)) and isaddress(c) = isr(repcontact(c)) and outemail(c) = outl(repcontact(c)) and outaddress(c) = outr(repcontact(c))
The built-in sum type only supported using functions like
isl
andoutr
rather than pattern matching, so custom sum types aren't any worse than the built-in one in this case.While this post appropriately focused on research about "real" programming languages, it'd be interesting for someone to also look into how this was discussed in fields "in between" set theory and real PLs, possibly either type theory or the theory of type systems. When I saw your BlueSky tweet, I decided to check out Martin-Löf's Pi and Sigma types, since those are obviously named after mathematical products and sums, and found out the paper came out in 1972. But interestingly, while he does use the
+
symbol for a binary sum, he doesn't use the word sum, but instead "disjoint union". So did theorists start calling them "sum types" before the papers you mentioned about real PLs did? (Or maybe PL theory didn't even exist in the same way it does today? IDK.)