New Blog Post: " A Very Early History of Algebraic Data Types"

If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.

        September 25, 2025

New Blog Post: " A Very Early History of Algebraic Data Types"

            Last week I said that this week's newsletter would be a brief history of algebraic data types.
I was wrong.
That history is now a 3500 word blog post.
Patreon blog notes here.

I'm speaking at P99 Conf!
My talk, "Designing Low-Latency Systems with TLA+", is happening 10/23 at 11:30 central time. It's an online conf and the talk's only 16 minutes, so come check it out!

            If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.

                                Don't miss what's next. Subscribe to Computer Things:

          Join the discussion:

                            Y

                                    YawarRaza7349

                        September 26, 2025, afternoon

Notably this is the first place we see “tagged” algebraic types, where both sums and products have string tags. 

This is a misunderstanding of what the word "tagged" means that permeates throughout the essay. An untagged union is one where values do not "remember" the variant used to construct them. The most well-known example is C unions:
union { int i; char* s; } v{ .s = "0" };
I cannot query v to see if it was created with .i or .s. I instead would need to keep track of that separately. If you bundle that "separately" with the data itself, you get a tagged union:
struct {  
  enum { INT, STR } tag;  
  union { int i; char* s; } data;  
} v{ .tag = STR, .data = { .s = "0" } };
Now I can query .tag to see which variant it is, then access the correct member of .data based on that. Higher-level languages do this same bundling for you "under the hood" for their tagged unions.
(This is similar to how C arrays do not store the length alongside the array, but Java arrays do.)
You used "tag" to refer to the names given to each variant: i and s. But tag actually refers to the field tag. Note that it also doesn't refer to the enum labels INT and STR themselves either; tag could have type int with values 0 and 1 instead. "Tag" really refers to the extra storage needed to store the tag value in run-time memory, compared to the lack thereof in the untagged union.
The word you're looking for seems to be "labelled", used at least in the first three documents here: https://smlfamily.github.io/history/. These refer to the syntactic names that make things easier for programmers reading the code. A type like Str + Str lacks labels but still has an integer tag to distinguish Left ":(" from Right ":(" at run-time ((0, ":(") vs (1, ":(")).

Pascal did not have sum types because Wirth thought they were less flexible than untagged unions.

Pascal, in your linked paper, has both tagged and untagged unions: "discriminated" and "free" unions respectively. While Wirth thought having only tagged unions and not untagged unions was inflexible, he recommends using tagged unions when possible in the following sentence (in parentheses).

ML had the abstype keyword for naming constructed types. You could define EmailOrAddress to be Str + Str but couldn’t call the left Str “email”.

There was this though (from Milner 1978, bottom-right of page 123/PDF page 5):
absrectype  α list = . + (α × α list)  
     with  nil = abslist(inl())  
      and  $.(x,ℓ) = abslist(inr(x,ℓ))  
      and  null(ℓ) = isl(replist(ℓ))  
      and  hd(ℓ) = fst(outr(replist(ℓ)))  
      and  tl(ℓ) = snd(outr(replist(ℓ)))
So you could do:
absrectype  contact = str + str  
     with  inemail(e) = abscontact(inl(e))  
      and  inaddress(a) = abscontact(inr(a))  
      and  isemail(c) = isl(repcontact(c))  
      and  isaddress(c) = isr(repcontact(c))  
      and  outemail(c) = outl(repcontact(c))  
      and  outaddress(c) = outr(repcontact(c))
The built-in sum type only supported using functions like isl and outr rather than pattern matching, so custom sum types aren't any worse than the built-in one in this case.

He does not name them “sum and product types” but I think it’s believable that they are named after the corresponding disjoint union and Cartesian product set operations. I’m not going to dig into the history of those names but I’d guess they were named in analogy to arithmetic sum and product.

While this post appropriately focused on research about "real" programming languages, it'd be interesting for someone to also look into how this was discussed in fields "in between" set theory and real PLs, possibly either type theory or the theory of type systems. When I saw your BlueSky tweet, I decided to check out Martin-Löf's Pi and Sigma types, since those are obviously named after mathematical products and sums, and found out the paper came out in 1972. But interestingly, while he does use the + symbol for a binary sum, he doesn't use the word sum, but instead "disjoint union". So did theorists start calling them "sum types" before the papers you mentioned about real PLs did? (Or maybe PL theory didn't even exist in the same way it does today? IDK.)

Reply
Report

          Add a comment:

            Share this email:

                            Share on Facebook

                            Share on LinkedIn

                            Share on Hacker News

                            Share on Bluesky