What's between a set and a sequence?
Sets -> {???, ???} -> Sequences
I finally updated the alloydocs to Alloy 6. The docs now cover how to use temporal operators in Alloy.
To celebrate, let's talk something completely unrelated. The simplest kind of collection is the set: an unordered collection of unique elements. All branches of mathematics use sets somewhere, and in fact you can bootstrap all other collection types from just sets. Many programming languages have set types or something that mimics sets, but it's less common than sequences, which are ordered collections of repeatable elements. ≈All programming languages have some form of sequence.
If sequences are sets + two properties, what are sets + one property?
Sets plus repetition: Bags
Also called multisets, counters, or accumulations. A bag is a set where elements can appear many times, aka have a "multiplicity". The set {a, b, c} can be considered equivalent to the bag {a¹, b¹, c¹}. In programming, if your standard library doesn't have a counter type, you can represent them with a mapping, where the bag {a², b⁵, c¹} is the map {a: 2, b: 5, c: 1}.
Most set operations map cleanly to bag operations. For example, b₁ is a subbag of b₂ if b₂ has every element, with an equal or greater multiplicity, than b₁ has. Other operations "split" prismlike into multiple related operations. The only way to merge {a, b} and {b, c} is the union operation (giving {a, b, c}), while there are four useful ways to merge {a¹, b²} with {b¹, c¹}: max-union ({a¹, b², c¹}), left merge (same), right merge ({a¹, b¹, c¹}), and sum ({a¹, b³, c¹}).
I often see bags used for analytics, like "how many times did we log this event" or "how many times have we called this function."
Sets plus ordering: ordered sets???
This isn't really a thing common in mathematics. You can apply a "total ordering" to a set, but that's ≈always based on properties of the elements: there's no easy way to say to write "the set {a, b, c} where the order is [c, a, b]". Usually mathematicians call it a "pairwise disjoint sequence" and leave it at that.
But predicative solutions are not conducive to good programming, so there's a distinct programming notion of the "ordered set". It seems this is most commonly called… "ordered set". Some operations like lookup are more efficient on ordered sets, so they're used in some efficient data structure implementations. A more "interesting" use I found is that Kubernetes uses an ordered set to import plugins from multiple sources, in the proper order, without double-initializing a plugin.
Why this matters
We should strive to use the least powerful collection our data needs. When you store your data in a set, you are guaranteeing the data is unique and unordered. This means
- You never have to worry about a duplicate element bug
- There are fewer edge cases you need to test
- Other people can more easily reason about, and refactor, your code.
(If you're doing formal methods, you want to pay extra attention to using the weakest possible collection. More general types are harder to prove stuff about, and they generate more states during model checking, so they're less efficient too.)
Ordered sets and bags are useful as intermediate power-steps between sets and sequences. Sometimes you can't use a set because you need multiplicity, but you don't need ordering, so you should use a bag instead of a sequence. That said, I much more often encounter sequences-that-could-be-sets than sequences-that-could-be-bags-slash-ordered-sets.
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.