Unusual basis types in programming languages
All languages have numbers, booleans, strings, and lists. What else is out there?
TLA+ Workshop
TLA+ workshop on Feb 12! Learn how to find complex bugs in software systems before you start building them. I've been saying the code NEWSLETTERDISCOUNT
gives $50 off, but that's wrong because I actually set it up for $100 off. Enjoy!
Unusual basis types in programming languages
Here are the five essential "basis" types in all modern programming languages:
- Booleans
- Numbers
- Strings
- Lists
- Structs
(I'm talking about these as general overarching concepts, so numbers includes doubles and integers, lists include arrays and tuples, etc. Languages often support several types in each category.)
These types are always supported with dedicated syntax, as opposed to being built out of the language abstractions. Some domains have additional basis types: systems languages have pointers, dynamic languages have key-value mappings, FP languages have first-class functions, etc. Outside of these, languages rarely add special syntax for other types, because syntax is precious and you can build everything out of structs anyway.
So it's pretty neat when it does happen. Here are some unusual basis types I've run into. This list isn't comprehensive or rigorous, just stuff I find interesting.
Sets
Seen in: Python, most formal methods languages
Sets are collections of unordered, unique elements. Sets are nice because you can add and subtract them from each other without worrying about duplicates. Many languages have them in the standard library but Python also gives them special syntax (shared with dictionaries).
# Python
>>> ["a", "b"] + ["a", "c"]
['a', 'b', 'a', 'c']
>>> {"a", "b"} | {"a", "c"}
{'c', 'b', 'a'}
>>> ["a", "b"] - ["b"]
TypeError: unsupported operand type(s) for -: 'list' and 'list'
>>> {"a", "b"} - {"b"}
{'a'}
Lots of things we use lists for are better represented with sets. Using a set tells the reader that ordering and duplicates don't matter to your program, which you can't always figure out from context. They're probably the thing I miss most in the average language, with symbols coming in a close second.
Symbols
Seen in: Lisps, Ruby, logic programming
Symbols are identifiers that are only equal to themselves. Most languages use strings for the same purpose, the difference being that strings can be manipulated while symbols can (in essence) only be compared.1
# Ruby
> puts "string1" + "string2"
string1string2
> puts :symbol1 + :symbol2
undefined method `+' for :symbol1:Symbol (NoMethodError)
So why use symbols if you already have strings? In Ruby it's because strings are mutable while symbols are not, meaning symbols are more space-efficient. This can really add up if you're making a lot of hash tables. The broader reason is that strings already do too many things. We use strings for identifiers, human writing, structured data, and grammars. If you instead use symbols for identifiers then you can be more confident a given string isn't an identifier.
Some logic programming languages— at least those that stem from Prolog— have symbols (called atoms) and not native strings. SWI-Prolog added proper strings in 2014.
Pairs
Raku actually has a few different unusual primitives, but pairs are the one that's most generally useful.2 A pair is a single key-value mapping, such that a hash table is just a set of pairs.
> (1 => 2){1}
2
> {"a" => 1, "b" => 2}.kv
(b 2 a 1)
I like pairs a lot. First of all, they satisfy the mathematician in me. You can pull the element out of a list, so why not pull the kv pair out of a mapping? Raku also does a lot of cool stuff with pairs. For example, :$x
is shorthand for 'x' => $x
and :foo
is short for :foo => True
. These make it easy to add optional flags to a function. I think keyword arguments are implemented as pairs, too.
Alloy also has sets of pairs ("relations") and a lot of cool ways to manipulate them. ^rel
returns the transition closure of rel
, while set <: rel
returns only the pairs in rel
that begin with elements of set
.
N-arrays
Seen in: APLs (like Dyalog, J, BQN), maybe Matlab?
In most languages a "2D" array is just an array containing more arrays. In APLs, a 2D array is just an array with two dimensions. Unlike the list-of-lists model, there's no "preferred axis", and you can mess with columns just as easily as you mess with rows.
NB. J
]x =: i. 4 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
NB. first row
0 { x
0 1 2 3
NB. first column
0 {"1 x
0 4 8 12
NB. grab the middle
(1 1 ,: 2 2) [;.0 x
5 6
9 10
Other things you can do: sort the rows by a specific column. Sort the columns by a specific row. Transpose the array so the rows are now columns. Reshape it into a 2x8 array. Multiply every element by 6.
Arrays can have any number of dimensions, though I find more than 2 dimensions awkward. If I have 3 or more I'm usually better off with a list of 2D arrays (via boxing). But 2D arrays are quite useful! Which is why the most popular 2D array by far is the dataframe.
Dataframes
Seen in: lil, nushell, Powershell, R (I think?)
Sort of a mix of a 2D array, a mapping, and a database table. Allow querying on both indexes (rows) and column names.
Python does not have dataframes as a basis type but pandas is the most popular dataframe library in use, so I'll use that for the demo:
import pandas as pd
df = pd.DataFrame({
'airport': ['ORD', 'LHR', 'MEL'],
'country': ['USA', 'UK', 'Australia'],
'year': [1944, 1946, 1970]
})
>>> df['airport']
0 ORD
1 LHR
2 MEL
>>> df.iloc[1]
airport LHR
country UK
year 1946
>>> df[df['year']>1960]
airport country year
2 MEL Australia 1970
Datatables are unsurprisingly extremely popular in data science, where you want to slice and dice data in all sorts of weird ways. But you also see it in some shell languages because being able to sort and filter ps/ls/grep is really really nice.
Things missing
- I don't know any stack or assembly languages, and I didn't look at NoCode/LowCode environments or DSLs.
- I bet there's a lot of unusual stuff in Mathematica, but I'm not paying $400 just to check.
- There's a lot more basis types I've heard of in languages I don't know: Go channels, lua userdata, SNOBOL patterns, raw bytes and byte streams.
- No syntax for graphs, which is unsurprising because there's no library support for graphs either. Working on the writeup of why now.
Feel free to email me with weird basis types you know about that I didn't list here. The criteria is pretty loose but the very broad qualifier is "is there syntax specifically for this type?"
-
In essence. Ruby supports a bunch of methods like
upcase
on symbols, which first convert it to a string, then upcases it, then converts it back to a symbol. ↩ -
The other really unusual type is the junction, which is a sort of superposition of boolean conditionals. They exist you can do
if ($x == 3|5)
instead ofif $x == 3 or $x == 5
. ↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.