Array logic operates by a
fill
algorithm. Fill the ds with the smaller rank to the be the same shape as the ds with the larger.
I recently released—so to speak—a new project that has occupied the last several weeks of my free time: a calculator program, called EC.
I decided to build it for a few reasons. Probably the first reason is that it seemed to be a good use-case for the project I had been working on for the prior few weeks: Fugue, an object system for the Janet programming language. I had settled on the basic approach and feature set, and built a little proof of concept, but obviously if I was going to herald this as the spiritual successor to CLOS it would behoove me to put it through its paces more substantially.
A little desk calculator struck me as a nice, mid-sized project, with enough potential for object-orientation that it could potentially benefit from what Fugue offers. It’s quite difficult to think of useful software to write; it’s much easier to think of useful software libraries, that will help other people make useful software.
After I thought of it, it seemed like I could create a sequel to a
previous project of mine, ad
1. I use ad
all the
time whenever I need to do one-off arithmetic, so I actually have a
vague sense of what I’d like to add to it.
The J language, of which I am a great admirer, though essentially hopeless as a practitioner, has a thorough and powerful array orientation. You can do all kinds of brilliant things with arrays and matrices, manipulating multi-dimensional groups of numbers just as though they were individual numbers. Here’s a tiny snippet:
(2 3 $ 3 0 0) + 1
4 1 1
4 1 1
In the first line, we take the array 3 0 0
and shape it into the
shape 2 3
, that is, into a two-dimensional matrix with two rows and
three columns. We then add 1 to that entire matrix, which is
equivalent to adding 1 to each element of the matrix.
In J, there isn’t any need to map a function like (lambda (x) (+ x 1))
over the matrix; the +
operator is “matrix-aware” and has a
built-in semantics for how to apply itself to a 2d matrix on the left
side and a single number on the right.2
It would be useful to be able to incorporate this kind of behaviour
into ad
‘s successor. EC is still an RPN, stack-based program, and
thus we can’t have J’s pervasive array-orientation; in the J example
we get an array “for free” any time we separate two or more numbers
with a space. We need spaces to separate all tokens in a stack-based
language. Nevertheless, the ability to add, multiply, square,
etc. operands of any dimensions would be very much in keeping for a
desk calculator designed to be powerful and convenient.
When I decided I’d like to build this program, my mind set off on it and wouldn’t be dissuaded. The quotation at the top of this article is a text I sent to myself on February 17 at 11:35pm, after a sleepless hour or so (typo included). Surprisingly, the initial commit is not until the next day. So I had started entirely conceptually.
In point of fact, I’m quite weak when it comes to algorithmic thinking. My mind starts swimming and losing track of things very easily. And I had no idea how you might accomplish this kind of “dimension-agnostic” logic; it seemed kind of magic the last time I had played around in the J REPL.
That said, I did come primed with one piece of knowledge: in J, arrays
have a shape, which is itself a 1-dimensional array listing the
length of each dimension of the original array. For instance, as
above, an array with shape 2 3
is a 2-by-3 matrix; one with 2 3 3
is a 3-dimensional, 2-by-3-by-3 one. We can follow that to its logical
conclusion: if a simple list is a 1-dimensional array, its shape
should also be a list, with a single element, which is the length of
the list. Finally, that means that the shape of a simple number should
be the empty list.
The key to the algorithm that occurred to me, like so many of them, feels kind of like a cheat. I simply can’t conceptualize of a way to “apply” an array of dimensionality N to an array of dimensionality M. Addition, to take the simplest operation I can think of, only makes sense to me when applied to two numbers. And trying to somehow imagine how to “do the right thing” when one of the operands is a list and the other is a 4-dimensional matrix just hurts my head. Therefore, let’s approach that problem from the other direction; instead of figuring out how to add a list and a matrix, or a matrix and a number, what if we just made them both the same exact shape? Then you’d have a one-to-one correspondence between elements and the actual addition would be trivial.
Again, I can barely conceptualize what it would mean to “resize” a
list into a 4-d matrix. But it’s pretty easy to conceptualize what
resizing 1
looks like in the example (2 3 $ 3 0 0) + 1
. You just
repeat 1
a bunch of times until you’ve filled out the whole
shape. It seems incredibly wasteful to create all those 1
s, but
conceptually it’s quite simple.
If we follow that thread, we can start to generalize the concept
without hurting our brains too much. If filling 1
out to the size of
a 6-element list is simply repeating 1
6 times, then filling 3 0 0
to a matrix of shape 2 3
should be simply repeating 3 0 0
twice. And indeed it is:
2 3 $ 3 0 0
3 0 0
3 0 0
At this point we depart from J as a useful model. In fact, J has a sophisticated and flexible approach to applying dyadic operators between arrays of differing shapes. It allows the user to select the “rank” across which they want to apply the operator, so that the behaviour implemented in EC is only one of the possible solutions to the problem.
That said, the fill algorithm provides a generalizable approach to the problem of applying an operation to operands of differing shape. Given any two vector-or-numbers, fill the smaller-dimensioned one to the shape of the greater-dimensioned one and then apply the operation pairwise.
It’s important to note that this imposes a significant constraint: the
shape of the smaller one has to be a suffix of the shape of the
greater one. For instance, a vector of shape 3
could be filled to a
matrix of shape 2 3
by repeating it twice; however, a vector of
shape 4
couldn’t be filled to 2 3
.
I’m happy with this. It addresses the most important case, that of
operating on a matrix of any arbitrary size and a single number (the
shape of a single number is []
, the empty vector, which is trivially
a suffix of any other shape), and there’s no special-casing for any
specific dimensionality.
fill
in JanetOnce I had that basic conceptual orientation, I actually found myself articulating the entire algorithm (in prose) in texts to myself. It may simply be a trick of the mind, but to me there is some proof in the “rightness” (I hesitate to say “correctness”!) of the algorithm in the fact that it seemed to unfold naturally, at a conceptual level, once I had oriented my thinking correctly around the topic. The resulting code is a very faithful translation of the texts I sent to myself.3
What leaves me feeling good about this as an approach is that the code
for fill
is concise and elegant. Here is the function that, given a
Vector object and a shape to fill it to (expressed as an array of
numbers), returns a new Vector of the original data filled to the new
shape:
(defmethod fill Vector
[self shape-to-fill]
(let [x (array ;(shape self))
y (array ;shape-to-fill)]
(while (not (empty? x))
(let [xi (array/pop x)
yi (array/pop y)]
(unless (= xi yi)
(errorf "Shape error: can't fill vector with shape %q to %q"
(shape self) shape-to-fill))))
(reduce (fn [acc length]
(let [new-shape (tuple length ;(shape acc))
new-data (array/new-filled length acc)]
(:new Vector new-shape new-data)))
self
(reverse y))))
The algorithm is very short:
n
times on each step.I can now demonstrate the above J behaviour as translated into EC.
EC, it should be noted, is written in RPN notation, rather than J’s
infix notation, so the operations should be read from left to
right. EC Vectors are denoted with square brackets []
, as spaces
separate stack tokens. The value in between the angle brackets <>
on
the left side of the prompt show the current top value in the stack.
<> $ [3 0 0] [2 3] fill
<[[3 0 0] [3 0 0]]> $ 1 +
<[[4 1 1] [4 1 1]]> $
I’ve done the operation in two steps, so you can see the matrix that
1 +
is applied against; it can just as easily be written [3 0 0] [2 3] fill 1 +
. We see the same sequence as in the J code: you can
fill
any data structure to any shape, subject to the suffix
constraint named above; and any two data structures can be applied to
an operator like +
as long as the smaller-shaped of the two—in this
case, the number 1
—is fillable to shape of the larger.
We see above an example of the Fugue framework in the form of
defmethod
. In the EC code, Vector
is a Fugue prototype, and
fill
is a method specialized to Vector
. Janet’s OO functionality
is built on prototypal inheritance (as opposed to class-based
inheritance), and thus in Fugue one defines prototype hierarchies. In
EC, Vector
is a child of Quotation
, which is a child of Element
.
Fugue offers two ways to define prototype-based behaviour, corresponding to the two major types of behaviour specialization in object-oriented systems: methods and multimethods.
Methods are functions which are assigned directly to prototype
objects (in prototypal inheritance, there are no distinct classes,
only objects which other objects designate as their prototypes), and
which are therefore inherited by any descendant object of the
prototype. Janet provides a native way of calling these assigned
functions; syntax like (:foo bar baz)
, where :foo
is a keyword
instead of a regular symbol, translates to ((get bar :foo) bar baz)
.
In EC, we implement fill
as a method on Number
and Vector
, so a
call like (:fill (:new Integer 4) [2])
will be
prototype-aware. (Integer
, is a child of Number
, of course!)
The defmethod
macro also defines fill
as a function that calls
:fill
, for added convenience.
Multimethods can be specialized against the types (or prototypes) of
all of their arguments. Instead of being defined for some prototype
(and inserted as a method in that prototype table, to be inherited by
its descendants), any individual multimethod instance is defined for
the type of each of its arguments. Thus, for instance, a multimethod
might be defined separately against [Quotation Number]
and [Vector Number]
.4 5
EC has turned out to be an excellent testbed for developing and testing Fugue; it offered some natural opportunities to use almost all of Fugue’s features, and over the course of developing EC I hit a bunch of new problems that suggested natural extensions of the existing feature set. By the time I was finished, I felt quite comfortable defining the feature set of Fugue 1.0 as “that which is sufficient to build EC”.
ad
is so-called because it comes after bc
, the
venerable calculator program. My partner suggested ce
as further
development of the naming scheme, which I think is quite
good. Somewhere, however, between that suggestion and my creating
the repo, I unconsciously transposed the letters, and didn’t
notice until a little while later. ↩
In fact (unsurprisingly), the array behaviours of J are far, far richer than in this example, and far richer than what will be demonstrated in the rest of the post. While it’s possible that we might be able to incrementally increase the power of shaping and filling in EC, we will certainly not take a language as powerful and subtle as J as our point of comparison. ↩
I think of this as operating along the horizontal axis rather than the vertical; multimethods trade the ability to exploit inheritance (a vertical orientation) in dispatch for the ability to dispatch across the entire (horizontal) sequence of argument types together. ↩
For completeness’s sake, I’ll mention that multimethods come in two flavors in Fugue: closed and open. Closed multimethods are defined within a single module and can’t be extended by other callers; thus, they’re roughly analogous to a multi-function-head, pattern-matching system like what you get in Erlang and Elixir.
Open multimethods, on the other hand, are declared once and then can be extended with new cases from anywhere in the codebase. They’re thus closer to what are sometimes known as protocols, in that they provide a way to extend the behaviour of a function with respect to data types that didn’t exist when the function was declared.
In EC there are examples of both: pretty-printing terms is defined
as a multimethod, while the push
function (which handles pushing
some term to the stack) is an open multimethod that picks up new
definitions as new data types are developed. ↩