We designed ClojureDart maps to be more regular and deterministic than Clojure’s, making it easier to optimize batch operations.
For a while, that potential sat untapped, until a couple of months ago (yeah this issue sat in draft for too long).
Today we are going to talk maps, accelerating merge
and introducing a new function, a real killer: join-with
.
Clojure maps are conceptually tries with a 32 fan out. (The clever parts are how we navigate this trie using a hash and how we store sparse children arrays.)
Doing boolean operations on tries is conceptually easy: for a given child slot in two nodes, process the matching subtrees recursively, possibly short-circuiting on identical nodes or nil
s.
In practice Clojure hash maps have three kinds of nodes: bitmap nodes, array nodes, collision nodes. Plus a node may need to be compared to an inline key-value pair. This makes a lot of cases to account for — and that's without counting on the fact that small maps are not hash maps but array maps.
For ClojureDart I wanted to have a simpler hash map implementation:
What about small maps? In ClojureDart they are hash maps too, however we have one trick: literal maps for which keys are stable (strings, symbols, numbers, booleans etc.) have their layout determined at compilation time (by a macro).
This means that for many small maps (where there's no collision on the lowest hash code 5 bits — Birthday paradox applies: 50% of 7 entries maps) we allocate an array containing the interleaved keys and values in the right order (determined at compile time) with the right bitmap (determined at compile time). Making it almost as cheap as allocating an array map.
If your map requires two nodes or more they will be predetermined in the same way.
If not all your keys have safe hash code, then the infringing keys will be added at runtime using normal transient association.
merge
The first function to benefit from ClojureDart simpler maps was merge
— or conj
between maps, as it's often overlooked that (conj {:a 1} {:b 2 :c 3})
is legit.
What does it mean for merge
to be accelerated? First it means that (merge small-map big-map)
is as fast as
(merge big-map small-map)
.
It also means that a lot of hash computation and bit-twiddling and pointer traversal is avoided.
Together with predetermined literal maps this has some surprising results: (conj m {:a x :b y})
is now the fastest way to associate two keys on a map!
merge-with
Instead of accelerating merge-with
, we accelerated a new function join-with
which is "merge-with
with too many functions".
(join-with combine a b)
(join-with combine fa a b)
(join-with combine fa fb a b)
(join-with combine fa fb where? a b)
join-with
can express any boolean operation on maps (and sets) combined with transformation and filtering, in a single lockstep walk of both maps.
(combine va vb)
combines values when k
belongs to maps a
and b
.
(fa va)
is the value when k
belongs only to map a
.
(fb vb)
is the value when k
belongs only to map b
.
where?
is a predicate applied to values returned by combine
, fa
or fb
to determine if the entry should be included in the result.
Any of combine
, fa
or fb
can be nil
to exclude the corresponding keysets. identity
is a special value for fa
and fb
: it's recognized and treated more efficiently than (fn [x] x)
.
join-with
do?Obviously you can express merge-with
:
(defn merge-with [f a b]
(join-with f identity identity a b))
More surprisingly update-vals
:
(defn update-vals [f m]
(join-with nil f m {}))
Despite update-vals
taking only one map, the more efficient traversal and building of the output is still a gain!
Another surprise is select-keys
: when the keys to preserve are given by a set (instead of a vector)
(defn select-keys* [m ks]
(assert (set? ks))
(join-with (fn [v _] v) m ks)))
You can implement diff
, patch
, merge3
and so on:
(defn diff [from to]
(join-with
(fn [from to]
(when-not (= from to)
[:<> from to]))
(fn [from] [:- from])
(fn [to] [:+ to])
some?
from to))
(defn patch [m diffs]
(join-with
(fn [v [op a b]]
(assert (#{:- :<>} op) "unexpected op")
(assert (= a v) "unexpected value")
(case op
:<> b
:- diffs)) ; reused as sentinel as it can contain itself
identity
(fn [_] (assert false "unexpected op"))
(fn [x] (not (identical? diffs x)))
m
diffs))
(defn merge3
[resolve ancestor child1 child2]
(patch ancestor
(merge-with
(fn [d1 d2]
(if (= d1 d2)
d1
(resolve d1 d2)))
(diff ancestor child1)
(diff ancestor child2))))
Behind join-with
there's -join-with
in IJoinable
.
-join-with
was interesting to design: like equals
or equiv
it expects another argument "compatible" with this
. We are at the limits of single dispatch. Anyway the solution we chose is to hard code compatibilities (hash maps and hash sets are compatible but this can't be extended) and to have -join-with
to return nil
on incompatible arguments so that slower path or exceptions could be thrown.
join-with
is a very versatile function and we'd like to extend it to sorted collections. However we would also like to explore other ways to process several maps at once, like a sequence or iterator which allows to seek
up to another sequence or iterator in a compatible collection (here compatible meaning iteration order).