June 26, 2025, 5:54 p.m.

Accelerating maps with join-with

Curiosities by Tensegritics

We designed ClojureDart maps to be more regular and deterministic than Clojure’s, making it easier to optimize batch operations.

For a while, that potential sat untapped, until a couple of months ago (yeah this issue sat in draft for too long).

Today we are going to talk maps, accelerating merge and introducing a new function, a real killer: join-with.

Join_With.png

We’re currently available for new work — short or long term.
You can work with either one of us, or both, depending on what you need. From a quick brain-pick session to a full prototype, MVP, or long-term project: we’d love to hear what you’re working on.

Just hit reply or get in touch!

When we’re not helping clients, we’re working on ClojureDart and Paktol — the positive spending tracker where money goes up 📈, we’ve just released a major new feature (anticipations) which makes it a complete and cohesive product.

Maps implementations differences

Clojure maps are conceptually tries with a 32 fan out. (The clever parts are how we navigate this trie using a hash and how we store sparse children arrays.)

Doing boolean operations on tries is conceptually easy: for a given child slot in two nodes, process the matching subtrees recursively, possibly short-circuiting on identical nodes or nils.

Clojure maps

In practice Clojure hash maps have three kinds of nodes: bitmap nodes, array nodes, collision nodes. Plus a node may need to be compared to an inline key-value pair. This makes a lot of cases to account for — and that's without counting on the fact that small maps are not hash maps but array maps.

ClojureDart maps

For ClojureDart I wanted to have a simpler hash map implementation:

  • No internal polymorphism: it has a single type of nodes: bitmap nodes (they are even reused for collision nodes — a collision node is just a node at maximum depth).
  • History independence: for a given keyset the layout is deterministic.
  • Nodes store the number of items they contain (this speeds up merging disjoint maps a lot).
  • Determining if a node is editable (owned by the transient) is done in a stateless way.

ClojureDart literal maps

What about small maps? In ClojureDart they are hash maps too, however we have one trick: literal maps for which keys are stable (strings, symbols, numbers, booleans etc.) have their layout determined at compilation time (by a macro).

This means that for many small maps (where there's no collision on the lowest hash code 5 bits — Birthday paradox applies: 50% of 7 entries maps) we allocate an array containing the interleaved keys and values in the right order (determined at compile time) with the right bitmap (determined at compile time). Making it almost as cheap as allocating an array map.

If your map requires two nodes or more they will be predetermined in the same way.

If not all your keys have safe hash code, then the infringing keys will be added at runtime using normal transient association.

Fast structural merge

The first function to benefit from ClojureDart simpler maps was merge — or conj between maps, as it's often overlooked that (conj {:a 1} {:b 2 :c 3}) is legit.

What does it mean for merge to be accelerated? First it means that (merge small-map big-map) is as fast as (merge big-map small-map).

It also means that a lot of hash computation and bit-twiddling and pointer traversal is avoided.

Together with predetermined literal maps this has some surprising results: (conj m {:a x :b y}) is now the fastest way to associate two keys on a map!

Generalizing merge-with

Instead of accelerating merge-with, we accelerated a new function join-with which is "merge-with with too many functions".

(join-with combine a b)
(join-with combine fa a b)
(join-with combine fa fb a b)
(join-with combine fa fb where? a b)

join-with can express any boolean operation on maps (and sets) combined with transformation and filtering, in a single lockstep walk of both maps.

join_venn.png

(combine va vb) combines values when k belongs to maps a and b.

(fa va) is the value when k belongs only to map a.

(fb vb) is the value when k belongs only to map b.

where? is a predicate applied to values returned by combine, fa or fb to determine if the entry should be included in the result.

Any of combine, fa or fb can be nil to exclude the corresponding keysets. identity is a special value for fa and fb: it's recognized and treated more efficiently than (fn [x] x).

What can join-with do?

Obviously you can express merge-with:

(defn merge-with [f a b]
  (join-with f identity identity a b))

More surprisingly update-vals:

(defn update-vals [f m]
  (join-with nil f m {}))

Despite update-vals taking only one map, the more efficient traversal and building of the output is still a gain!

Another surprise is select-keys: when the keys to preserve are given by a set (instead of a vector)

(defn select-keys* [m ks]
  (assert (set? ks))
  (join-with (fn [v _] v) m ks)))

You can implement diff, patch, merge3 and so on:

(defn diff [from to]
  (join-with
    (fn [from to]
      (when-not (= from to)
        [:<> from to]))
    (fn [from] [:- from])
    (fn [to] [:+ to])
    some?
    from to))

(defn patch [m diffs]
  (join-with
    (fn [v [op a b]]
      (assert (#{:- :<>} op) "unexpected op")
      (assert (= a v) "unexpected value")
      (case op
        :<> b
        :- diffs)) ; reused as sentinel as it can contain itself
      identity
      (fn [_] (assert false "unexpected op"))
      (fn [x] (not (identical? diffs x)))
    m
    diffs))

(defn merge3
  [resolve ancestor child1 child2]
  (patch ancestor
    (merge-with
      (fn [d1 d2]
        (if (= d1 d2)
          d1
          (resolve d1 d2)))
      (diff ancestor child1)
      (diff ancestor child2))))

The protocol behind

Behind join-with there's -join-with in IJoinable.

-join-with was interesting to design: like equals or equiv it expects another argument "compatible" with this. We are at the limits of single dispatch. Anyway the solution we chose is to hard code compatibilities (hash maps and hash sets are compatible but this can't be extended) and to have -join-with to return nil on incompatible arguments so that slower path or exceptions could be thrown.

Conclusion

join-with is a very versatile function and we'd like to extend it to sorted collections. However we would also like to explore other ways to process several maps at once, like a sequence or iterator which allows to seek up to another sequence or iterator in a compatible collection (here compatible meaning iteration order).

You just read issue #15 of Curiosities by Tensegritics. You can also browse the full archives of this newsletter.

Share on Twitter Share on LinkedIn Share on Hacker News Share on Reddit Share on Mastodon
Powered by Buttondown, the easiest way to start and grow your newsletter.