Today to change from heavy interop and frameworks, let's do some light coding exercise and implement the most amateur datalog engine by taking any shortcut we see fit!
Don't forget when we're not busy writing silly Datalog implementations, we are available to help you with Clojure or working to get our app Paktol (The positive spending tracker where money goes up!) off the ground.
The first question is how to represent facts, rules and databases.
A fact will be represented by a vector of simple values (but no symbols, we reserve symbols for variables).
[:father :bart :homer]
A rule will be represented by a list of patterns. The first pattern being the head of the rule (its "conclusion"), the rest of the rule being its body. A pattern is a vector of simples values (this time, including symbols for variables).
;; parent(c, p) :- father(c, p)
([:parent c p] [:father c p])
;; parent(c, p) :- mother(c, p)
([:parent c p] [:mother c p])
The database in Datalog is conceptually split in two: the EDB (the extensional database, every recorded fact — tables in SQL) and the IDB (the intensional database, every deduced fact — views in SQL). Let's lump them together as a single set of facts! What could possibly go wrong? 🤷♂️
The result of matching a pattern against a fact is going to be nil (no match) or a map of variables to their values (let's call these bindings an environment).
However further matches in a rule will have to obey environments created by earlier matches. That's why we also pass an upstream environment to match
: (defn match "Returns updated env or nil." [pattern fact env] ...)
(defn match [pattern fact env]
(when (= (count pattern) (count fact))
(reduce (fn [env [p v]]
(let [p-or-v (env p p)]
(cond
(= p '_) env
(= p-or-v v) env
(symbol? p-or-v) (assoc env p v)
:else (reduced nil))))
env (map vector pattern fact))))
You may have realized that unlike textbook datalog the predicate name isn't special and is treated as any other slot of facts vectors. This will prove useful when implementing q
at the end.
Now we need to match a patterns chain against all known facts:
(defn match-patterns [patterns facts]
(reduce
(fn [envs pattern]
(-> (for [fact facts env envs] (match pattern fact env))
set (disj nil)))
#{{}} patterns))
From this we can infer new facts by turning envs into facts by simply replacing vars appearing in the head by their values in environments.
(defn match-rule [facts [head & patterns]]
(for [env (match-patterns patterns facts)]
(into [] (map #(env % %)) head)))
We repeatedly apply all rules until no new facts can be derived. This saturation is detected by comparing sizes because it's cheaper than an equality check.
(defn saturate [facts rules]
(let [facts' (into facts (mapcat #(match-rule facts %)) rules)]
(if (< (count facts) (count facts'))
(recur facts' rules)
facts)))
A query is just an anonymous rule that we run after the others so that we get only its results.
(defn q [facts query rules]
(-> facts (saturate rules) (match-rule query) set))
The Simpsons family:
(def edb
#{[:father :bart :homer]
[:mother :bart :marge]
[:father :lisa :homer]
[:mother :lisa :marge]
[:father :maggie :homer]
[:mother :maggie :marge]
[:father :homer :abe]
[:mother :homer :mona]})
Some sample rules:
(def rules
'[([:parent c p] [:father c p])
([:parent c p] [:mother c p])
([:grand-parent c gp] [:parent p gp] [:parent c p])
([:ancestor c p] [:parent c p])
([:ancestor c ancp] [:ancestor c anc] [:parent anc ancp])])
And now the queries:
user=> (q edb
'([gp] [:grand-parent :bart gp])
rules)
#{[:mona] [:abe]}
user=> (q edb
'([anc] [:ancestor :bart anc])
rules)
#{[:homer] [:marge] [:mona] [:abe]}
🎉 Here we are: datalog in 26 lines of code!
Datalog is minimal—almost to a fault. It’s like a game of Jenga: stack too many extensions on top, and the whole thing could come crashing down. That’s why there’s a rich literature on adding just enough to make it useful without tipping over.
So here we are with a seed implementation of a seed language. We have a starting point and can grow it in any directions we are interested in.
Share online if you'd like to see more open ended exploratory code (including growing this datalog)!
(defn match [pattern fact env]
(when (= (count pattern) (count fact))
(reduce (fn [env [p v]]
(let [p-or-v (env p p)]
(cond
(= p '_) env
(= p-or-v v) env
(symbol? p-or-v) (assoc env p v)
:else (reduced nil))))
env (map vector pattern fact))))
(defn match-patterns [patterns facts]
(reduce
(fn [envs pattern]
(-> (for [fact facts env envs] (match pattern fact env))
set (disj nil)))
#{{}} patterns))
(defn match-rule [facts [head & patterns]]
(for [env (match-patterns patterns facts)]
(into [] (map #(env % %)) head)))
(defn saturate [facts rules]
(let [facts' (into facts (mapcat #(match-rule facts %)) rules)]
(if (< (count facts) (count facts'))
(recur facts' rules)
facts)))
(defn q [facts query rules]
(-> facts (saturate rules) (match-rule query) set))