Today, as promised, a follow-up to Half Dumb Datalog in 30 loc.
In the aforementioned article we complained that it was impossible for our Datalog to not conclude that Bart Simpson is his own sibling. To fix this situation we need to introduce constraints (=
and not=
for a start) into our toy Datalog to be able to state that Bart's sibling can't be "equal" to himself.
Don't forget: when we're not busy writing toy Datalog implementations, we are here to help: from letting you pick our brains for a couple of hours, to handling a whole project or writing a prototype or a MVP. Get in touch!
Else we are working on ClojureDart or on our app Paktol (The positive spending tracker where money goes up!)βa big feature is coming π€«.
Before even considering implementation we should consider expression: currently we can't even state (not= x y)
, it would be interpreted as matching any triple, with not=
being considered a variable name.
In the first installment of this Datalog series, I wrote:
A pattern is a vector of simple values (this time, including symbols for variables).
So match
working on patterns represented by list is just an accident: let's reclaim lists for constraints!
A constraint is thus represented by a list whose first item is a symbol identifying the constraint type, further items are simple values (including symbols for variables).
Let's assume a constrain
function taking two arguments [env constraint]
and returning an environment with the constraint or nil when the constraint fails to apply.
Equipped with constrain
, we can update match-patterns
:
(defn constrain [env constraint] ; TODO
nil)
(defn match-patterns [patterns dfacts facts]
(reduce
(fn [[envs denvs] pattern]
(if (seq? pattern) ; π if constraint, apply constraintπ
[(->> @envs (keep #(constrain % pattern)) set delay)
(->> denvs (keep #(constrain % pattern)) set)]
[(-> #{} (into (for [fact facts env @envs] (match pattern fact env))) (disj nil) delay)
(-> #{}
(into (for [fact facts env denvs] (match pattern fact env)))
(into (for [fact dfacts env denvs] (match pattern fact env)))
(into (for [fact dfacts env @envs] (match pattern fact env)))
(disj nil))]))
[(delay #{{}}) #{}] patterns))
Now we have to find a way to represent constraints inside environments maps.
Up to now, environment maps were mapping variable (symbols) to values. Now they are going to map to values or constraints sets. For simplicity we'll represent constraints sets by actual sets so it means that sets are not valid values any more -- we should generally consider that values can't be collections.
A key insight is that when one wants to add a constraint (say (not= a b)
) to an environment then this constraint won't be evaluated until both a
and b
are bound. Thus we can bind the constraint to the first free (unbound) variable:
(defn constrain [env [op & args]]
(let [args (map #(let [v (env % %)] (if (set? v) % v)) args)]
(if-some [free-var (->> args (filter symbol?) first)]
(update env free-var (fnil conj #{}) (cons op args))
(when (apply (case op not= not= = =) args) env))))
Now we need to consider what will happen when a value is bound to a constrained variable but, first, let's extract bind
out of match
:
(defn bind [env p v]
(let [p-or-v (env p p)]
(cond
(= p '_) env
(= p-or-v v) env
(symbol? p-or-v) (assoc env p v))))
(defn match [pattern fact env]
(when (= (count pattern) (count fact))
(reduce (fn [env [p v]] (or (bind env p v) (reduced nil)))
env (map vector pattern fact))))
When we try to bind a value to constrained variable V
, we find in the environment map a set of constraints whose first free variable is V
(by definition of constrain
). Thus if we bind V
to an actual value, we can re-add all constraints (this will either assign them to other free variables or evaluate the constraint if all variables are bound):
(defn bind [env p v]
(let [p-or-v (env p p)]
(cond
(= p '_) env
(= p-or-v v) env
(symbol? p-or-v) (assoc env p v)
(set? p-or-v) (reduce constrain (assoc env p v) p-or-v)))) ; π
If you are not convinced, see:
=> (constrain {} '(not= a b))
{a #{(not= a b)}} ; a is the first free var of the constraint, so it's the one being constrained
=> (bind *1 'a 42)
{a 42, b #{(not= 42 b)}} ; a is now bound and the partially evaluated constraint is put on b which has become the first free var
=> (bind *1 'b 33)
{a 42, b 33} ; success: b is bound to a value for which the constraint holds
=> (bind *2 'b 42) ; mind the *2: it's {a 42, b #{(not= 42 b)}} again
nil ; failure: b is bound to a value for which the constraint doesn't hold
We can now compute the siblings of Bart without having him among the results:
=> (q edb
'([s] [:sibling :bart s])
'[([:parent c p] [:father c p])
([:parent c p] [:mother c p])
([:sibling c c'] [:parent c p] (not= c c') [:parent c' p])])
#{[:lisa] [:maggie]}
I've deliberately put the constraint between the two "calls" to :parent
to insist on the fact that its position doesn't matter, that we can constrain variables without knowing the evaluation order of the query. It's a desirable property as it makes the language more declarative.
We were able to add declarative constraints to our toy Datalog without changing the code too much.
In next installments of this series I'd like to cover among other ideas negation, indices, magic sets and aggregations.
Negation and aggregation would enhance the expressiveness of our Datalog while indices and magic sets would make it practically usable.
If you like this series (or this newsletter in general) don't forget to share it widely.
(defn constrain [env [op & args]]
(let [args (map #(let [v (env % %)] (if (set? v) % v)) args)]
(if-some [free-var (->> args (filter symbol?) first)]
(update env free-var (fnil conj #{}) (cons op args))
(when (apply (case op not= not= = =) args) env))))
(defn bind [env p v]
(let [p-or-v (env p p)]
(cond
(= p '_) env
(= p-or-v v) env
(symbol? p-or-v) (assoc env p v)
(set? p-or-v) (reduce constrain (assoc env p v) p-or-v))))
(defn match [pattern fact env]
(when (= (count pattern) (count fact))
(reduce (fn [env [p v]] (or (bind env p v) (reduced nil)))
env (map vector pattern fact))))
(defn match-patterns [patterns dfacts facts]
(reduce
(fn [[envs denvs] pattern]
(if (seq? pattern)
[(->> @envs (keep #(constrain % pattern)) set delay)
(->> denvs (keep #(constrain % pattern)) set)]
[(-> #{} (into (for [fact facts env @envs] (match pattern fact env))) (disj nil) delay)
(-> #{}
(into (for [fact facts env denvs] (match pattern fact env)))
(into (for [fact dfacts env denvs] (match pattern fact env)))
(into (for [fact dfacts env @envs] (match pattern fact env)))
(disj nil))]))
[(delay #{{}}) #{}] patterns))
(defn match-rule [dfacts facts [head & patterns]]
(for [env (second (match-patterns patterns dfacts facts))]
(into [] (map #(env % %)) head)))
(defn saturate [facts rules]
(loop [dfacts facts, facts #{}]
(let [facts' (into facts dfacts)
dfacts' (into #{} (comp (mapcat #(match-rule dfacts facts %)) (remove facts')) rules)]
(cond->> facts' (seq dfacts') (recur dfacts')))))
(defn q [facts query rules]
(-> facts (saturate rules) (match-rule #{} query) set))