The Official NULL BITMAP Glossary: Graph Theory Edition
Last week a post of mine made it to God’s favourite website and one thing I was struck by was how many people disagreed about basic graph theory terminology. Some people also seemed to complain about the lack of an introduction. To be clear: I have no intention of changing the expositional style of NULL BITMAP, at least explicitly or consciously, to satisfy anyone’s tastes but my own. NULL BITMAP is definitionally for you if you enjoy it and there isn’t really another criteria I have in mind. NULL BITMAP is also decidedly intended to be "entertainment" and not "educational," which influences how much exposition I am willing to devote to background knowledge.
That said, this week we are going to do an official NULL BITMAP glossary of terms related to graph theory. Because I thought that would be fun. And I want to relive my peak glory days of taking a graph theory class. These are the terms and definitions I use and I generally try to use them consistently. If you know these ideas by different names or these names by different ideas then you are valid, you are heard, but in my house, you are wrong.
A simple graph is a pair (V, E)
of a vertex set and an edge set. The edge set is a set of unordered pairs of vertices. We can draw a graph like so:
In the words of my graph theory professor Kevin, “the definition is what a graph is. You can not ever forget that. But when I and everyone else in the world think of a graph, we think of this picture.”
That guy actually shaped a whole lot of my personal mathematical thinking. I'm not sure if he realized that or if that was his goal or what, but I can't think of a professor I had that had a bigger influence on how I think about math and programming and problems in general. I remember one time he was telling us what an equivalence relation was. And he was like, man, everyone always says oh, yeah, you gotta have reflexivity, transitivity, commutativity, all this crap (I'm paraphrasing). That's not way to think about it. Look, an equivalence relation just says, you got a bunch of buckets. Every one of your objects, you toss it in a bucket. The buckets are called equivalence classes and two objects in the same bucket are equivalent. Done.
You should generally strive to have this kind of understanding of mathematical ideas (this is not an uncommon belief): you need to understand the underlying machinery to resolve times when your intuitions fail you, but for actual day-to-day thinking about ideas we operate on abstractions. Anyway, Kevin was cool. Shouts out to that guy.
Generally when we say “graph” devoid of context and without any other qualification, we mean a simple graph. It’s pretty easy to see how to tweak this definition to get other kinds of graphs, though. A directed graph changes the edge set to be over ordered pairs, rather than unordered pairs. A hypergraph says that the edges can be sets of any size (and corresponding edges are called hyperedges).
An edge is said to be incident to the vertices it contains. Two vertices are adjacent if there is an edge joining them. The number of edges incident to a given vertex is its degree (in a directed graph, we distinguish between the indegree and outdegree).
A loop is an edge which starts and ends at the same vertex. Two edges that cover the same vertices are called parallel. Note that neither of these can occur in a simple graph, but are natural in some settings.
A walk is a sequence of vertices where each pair of subsequent vertices are adjacent. A path is a walk that does not repeat any vertices. A cycle is a walk where the only repeated vertex occurs at the first and last position.
In a simple graph, “there is a path from vertex u to vertex v” is an equivalence relation. Such an equivalence class is called a component of a graph.
A graph is called connected if it has exactly one component, or equivalently, if there is a path between every pair of vertices.
A graph is called acyclic if it does not contain any cycles. A simple acyclic graph is called a forest. A connected forest is called a tree. If you take a tree and pick your favourite vertex and call it the root, the result is a rooted tree.
A conjunctive query is a particular kind of database query, in practical terms, it’s a join of a set of relations combined with a set of equality predicates. The predicates can either equate two columns or equate a column and a constant.
We have to tie this back to databases somehow, so let's keep going.
The query graph of a conjunctive query is constructed by setting the vertex set of the graph to be the set of relations being joined and the edge set to be all pairs of relations which have a predicate that relates them. In general this requires the use of hyperedges to fully represent, but you can get a usable definition with just a simple graph.
Since joins are commutative and associative, we can compute the query graph piece-by-piece by picking our two favourite relations and joining them, and replacing them with a new relation. This transforms the query graph by way of identifying vertices (note that the verb “to identify” here means “to consider identical” rather than “to recognize” or something). We assert that two vertices are now the same vertex, and the new combined vertex is a part of all the edges both of the previous ones were. This is like a quotient, if you are familiar with that from abstract algebra.
The shape of the query graph says a lot about how easy a given join is to compute. If the query graph is a tree, it's much simpler to compute efficiently. If it contains cycles, we need much heavier machinery to do so.
A join order is a sequence of vertices to be identified. A join order without cross products is one in which we only ever identify adjacent vertices.
I will update this issue if necessary. I wanted to make sure we are on the same page because we will talk about graphs in the future and I wanted to make sure we are all on the same page here and I felt like telling that equivalence relation story about Kevin. Have a good one.