Bill Kentstravaganza
Bill Kent is one of the unsung greats of software engineering. I get why Data and Reality is obscure: the 2nd edition is out of circulation and the 3rd edition (that he didn't write) is a pale imitation of the classic. It's only because of a random github repo that we even have access to the 2nd edition. Read it. It's good. Very good.
So that's why D&R is obscure. But what about the rest of his stuff? It's all online on his website. Anyone can read it! I think people are scared off because it's all on legacy tech from ~1990. Not exactly stuff useful to us today!
Reminder: Data and Reality is from 1978. His writing is timeless. That said, it's also in much rougher shape than D&R. Here's a quick rundown of the highlights. Most of it focuses on Kent's specialties, where are 1) data in information systems, and 2) object databases. In no particular order:
-
What is an Object?: "This is a 'naive' exploration of object orientation. On the face of it, it could simply be a tutorial for the uninitiated. But it is also a sanity check for practitioners." If I have to describe Kent's style, it's exactly that: 'naive' explorations that question our first principles. In this, Kent constructs the motivation behind objects- encapsulation and polymorphism, and then all of the consequences that come from that. The distinction between subjects and representation, construct and 'the world'. Important here is the 'inversion of concepts': how much of our conception of an 'object' is metaphoric sugar versus necessity? My favorite quote from this:
We could turn our mental model of state inside out. Instead of state being in an object, we can say that the information system as a whole has a state in which objects are embedded. The state of any object is some subset of the whole state, but the states of individual objects often overlap. The image starts to look more like overlapping envelopes in a graph.
See also Fundamental Concepts. I haven't thoroughly read it, but it seems a deeper treatment of the same ideas.
-
The Semantics of Object Identity: Kent starts from "what does it mean for an object to have an identity" and derives the consequences of equality and a whole bunch of other stuff. My favorite thing here is how he relates consistent-identity to subtyping relations. If two jobs held by the same person are represented by two
Employee
objects, thenEmployee
cannot subtype-inherit fromPerson
. Otherwise you have inconsistent equality. -
The Breakdown of the Information Model in Multi-Database Systems: How almost all of our assumptions about how data works fall apart in the face of multiple database systems. No more identity! No more database constraints! No more clear delineation between subject and representation! See also Spheres of Knowledge, one of his attempts at dealing with this.
-
A Simple Guide to Five Normal Forms in Relational Database Theory: Exactly what it says on the tin. Kent explains it better than any of the tutorials out there. And he doesn't treat normalization like it's always the right choice- he talks about the tradeoffs, too. Point of note: Edgar Codd (as in "the inventor of the relational model") edited this essay.
-
A Taxonomy for Entity-Relationship Models: There is no one "ER model", but hundreds of equally-valid "variants", based on how you treat implicit assumptions. Kent lists 45 such choices. The goal isn't to figure out which choices are "right" but to get people to be explicit in which ones they make. While specifically about the ER model, the core thesis universally applies to all diagrams and notations.
-
Profile Functions And Bag Theory: An attempt to reconstruct relational databases on a multiset basis instead of a set basis, so as to logically handle duplicate data. Unsuccessful, but still an interesting mathematical experiment. See also The Hyper-Join.
-
The Type and Class Definition Game: I'm just gonna let Kent summarize this one.
Here's a parlor game you can play at your next object-oriented party. The goal is to intimidate as many people as you can into accepting your definitions.
The Playing Pieces are a set of statements (shown later on) containing the words "type", "class", and "group". "Group" is a wild-card word which can be replaced by either "type" or "class" to generate one or more new statements. Your first goal is to generate all the statements which are true about types and classes. That part is easy, of course, since you know all the right answers. Your second goal is to achieve consensus with the other players, getting all to agree on the same set of true statements. This part will also be easy, assuming that everyone else knows the same set of right answers (hah!).
-
About Time: It's like one of those "Falsehoods Programmers Believe About Time" listicles, except it actually covers the fundamentals of time and why it's so hard to model. Favorite quote:
There are two sequences of states: the real world and the database. A single state of the database describes many states of the real world. That's precisely the significance of memory: information about many past states of the real world can be retrieved from a single present state of the data base.
See also his shorter summary, The Essence of Time.
-
The Many Forms of a Single Fact: 36 different ways to represent "A salesman serves a territory" in a database. Some are better ideas than others, but you're still likely to encounter to "worse" ideas in the real world. Some discussion of the consequences of having many possible representations, even if only one is used in a system.
-
The Null Wars: Much Ado About Something: An arugment for why "null" is distinct from "void" and the challenges of making that model coherent. Sure it sucks, but the alternatives are even worse (to him).
-
Measurement Data (Archive Report): One of the best finds in the archive. Discussion on the challenges of representing measurements, both fundamental (different-but-compatible units, dimensional analysis pitfalls) and specific (angle addition is sometimes cyclic). Abandoned as a rough draft, still excellent. See also My Height: A Model For Numeric Information, which is complete but much smaller in scope.
That's just a few of the most obviously-interesting things he wrote. You can read the rest of the archives here. Be aware that some of his earlier stuff is later aggregated or elaborated on in his later essays. Happy reading!
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.