Why Feedback Control Interests Me+Discussion Group
I got enough positive interest in some variety of book club that I would like to move forward with that. I want to discuss Feedback Control for Computer Systems: Introducing Control Theory to Enterprise Programmers. A couple people expressed interest in discussing the query optimization book but also rightly pointed out that the ongoing CMU course somewhat obviates having a parallel discussion track. Maybe it's something we can do in the future.
I have put logistical information and a link to Discord at the end of this newsletter but I first want to explain what appeals to me about this topic and this book.
I first learned about the existence of Control from a friend who was using it to regulate resources in a distributed system. He described another coworker (whom we both respected quite a lot) as describing PID controllers as "magical." I didn't know what any of that meant but I filed it away as something to look into someday.
In the time following I encountered problems that had that shape that seemed like they could be solved with some kind of soft "let me push you back into the right zone" tool. So I cracked open a control theory book for engineers. But it was not the kind of engineer I am because it expected I knew all kinds of physics that I don't about things (like heat transfer). So I gave up. Then I read a bunch of this book:
Which I enjoyed, but I couldn't really tell how to turn any of it into anything that actually does anything. So I became dejected and decided the only way to know how to actually do this in a practical was to work with someone who had already done it or to take a class on it and I gave up for a bit.
Through talking to people, I slowly learned that what most people mean by "control theory" is "vaguely understanding the framework of control, and then learning how to tune a PID controller." This thought really crystalized for me reading this excellent review of a queuing theory book, when I decided that if the formal theory of queuing is decidedly not useful for this group of people, it throws the whole premise out the window. Perhaps the mythical practitioners for whom this kind of theory is actually valuable don't exist.
However! That doesn't mean the field or even study of the field is not valuable: perhaps we need the formalisms and the ability to write concrete mathematical problems about a field in order to understand what it is; to train our brains what shape of problems we're solving, and then, once we've acquired the intuition, we can go off and do informal reasoning, simulation, and never think about another actual continuous-time Markov chain.
I don't know if this view is correct but it's the hypothesis I'm working off of for the time being.
A few months ago I came across this book:
You know who is an enterprise programmer? Me! So I read this book and I liked it! It was exactly what I was looking for. But I think it merits a more involved read-through and discussion and maybe a revisiting of the other, harder theory. So I would like to read through it with some NULL BITMAP readers.
To be clear: I am not any kind of expert on this topic but I would like to get closer to that. But here is my sales pitch on why this topic is cool.
Why Feedback is Good
The thing that this book finally made clear to me is that control isn't some weird framework built on generating functions and strange abstractions (those might be useful tools within it, though). If you start from certain premises, all of the vocabulary and terminology and setting of control are very natural. It's a cliche example but the one that motivated everything for me in the cleanest way was "how does an air conditioner keep a room at a particular temperature."
The way an air conditioner cools things down is vaguely: you push some amount of power into the cooling mechanism, and that spits out some vaguely proportional amount of coldness. So, given the target temperature for an air conditioner, how much power should you send?
If asked to solve this problem, my mindset might have been to start from a position of "let me do a bunch of experiments and control all the variables and that will let me understand the relationship between the amount of power that I push into the system and the intensity of cold air that's pushed out. Basically: you build a model.
This approach is sort of doomed to fail, though: the cooling units produced for air conditioners aren't uniform enough that the experiments you perform on one of them will carry over exactly to another one. Even the same unit will degrade over time so that it requires more power to achieve the same amount of cold output. What if someone doing repairs replaces the cooling unit with a different make that has slightly different behaviour?
What's worse, the amount of power required to cool a larger room will be more than the amount of power required to cool a smaller room. Or a room with a window open. Or a room with a second air conditioner in it. Or a room in the summer versus a room in the winter:
There's simply too many variables involved to build a model that can correctly work in all the situations that someone might want to use an air conditioner in. The approach is flawed. We need a tool that is robust against that kind of variance. So how can you approach this problem in a way that incorporates all of the possible variables that could influence the output? You step out in front of them by giving the air conditioner the ability to monitor the actual temperature of the room:
This is, in some sense, incorporating the only real truth that exists: what the actual state of the outside world is. Get used to this shape of diagram because it shows up a lot.
Dverything else about control (z-transforms, PID, whatever) is a consequence of this chain of reasoning, not the starting point.
I think this solves two general shapes of problems at once, and these are the kinds of problems that pushed me into being interested in this stuff:
- we don't really know what the relationship is between the amount of power pushed into the system and the amount of coldness is. We know there's a positive relationship, in that more power equates to more coldness, but beyond that, it's a little unclear and sort of fundamentally unknowable due to things like manufacturing variation.
- We'd like the system to be able to adjust to a changing environment. If it gets colder outside, the AC needs to work less hard, and vice versa.
It's easy and fun—really fun! If you don't know how to get started you can even ask LLMs to write you these simulations and then screw with the numbers and have a grand old time—to simulate this kind of thing. Here is some code that simulates an air conditioner along with a PI controller feedback loop to try to adjust its power level. Here's a plot of the simulation that moves the target temperature partway through and how the system responds.
Lots of computer and specifically database-related things have these characteristics:
- There's some kind of vaguely positive relationship between the number of workers doing some IO-bound task and the throughput.
- There's some kind of vaguely positive relationship between the number of servers available to serve a request and how quickly a given request is served.
- There's some kind of vaguely positive relationship between the amount of memory allocated to a buffer pool and the hit rate for queries.
- There's some kind of vaguely positive relationship between the maximum level of queuing in a system and the average latency a successful query experiences.
These problems all resemble each other, but there isn't a one-size fits all approach! Once you identify that using feedback is the way to solve a problem, there are many knobs and tricks and tools to get to a working solution. And I don't know them! I don't really know what I'm doing here. I would classify it broadly as "screwing around," and I would like to crystallize my understanding a little better, maybe build some more real-looking programs and more sophisticated simulations that incorporate feedback in this way. And I think discussing this stuff with other interested (and maybe more informed) people is the best way to do it.
Logistics
My ideal structure for this is the following: I will read one or two chapters of the book each week starting in March (they are fairly tight, so I'm leaning towards two), or each other week, and I will share some thoughts and some code. Code, because I think this is a topic that lends itself very well to getting your hands dirty, writing simulations, and tweaking numbers to see how things behave. I would love it if any readers would also share their thoughts and their code and I have two ways I'd like you to do this.
One: I would love some medium-to-long form writing of your thoughts on any particular chapter. These could be in whatever form you find convenient—email them to me and I can post them here, or, even better, blog about them yourself so I can link to them (even BETTER if you start a blog in order to write them!).
Two: I would like some kind of more real-time discussion. I thought a bit about the best way to do this, and while the officially endorsed NULL BITMAP place-online-to-go-to-discuss-databases is Phil Eaton's software internals, in the interest of having a place to discuss more specifically topics related to this book club and maybe topics of interest specifically to newsletter readers I have made a NULL BITMAP Discord.