Keeping Things Simple is not Always Simple, If Ever
Looking at our AWS bills, there was one particular line that stood out like a sore thumb. Data transfer. It seemed way out of proportion.
“Doh!” I hear you bemoan, “everyone knows that the AWS data transfer bill is always larger than you expect.” However, this wasn’t our production account which actually does have significant usage and the cost is all in line with what we expected. This was on QA and DEV environments.
So we have this setup in which there’s these two services that communicate a lot between them, but they happen to be in different execution units (pods in Kubernetes parlance). This means they might get spun up in different Availability Zones (which are different data centers in an AWS region), which in turn means they are subject to the data transfer charging for cross-AZ traffic.
The reasons as to why they are communicating a lot and why they are separate execution units is another story and not relevant to this one. There are good and bad reasons for all of this, but nothing that justifies a complete rewrite currently.
But anyhow
The data transfer between AZs costs real money. Not awfully lot, but enough that if you do silly things you easily get an opportunity to optimize the costs. In this case, all I needed is one pretty straight-forward algebraic calculation to justify spending couple of hours of cutting the excessive data transfer. Add to that the fact that this particular data transfer is basically unnecessary.
A quick back-of-the-envelope calculation of the data transfer rate between these two components gave the hunch that the excessive babbling between these two deployments could be the cause of the data transfer cost.
Yes, I know, that was the first mistake.
Of course, you need to make sure that this is actually the case. Luckily, AWS has a nice UI to explore the costs. Yes, it is called the AWS Cost Explorer. And yes, it is easy to plot the costs of data transfer as the function of time.
In this case I didn’t need to be a regular Sherlock to figure out that those spikes do in fact coincide with the days that there were the comprehensive load testing going on. It was actually the load tests that generated the data transfer bill. (That’s yet another story.)
Unfortunately I found this out only after I had a small a sidestep. Fortunately the sidestep did have an extra lesson in it anyhow.
The Attempted, but Wrong Solution
Now, you can make a Kubernetes Deployment that has multiple pods, which would mean that these components could communicate inside a pod and therefore would not generate traffic between AZs, but in this case this type of rewrite wasn’t a viable option. (Curiously enough I don’t remember all the details why this was the case, but let’s just say that it wouldn’t have been a most natural coupling anyhow. That’s not important now. We can pretend that this was the case if it really wasn’t – if you know what I mean.)
But yeah, there’s also the concept of an affinity in the Kubernetes. It means that you can define two different Deployments to have an affinity or an anti-affinity, former meaning that the scheduler tries to, as an example, put these two pods in the same Availability Zone or vice versa for the anti-affinity.
I’m sure there are situations where using affinities makes sense, but let’s just say that this wasn’t one. If you look at the documentation, it is pretty complicated conceptually. I mean, it doesnt require a lot of configuration, but I didn’t get the feeling that I really knew what was actually going to happen if a Deployment’s affinity rule was preferredDuringSchedulingIgnoredDuringExecution. I can make guesses, but do I really know what happens in each possible scenario? I doubt.
But since it was a quick job to try this out, I nevertheless decided to make these rules and see if it worked out. It took some effort to make them work in the first place, but eventually there they were: always in the same AZ. Nice.
But! There’s an additional complication.
The system also has a an autoscaling system (provided by the platform) which spawns new instances of the components that are the bottlenecks of the performance and new nodes for the cluster if need be. It works just fine.
It also has a feature of scaling down when there’s no need for extra capacity. So it moves the execution units to as few machines as possible. Therefore the situation usually is such that the machines (or nodes) are pretty much almost fully utilized. It’s like a game of Tetris that the autoscaler is playing there in the background.
So when a Deployment is installed, sometimes the autoscaler needs to spin up a new machine where the deployment is scheduled. And this seems to work pretty much flawlessly.
Except with the case of my Deployments with these affinities. I managed to create a situation where the Deployment A would fit the current capacity, but when Deployment B had an affinity to A, neither of those were scheduled! The autoscaler/scheduler just insisted that these two deployments didn’t fit, but didn’t think that a new machine would need to be provisioned. This is unacceptable, because this situation might occur when there’s no one fixing it manually.
Now, it might have been a misconfiguration from my part, but the reason I backed out of this solution was the uneasy feeling I got from introducing this type of complexity for a thing that usually is so simple and seems to work quite well. It didn’t feel right. Fortunately I did take a better look at the bill, which suggested that this type of fix wasn’t actually needed.
I don’t know the real moral of the story, but has something to do with teamwork and not sticking with the first solution that comes to mind. So there.
It would be nice to be able to say that I’m such a strict adherent to simplicity that I did this out of my principles, but the fact is that I got kind of bailed out by the revelation that the between-AZs traffic wasn’t the cause of the excess cost. The load tests were.
But I do think that when you get that feeling of introducing too much complexity and especially if you are not quite sure how the system as a whole will behave, you should take extra steps to think through the situation. Visualize it, discuss it with your colleagues. Try to find simpler solutions. Keep on googling. Don’t go for the first solution that comes to mind.
I don’t think you can always find a solution that you can honestly call simple. But I think you should feel at least a little guilty for introducing complexity that is not absolutely necessary. I know I do feel. Every single day.