Hard is Concurrent Programming (fuck, it's hot)
12:22 PM.
I'm sitting outside on a cafe patio. Depending on what app you ask, the temperature ranges from a breezy 28 °c to 34 °c. Vancouver is in the midst of a historic heat wave that's bringing temperatures that we've never seen before. I'm lucky enough to live in a basement, so I haven't felt too much of the heat. Whenever I go outside, however, I definitely feel it.
Apparently, the temperature is supposed to crawl up to around 40 °c on Monday, with a "feels like" temperature of around the high 40s. This is pretty scary, I don't think I've felt temperatures as high as this since my last trip to Korea in 2010. What I worry about now is how often this type of "historic heatwave" will become the norm. It certainly looks like it'll be going that way, given how extreme temperatures are becoming a fact of life.
I'll be in the lab tomorrow. Twitter has an org-wide "day of rest," and I feel like spending some time in the air-conditioned paradise of the lab would be a good use of my time.
Last week, I spent some time writing up some very simple experimental code that I'm going to use for my project. The general idea is to make a logging/analytics framework for our team's services. This involves me sending messages to Twitter's instance of HDFS (Hadoop Distributed File System) across their data centers. Of course, this involves a fun bit of concurrent programming. By "fun bit," I really mean me pulling my hair out only to realize that I was being a complete idiot.
The issue
Here's some dummy code to describe the general problem I ran into:
trait Scribe {
def publish(msg: String): Future[Unit]
}
class DataCenterScribe(logCategory: String) extends Scribe
What I've done here is described a publisher to HDFS. I don't want to couple any code to a specific implementation of a publisher, so I've defined a top-level Scribe
trait that I'll use for all touch-points. Below is basically the executable that I've written to test out this publisher:
object Main {
def main(args: Array[String]): Unit = {
val testCategory = "test_publishing"
val dataCenterScribe = DataCenterScribe(testCategory)
val message = "hello"
// Publish messages
dataCenterScribe.publish(message)
}
}
See any problems? Maybe you do, because you're smart. I'm dumb, so I didn't see any issues with this. Imagine my frustration when I saw some messages show up, while others get completely lost.
The issue, really
After a day or two of debugging on HDFS, including steps like:
- Making sure my VPN was fast enough.
- Making sure my
ssh
tunnnel was stable enough. - Generating a binary,
scp
'ing it to a HDFS machine in the data center and executing it on the metal.
I soon realized my issue. The type of Scribe#publishMessage
is Future[Unit]
. This means it's an asynchronous process. This means that whenever I ran Main
, there was absolutely no guarantee that the call to Scribe#publishMessage
would be finished before the main thread was terminated. In fact, it was basically a miracle that I saw some messages even show up on HDFS, since the main thread is such a short-lived process.
Years of TA'ing a course that's heavy on async programming with JavaScript, and I got absolutely roasted by the JVM. Thank you, JVM, very cool.
The solution*
For longer-lived JVM processes, this isn't an issue. The message would easily finish publishing to HDFS and things would be great. Unfortunately, since this is some experimental code that just verifies whether we can write stuff to HDFS, I'd have to use some unorthodox methods.
Using blocking-IO in the form of Await
was the solution here. I'm not concerned at all about blocking/performance issues here, since I'm just running the equivalent of a command-line app. Here's the code that made sure to block until I was done writing to HDFS:
object Main {
def main(args: Array[String]): Unit = {
val testCategory = "test_publishing"
val dataCenterScribe = DataCenterScribe(testCategory)
val message = "hello"
// Publish messages and BLOCK
val futureWrite = dataCenterScribe.publish(message)
Await.result(futureWrite)
}
}
With this code, all messages show up on HDFS and now I'm cooking with gas. The next few weeks will be a bit more involved in code, so I'm excited to see how that'll work out.
Stay hydrated!
*not really a solution, it's more of a workaround.