Trusting Software and Developer Workflows

poorly

                July 25, 2021

            Trusting Software and Developer Workflows

            This past week, I estimate that I spent about 3-4 hours trying to make sense of some strange behaviour that I've been seeing in some code.
I'm currently developing a generic logging framework. As part of that work, I'm using a pre-made logging pipeline library. This library uses the pub/sub pattern, so clients can publish events, buffer them, and close a connection as needed. Here's where things get a little interesting. I'm using this library in a short-lived JVM process, so I'm expecting it to run in a couple of minutes, max (usually a minute and a half).
As soon as I began to use the library, the JVM process would hang until I introduced an explicit sys.exit(0) call (which isn't a great practice). It turns out that I was forgetting to call the close(...) method on the publisher after I was done.
I added a call to close(...) to execute after the Future[...] from publishing finished. But the same issue persisted. The JVM process was still hanging, even after I made sure that close(...) was being called. This made me call into question my entire understanding of Scala's multithreading/Future model, as well as my understanding of the JVM's threading model.
After spending a couple hours debugging my own code, I decided to drop into the library to see what's up. This meant debugging lines of re-compiled Java bytecode, with variables like var1, var2, ..., varN. It turns out the bug wasn't in my own code, but it was an error lower down.
Trusting Trust
In 1984, Ken Thompson gave his Turing lecture "Reflections on Trusting Trust." In it, he spoke about "to what extent should one trust a statement that a program is free of Trojan horses?" Obviously, comparing the issue I found in the underlying logging library would be a bit of a stretch, but I still think the general sentiment is the same. This may have not caused a serious security vulnerability, but I can't have been the only person to be affected by this bug. Debugging code is expensive, and going through the workflow of reporting a bug, investigating the bug, patching a bug, and deploying the fix to production is even more expensive. 
Developer Workflows
Something else I've been thinking about is the software development process, and how poorly it's documented. There are an incredible number of artifacts that are affected or produced by developing a feature or investigating defects. At a very granular level, a developer can expect to review and create artifacts like:

source code files in the IDE
conversations in Slack or other IM services (Skype, MS Teams, or, god forbid, Altlassian HipChat)
Jira tickets
README files

Distributed across some, or all of these artifacts is a fragmented history of how a design decision, or a software feature came to be. The fact that a developer often has to manually traverse these artifacts is annoying at best. I wonder if there's a way to automate artifact collection to piece together a general workflow of how a development task was completed.

Don't miss what's next. Subscribe to Echoes from 308: