Honing In
It's another cloudy Sunday (classic Vancouver), and I'm back at the usual cafe I go to. There's a staff member that must have overheard me speak Korean one time, because they'll always speak to me in Korean regardless of who they were helping before. I guess it gives me a chance to practice my very bad Korean (I usually only speak it when I'm calling my mom), although I'm not getting much help since "coffee" sounds like "coffee" in Korean.
I had a meeting with Gail last week to discuss the papers I've read, and I think we've found something that I can investigate for the rest of my time here. I've mentioned this before in a previous newsletter, but I'm still surprised at how code search hasn't evolved much in the last twenty or so years. I'm a pretty big skeptic when it comes to new ideas or concepts; no matter how excited I was about rethinking code search, it wasn't long before I asked:
"Maybe it hasn't changed because it just works."
The State of Code Search
Modern IDEs 1 like IntelliJ, Visual Studio/Visual Studio Code 2 usually enable developers to search in a variety of ways. The most popular ways to search are usually:
- Structural
- A syntax-aware search idiom. E.g., find all usages of a function or method, find all usages of a loop or another control-flow construct, etc...
- Symbolic
- A search idiom that's largely orthogonal to the source code. E.g., find all places where this string/symbol appears.
A huge proportion of developers engage in at least one type of these searches, often multiple times a day. However, IDEs have not changed much in the past 20 years in how they present results to a developer. Something Gail and I discussed was how good IDEs were in getting results; some IDEs and code search services have DSLs 3 that enable developers to specify exactly what they're searching for in great detail. However, when it comes to actually showing results, they largely just list them in some sort of dropdown or table. I'm gonna call this the "vomit view" from now on, since the IDE just appears to barf the results up and call it a day.
One of the papers I've read this week showed that most developers didn't really click on any of the results from their very first query during a search session. Instead, they began a new search, supposedly using the results from the first try to inform their next query. The same paper also showed that many developers wanted code search to become more interactive and dialogue-based. If images of the infamous Clippy from 1997 pop up, you're not alone. I very much doubt that developers would enjoy having an anthropomorphic paperclip give them suggestions on what to search for.
So what's the idea?
If I'm being honest, I'm not sure if this idea is fully-formed yet, but I might as well write it down so I can read this in a few months and laugh at how wrong I was.
Code search is a very iterative process for developers. They will often run a search, read the results, and then modify their query or filter their results by some condition. Some of these filter conditions may be:
- "Find only usages in directory
foo
" - "Find only usages in source, not comments"
- "Find only usages in
*.scala
, not*.java
"
What if instead of having these options be deeply hidden away in menus that developers have to manually find and trigger (or, more realistically, not find at all), a search tool surfaced these as basically the first options that developers must choose? A developer could open a search dialogue, put in a query string, and the IDE would surface a number of options that would group search results and categorize them at a higher level.
Of course, the filter conditions I've suggested above are very rudimentary. We could also explore displaying results based on how "close" they are to the search origin, or by their fan-in value in code (basically how many other modules call into it).
It's also surprising to me that code search results don't take the context of what a developer might be working on. For example, let's say a developer is working on a task related to refactoring a parsing module in a system. In the middle of their refactoring task, they start a search session. The search session ends in the IDE returning results from across the entire codebase, including code that isn't related to their parsing task, i.e., modules that aren't even reachable from their parsing module. This introduces another subset of irrelevant results that developers must sift through, and ultimately spend time ignoring.
Obviously this idea is in its very early stages so far, but it's good to get it down on paper.
Papers
Here are some of the papers I've read last week:
- "Using Task Context to Improve Programmer Productivity"
- "How Developers Search for Code: A Case Study"
- This is the paper I mentioned where I got some actual evidence behind my hunch that developers use code search more than any other tool.
- "Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behaviour"
- This is a paper that Gail recommended I read. They take a similar approach to "inverting" debugging by making the hypotheses that most developes make about defective programs a first-class citizen.