Honing In

very

                September 19, 2021

            Honing In

            It's another cloudy Sunday (classic Vancouver), and I'm back at the usual cafe 
I go to. There's a staff member that must have overheard me speak Korean one 
time, because they'll always speak to me in Korean regardless of who they were 
helping before. I guess it gives me a chance to practice my very bad Korean 
(I usually only speak it when I'm calling my mom), although I'm not getting 
much help since "coffee" sounds like "coffee" in Korean.
I had a meeting with Gail last week to discuss the papers I've read, and I 
think we've found something that I can investigate for the rest of my time 
here. I've mentioned this before in a previous newsletter, but I'm still 
surprised at how code search hasn't evolved much in the last twenty or so years.
I'm a pretty big skeptic when it comes to new ideas or concepts; no matter how
excited I was about rethinking code search, it wasn't long before I asked:

"Maybe it hasn't changed because it just works."

The State of Code Search
Modern IDEs ¹ like IntelliJ, Visual Studio/Visual Studio Code ² usually
enable developers to search in a variety of ways. The most popular ways to
search are usually:

Structural
A syntax-aware search idiom. E.g., find all usages of a function or
  method, find all usages of a loop or another control-flow construct, etc...

Symbolic
A search idiom that's largely orthogonal to the source code. E.g., find
  all places where this string/symbol appears.

A huge proportion of developers engage in at least one type of these searches,
often multiple times a day. However, IDEs have not changed much in the past 20
years in how they present results to a developer. Something Gail and I
discussed was how good IDEs were in getting results; some IDEs and code
search services have DSLs ³ that enable developers to specify exactly what
they're searching for in great detail. However, when it comes to actually
showing results, they largely just list them in some sort of dropdown or table.
I'm gonna call this the "vomit view" from now on, since the IDE just appears to
barf the results up and call it a day.
One of the papers I've read this week showed that most developers didn't really
click on any of the results from their very first query during a search session.
Instead, they began a new search, supposedly using the results from the first try
to inform their next query. The same paper also showed that many developers wanted
code search to become more interactive and dialogue-based. If images of the
infamous Clippy from 1997 pop up, you're not alone. I very much doubt that 
developers would enjoy having an anthropomorphic paperclip give them suggestions
on what to search for.
So what's the idea?
If I'm being honest, I'm not sure if this idea is fully-formed yet, but I might
as well write it down so I can read this in a few months and laugh at how wrong
I was. 
Code search is a very iterative process for developers. They will often run a
search, read the results, and then modify their query or filter their results by
some condition. Some of these filter conditions may be:

"Find only usages in directory foo"
"Find only usages in source, not comments"
"Find only usages in *.scala, not *.java"

What if instead of having these options be deeply hidden away in menus that
developers have to manually find and trigger (or, more realistically, not find
at all), a search tool surfaced these as basically the first options that
developers must choose? A developer could open a search dialogue, put in a query
string, and the IDE would surface a number of options that would group search
results and categorize them at a higher level.
Of course, the filter conditions I've suggested above are very rudimentary. 
We could also explore displaying results based on how "close" they are to the
search origin, or by their fan-in value in code (basically how many other modules
call into it).
It's also surprising to me that code search results don't take the context
of what a developer might be working on. For example, let's say a developer is
working on a task related to refactoring a parsing module in a system. In the 
middle of their refactoring task, they start a search session. The search session
ends in the IDE returning results from across the entire codebase, including code
that isn't related to their parsing task, i.e., modules that aren't even reachable
from their parsing module. This introduces another subset of irrelevant results
that developers must sift through, and ultimately spend time ignoring.
Obviously this idea is in its very early stages so far, but it's good to get it
down on paper.
Papers
Here are some of the papers I've read last week:

"Using Task Context to Improve Programmer Productivity"
"How Developers Search for Code: A Case Study"
This is the paper I mentioned where I got some actual evidence behind my 
  hunch that developers use code search more than any other tool.

"Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behaviour"
This is a paper that Gail recommended I read. They take a similar approach
  to "inverting" debugging by making the hypotheses that most developes make
  about defective programs a first-class citizen.

Integrated Development Environment ↩

One can argue that VS Code isn't a full-fledged IDE, but a widely extensible
  text editor – a discussion for another time. ↩

Sourcegraph Query Syntax ↩

Don't miss what's next. Subscribe to Echoes from 308: