Syntax highlighting is a waste of an information channel

in general

                July 20, 2020

            Syntax highlighting is a waste of an information channel

            No newsletter next week
Running the TLA+ workshop. No way I'm gonna have any brainpower after that.
Syntax highlighting is a waste of an information channel
No, not a waste in general. Syntax highlighting is quite useful. I'm saying it's a waste of an information channel. Here's a quick demonstration of what I mean. Here's 399 squares and one circle. Where's the circle?

Round two. Where's the circle?

Color carries a huge amount of information. Color draws our attention. Color distinguishes things. And we just use it to distinguish syntax.
Nothing wrong with distinguishing syntax. It's the "just" that bothers me. Highlighting syntax is not always the most important thing to us. The information we want from code depends on what we're trying to do. I'm interesting in different things if I'm writing greenfield code vs optimizing code vs debugging code vs doing a code review. I should be able to swap different highlighting rules in and out depending on what I need. I should be able to combine different rules into task-level overlays that I can toggle on and off.
I've listed some examples of what we could do with this. If this is something that already exists I included a link. Otherwise I included a mockup. Some of the examples have implementation issues beyond what I discussed; they're just demonstrations of what highlighting could be. All examples are Pythonish unless otherwise noted.

Some Use Cases
Rainbow parenthesis
This is a pretty common one. We can use different colors to mark how nested a set of parenthesis are. From here.

Context Highlighting
Highlight different levels of nesting. From here.

Import highlighting
Highlight identifiers imported from a different file.

Variations:

Highlight imported functions and classes differently
Highlight qualified imports
Highlight imports from particular trees

Argument Highlighting
Arguments passed into the function are highlighted differently from local variables or global identifiers.

Variations:

Carry it through to aliases (if we assign the argument to another value, highlight that too)
Highlight local variables only
Highlight values that will be assigned to something
Highlight variables used in loops

Type Highlighting
Highlight all list variables and integer variables with different colors.

Variations:

Highlight all iterables
Highlight all functions returning option types
Highlight all variables that could be one of two types
Highlight all polymorphic types parameterized to integers

Exception Highlighting
Highlight functions that raise errors not caught in their body. 

Variations:

Highlight all functions with try blocks
Highlight functions that raise user-defined exceptions
Highlight functions that raise a specific exception
Highlight functions that catch a specific exception

Metadata Highlighting
Highlight functions that were directly called in the bodies of tests that failed in the last test run.

Highlight functions without precondition decorators
Highlight functions that are part of a certain stacktrace
Highlight functions which are defined in our branch but not the master branch

Random other ideas I didn't mock up

All functions that transitively call functions that make an http call
All variable identifiers we assign to twice
All classes with more than 10 user-defined methods
All functions more than 100 lines long
All functions without docstrings
All lines last edited by a particular member of the team
All identifiers marked "deprecated" in a certain design document
All functions with a # TODO comment inside them

Issues
Why aren't things this way? There's both essential and coincidental challenges that make fully leveraging color a lot harder than just having syntax highlighting.
First is actually implementing rules. Some of these require access to the code's AST, some require broader knowledge of the project, some require runtime information. Some of the ideas are even infeasible; accurately tracking aliasing is an open problem for most languages. Syntax highlighting, by contrast, is usually a matter of regexes and hierarchical state machines. That's how pygments does it. Semantic highlighting would have to be made from scratch for each language.
Second is highlighting conflicts. What if something needs to be colored two things for two different reasons? In syntax highlighting this is less of a problem because you have an ordered list of matchers. But with semantic highlighting we might have dynamic priorities, where rule A is more important to us now while rule B is more important to us later. Things get even more complicated if we have multiple distinct overlays, which themselves can have priority conflicts. Semantic highlighting would need a much more complex design and implementation than simple syntax highlighting does, and adding overlays makes it even more complicated.
Finally, existing editors just aren't well set up to handle this. Vim's syntax highlighting is a mess of regular expressions and special cases. VSCode and (I believe) Atom use TextMate grammars, which assume a single canonical tokenization per file. VSCode recently added semantic highlighting but it seems more oriented to augment the existing syntax highlighting, not radically rethink it. I have no idea what Emacs does.
So I think this is something we'll eventually have, because the potential advantages are too great to ignore forever. But it will take us a long time to get there. Maybe we'll see it first with toy languages where the AST is simple enough and the expressiveness is low enough to make semantic highlighting easy.
Update for the influx of new readers
This was a newsletter post, you can subscribe here

            If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.

Don't miss what's next. Subscribe to Computer Things:

Start the conversation: