Know (of) the right tool for the job
Plus: Using LLMs for data queries
Last week's Picat essay did pretty well online! One of the common responses was "This looks really interesting but I don't know where I'd use it."
For Picat there was a gap of five years between me hearing about it and actually learning it. The same is true for most of the weird stuff I encounter: I keep it in the back of my head for years until I finally run into the problem which justifies using it. I think this is an important programming skill: being able to recognize when a tool you know of but don't know solves a problem you have.
It's the opposite of Paul Graham's Blub paradox:
Blub falls right in the middle of the abstractness continuum. … when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn't realize he's looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.
When we switch to the point of view of a programmer using any of the languages higher up the power continuum, however, we find that he in turn looks down upon Blub. How can you get anything done in Blub? It doesn't even have y.
I always thought it was funny that Paul ends this with "and so Lisp is the most powerful language", as if he's not himself a Blub programmer. Regardless, this is definitely a real paradox I've experienced. A programmer once told me that Chapel was just a Julia knockoff, when he had never seen Chapel before and only studied Julia for a week. It's just really hard to think of how things you don't already know might help you. But if you can do this, you're more likely to learn the right tool for the job instead of sticking with the wrong one.1
Some Examples
I categorized Picat as "logic + imperative + constraint solving". I figured it'd be useful if I needed to do one of those things with a sprinkle of another. In the end I was doing a constraint problem and needed "a more expressive constraint solver", which means constraints + a sprinkling of LP. So I learned Picat and it solved the problem. As part of exploring the language I discovered planning, which is now another tool on the backburner for when I hit the right problem.
Sometimes it doesn't work out. Either the unknown tool doesn't actually solve my specific problem or it works in theory but has a problem in practice. At one point I wanted to analyze a state space graph and remembered that Gephi was a graph analysis software suite. But when I actually learned it, I found it was too limited in capabilities to be useful.
That's one thing that worked and one thing that didn't. Something that I haven't tried yet is lisps, in particular macros. I regularly run into problems which seem solvable with macros:
- A case statement that automatically raises an error if nothing matches
- A special
if
statement that treats domain-specific values as true or false, which my language doesn't support - Defining multiple functions that all share a specific set of keyword parameters
Do any of these justify learning a lisp? Nah. They might even be terrible ideas in practice! But at least it keeps me thinking about the uses of tools I don't know.
How to develop the skill
Well the first step is to read about lots of different software topics. That's not enough, because you can go on a grand sneer, so you have to be open to thinking about how those topics could augment what you already know.
Past that I have no idea. I think my experience comes from being a teacher. Whenever I learn of a new topic I think about what motivating examples I'd use to teach it, and whenever I come across something "interesting"2 in real life I pattern-match against a bunch of topics to see if it'd make a good example. I just find explaining things fun.
Bonus: LLMs for querying data
As part of the graph project I wanted to see if Neo4j was getting more or less popular. I figured one good proxy would be the number of StackOverflow questions asked per month. It's imperfect (maybe fewer questions = lowest fruit already picked) but good enough for my purposes. StackOverflow's data is public. My blocker is the same thing with all sources of public data: I don't know the schema! But maybe GPT4 does:
Give me a query that returns the number of stack overflow questions tagged "neo4j", by month. Use the schema in the "stackexchange data explorer"
SELECT
FORMAT(CreationDate, 'yyyy-MM') AS [Month],
COUNT(*) AS [Number of Neo4j Questions]
FROM
Posts
WHERE
Tags LIKE '%<neo4j>%'
AND PostTypeId = 1 -- 1 indicates a question
GROUP BY
FORMAT(CreationDate, 'yyyy-MM')
ORDER BY
FORMAT(CreationDate, 'yyyy-MM') ASC;
The result worked! This could be a great use of LLMs! Drunk with power, I tried it the final boss of public datasets, wikidata:
Give me a wikidata query for "number of dogs for each dog breed". No explanation, just the query.
SELECT ?breed ?breedLabel (COUNT(?dog) AS ?numberOfDogs) WHERE {
?dog wdt:P31 wd:Q144;
wdt:P31/wdt:P279* ?breed.
?breed wdt:P279* wd:Q144.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?breed ?breedLabel
This worked... less well. It doesn't deduplicate dogs, and it gets all dogs of all types, not breeds. We actually want the line ?breed wdt:P31 wd:Q39367.
3 Other experiments show that GPT often gets the wikidata tags wrong, sometimes giving stuff that doesn't correspond to anything.
That said, I was still able to kludge something mostly workable in ten minutes by modifying the LLM output, which is still a lot faster than I could have done on my own. This follows my general heuristic for applying AIs: use it on things that are hard to do, easy to check, and easy to fix.
-
Though using the wrong tool can work just fine. And sometimes it's not worth the time to learn something you don't know if it'll only be slightly better than what you have now. ↩
-
You know, flight layovers, self-checkout machines, rat-shaped potholes, bank teller queues... ↩
-
This doesn't get all of the data either, but this is a problem with Wikidata and not the LLM. Many specific breeds are listed as instances of dog type (
Q38672
), which also includes things like "hunting dog" and "sighthound". ↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.