GitHub Search for research and learning
Also, new blog post!
Hi everyone!
I have a new blog post out: An RNG that runs in your brain. It's a mix of cool tricks and math analysis done with an exotic gremlin language. Patreon is here. Also TLA+ workshop on Feb 12 etc etc use the code NEWSLETTERDISCOUNT
for $100 off etc
Anyway I've been all over the place this week wrt contracts, writing projects, and errands where no matter what I'm working on it feels like there's something more important that I should be doing instead. One of the projects I'm jumping between is the "Why don't we have graph types" writeup. I'm trying (trying) to get the first draft done by the end of the month, and as part of that I'm researching how people use graph libraries in production codebases. And that means a lot of time with GitHub code search.
Code search is amazing. First of all, it covers 200 million public repositories. Second, it can filter on language and filepath. Third, it supports regular expressions. To find files that use Python's graphlib
package, I just need to search
/(from|import) graphlib/ language:python
After that I can sift through the results to separate real-life use cases from false positives and toy problems.
I also used search for another project I'm jumping between, the logic book. One programming trick you get from logic is that you can rewrite not (all x: P) or (some x: Q)
as (some x: Q or not P)
, which is clearer and faster to evaluate.1 I wanted a real-life example of code that can be rewritten this way. So I searched
/all\((.*)\) or any\(.*\):$/ language:python
And got like 300 code samples that could be simplified. I used a similar search to get examples of nested quantifiers for a different topic in the book. Now if only I could stop researching and start writing...
Code search is potentially even more useful as an education tool. You can learn a lot about how something works by reading code that uses it. For example, let's say you're using hypothesis for property-based testing, and you want to get better at composites. You can find examples by searching
/(from|import) hypothesis/ composite language:python path:test_*.py
Oh yeah you can include multiple regexes. Want to find files that have two composites? Replace composite
with /composite(.|\s)*composite/
. Just about the only things that don't seem work are group matching and lookarounds like (?!)
.
So far I've just done searches from the website, but there's also a code search API. You could feasibly run a query, dump a hundred matches into files, and then do more complicated filtering and munging before spitting out a courseload's worth of case studies.
You can read more about the search syntax here. Try it out!
-
It's even better when you can rewrite
not P
in a clear way! Like replacingnot (x < 0)
withx >= 0
. ↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.