German verbs & AI thoughts
I managed to write quite a lot for my standards during this xmas break! Here’s a new data story, plus some more links at the bottom.
Initials of German verbs
You can read this one in full on my blog here. But here’s the main parts.
I am learning German and amongst the things I do is keeping track of the verbs I master. I do it the old school way: on paper, alphabetically, one sheet per initial letter. It sounds crazy but it is really satisfying to see my verbs pile up and count them.
German builds some of its verbs via prefixes, which either add nuance to the original meaning or alter it completely. Those in the latter class are the hard ones to absorb, at least for me. Out of struggling to distinguish my aufmachen from my ausmachen, I started this exercise: at the very start, I made a list of all verbs I could think of whose meaning and usage I was sure of; now, I just add one every time I feel like I've learned it, as in, I've stored it in my (hopefully) long-term memory and I can comfortably use it.
The whole of German is wonderfully Lego-brickable, but I only do this for verbs because with no other part of speech have I had such struggles, and also because without a rock-solid foundation in verbs it's hard you'll be able to build coherent discourse.
I am between B1-B2 levels now and the below is what I've got. It doesn't look like many, but there will be several (many?) verbs that I somewhat know but I'm not really sure about. Also, there may will be verbs I forgot to add. And, I am not tracking all possible easy combinations from a root (the first class from above): I haven't added wegrennen (run away) for instance, despite understanding it very well and intuitively.
These days the adding-to-list frequency is quite low, maybe I'm approaching the stage where I need to level up and/or diversify sources to have a new burst of novelty like in the early glorious days of this.

S really is queen! We have all the sein, singen, stattfinden, schenken, schicken, ...
There's 0, nichts, for C - If I really think hard now, nothing more than an English loan like chatten, should it exists at all (not sure), comes to mind. X and Y have the same fate but that's not surprising, most words there will be actual loans.
Only 1 entry for J, that's joggen, also a loan.
Likely, and this is important, my verbs are the most common ones (because I'm a learner and don't yet consistently consume texts aimed at natives) so this isn't necessarily representative of the whole vocabulary of German verbs.
Can I pull this data from actual German?
This is a more techy part with some little code - see it here if you want the details, but basically I’ve pulled a bunch of text from Wikipedia (German edition) and counted the number of times initials occur in verbs after passing the text via POS (part of speech) tagging to isolate verbs. This is the result:

S is not dominating anymore although it's still quite high, and in fact (this was unexpected for me), A is the top one! There's a few prefixes starting with A so maybe that's why, and maybe I haven't encountered - or learned - those verbs yet. My K seems overinflated, maybe it's the common verbs that are particularly concentrated there. For the rest, it's arguably not that different.
Comments? Thoughts? Tell me on Bluesky - some people have even offered to count the verbs they use in a day for me :).
Other stuff I wrote and thought about
As I said I’ve been quite prolific this period. Here’s a few things that I did:
A (techy, but not too much) exploration of how to build myself a movie recommender based on similarity, so that I can get a suggestion based on a movie I like when I want something of similar vibe
Reflections on how LLMs are flipping the paradigm of experimentation and how we need to apply (humanly) critical sense
My personal list of best movies, shows, books and tools in 2024
Hope you’ve enjoyed this, let me know.