Vim is Turing-Complete
No, not VIMSCRIPT. Vim KEYSTROKES are Turing-complete.
No, not vimscript. Vim keystrokes are Turing-complete. Here's how:
lhxxjp*wywgg$pjxgg
I'll explain how this works, but first, a very brief overview of what I mean by "Turing-complete".
A Brief Overview of Turing-Completeness
Very roughly: A decision problem is a standard math problem. We have some predicate P(x) ("x is divisible by 2", "x contains an A", etc) and we want to come up with an algorithm that correctly determine P for all possible finite inputs.1 If we can come up with an algorithm, we can then "implement" it in some mathematical system. One class of "implementation systems" are automata, and a Turing machine is one kind of automata. The Church-Turing Thesis hypothesizes that Turing machines are the upper bound of power: if you cannot implement the algorithm in a Turing machine, you cannot implement the algorithm in any system. This isn't formally proven, but it's empirically held up for a long-ass time. Nothing can be more powerful than a Turing machine.
Plenty of things are as powerful.2 We say a model of computation S is Turing-complete if it can solve any problem a Turing machine can, aka S is maximally powerful. The most direct way to prove S is Turing complete by showing that any Turing machine can be converted into an S-construct. Unfortunately Turing machines are a huge pain to work with, so we try to avoid that.3 Instead, the most common way to prove Turing-completeness is to write an interpreter for a system that we already know is Turing-complete, preferably a system that's more agreeable. If we could write a C interpreter entirely in Vim keystrokes, then we've proven that Vim is Turing complete.
We're not going to do that. C is even more of a pain than Turing machines are. It's got too much stuff in it designed for "actually writing real programs" that makes it hard to encode. One of the more popular targets is Brainfuck, an esolang (esoteric language) from the 90s. BF is TC with only one datatype and eight commands. Esolangs commonly prove their own TCness by implementing a Brainfuck interpreter.
I've already written a Brainfuck interpreter in Vim keystrokes, so I could stop right there and call victory. But I don't think it's a good demonstration! I used way too many Vim Tricks (tm), like macro/register equivalence, that make understanding how it works really annoying.4 I wouldn't be able to explain it well and this wouldn't be a very informative newsletter.
So let's talk tag systems.
Tag Systems
Tag systems are one of my favorite automata, which is easily the third or fourth nerdiest thing I've said all month. We have a string of characters and a dictionary that maps single characters to strings:
abc
a cbac
b acc
c
We cut the first two characters in the string,5 then look up the mapping for the second character we removed and append it to the end of our string. In the above example, we'd remove ab
, then look up b
to get acc
, then append it to the end to get cacc
. The next five steps are:
cccbac
cbac
acacc
acc
c
We halt if we have fewer than two characters remaining. We can encode the truth-value of the decision problem in any number of ways: whether we end with zero or one characters, what the last character is, whether we halt at all. Hao Wang proved it's Turing-complete.6
This is a good candidate for Vim because there's no "bookkeeping" internal state. We never change how we interpret the production rules. The input string is also our data tape, and we don't need to emulate explicit conditionals. It's all just text manipulation.
The Script
Here's the Vim keystrokes again:
lhxxjp*wywgg$pjxgg
We can assume it's loaded into a register, say @q
, for use as a macro. The "source file" should be in the same format as the example: input string on the first line, then a couple newlines, then the mappings. Every invocation of the macro corresponds to one step of the computation. Let's break it down:
lh
Move right, and then move left. We'll come back to this later.
lh xx
Delete the first two characters of the string. Vim automatically loads deleted characters into the default register. Since we do this twice, the first deleted character is clobbered, and the default register only has the second deleted character.
lhxx jp*
Paste the lasted deleted character on the line below, and then search for the next instance of that "word". *
only looks for whole words, which in this case is a whitespace-separated single character. We now have our corresponding production rule.
lhxxjp* wyw
Move from the key to the value and copy it into the default register.
lhxxjp*wyw gg$p
Go to the first line of the file, jump to the end, and paste. The production is now appended to the end of the data string.
lhxxjp*wywgg$p jxgg
Cleanup. Delete the scratch character on the line below and go back to the beginning of the file. We have now completed one step of evaluation.
If this is in the q
register, we can append @q
to the macro to make it recursive. Then it will run until the macro errors out, which we want to associate with the halting state. For our purposes, defining the halting state as "less than two characters in the string" is fine. That's what these two characters are for:
lh xxjp*wywgg$pjxgg
Vim macros define errors as "an error message or beep".7 Trying to move right at the end of the line causes a beep, which cancels the macro. Since we start at the beginning of the line, the way for lh
to error is if we have one or zero characters, giving us the halting condition. Otherwise, lh
is a noop.
With this we've fully implemented a 2-tag system in Vim keystrokes, proving that Vim are Turing-complete.
Warning: If you use a recursive macro and your cyclic tag input doesn't halt, the editor will freeze. Use at your own risk.
Why do this?
Why not?
Update 2021-03-09
To my shock and horror, someone found an error in the proof:
On the usage of * to find production rule. What if you’ve got rules like this
— Ruslan Prakapchuk (@northern_witch) March 9, 2021
a c
c zz
and the current symbol to search is c. Then * will find the first (production) c rather than the second (prefix). Which makes your macro not obviously equivalent to the described tag system.
Ruslan suggested fixing it with /^<C-r>
. That would work at the cost of introducing two new concepts (generalized searching, regular expressions). A simpler patch would be to use a sentinel value, like p
:
ba
p
pa b
pb a
We add the rule that p
cannot be part of the alphabet, so that pa
cannot appear in a production rule. Then instead of running *
on a
, we run it on pa
, guaranteeing we find the prefix instead. The new vim string should be
lhxxjp*3lyv$gg$pj$xgg
I also patched a small bug when you have multiple empty production rules. Prefixes for empty production rules need exactly one space after them.
-
This branch of math is called formal languages. Conventionally inputs are restricted to binary strings. ↩
-
There's also plenty of systems that are less powerful, aka there are computable decision problems they cannot solve. Lots of people are interested in these systems because they're more tractable: there are more mathematical properties we can assert of the class as a whole. For example, we can look at a deterministic finite automata and immediately know if it can be simplified further- if there exists a smaller DFA that solves the same decision problem. We can't do that with pushdown automata, let alone Turing machines. ↩
-
Turing machines are really clunky for a Turing-complete system, and there are lots of smaller, more elegant systems with the same power. The main benefit of Turing machines is that it's relatively easy to construct a Universal Turing Machine, a TM that takes encodings of TMs as inputs and simulates them. ...I think. Please don't quote me on that.
There's also the whole "Turing invented them" thing ↩
-
Also I didn't "really" implement brainfuck, since I didn't bother with
,
or.
. ↩ -
This makes it a 2-tag system. A 3-tag system would be the same, except we'd delete three characters and look at the last one. An m-tag system is Turing complete for all m ≥ 2. ↩
-
Update: on more careful rereading, the timeline is a bit confusing. If I understand this correctly, first Marvin Minsky proved 6-tag systems were TC, then Minsky wrote a memo that 2-tag systems were TC but didn't publish it, then Hao Wang published an improvement on it (and cited it), and a year later Minsky published a paper based off the memo. So technically Minsky was the first to prove that 2-tags were TC, while Wang was the first to publish a proof. ↩
-
:help map-error
↩
If you're reading this on the web, you can subscribe here. Updates are once a week. My main website is here.
My new book, Logic for Programmers, is now in early access! Get it here.