What's not to like about LaTeX?

errors

                January 21, 2021

            What's not to like about LaTeX?

            Apologies for the delays in this newsletter.
With the baby and the coup, the newsletter took a momentary backseat. I also wrote a few drafts (not about LaTeX) that I was unhappy with and scrapped completely.
What’s not to like about LaTeX?
TeX/LaTeX (and its cousins XeTeX, LuaTeX, etc.)
is the de facto typesetting system of choice
for all mathematicians.
Basically all papers and math books
are written in it,
and its “math mode” syntax
has been widely adopted
by the most widely-used web-based math display systems
such as MathJAX.
TeX was written by Donald Knuth in 1978,
and LaTeX (a system of macros written on top of TeX)
was written in the early 1980’s.
Development has continued on LaTeX,
and the underlying TeX system
has been effectively been marked as “complete.”
Tex/LaTeX extends far beyond
math equation typesetting.
It is a complete typesetting engine
and can be used to typeset entire books, 
ready to print.
I’ve personally done this twice.
And I should stress this:
LaTeX was designed for print.
It shines for print-specific problems 
like page layout, and automatically tracking
complex cross-section or cross-chapter references
to theorems, diagrams, etc.
So what’s not to like?
TeX/LaTeX is a software masterpiece.
It’s a 40-year old piece of software
that still has millions of active users.
That said,
even masterpieces deserve criticism.
Receiving thoughtful,
respectful criticism from invested stakeholders is an honor.
It means they care
in a world that makes it too easy
to burn villages down.
I endeavor to provide honorable criticism
about things that matter to me.
A state-heavy state machine
As mentioned, TeX is the core typesetting system,
and LaTeX is a system of preprocessor macros
built on top of TeX.
LaTeX is a state-heavy state machine.
In particular, on each run,
various parts of LaTeX
generate auxiliary files
that are used by later runs of LaTeX
to compile the final document.
This is why when compiling a document,
most people need to run something like
pdflatex main.tex
bibtex main
pdflatex main.tex
pdflatex main.tex

If you use one of the monolithic LaTeX IDEs,
or a tool like latexmk,
this repetition step is handled for you.
Still, it can cause problems.
Some steps of the typesetting chain are skipped
when old auxiliary files are present on the file system.
When something goes wrong—i.e.,
when my changes aren’t showing up in the compiled document—and
I can’t figure out what happened,
deleting random auxiliary files and recompiling often fixes it.
This is the
“turn it off and turn it back on again”
method of fixing problems,
and leads people with principles like mine
angry at the tool.
Based on Lamport’s writing
and the time it was designed,
this decision is almost certainly due
to the computational constraints of the time.
Those constraints don’t exist today.
The tradeoff for this choice is instability and confusion.
The environment in which LaTeX runs is not hermetically sealed.
As the modern software discipline emphasizes,
there are more problems that can occur due to this
than are dreamt of in anyone’s philosophy.
Unnecessary Turing-completeness
At its core,
TeX is a macro preprocessor.
This macro language is Turing complete,
which means you can use it to write arbitrary programs.
People have taken this too far.
LaTeX has packages implementing
a presentation language,
a drawing language,
a magnetic field plotter,
a genealogy tree plotter,
and much much more.
In some ways this is great,
and part of TeX’s killer feature of extensibility.
In some ways,
this is a disaster,
and none of these things should be done in TeX.
It makes LaTeX monolithic.
Tools should do one thing and do them well,
and be easy to compose with other tools.
Plotting magnetic fields, while cool,
is not within scope.
It also limits the usefulness of the magnetic field plotter,
since it can only be used in the context of document typesetting.
More to the point,
the decision to make TeX’s macro language Turing-complete
could have been made differently.
Someone with more knowledge of LaTeX than me
might have a broader picture of how reasonable
it could be to write the most widely-used parts of LaTeX
without Turing-completeness of the macro system.
From experience, I suspect that it is not essential.
The consequence of this design choice
is that it’s harder to write extensions for TeX,
and it’s harder to analyze LaTeX source statically.
By that I mean, any tool that hopes to
transform a TeX file into another document format,
or automatically detect problems in TeX code,
cannot be correct without actually compiling the document
to see how all the macros expand.
This adds resistance to the wider ecosystem,
and seems like a big part of why
we don’t have good tools
to turn a beautiful TeX document
into a beautiful webpage.
To the extent that we do,
those tools actually ignore the macro system
and cross-compile the high-level,
most-used macros directly.
This can be considered a criticism of TeX’s complexity,
because if we could just convert from the underlying TeX format,
we could write a converter once
and all LaTeX packages would be supported automatically.
These kinds of problems
were probably inconceivable to Knuth
at the time he wrote TeX.
“Other document formats” like Word,
HTML, and pdf did not exist at the time.
Nor did the concept of massively distributed collaboration,
or even “lazy developers”—at the time
using TeX or LaTeX
required buying and reading a book.
Today most LaTeX users have to be cajoled
into read anything beyond the top answer
on TeX StackExchange.
Since TeX was also written for print books,
most of the focus of TeX—and much of what 
is now less important than, say, 
the math mode sub-language—is 
on the constraints of the physical page.
A culture of avoidance
The complexities of the LaTeX ecosystem
promote a culture of avoiding problems
in weird ways.
For instance, Tim Gowers,
world renowned mathematician and
Fields medalist,
admits in this thread
that, to avoid a typesetting problem
with the end-of-proof tombstone,
he would inject meaningless phrases like,
“with proves the theorem.”
The suggested solutions to his problem
also showcases how stupid LaTeX solutions can be.
It involves violating the encapsulation of the amsthm package
(which defines the theorem/proof macros)
to manipulate a stack
defined for the purpose of nesting proofs within proofs.
The general culture of LaTeX includes
(many of which I do or used to do often).

Copying StackExchange snippets
Ignoring compiler warnings
Ignoring compiler errors (see next paragraph)
Using the wrong language construct for the job
  because learning the right one is too daunting.
  E.g., repeated use of \\ instead of \vspace,
  and \ \ \ \ instead of \hspace
  or any one of the dozens of spacing commands.¹

These days most new LaTeX users use Overleaf, 
a web IDE for TeX that doubles as cloud document storage.
One of the “helpful” features Overleaf uses
is to compile a document 
even if there are compilation errors,
if possible.
This was likely done to remove roadblocks for new users,
who would stop using the product if it’s too cumbersome.
This is a classic kind of decision made all over modern software
that speaks to the difference between Knuth’s time and ours.
User growth often outweighs principle,
because what use is strong principle 
when there are no users for it to benefit?
Or when the competition moves faster 
and snatches up all the users 
before they can experience your genius?
As a consequence, various researchers
I follow on Twitter have complained about “kids these days.”
Right before a conference deadline, 
their students send send them TeX
which can’t be submitted because it doesn’t compile,
and this was detected so late
because it was all done in Overleaf.
As someone who thinks people should invest heavily
in the proper use of their tools,
I agree with the researchers.
Kids these days.
At the same time,
a good system allows users to grow in complexity
with the system being used.
LaTeX has too many inscrutable barriers
to start doing simple things,
and the benefit of learning more about LaTeX is not clear,
except to fix bugs in LaTeX packages 
or find slightly better workarounds.
Either way, it’s more about annoyances in the tool
than learning new abilities.
TeX’s core is fossilized
The core TeX engine itself
is a curious software artifact worth peeking at.
Here is its source.
As you can see, it’s written in a language called WEB.
WEB, which compiles to Pascal, 
was designed by Knuth
to promote his vision of literate programming.
In brief, literate programming is the idea
that programs can have such nicely written comments
that the program and its comments are the documentation.
To that end, WEB programs can be compiled
into webpages and books.
To the best of my knowledge,
TeX is the only serious program written in WEB.
Anyone who actually uses TeX’s source first cross-compiles it to C.
Meanwhile, the book that TeX is compiled to
is 500 pages long,
and goes unread by all TeX users.
I could expand on why I think literate programming failed,
perhaps another time.
In short, it’s not practical.
Better design, encapsulation, and stronger safety measures 
are more effective in all cases,
and, “think deeper and write better,”
is foolish to expect of everyone.
Back to the point,
the TeX source, partly by it being written in WEB,
and partly due to the orientation around Pascal 
and the limitations of computers in the age it was written,
is inscrutably tangled.
There is no easy way to understand 
the overarching architecture of the program,
nor does its documentation 
allow you to compartmentalize details.
See this blog post
for another perspective on that.
This is a huge barrier to understanding, 
and hence to improving it.
Finally, Knuth forbids anyone to modify TeX,
and considers TeX finished 
(as seen by version numbers converging to e):
% This program is copyright (C) 1982 by D. E. Knuth; all rights are reserved.
% Copying of this file is authorized only if (1) you are D. E. Knuth, or if
% (2) you make absolutely no changes to your copy.

...

% Version 3.141592 fixed \xleaders, glueset, weird alignments (December 2002).
% Version 3.1415926 was a general cleanup with minor fixes (February 2008).
% Version 3.14159265 was similar (January 2014).

In interviews, Knuth has stated that
any remaining bugs should be considered features.
In one sense this makes sense,
Hyrum’s Law says that any observable property 
eventually becomes depended on,
and 40-year-old old programs are very sensitive to this.
That said, the fastest way to turn someone off 
from reading your program
is to threaten legal punishment for modification,
and to say, “I accept no changes in the future.”
Any other benefits of reading the TeX source
(such as learning its efficiency tricks)
are likely obsolete due to modern computational power
and abstractions, 
though I have not read the entire book myself to confirm.
All together,
being written in WEB
and surrounded in legal threats
probably caused some of the complexity
of the ecosystem.
If it were written in C,
or rewritten in a higher level language,
it could at the very least be refactored
so as to be organized for readability.
Better, the core engine could admit extensions.
I believe LuaTeX does something along these lines.
Seriously, who thought it was a good idea to have so many spacing
  commands? I suppose it’s like how some human languages have more words for
  culturally important concepts, though I recently heard the “Eskimo words for
  snow” version of that idea is a
  myth.

Don't miss what's next. Subscribe to Halfspace: