Why is English like that?
I’m still working on the format of this newsletter, and I’m going to try starting with the news part of the letter this month.
News
The thing I alluded to last month is official. I’ve signed the contract, and my translation of Aiki Mira’s “Schritt ins Leere” will be published in Amplitudes: Stories of Queer and Trans Futurity (ed. Lee Mandelo) by Erewhon Press as “A Step into Emptiness” in Summer 2025. (Publishing is slowwwww.)
Why is English like that?
As part of my preparation for Filling Your Worlds with Words, I’ve been reading (or re-reading) a bunch of linguistics research, and one of the books I read is Millennia of Language Change: Sociolinguistic Studies in Deep Historical Linguistics by Peter Trudgill (2020). Trudgill is a renowned, indeed venerable, dialectologist, and in recent years, he’s turned his focus to a sociolinguistic account of historical linguistics. First let’s define a couple things.
Sociolinguistics is an umbrella term for a hard-to-define field that studies the intersection of language and society: why this person chose this way to say this thing and what that means socially. The way people talk can say a lot about their identity, and identity creation is only one facet of the field. Sociolinguists also study variation and how that can affect the way language changes over time.
Historical linguistics is focused on identifying patterns in language change over time. Some historical linguists work on reconstructing previous forms of language based on existing written records; others compare previous forms of the same language with each other.
Since both of these fields study language change over time, it’s not a wild leap to come up with sociolinguistic analysis of historical language change. (My MA thesis was a historical sociolinguistic analysis of 13 German verbs.)
So, Trudgill 2020 is a collection of eight of his articles in this research area that have been revised and made more book-like. For lay people, they may get deeper into the weeds than you can understand (chapter 5, for example, is about an argument I didn’t even know existed), but if you’re already an armchair linguist, or if you’re fine with stopping to google things or check Wikipedia a lot, I think it’s a pretty good read. (I enjoyed the heck out of it, even the chapter that put me to sleep.)
I want to tell you about everything in the book, because it’s so cool, but I promised to talk about why English is like that, so I’ll focus on two of the chapters about English. One of the things Trudgill repeatedly emphasizes is that English as we speak it today has been shaped over millennia by contact with other languages, both within and without the Indo-European family. The earliest instances of language contact were during the prehistoric migrations, which we don’t have any written records of, but we can see their influence in the written records we have and in the language we speak today.
Three verbs in a trenchcoat
The English verb to be is weird. The infinitive form doesn’t even appear in any of the conjugated forms, and the past tense (was/were) doesn’t even look like the present tense (am/is/are). That doesn’t make a lick of sense.
Even the other irregular verbs, like to have, have forms that are at least similar enough to each other that you can go, “Oh, sure, I can see that.” The he/she/it form in Old English (OE) was hæfþ (hafth), which had simplified to hath by Early Modern English (the Shakespearean era). You can see how that would work, right? The /fth/ cluster is really awkward to pronounce, and the /f/ went away over time, leaving only the /th/, like in fifth. Similarly, the past tense in Old English was hæfde- (havde- plus the inflectional ending, e.g. thou hæfdest), and you can see how “he havde” could simplify over time to “he had.” These make sense, or can at least be logicked through if you think about it. But to be? There’s no logic in any of that. Let’s take a look at a chart.
to be | to have | ||
present | past | present | past |
I am | I was | I have | I had |
you are | you were | you have | you had |
he/she/it is | he/she/it was | he/she/it has | he/she/it had |
we/they are | we/they were | we/they have | we/they had |
(Side note: one really fun way that linguists and cognitive scientists understand the process of language acquisition (pre-puberty) and learning (post-puberty) is by observing babies as they grow up and start using language. A lot of the mistakes toddlers make come from overgeneralization of a rule (“I maked you a cookie but I eated it”) that then get corrected by their caregivers. Adult language learners make the same mistakes! It’s really cool and it gives researchers a lot of hypotheses to work with.)
The reason for this weirdness is that “to be” has three different Proto-Indo-European (PIE) roots, which I am copy-pasting from Wiktionary because I don’t feel like dealing with formatting: bʰúHt (to appear, to come into being; gives us the forms that start with b), h₂wes- (to reside; gives us was/were), and *h₁es- (to be; gives us am/is/are). The H and h(n) are different kinds of laryngeal Hs that influenced what character the neighboring vowels took on as the language changed; they are not important for this discussion.
In Old English (ca. 800-1100 CE), there were two different copulas (linking verbs; a fancy way of saying ‘to be’) in active use at the time. They were both still in use in Middle English as well. Then, at some point, all of the be-forms went away, leaving only am/is/are in the present tense. No more “he be-eth my brother,” only “he is my brother.” (Except in some dialects, where the b-forms were kept instead of the a-forms, apparently; I’m nowhere near expert enough on this, so I’m going to take renowned dialectologist Peter Trudgill’s word for it.)
Why were there two different forms, and why did one of them go away? The theory Trudgill puts forth, based on a theory Tolkien proposed in 1963 (I always get a little nerdy thrill when I see Tolkien cited in philology) that has been tested by researchers since, is contact with the Celtic languages, specifically Welsh – which itself got its parallel copulas from contact with Latin.
The two forms had different usages: the be-forms were used for habitual meanings, and the am/is/are forms for non-habitual meanings. Over time, however, the distinction blurred, and in Middle English, the two verbs mixed together, and in standard modern English, the be-forms are lost.
The linguistics term for this type of lexical grafting is suppletion. The only other suppletive verb in English is “to go/went,” where went is an entirely different verb (“to wend”) grafted on (which resulted in the past tense of “to wend” becoming “wended”). Historically, the past tense of “to go” was also suppletive, but with a different root verb; OE had gān/ēode.
Language change is cool and etymology is fun! But for me, it’s even more fun to try and figure out the why.
Mystery vocabulary
When etymologists and historical linguists try to figure out what root a word came from, they look both at previous forms in the same language and at older forms in different languages. Let’s make a chart to make it easier. (Wiktionary)
West Germanic (living) | North Germanic (living) | ||||
English | German | Dutch | Norwegian (bokmål), Danish | Swedish | Icelandic, Norwegian (nynorsk) |
I | ich | ik | jeg | jag | eg |
Y’all can see how these are similar, right? Let’s go back a thousand years and then into prehistory. Notice the similarities to each other and to today's forms.
Old English | Old High German | Old Norse | Gothic (4th C) | Proto-Germanic (reconstructed) | PIE (reconstructed) |
ih, ic, iċ | ih | ek | ik | *ek, ik | *éǵh₂ |
Historical linguists also look at other Indo-European language families, like Latin, Celtic, Slavic, and Indic, for evidence for their reconstructions, but this essay is already too long, so I will simply commend to you the Wiktionary article linked on the PIE reconstruction. Words that can be traced to the same root are called cognates, and most of the basic vocabulary (things like mother, father, eat, drink) of the I-E languages are cognate with each other.
But there are some words in the Germanic languages that don’t have any known cognates in the other Indo-European languages -- words that reflect some basic vocabulary that would have been in regular use in the Bronze Age, like sword, ship, king, calf, boat, “and many others” (Trudgill p. 53). I’m not going to do etymology on all those words, just a couple. King is of uncertain origin and appears in a variety of other languages probably through borrowing from German.
One of my favorite fun facts is that ship and skiff have the same uncertain root, and that skiff entered English via Old French esquif, which probably entered French via contact with a Germanic language. Maybe the Danes (who settled in Normandy, after all), maybe the Franks; we can't say for sure. This phenomenon, of two different words having the same root but entering the language from different origins, is called a doublet. (As you may be guessing, English has a ton.)
Why would words for perfectly normal, everyday things that other Indo-European languages have Indo-European words for not exist in the Germanic languages? That’s a really good question that we don’t have a definitive answer to, but prehistoric contact with non-Indo-European tribes is a pretty plausible one. Archaeological evidence can provide an additional window into contact between peoples, and some historical linguists use findings like Kurgan mounds and corded ware to place speakers of various (proto-)languages at particular times, as well as if they encountered any people who already lived there.
The languages that they spoke influenced the languages of the migrants.
It’s language contact all the way down
In addition to the unknown languages that the Germanic tribes encountered, they also re-encountered speakers of I-E languages, namely Celtic. Once the Angles, Saxons, and Jutes landed on Britain, that was inevitable, but there was an earlier period of contact while the tribes were still on the continent. This is when the word iron came into the Germanic languages, because the Celts had the technology, and the Germanic tribes did not.
The next major language contact occurred when the Danes settled in northeastern Britain (including a lot of modern-day Scotland) in what became, after a lot of fighting and a treaty, the Danelaw. Old English and Old Norse were fairly similar, much more similar than modern English and Norwegian or Danish are, and speakers of one language could kinda sorta understand the other. Jackson Crawford and Simon Roper have a great video about this, if you want more details (and to see how much I simplified it). But they weren’t the same, and there were some things that were just different enough that it ended up being easier to do away entirely with grammatical gender.
There are a handful of features of English that do not occur in the other West Germanic languages (German, Dutch) but do occur in the North Germanic (Scandinavian) ones, like preposition stranding (“that’s something he didn’t think of”) and splitting the infinitive (“to boldly go”). These are very likely to be borrowed from Old Norse, and when one language borrows grammar from another, it’s a sign of very close contact over a long period of time – which was, in fact, the case in the Danelaw! But people move around, and Britain is a very small island, so these innovations spread into the southeastern dialects that would become standard modern English.
Then the Normans invaded England in 1066, and the nobles started speaking French, while the peasants continued to speak English. Trudgill dismisses the influence of French on English grammar outright. While it’s true that some 40% of English vocabulary is borrowed or otherwise sourced from French, other than some sounds and some prefixes and suffixes, “any role for French in the shaping of English grammar would seem, on the face of it, to be unlikely” (p. 64). Despite 17th- and 18th-century grammarians’ attempts to impose Latin grammar on English, English remains a Germanic language. In fact, a lot of the grammar “rules” you hear, like “never split the infinitive,” were imposed on English during this time, because Latin didn’t do that; therefore, English shouldn’t either.
In summary
The history of English is full of contact with speakers of other languages, and that is why it’s the beautiful mess it is today.
Additional resources
Peter Trudgill, Millennia of Language Change
Elly van Gelderen, A History of the English Language
Did you enjoy this essay? You can put a tip in the jar here.
If you would like to support this newsletter on an ongoing basis, you can subscribe for 1€, 3€, or 5€ a month.