What AI and Bible Translators have in common - Corpus Linguistics
What AI and Bible Translators have in common - Corpus Linguistics
ChatGPT and other "Large Language Models" (LLMs) use collections of texts, called corpora (singular, corpus), to create complex algorithms that model how the language in the texts works. They can then produce highly natural prose in the language(s) in their corpora. While AI brought corpora into the public spotlight this year, they are nothing new. Linguists and translators have used corpora to do their work for centuries.
Computers, however, have significantly altered how we do our work. The advent of modern computers ushered in a new era of corpus linguistics and corpus translation that can ensure higher quality translations of all types, including Bible translations.
This newsletter gives a window into how corpus linguistics can increase the quality of Bible translations and how the Spoken English Bible translation team harnesses the power of corpus linguistics.
Corpus Linguistics and Translation
Corpus linguistics is a branch of linguistics that relies upon corpora, large collections of written or transcribed texts. These corpora are searchable on multiple levels. For example, the Corpus of Contemporary American English (COCA) has over one billion words from seven different types of sources (blogs, TV, spoken transcripts, etc.), covering a time span from the 1980s to 2019. Linguists can search the corpus for examples of a word ('right'), a phrase, ('do what is right'), and even a phrase within a specific type of source ('do what is right' in blogs). Such corpora give linguists access to the real contexts in which a word or phrase is used.
Why does this matter for Bible translation? The four pillars of Bible translation are that the text is clear, accurate, natural, and acceptable (Barnwell, 2020, chapter 5). Translators can ensure that their translation has all of these qualities using corpus linguistic approaches.
For example, naturalness is foundational for translation quality because people always fill in context for language they read or hear, automatically constructing a “natural” setting for the statements they take in.
As an example, think of the different scenes that pop into your head with these sentences:
- "Ma’am, speak clearly into the microphone and tell us exactly what happened."
- "Lady, use the mic to tell us exactly what happened, and speak clearly."
- "Woman, use the mic, speak clearly, and tell us just what happened."
These sentences, while similar, cause very different images to appear for an American English audience. In the same way, the slight differences between words that a translation uses can result in large differences in the meaning of the text. This is where corpus linguistics comes in handy. When translators have access to large corpora of natural texts, they can see the precise circumstances in which words and phrases are generally used. These contexts give the translators the opportunity to determine the connotations of their word and phrase choices.
How the Spoken English Bible translation team harnesses corpus linguistics
The Spoken English Bible translation team is using corpus-linguistic approaches to ensure that we translate the most important concepts of the Bible naturally for modern American audiences. Recently we have been wrestling with the concept of "righteous." This concept is common in our American culture, but the word "righteous" appears almost exclusively in religious discourse and thus does not resonate with many secular/unchurched audiences. Recognizing this challenge, our team brainstormed alternatives and came up with "good person/woman/man," "do what is right," and "moral."
Once we came up with these options, we needed to test whether they communicated the correct meaning to native speakers of American English. Teams often spend weeks, or even months, on this community testing process, consulting with dozens of people in their community. Corpus-linguistic approaches can significantly speed up this process because the corpora give translators direct access to how people already use the terms they want to test!
We used the Corpus of Contemporary American English to determine which of these phrases most closely aligned with the biblical concept "righteous." We found that American English speakers use "do what is right" to denote when someone does what is morally/socially expected despite its potential cost. This is exactly what the translation team wants to get across.
The corpus we are using also allows us to see what year the phrase was used and the contexts in which it was used. "Do what is right" has been steadily used from the 1990s through the 2010s and appears predominantly in colloquial contexts, such as blogs, websites, and transcripts of conversations:
The contraction form "do what's right" appears even more frequently in spoken contexts:
The consistent use of this phrase in colloquial contexts makes it an ideal default translation for the concept of "righteous" in the Bible.
Think about how it sounds in this well known Bible verse about Zechariah and Elizabeth, the parents of John the Baptist:
Luke 1:6 | and both of them did everything that God considered to be right. And they followed all of God's commands exactly. |
Beyond the Spoken English Bible Translation
Corpus-linguistic approaches to translating the Bible can directly benefit most of the world's major languages. For languages that already have large corpora, like English, Chinese, Russian, Spanish, and Swahili, using a few minutes of corpora searches can save hours of revisions on the back end.
We in the Spoken English Bible translation team are excited to use corpus-linguistic approaches to make the most accurate and natural translation possible!
TL;DR
If the email was too long to read, here are the main points:
- AI models like ChatGPT rely upon large bodies of texts called corpora.
- Linguistics and translators have used corpora long before AI was mainstream.
- Corpus-linguistic approaches equip translators to choose the most accurate and natural equivalents.
- The team translating the Spoken English Bible (SEB) is using corpus linguistics as one tool to ensure the quality of their translation.
Challenge for You
Make your own free account on English-Corpora.org and play around.
References
Barnwell, K. (2020). Bible Translation: An Introductory Course in Translation Principles, Fourth Edition (4th ed. edition). Summer Institute of Linguistics, Academic Publications.