Exploring LDAB: X. Computational Textual Criticism

The Back Story

There is a Chinese saying, “You never regret how little knowledge you have until the time comes to use it”. Many times in the past two years, I was painfully reminded of that truth.

When I started writing this “Exploring LDAB” series last year, I ventured into fields that were totally strange to me, and was forced to learn many things I had never thought of learning before — and all that just out of incorrigible curiosity. One of those fields is called New Testament Textual Criticism [1], which I didn’t know existed until a few years ago, thanks to the controversies caused by New York Times bestselling scholars in that field.

I find Textual Criticism (TC) intriguing, mainly because it is somewhat similar to my own field, which is genomic research. Both involve the study of information-bearing sequences, their changes over time, the resulting differences (mutation) and similarities (conservation) between these sequences, and their causes and ramifications. One can leverage the concepts and computational tools developed and well-tested in one field to gain insights in the other.

The Origin of the Gospel of John

One goal of TC is to establish the text as closely to the autograph as possible. This is somewhat similar to the so-called “common ancestor” problem in biology. As far as I know, nobody has solved that problem yet, that is, nobody has been able to establish the genomic sequence of an organism, from which all known species have descended. In other words, despite claims to the contrary, we actually don’t know “The Origin of Species”.

Nevertheless, it is intriguing to try to determine the relationships between all the different transcripts of a New Testament Book, and incorporate them all into a phylogenetic tree-like structure.

For this exercise, I chose the Gospel of John, partly because it has the best manuscript coverage of all the New Testament books in the earliest centuries, and partly because transcriptions of the Gospel of John are accessible on line. To quantify the differences between two transcripts, I use Edit Distance as a metric, which is the minimum number of operations (insertion, deletion, substitution) required to transform one transcript into another. This is a rather crude and naive approach, for it doesn’t take into account the information encoded in the transcript, but treats it as merely a sequence of characters. But it is at least a good place to start.

After performing pairwise distance calculation verse by verse on all 31 papyri of the Gospel of John, I generated the following graph using the distance matrix. The manuscripts are color-coded according to their dates.

Papyri of the Gospel of John

Interestingly, there is no sharp divide between manuscripts of the early (2nd and 3rd) and late (6th and 7th) centuries, which might suggest that the Gospel of John has not changed much. For example, 6th century manuscript P63 (in blue) is similar to 3rd century manuscripts P66 and P75, more so than other manuscripts of John of the 3rd century (in orange).

The figure below is the same graph in “radial” rather than tree form, for viewing convenience.

Papyri of the Gospel of John

Notes:
^1.Dr. Daniel B. Wallace’s video lecture, “The Basics of New Testament Textual Criticism” is a helpful introduction into TC. YouTube. April 09, 2018. Accessed April 16, 2019. https://www.youtube.com/watch?v=Doi8JxJOtgE.

References:

Leave a Comment