The University of Queensland Homepage
School of ITEE ITEE Main Website

 Multi-version Documents

'Let's figure out where we want to go, and that will show us how to get there.' - Alan Kay

This project is about refining the digital form of literary text. It is about figuring out, in the Internet Age, how to accurately represent the printed or handwritten works of the past, and how to preserve the electronic works of the future.

Digital text was first seen as a cheaper means of producing printed texts, just as the first printing press of Gutenberg was seen as a cheap means of producing manuscripts. The formatting codes inserted into early digital text to reproduce typographical features of the print medium were later replaced by 'generic' codes. These recorded the same printed structures as before but their realisation was deferred – an external 'stylesheet' or program was used to interpret them, in the belief that this made the digital text independent of its printed form. The generic codes were later given the hierarchical structure of formal languages, which had been invented by linguists in the 1950s. This facilitated processing by computer because it allowed the automatic detection of structural errors. With the arrival of the Internet this form of digital text took over the world, and has allowed a whole generation to grow up without having to read a book. But most people have forgotten the roots of markup, its close relationship with the print medium, and its structural limitations. What is needed for the literary, philosophical and documentary texts of the past or the future is something more flexible, that can represent the text itself, rather than one possible form of its expression.

Like the books of the Eloi in H.G. Wells' The Time Machine, our own printed books will eventually turn to dust. If we can't convert them successfully into digital form they will soon disappear forever. This is what happened during the transition from papyrus to the parchment codex nearly 2,000 years ago. It was simply too expensive and too difficult to copy most of the works of the Greeks and Romans, and so they were lost. We are facing a similar problem today.

A multi-version document organises into a single, integrated, digital entity all alternative versions of a work so they can be efficiently searched, edited and compared. Shakespeare's King Lear exists in at least two major alternative versions (and many minor ones); the Greek New Testament has at least 5,600 different versions. What are the texts of these works and countless other more modern ones, if not a collection of alternatives?

For this purpose, I have devised a multi-version document format, a means of implementing it, and a user interface to edit it, which is still under construction. You can read a blog about progress on my multi-version wiki here.