Marilyn Deegan looks at the challenges facing research libraries in the electronic era.
The function of a research library is the preservation of our written cultural heritage for the long term. Libraries are responsible for conserving the artefacts which contain that heritage and for making it available to scholars.
There is a synergistic partnership between libraries and scholars in this process of preservation and access: the work scholars do is part of the intellectual preservation of written culture, because by preserving and enhancing the context of the written artefacts, they ensure that the content remains accessible, and amenable to further interpretation by new generations of scholars.
Preservation of written materials, then, is a two-fold process: conserve the object, and maintain and extend the knowledge needed to extract the information encoded on it. These are not necessarily easy tasks, but many centuries of librarianship and scholarship means that the techniques we use have become so sophisticated that sometimes we do not even realise that we are using them at all. Digital documents present research libraries with a whole new range of issues of preservation to consider, and because of the rapid, and accelerating pace of the digital revolution, these issues change constantly.
In the electronic world the context and content of documents have the same importance as in the non-electronic, with an added layer of complexity: the context of the creation of the electronic documents and also their format has to be encoded with them, or their status becomes ambiguous.
One of the main difficulties inherent in digital documents is that they cannot be read directly: they are encoded in electronic impulses and complex technology is required before the reader can begin to interpret them. Libraries therefore now have to provide not merely seats and tables for their customers, but also networks and terminals. This stretches already diminished budgets and causes new strains on the physical infrastructure.
A further difficulty is that digital media come in many forms. Different kinds of hardware are needed to access digital documents, depending on whether they are held on CD-Roms, hard drives, tapes or networks.
The preservation of digital documents has many problems which do not arise in the preservation of written documents, and the costs can be much higher. One estimate suggests that it costs in the region of Pounds 5 per year to preserve a written document for the long term, but around Pounds 95 per year to preserve a CD-Rom. The definition of a digital document is a thorny problem, too. With the vast amount of digital material being produced every day, do we preserve it all, or must we have selection policies?
Also, the software and hardware with which we access digital documents changes all the time and it is clearly not practical to preserve every version of a computer or a software program: the costs are too great. There are three possible ways of accessing documents produced on obsolete machinery. The first would be to "refresh" the data regularly and move it up through generations of software: this is hugely costly in terms of the human resources needed to carry out the task. The second way is to build emulation programs which run on current hardware; this has been investigated by the National Science Museum. This is again a large task, given the number of possible programs and versions.
The third option is to convert data into a standard format when it is received for storage. But this, while possibly the most intellectually rigorous solution, is again a time-consuming task because of the number of proprietary formats that exist. With a complex multimedia CD-Rom produced in proprietary formats, one could attempt to preserve the information content by some conversion process. While this might seem better than no preservation at all, much of the meaning is obtained from the format, and future generations might be able to make no sense of such an object.
Standards committees such as the Text Encoding Initiative are recommending that proprietary formats should be eschewed when creating digital documents in favour of such interchangeable languages as Standard Generalised Markup Language. While these recommendations are gaining wide circulation in the academic world, their take-up generally is not great outside it, and there is no way of enforcing standards for preservation: one just has to preserve what one receives.
Besides the preservation of the actual digital document, we need to consider the question of intellectual preservation, which is concerned with the integrity and authority of the information which we are preserving. It is frighteningly easy to change a digital document, leaving no trace, no ghostly palimpsest to tell us what was there before. If we alter the words on a written document, generally we can decipher the original and the changes become part of the cultural accretion of meaning which the document may accrue: a medieval manuscript altered or glossed by later scribes, for instance. A digital document is always pristine, despite changes made by accident or design, and this means that if two scholars are discussing a work, they may not always know that they are using the same version, or if there has been some hidden change. Scholarship needs to rely on some certainties in an uncertain world, and these are now under threat.
Another important aspect of preservation is the use of digitisation to preserve fragile and rare originals: medieval manuscripts, carbonised papyri, crumbling photographs. And artefacts other than documents can be preserved in digital form: archaeological findings, architecture. Libraries are becoming increasingly worried at the deterioration caused by allowing scholars access to the original objects. Research on high quality digitisation is being carried out in the Oxford Libraries and the British Library under its Initiatives for Access, with encouraging results. Scholars find the electronic surrogates adequate for most purposes, and preferable for some. For example the Electronic Beowulf project at the British Library has used enhancement techniques to reveal readings in the manuscript which were lost to us for many years.
With carbonised papyri, one of the problems is that when the papyrus rolls are unrolled, they fall into fragments and conventionally have to be reassembled and mounted on to linen. If there is a mistake in the mounting, it is very difficult to correct. With the digital versions the fragments can be pieced together in many different ways, until a satisfactory text is established. The Tchalenko Archive at the Ashmolean Museum in Oxford contains many architectural photographs from the Middle East, often of buildings which no longer exist. The archive has recently been digitised, allowing scholars enhanced access to a valuable resource. Despite the problems and issues outlined above, the benefits to scholarship offered by the digital library are so great that serious efforts are being made to address them. In 1993 the Washington-based Commission on Preservation and Access produced a report, Preserving the Intellectual Heritage, which outlined key issues in digital preservation and is currently the subject of much debate. The commission has recommended that a national infrastructure be set up in the US for the purpose of archiving digital materials. In the UK the British Library Research and Innovation Centre and the Joint Information Systems Committee of the higher education funding councils are considering the findings of this report and their application. We increasingly live in a digital world and we must ensure that the rigorous practices which have been the benchmark of librarianship and scholarship over many years are carried over into it.
Marilyn Deegan is professor of electronic library research in the humanities, De Montfort University.