As the world's pictures are digitised and powerful interests jockey for electronic rights, Charles Oppenheim explores the maze surrounding ownership of diverse works, Roy McKeown describes progress in image library cataloguing and Anne Ramsden recounts the Excalibur search engine's academic and business conquests.
Excalibur EFS is one of many document image processing systems that convert documents to digital form. The digital image page is produced using a desktop scanner. Optical character recognition software converts the images to searchable ASCII text.
In most cases the OCR text is approximately 95-99 per cent accurate. This has important implications for searching, because some documents may not be found if the text is inaccurate. Excalibur's software gets round this problem with its fuzzy search engine which can find the correct terms and near matches.
The Elinor project (http://www.iielr.dmu.ac.uk/Projects/ELINOR/), an electronic library development at De Montfort University's Milton Keynes campus, uses Excalibur EFS as the core system for scanning high-demand course texts and putting them on the campus network.
Other organisations have started electronic library applications using the same software:
* The Internet Library of Early Journals project at Oxford and Leeds universities, is digitising 18th and 19th-century journals.
* The British Library is archiving digitised ageing microfilms of 18th-century newspapers as well as converting and indexing an old printed catalogue of seals.
* The library of the United States Naval Research Laboratory is importing Elsevier Science electronic journals in TIFF image and ASCII text format into Excalibur's EFS system. Users access the articles through a web browser.
The latest Excalibur software range, RetrievalWare, combines two retrieval technologies, fuzzy searching and semantic retrieval (a semantic network of searchable words built from various dictionaries and thesauri).
Late last year Excalibur announced Visual RetrievalWare for indexing and retrieving visual images such as video clips, photographs, and fingerprints. Instead of a text-search clue, the user selects an example image as the basis of the search. Adjustments can be made to the image clue by parameters such as colour, texture, shape, brightness, aspect, keywords. The retrieval window presents thumbnail images closely matching the first image.
Tony Durham writes:
Since 1991 Excalibur Technologies has cultivated its image as the company that can solve tricky problems in the retrieval of text, images and other data types. The San Diego company found customers in business, government and research, and hit the British headlines last year when the Labour Party deployed the software for finding facts to counter Tory claims. Excalibur's technology was based on pure pattern matching and was sometimes described as a neural network approach, though it avoided much of the heavy numerical computation that conventional neural networks require. This was good for "fuzzy matching" where the bit patterns were similar, but not so good for the kind of matching which depends on meaning: for example, matching "China" to "Asia".
The real breakthrough for Excalibur was its acquisition of ConQuest Technologies, which had the concept-searching technology that Excalibur lacked. ConQuest also had a software architecture for indexing and searching large databases distributed over a number of sites - ideal for the Internet age.
Excalibur has matured into a middle-of-the-road vendor offering the full basket of information retrieval techniques, both pattern-based and concept-based.
Indexing and searching image databases is a tough problem widely regarded as "AI-complete": to do it really well by computer requires human-level artificial intelligence. Despite Excalibur's achievements, image retrieval remains a challenging research problem, not the kind that can be vanquished with a few flashing sword-strokes.
Roy McKeown is curator of the National Art Slide Library, Charles Oppenheim is professor and co-director, and Anne Ramsden is a project manager, all at the International Institute of Electronic Library Research, De Montfort University.