3-D photography is the Holy Grail for computer graphics researchers. Toby Howard assesses how far they have come and where they plan to go
It is gratifying when undergraduates come to the front at the conclusion of a lecture, but recently I found I had a small-scale rebellion on my hands. I'd been describing how the computer-generated creatures were achieved in Jurassic Park, but some sceptical students didn't believe me. They suggested I'd got my video clips mixed up, and that what I claimed to be huge meshes of textured polygons were giant hydraulic robots with rubber skins. But the modelling and the computer graphics really were that convincing. From the millions of pixels, life - of a sort - had emerged.
As computer graphics prepares to enter its fifth decade, it's been suggested that all the fundamental problems have essentially been solved. If you've seen Jurassic Park, or almost anything from Hollywood recently, you may agree. If a computer can recreate those prehistoric creatures so realistically, and make them blend so seamlessly with the real world of the human actors, what more is there to say?
Traditionally, creating images using computers has involved a fusion of three separate processes: "modelling", "viewing" and "rendering". Modelling describes what should be in the picture (the shape of a pterodactyl's skeleton, for example), and usually involves constructing precise geometrical descriptions. The "viewing" process then applies the laws of perspective in an attempt to bridge the dimensional gap that arises from using a flat 2D display to draw a 3D object. Finally, rendering determines how the geometry is actually displayed (the texture and gloss of reptilian skin in moonlit rain, perhaps). The thrust of much computer graphics research in the past three decades has been on rendering, with astounding results. Any modern PC can now create synthetic images which are almost indistinguishable from photographs of the real world.
Research into modelling has often played second fiddle to interest in rendering. But the ubiquity of graphics software and the increasing power of computers are leading to renewed interest in creating and manipulating models, particularly 3D models of real-world objects.
The applications of 3D computer modelling are legion. But, it seems, everybody's doing it: archaeologists are arranging virtual field trips; doctors are rehearsing surgery with virtual body-parts; engineers are experimenting with virtual wind tunnels, all at a fraction of the cost of the real thing, and without anything physical to construct or damage.
Modelling in 3D is a three-stage process: first, obtain your data and get it into the computer; second, use the computer to interact with it, just as if you were manipulating a real object, but without annoyances like gravity and material strength; and third, output or store the modified data. Or, the data might subsequently be made "real" again by automatically fabricating an object using, for example, a numerically-controlled milling machine.
This is fine in principle, but there is a major problem. Using current methods, creating 3D models is an extremely time-consuming, unreliable, and labour-intensive business. Models are often made of huge numbers of polygons, usually triangles, linked together into a mesh, rather like chicken-wire. It can take a huge number of polygons to capture geometrical detail. For example, Viewpoint Datalabs, one of the leading suppliers of off-the-shelf 3D model data, will sell you a detailed model of a bee (with hair) made from 129,802 polygons (without hair, it's 44,036 polygons).
Marshalling all these tiny shapes together to make the overall object is a daunting task, and there are a number of ways to go about it. If your model is very simple, you might be able to do it by hand, sketching out shapes on graph paper and reading off the coordinates. This is hard work. A more practical solution is to use a computer aided design (CAD) package, of which there are many hundreds available. The best systems, such as AutoCAD, can simplify the process. However, the dimensionality problem remains: you are trying to build solid 3D objects, but can only see flat images on your screen.
If you have an object whose geometry you wish to capture, a different approach is to scan it, as the Star Trek transporter scans its passengers, and read off a stream of numbers that describe its shape. Machines that do this can be effective, and capturing the shape of 3D objects using laser scanners is an established technology. Cyberware, an American modelling company, sells turnkey systems where the object is placed on a rotating platform and illuminated by a low-power laser scanned rapidly across it. Two video cameras record the spots of laser light visible on the object, rather like following the path traced out by twirling sparklers on Guy Fawkes night, and software digitises the images to extract the shape information. Laser scanning allows automatic capture of complex geometry to reasonably high resolution, and has wide application. Such a system can scan a human body to a resolution of 2mm in 17 seconds, and marine biologists at Purdue University have used this technique to measure fish skeletons to distinguish between species.
But what if you don't have the object you want to model, or if, like the Taj Mahal, it might not fit on the scanner platform? Here, photogrammetry, a long-established technique used in map-making, architecture, medicine and forensics, is useful. This involves photographing terrain or objects with calibrated cameras, and measuring the features from the photographs. Computer vision researchers have long worked on automating the process, to extract 3D structure from video images, and the University of California at Berkeley has a system which can semi-automatically recreate 3D architectural scenes from photographs. However, there is still no known general method for automatically deriving an accurate geometrical model of a scene from arbitrary photographs of it. Many researchers see this as the Holy Grail.
There are times when geometric modelling isn't appropriate. How do you write down a set of numbers which describe a flower, for example? Or a cloud, fireworks, a forest, or a snowstorm? "Fuzzy" objects like these call for methods which can generate geometry algorithmically, such as particle systems, fractals, iterated function systems, and the marvellously-named technique of "blobby modelling".
But regardless of how you've created your model, you will probably wish to interact with it in some way, to edit its shape, change its surface properties, and so on. This is the point at which, in most state-of-the-art systems, all the 3D information you've taken such pains to capture, gets squashed down to a 2D image on the screen. Armed with a mouse rolling on a flat mat, we must again struggle to bridge the 2D/3D divide.
The ideal situation is to manipulate the model in true 3D. Not only do we replace the flat screen with a stereoscopic display, so that we feel we are sharing the same space as the model, but we also replace the desk-bound mouse with true 3D input devices, which we can hold in our hands and wave about in space. We're now in the realm of virtual reality, or VR. Unfortunately, VR has suffered to some extent from problems which plagued artificial intelligence in the 1970s, when research results failed to live up to the hyped promises. Just as people became disillusioned with the disappointing behaviour of programs intended to engage you in believable conversation, they remain unconvinced by the poor-quality imagery of VR headsets which according to the media promised the "ultimate virtual experience". "Phooey", most people said when they had a try.
Current affordable VR headsets do not offer wonderful image quality. But it can only get better, and the psychologically engaging interaction techniques pioneered by VR research groups worldwide promise new ways to work with 3D models. There is much scope in using large-screen stereoscopic displays, where multiple participants, unencumbered by special head-mounted displays, can work cooperatively in shared virtual environments across the Internet.
Perhaps the grandest challenge is to be able to scan an environment and automatically create a faithful representation of it inside a computer, where we can explore and manipulate it virtually, export it from the machine, and make it real again.
Such technological alchemy is currently beyond our grasp, but I look forward to the day when it arrives, and with it the opportunity to once again attempt to convince my sceptical students.
Toby Howard is a lecturer in the department of computer science, University of Manchester.