It is estimated that the Internet contains enough information to fill ten million books. Now there is an effort under way to build a library for all of it.
The Internet Archive is working to save a copy of the entire Internet for posterity, collecting everything from downloadable graphics to anonymous real-time chatter on to giant tape drives in San Francisco.
"The Internet is millions of people every day discussing what's important to them, all available now, whereas before we had to collect diaries or personal letters after somebody was dead," said Brewster Kahle, the project's director and the inventor of wide-area information servers (WAIS) which made it possible to search the Internet for specific subjects.
"It's going to be important to scholars and historians to understand this change that's happening in the way people communicate,'' said Mr Kahle. "We have an ability to study what people are thinking about and dealing with on a day-to-day basis in a much finer granularity than just archiving professionally produced and published materials.'' The challenge is in keeping up. Project workers estimate that there are 30 million World Wide Web pages on 225,000 sites. The mean lifetime of a Web object is just 44 days. Since its inception in March, the archive has collected 500 gigabytes of information, or 5 per cent of the entire Internet's ten terabytes. Still, organisers say they will have collected all of it within the next few months, to be maintained by a charitable trust. But as the Internet expands, the breadth of what they gather may be narrowed.
"In some sense, it's like rolling a rock up a hill,'' said Mr Kahle, who has helped fund the project. "At some point, we're going to have to be selective.'' Project staff said they hope to find commercial uses for the archive. During the presidential election campaign they collected the home pages of political candidates for the Smithsonian Institution, which plans to maintain a collection of the sites.
Mr Kahle said the archive will become a back-up service for Web sites lost to technical glitches.