By PETER SINCLAIR
"Fast Search & Transfer announces world's biggest, fastest search-engine: 300 million web-pages searched in under one half second with highly relevant results based on newly-developed algorithms… "
That's all very well, but as of last week the Web was nearer 6 billion pages; 300 million is just a drop
in the bucket.
Still, Fast Search ["All the Web, All the Time"], like several of the more obsessive engines, isn't leaving it at that. Nothing less than total coverage of the entire Web will satisfy it, updated continuously to include the thousands of new pages appearing every day and eliminate dead links and identical porn-sites masquerading under different names.
And as if this Sisyphean chore wasn't enough, the latest mad grail of Man the List-Maker is to preserve every last word of it all, from the sublime to the silly – everything done, thought, guessed, considered or just blurted out on the spur of the moment - for posterity.
Poor posterity! Never again will it be able to distinguish the ephemeral from the immortal by the simple laws of erosion, as time wears the trivial away until only the significant remains. From now on, anyone's fleeting thought on the Web will weigh exactly the same as Einstein's Theory in the scales of eternity.
But this, according to San Francisco's non-profit Internet Archive [free access to bona-fide scholars; registration required], is the whole point.
Your 3-year-old's first crayonings, scanned and posted to the family website – why not? At the moment he may display only the gifts of the average chimpanzee; but if he grows up into Leonardo da Vinci, those childish scribbles are going to be a priceless resource for art historians of the future.
The Archive has been established by Brewster Kahle, developer of the Alexa search-engine, itself named for the great Library of Alexandria.
But where this repository of ancient wisdom amassed only about 500,000 scrolls in the 5 centuries before it was destroyed by fire 1600 years ago [UNESCO is belatedly trying to recreate it], its digital namesake has already collected nearly 1.5 billion webpages and is adding to them at the rate of 120 million a week.
The Archive acquires material in two ways - from gifts of digital collections and with search-bots.
A set of pages that has been retrieved by robot is called a 'crawl'. Each Alexa crawl takes a couple of months, harvesting more than 100 gigabytes of publicly available information a day - its robots don't gather passworded pages or those tagged for 'robot exclusion'.
The system's strength, Kahle claims, is that it makes no value-judgements before acquiring something. Where Alexandria failed to preserve the gossip of the market-place, or a line of verse scribbled by a slave on a scrap of papyrus, Alexa misses almost nothing - and it's exactly this kind of stuff, he says, which will be of most value to the anthropologists, sociologists and historians of the future.
Alexa 1000, a symbolic assemblage of scarlet computer-monitors, now stands flickering in the Library of Congress [http://lcweb.loc.gov] – which, incidentally, contains a mere 17 million volumes.
The Archive's critics – notably Richard Cox, professor of the subject at the University of Pittsburgh – claim that its all-inclusiveness tends to distort some data by taking it out of context. Even Archive board-member Peter Lyman, professor of information management at Berkeley, admits "it's a mistake not to let some things be ephemeral".
But with all these robot historians around, just remember next time you flame someone in a newsgroup that you are not merely dissing him or her in a brief flash of red rage: you are addressing your insults to the ages…
Related links:
Fast Search & Transfer www.alltheweb.com
Internet Archive www.archive.org
TV Chimpanzee www.kokomojr-tvchimp.com
Leonardo da Vinci www.mos.org/leonardo
Alexa www.alexa.com
Library of Alexandria www.perseus.tufts.edu
Unesco www.unesco.org
Library of Congress http://lcweb.loc.gov
BookMarks
MOST REVVED: PartExpress
For a pundit, the most gratifying thing about the Web is the speed with which it fulfils his predictions. Last Christmas, on the potential of collective action to unlock the smaller-scale potential of the Web, I wrote: "A group of, say, car-part firms could very easily lease space on a server at a reasonable rate, combining inventories so that each individual outlet, while stocking standard lines for its area, would also have access to more specialised items available at other stores while minimising inventory costs". Four months later, here it is – from now on, mechanics won't have to spend endless hours sourcing and pricing parts for commonly available models less than 15 years old. A one-search facility, with ordering and payment functionality – just specify delivery requirements.
Advisory: in association with EzySurf.
FISHIEST: Fishing Tackle Online
There was a time years ago when your columnist knew how to tie a Hardy's Favourite; and, what's more, fool a fat brown trout out of a South Canterbury stream with it. These days I'd have to visit this excellent new site – although I couldn't find a Hardy's, another of my specialties, the enticing Coch-Y-Bondhu, is available for only $1.95. Plus rods, reels, hooks, lines and sinkers for everything from mighty marlin to the most satisfying tiddler ever hauled onto Devonport Wharf by a 10-year-old.
Advisory: don't miss that next nibble…
Email: petersinclair@email.com
Peter Sinclair: Keeping pace with the web
By PETER SINCLAIR
"Fast Search & Transfer announces world's biggest, fastest search-engine: 300 million web-pages searched in under one half second with highly relevant results based on newly-developed algorithms… "
That's all very well, but as of last week the Web was nearer 6 billion pages; 300 million is just a drop
AdvertisementAdvertise with NZME.