Monday, April 8, 2013

British Library sets out to archive the web

The NZ Herald reports that the British Library is going to archive the web for future historians. The British Library has always tried to keep a copy of everything published in the UK; that means every book, newspaper, magazine, newsletter, and pamphlet. However, obviously in 2013 possibly the majority of information is published online. The problem is though that webpages are notoriously ephemeral; here today and gone tomorrow. This potentially leaves a gaping void for historians in the future, which the British Library now intends to fill by archiving the web. An automated "web harvester" will scan and record 4.8 million sites ending with the suffix ".uk" at least once a year - a total of 1 billion web pages. Rapidly changing websites, like those of newspapers, will be harvested more frequently, as often as daily.
    The US based Internet Archive has been doing just since 1996 on a slightly ad hoc basis. It's Wayback Machine lets you browse through over 240 billion web pages from 1996 to the present