| New Web Archiving Collections (WAX) Service |
|
On December 1, the public interface for Harvard’s new Web Archive Collection Service (WAX) will launch as a pilot project at http://wax.lib.harvard.edu. Funded by the University’s Library Digital Initiative (LDI) to address the management of web sites by collection managers for long-term archiving, WAX is the first LDI specifically oriented toward preserving “born-digital” material. Collection managers, working in the online environment, must continue to acquire the content that they have always collected physically. With blogs supplanting diaries, e-mail supplanting traditional correspondence, and HTML materials supplanting many forms of print collateral, collection managers have grown increasingly concerned about potential gaps in the documentation of our cultural heritage. WAX was developed as an initial—and only partial—response to these and other concerns, which range from technical feasibility to legal and financial implications. The pilot focused on harvesting content from the surface web—content that is discoverable to search engines through web crawlers, as opposed to content hidden from web crawlers in a database or by password or login protection. The two-year WAX pilot, which began in July 2006, specifically addresses the capture, management, storage, and display of web sites for long-term archiving. It is a collaboration of the University Library’s Office for Information Systems with three University partners, each fielding a single project: • Harvard University Archives (Harvard University Library)—“A Demonstration Project to Collect and Make Accessible Departmental History of the Faculty of Arts and Sciences” • Arthur and Elizabeth Schlesinger Library on the History of Women in America (Radcliffe Institute for Advanced Study)—“Blogs: Capturing the Alternative Voice” • Edwin O. Reischauer Institute of Japanese Studies (Faculty of Arts and Sciences, with sponsorship from Harvard College Library)—“Japanese Constitutional Revision” According to WAX project manager Wendy Gogel, each partner brought different requirements to the pilot. “For the Schlesinger,” Gogel stated, “the daily thoughts of women captured through centuries in their diaries are now more often found in blogs. The University Archives has a mandate to capture the output of the University, much of which is found only online. The Reischauer Institute, having convened a study group to research and document the process of constitutional revision in Japan, found that the debate among citizens, politicians, and governmental and religious groups was essentially an online phenomenon. This diverse content is providing us with a variety of collection management and technical challenges to address.” As part of its challenge grant program, which was completed in 2007, LDI funded 49 projects that have covered such wide-ranging subjects as art, architecture, religion, history, culture, botany, biology, landscape design, music, politics, law, and advertising. The great majority of these projects have involved digitizing analog materials. Projects have created digital texts of books, pamphlets, letters, manuscripts, reports, diaries, interviews, legal trial documents, and more. Digital images include photographs, slides, lantern slides, prints, drawings, paintings, sculpture, coins, and archaeological objects, among others. Audio files have documented ethnomusicology, poetry, and epic songs. Musical scores and medieval manuscripts have been digitized, and geospatial data has been captured, including georeferencing of maps. And now, archived web sites have been added to the variety of material. |