- Harvard University

- Library Notes

- January 2009

- No. 1347
| Harvard Goes to DLF |
|
|
The Harvard Libraries were strongly represented at the recent Digital Library Federation Fall Forum, held in Providence, Rhode Island, November 12–14. While numerous members of the Harvard library community were in attendance, staff from Harvard University Library’s Office for Information Systems (OIS), as well as from the Harvard Map Collection in the Harvard College Library, gave three significant presentations. A Busy Hive Creates Better WAX: Archiving the Web from Many Perspectives Andrea Goethals and Wendy Gogel, both of OIS, described the two-year process of designing and implementing the pilot for Harvard’s Web Archive Collection Service (WAX). Many diverse viewpoints on what web archiving means needed to be taken into account: To collection managers (librarians, archivists, and faculty) it means collecting, preserving, and providing access to web content in the same way that they have always done for analog material. To lawyers it means entering a world of risk to be mitigated. To technologists (architects, programmers, graphic designers, and preservationists) it means working with systems and content that is much more complicated than usual. Challenges included defining collections with amorphous boundaries, managing a multitude of complex IP and other legal issues, addressing QC for material too vast to review comprehensively, navigating crawler traps, dealing with an explosion of formats, and handling duplicate content. Goethals and Gogel also provided a demonstration of the WAX system. Deep Web Content and Internet Discovery: Exposing Harvard University Library’s Digital Resources to Search Engines In a presentation representing work by Roberta Fox, Michael Vandermillen, and Spencer McEwen, also all from OIS, Fox discussed OIS’s efforts to expose database-driven applications that were developed under Harvard University’s Library Digital Initiative (LDI) so that the data is crawlable by search engines, such as Google, Yahoo, etc. The applications originally relied heavily on cookies, sessions, and forms processing, all of which create barriers to search engine crawlers. As students, faculty, and other users increasingly turn to search engines—rather than library catalogs—to meet their research needs, the team reassessed its assumptions. Inadvertently created barriers to crawling were keeping users from knowing about the wealth of digital resources that the applications could provide. Not only did the content of the pages need to be modified to allow the deep web content to be crawled, additional modifications were needed both to make the presentation of the search engine results meaningful, and to place the pages reached by such a search within the context of record groupings. Fox described the reengineering efforts that took place to make 400,000-plus page-turned objects, high-quality images, and other digital objects more accessible both to search engine crawlers and to users, including analysis of the trouble spots, general technical approach, and specific solutions implemented. Usability and the Harvard Geospatial Library In a presentation representing work by David Siegel and Janet Taylor, OIS staff members, and Bonnie Burns, of the Harvard Map Collection, Randy Stern of OIS spoke about the 2007 usability study and subsequent redesign of the Harvard Geospatial Library (HGL). HGL, first released in 2001, is a web-based discovery system and repository for geospatial data sets for use in research and teaching, including vector data, raster images of historic maps, satellite imagery, and more. Upon its release it was initially aimed at an audience of GIS (geographic information systems) experts, and the user interface for searching and mapping data was modeled on expert-oriented GIS applications. With the explosion in general knowledge of mapping tools (such as Google Maps) and the value of including geospatial analysis in research, a wider range of users now looks to HGL to locate relevant data sets. The presentation reviewed the objectives, methodology, and outcomes of the usability study, demonstrated the new HGL user interface with its innovative categorical and geographic browsing capabilities, and briefly touched on the technologies used to implement the new HGL. |
