DPIP Report: Needs,
Challenges, Issues
Cross-collection
searching and global integration of resources
Brian Kupiec, Derek
Merleaux (site visit text by FMM)
December 13, 2005
Issues
At this stage of technology development in the Library system, the topics of cross-collection searching and global integration of resources raise more questions than they answer. This becomes acutely clear when the discussions inevitably turn to areas of interest like presentation and navigation of results, local branding of resources, manipulation of data and objects for classroom or study use, diverse requirements for feature sets, etc.
What strikes one during these discussions is the essentially organic relationship between, on the one hand, the union catalog (harvested or otherwise generated) vs. federated catalogs, and on the other, a resource owner’s ability and desire to control the look-and-feel of their resources. It is possible to think of the two types of catalogs as being opposed to each other, but also as complimentary. One of the important contributions of DPIP could be to make some strong policy decisions about how we will look at these issues.
For all the online, searchable collections made available through efforts at YUL:
One model for cross-collection searching that could be accomplished relatively quickly has already been mentioned. Specifically, each collection provides processes that respond to external queries (queries not generated from within that collection’s own local search interface) by delivering an XML data file. This is a perfect place for DPIP to provide consultation and possibly some tools as well. There will most certainly be collections that don’t have the resources to do this on their own. This data file would comply with a YUL standard and not necessarily be OAI compliant. If this file is OAI compliant, then each collection can achieve the twin goals of providing data to the Yale community aggregator(s) and to the rest of the world through OAI harvesters without a duplication of effort. The aggregator, in this case, would assemble the cross-collection search results for display.
Site Visits
Cornell has invested years of effort working with Endeavor to develop and implement the three components of ENCompass (reference linking, federated searching, content repository). Recently they have concluded that the federated searching component will not satisfy their needs and are beginning a reassessment of the marketplace to determine how best to start over again. They don't feel that their digital collections alone are deep enough to justify spending large amounts of time on building a search across Cornell collections. They see more value in searches across Cornell and other external digital collections. A goal of their Integrated Framework initiative is to enable widespread harvesting of CUL digital collections via OAI and the Ockham registry.
Harvard does not yet have a comprehensive integrated access solution, but has focused on developing a rich and flexible digital library infrastructure which will facilitate future efforts in this area. In addition to components such as persistent naming and OAI compliance, they have a technical architecture which separates the Discovery, Delivery, and Storage/Access Management functions. This approach will enable them to introduce future global search technologies in the Discovery function without necessitating major redesign of the other two functions.