Home

Disruptive Technology and the Library Catalog

This report synthesizes ideas from the NEASIS&T and NELINET programs I attended on Nov. 15th and 18th. I’ve tried to organize the narrative by theme rather than by speaker, since the there was a lot of topical overlap. I’ve also integrated some notes from Amy Benson’s presentation in a few places where it touched on similar issues.

The NEASIS&T (New England Chapter of the American Society for Information Science and Technology) program was entitled “Buy, Hack, or Build: Optimizing your Systems for your Users and Your Sanity.” The speakers were: (1) Joshua Porter, Director of Web Development at User Interfaces Engineers (“Web 2.0 for the Rest of Us”); (2) Pete Bell, co-founder of Endeca Solutions (“Faceted Navigation for OPACs”); and (3) Casey Bisson, information technologist at Plymouth State University (“OPAC Hacks”).

The NELINET program was entitled: “Google vs. the OPAC: the Challenge is On”, and the speakers were: (1) Casey Bisson ( “The Social Life of Metadata”); (2) Carl Grant, President and COO of VTLS (“Library Systems in the Age of the Web”); (3) Marty Kurth, Director of Cornell Library Metadata Services (“Catalogers Wanted: Metadata Practice in the Web Era” based on his in-press book chapter); (4) David Lindahl, Director of Digital Library Initiatives at University of Rochester Libraries (“Metadata that Supports Real User Needs)”; and (5) Stuart Weibel, research scientist at OCLC (“Google vs. the OPAC? The Challenge is Wrong”).

Social Software

An important trend cited by several of the speakers is the growing use of self-organizing “social software” such as tagging, folksonomies, ranked reputations, and user-submitted book reviews.

Social tagging is the assignment of descriptive terms by ordinary users to Web sites as a kind of combination bookmark and index term. Once an item has been marked with a particular tag, it can be collocated with other documents that have been given the same tag (as assigned and retrieved by single or multiple users). Del.icio.us is perhaps the most popular tagging application at the moment, but there are several others making their way into the mainstream. Amazon recently added social tagging to its site (see announcement), allowing users to bookmark and index Web pages according to their whim, and at the same time allowing Amazon to collect data on what kinds of words are spontaneously passing through the heads of their customers. The market research and trend-analysis implications of this kind of data collection are impressive.

By virtue of its new integrated wiki (“WikiD”), OCLC is allowing users to add their own book reviews to Open WorldCat (and eventually, so we are told, to the totality of WorldCat).  And here at Yale, Dan Chudnov has been hosting an experimental groupware version of social linking at links.med.yale.edu.

While few would argue that social tags and links herald the complete downfall of controlled vocabularies, it seems clear that libraries need to find ways to incorporate the self-organizing properties of this technology. Authority control will continue to be desirable for cross-disciplinary communication and research, since different communities employ different jargons, and the cross-referencing provided by authority records allow these communities to speak to one another more easily. Porter believes controlled vocabularies will continue to be indispensable for command-and-control operations such as space shuttle launches or military campaigns, where there is little time or tolerance for the ambiguities of natural language.

Authority Management

          In addition to the problem of ambiguity, the growing use of social software raises the question of authoritativeness. While book publishers and other media unleash a torrent of knowledge-claims-making materials, libraries have traditionally winnowed out the least authoritative and reliable from among them. This role is now threatened as never before. Peer-to-peer networks make it easy to bypass the traditional gate-keepers (for better and for worse), and allow individual users to decide for themselves whom to trust and whose ideological narratives to adopt. This is particularly evident in the blogosphere, where traditional print journalists must contend with amateur and editorially unrestrained reporters who have virtually unlimited broadcasting reach.

          Porter has an interesting take on this. It turns out, he argues, that authority has always been based on peer-to-peer contacts and recommendations anyway. A teacher, librarian, mentor, parent, older sibling, etc. (i.e., a ‘senior’ peer), recommends Encyclopedia Britannica to us, for example, and so it begins to rise in our esteem. Successful use of the resource (in the form of correct answers, productive research pathfinders, etc.) reinforces its reputation, but the initial moment of trust still starts from a personal recommendation. Porter suggests that peer-to-peer networks simply replicate the way reputations are established in the real world: they are built up over time, and nurtured through repeated demonstrations of reliability and confidence-building testimonials from other users (who have established their own authority in similar ways).

          eBay is good example of how non-centralized authority emerges on the Internet. Customers rate the quality of their experience doing business with other eBay customers, and, over time, the most reliable trading partners are rewarded with the highest ratings. Similarly, misanthropic users earn low ratings, and find their ability to exploit the system diminished over time (i.e., as other customers learn to avoid them).

Identity Management

          Based in part on (or at least in agreement with) Dick Hardt Identity 2.0 conference presentation, Bisson believes that enhanced identity management (including ranked reputations and network authorizations) will be the “next big thing” in knowledge management. Current identity management (which he termed “Identity 1.0”) is enterprise-centric, forcing users to develop and manage their identities within a single domain such as Amazon or eBay. The emerging “Identity 2.0” model should replace this with a portable, more user-centered design. It will resemble the state-issued driver’s license, providing proof of identity and authorization across multiple domains of society and business.

          Improved identity management is to key to providing the kind of easy full-text access that our patrons have come to expect. If identity could be established centrally, and made fully portable across Web sites and databases (i.e., as envisioned in the Identity 2.0 model), then users could move from search engine to bibliographic citation, to full text in just a few clicks of the mouse, rather than the many clicks now required to penetrate link resolvers, proxy servers, and database authorization screens.

Hacking

          Based on how the term was being used in this context, I would define “hacking” as “the improvised, incremental improvement of services realized through manipulation and remixing of pre-existing technologies.” The incremental improvements aspect was articulated by Joshua Porter as “iterative design”. As Porter sees it, the most innovative Web developers are willing to make mistakes and break the rules. For example, the HTML on Google’s Web site doesn’t actually validate, and Google tools frequently get released (and often remain for long periods) as experimental beta versions. Successive drafts of imperfect content are released, and then, through a mutually-beneficial feedback loop with the public, continuously improved and refined. The end user essentially becomes a co-developer of the product or service, and ensures that new ideas and genuine user needs are always taken into account.

          This is not necessarily the way things work between libraries and their system vendors. In the case of OPACs, for example, development plans and source code are closely guarded secrets, and the library is often reduced to begging for enhancements that, in a more open environment, it could have helped develop. Librarians make a similar mistake when they neglect the open access and open source communities that have transformed other sectors of the information economy.

          Bisson and Grant both quote Roy Tennant, referring to new OPAC enhancements as merely “putting lipstick on a pig.” In other words, the core technology is ugly on a deep level, while much-heralded improvements tend to be merely cosmetic.  Bisson spends much of his time at Plymouth State using LAMP technologies (i.e., Linux, Apache, MySQL, and PHP) to work around the limitations of his OPAC. For example, Innovative Interfaces neglected to provide a billing-history feature, so Bisson found a way to bring it in using LAMP. Despite the various hacks, however, Bisson worries that his and other Library sites still barely meet the expectations of users coming in from Google or Yahoo.

Joshua Porter points to Paul Rademacher's Web site, housingmaps.com, as an example of good hacking. Rademacher combined the RSS feed from Craigslist with the API for Google Maps to create a different and very useful new service. The former resource provides Rademacher’s page with current apartment listings, while the latter precisely indicates each address with ‘pins’ pushed into a color-coded map.  Thus, in a single Web site, the user receives up-to-date housing availability, directions, local maps, and even, if so desired, (through the magic of satellite imagery) the topography of the land.

But how do we know what our users actually want from us? Typically we design surveys, conduct interviews, assemble focus groups, analyze transaction records, or use other standard measures. David Lindahl, by contrast, has been working with anthropologists over the past 10 years, using ethnographic techniques to get this information. Based on his findings, and frustrated with the limitations of his vendor-supplied ILS, Lindahl developed his XML-based CUIPID (Catalog User Interface Platform for Iterative Design) for the University of Rochester Libraries. As part of the design, Lindahl exported the library’s MARC-encoded bibliographic data into XML, and supplemented the holdings with maps of where to find indicated items in the stacks. He also implemented the OCLC FRBR collocation tool, along with a home-grown metadata search engine called SARA (Search and Retrieval Application). While he appreciates the link resolving power of SFX, he decided to keep the native interface and most of the steps hidden from he user (since they had failed usability testing). The suppression and automation of SFX processing is managed through his ColdFusion application “GUF” (Getting Users to Full Text), which moves the user from citation to article text in about 2 mouse clicks (i.e., the standard apparently set by Google).

“Weaving the Library into the Web”

          The challenge for catalog (and other) librarians is not to defeat ‘Amazoogle’ (i.e., Amazon, Google, eBay, Yahoo, etc.) at its own game; and in any case, this would be futile. Rather, librarians need to learn how to play the game better, and leverage their traditional expertise along with the emerging technologies in a way the best serves their users’ needs. If libraries can do a better job of exposing their contents on the Web (e.g., by converting OPAC data into XML, exploiting OAI MHP, incorporating user-supplied book reviews, etc.), the real value they have to offer will cause their Web sites to rise up the list of search engine results. This is the nature of the “Google economy”, where citation analysis (in Google’s case, its proprietary PageRank technology) is the single most important factor in getting resources discovered and used by the general public. 

          Weibel notes that Wikipedia articles increasingly rise to the top of search engine results for the simple reason that many Web authors have linked their pages to them. According to Weibel, Wikipedia is now the 10th or 11th most frequently visited site on the Web. With Google’s PageRank formula, the number of external sites linking to your document largely determines how highly it will be ranked in retrieval. A corollary of this is that libraries should encourage as much external linking to their resources as possible in order to increase their search engine visibility (the importance of better identity management comes into play here). OCLC is right to promote “weaving libraries into the Web” as vigorously as possible.

          So, rather than denounce the migration of knowledge seekers to Wikipedia, even at the expense of more authoritative sources, says Bisson, librarians should harness the same tools and principles that have made Wikipedia a success, and make sure that the very best libraries have to offer is at least getting into the game.

The bibliographic utilities are pointing the way toward greater Web exposure through Open WorldCat, RedLightGreen, and other Web-savvy initiatives. Joshua Porter reminds us that the basic principle of the usability industry (of which he is a part) is this: the only thing that matters is use. You can have the most brilliant web page with all the right answers on it, but, if no one actually uses it, it’s worthless. Librarians need to keep this in mind, and increasingly they do. In Germany, for example, Die Deutsche Bibliothek (DDB) has teamed up with the German Wikipedia to provide biographical articles that are highly dynamic and accessible (thanks to Wikipedia) while also providing rigorous cross-referencing and disambiguation (thanks to DDB’s national authority file). (See press release). Our highly structured bibliographic data is extremely valuable, but we need to learn to exploit them better. Or, as Lorcan Dempsey puts it, we need to make our data work harder.

Information seekers tend to follow the path of least resistance. As Stuart Weibel says: “Free and easy beats free and good in the marketplace for the most part.” What distinguishes Wikipedia from traditional library resources is not only that it’s free, but also that it’s easy. Moreover, in terms of authoritative information, for most people most of the time, Wikipedia is good enough.

Peter Bell worries (along with Clay Shirky) that library ontologies fixate too much on how books are arranged on shelves, rather than how ideas naturally relate to one another.  (Shirky argues that our attachment to ‘obsolete’ ontologies like LC Classification holds us back. Instead, he suggests, we should be embracing social tagging.) Bell proposes (not surprisingly) adoption of his company’s product, i.e., Endeca’s faceted navigation tool. A large number of institutions have implemented it (including Barnes & Noble and the U.S. Military), but not, so far, any libraries. [Incidentally, given his company’s use of faceted terminology, I asked what he thought of the OCLC FAST project. He replied that he believed many of the FAST terms are hopelessly ambiguous after having been broken apart from their native LCSH syntax. Perhaps a true thesaurus (vs. decomposed subject heading list) would lend itself better to clean faceting? I don’t know what kind of thesaurus (if any) is used by Endeca.]

Carl Grant also thinks libraries should do more to weave themselves into the Web. He also suggests that we leverage those skills in which we have the greatest expertise and competitive edge. For example, providing virtual reference and referral, on a fee-for-service basis, assisting commercial book stores that are otherwise ill-equipped to bring books and readers together. He also advocates librarians getting more involved in NISO standards, and taking a higher profile role in the development of digital repositories. He believes we could develop a more efficient business model, too, whereby instead of simply locking visitors out of our resources, we could find ways to charge fee-for-service to anyone at all (i.e., even those without Yale Net ID). This would also increase the number of Web sites which would link to us, and therefore increase our Google ranking.

Moreover, says Grant, rather than spending large sums of money on highly-customized OPACs of their own, libraries could customize their interfaces to WorldCat, and allow patrons to use the advanced features only an OCLC-sized database could provide. With money saved from OPAC implementations, librarians could get more involved in collaborative educational software such as Sakai and Blackboard. Furthermore, given OCLC services such as Open WorldCat and Find-in-a-Library, we should find ways to complete the browsing experience and make it possible to deliver items (whether through ILL or, in the case of rush purchases, commercial partnerships) at least as quickly as is done through Amazon or Barnes & Noble. At the same time, librarians could take their relationships with vendors more seriously, and work together, as Andrew Pace has suggested, to create a common professional vision and development agenda. Grant’s impression is that library directors and the ALA currently lack the will to collaborate on that level. He speculates that OCLC, however, might be showing the way forward.

Grant is also a big believer in visualization tools. 2D browsers such as Grokker have become somewhat commonplace. 3D browsers are on the horizon, though, and promise a profound change in the way users seek information. One prototype in particular singled out by Grant is Croquet. Apparently based on gaming industry technology, Croquet allows the user to manipulate an avatar (i.e., an animated user surrogate; in this case, a giant rabbit) on the screen, and navigate through a virtual landscape of visualized research tools and databases. (The assumption seems to be that game design is highly intuitive, and that a crossover of gaming techniques into Web browsing would improve the average search experience.) The avatar creates the illusion that the user is physically inside the computer and able to explore the digital resource, making the idea of ‘virtual browsing’ more than just a euphemism for an on-line shelf list.

          Stuart Weibel recommends study of the OCLC Environmental Scan for 2004 (i.e., once it is published. At the moment only the 2003 edition is available), which shows that libraries are continuing to lose market share (in terms of the Google or the ‘attention’ economy) to non-library information providers. He suggests we consider the Curiouser prototype, which is based on Open WorldCat, but also incorporates bookmarklets, OCLC’s FRBR work-set algorithm and  FictionFinder. Weibel agrees with Grant that 3D browsers and avatars are good ideas, helping create a sense of presence on the Web, and helping “wrap community around the content” and “content around the community.”

Amy Benson’s review of metadata trends (“Metadata 101” at Yale) throws additional light on how libraries can be woven into the Web. In addition to FRBR implementation, which promises a more intuitive catalog display, the increasing precision of unique identifiers is helping bring library services in line with the Web 2.0 vision. Important identifiers include URLs (Uniform Resource Locators), PURLs (Persistent Uniform Resource Locators)  OpenURLs (essentially, a platform-independent, portable search query), DOIs (Digital Object Identifiers: commonly used for management of intellectual property rights), and ISTCs (International Standard Text Work Codes, similar to ISBNs, but representing works rather than manifestations). Unique identifiers used in conjunction with the increased precision of the FRBR model will make machine manipulation and Web 2.0 processing more efficient and powerful than is possible today.

The Role of the Metadata Librarian in the Internet Age

Cornell committed itself in 2001 to establish and operate a “consulting to production” metadata unit. Marty Kurth, formerly head of Cornell’s Catalog Department, become head of the new unit, and took with him one professional cataloger, one assistant, and one serials specialist. Recently and enthusiastically, he hired a full-time information technologist as well. The service model includes 5 components: (1) Digital media; (2) Metadata; (3) Copyright; (4) Technology support; and (5) Electronic publishing. According to their mandate, his group “provides metadata consulting, design, development, production, and conversion services to Cornell’s faculty, staff, and community partners to increase the value of their digital resources.” 

Kurth made an interesting distinction between traditional cataloging and newly developed metadata services. The former typically begins with an item in hand, and then applies to that item predetermined descriptive standards, subject analysis tools, and encoding schemes. The latter, by contrast, starts at the project level, and only then, based on project specifications, user requests, and interoperability requirements, selects or designs the standards and encoding that will be suitable for that particular case. The different workflows are “mirror images” of each other, however, and Kurth sees the two as sharing the same conceptual foundation.

He suggests, in fact, that the role of the catalog/metadata librarian is the same as it’s always been. We provide the pre-conditions for reconciliation (or semantic interoperability) among different disciplines’ representations of knowledge. In other words, catalog/metadata librarians serve as conceptual translators from one academic discipline to another, reconciling vocabulary and facilitating interdisciplinary research. Metadata, in turn, forms the connective tissue that makes translation, reuse, mapping and transformation possible. This is what NISO calls for in its Framework of Guidance for Building Good Digital Collections, namely, that “Digital objects, metadata and collections are building blocks for reuse and integration.” I believe Amy Benson would concur. As she describes them, XML, Dublin Core, OAI MHP, and other metadata innovations have opened up vast new opportunities for organizing information. The basic theory and practice of cataloging, however, remains the same: we provide building blocks for semantic interoperability, schematic transformation, and the exchange of knowledge across borders and disciplines.

Conclusion

          Given the unprecedented wave of disruptive (yet wonderful) technologies surging through the workplace, many professions are being forced to reassess the nature and value of their services. Librarians are hardly exempt from this collective soul-searching. Google poses a particularly acute challenge to us, given its explicit mission “to organize the world's information and make it universally accessible and useful” (which sounds suspiciously library-like), and a proven track-record of injecting ‘killer apps’ rapidly into the marketplace. Fortunately, however, and contrary to the half-joking title of the NELINET program, there is no real competition between Google and the OPAC. Google has an army of computer scientists and engineers, and, as of November 17th at least, a market capitalization of $118 billion dollars.  Moreover, given the introduction of Google Scholar, Google Book Search (formerly Google Print), Google Maps, Google Directory (based on dmoz) and now the free digital content management system Google Base, it’s no wonder some of our colleagues have started to panic. Fortunately, as most of program speakers would agree, there’s no need to fight Amazoogle, which would in any case be a losing battle. Rather, these highly innovative companies are developing tools that, when implemented in our local systems, will help us do our jobs much better than ever before. If we can learn to market our skills better than we do now, Amazoogle will have much to gain from us as well.

          The important thing is for us to become more engaged in communal Web development work. We have much to offer in the areas of subject analysis, FRBR data modeling, digital resource management, controlled vocabularies (or ‘ontologies’), highly structured bibliographic data, and a knowledge and love of reading. Just as importantly, we need to absorb the best of the tools and technologies that the Web has to offer. These include blogs, wikis, social tagging and folksonomies, identity and reputation management tools, 3-D browsers, unique identifiers and the LAMP set of programming/hacking instruments. If we can take a greater role in developing Web applications, and learn to trade expertise more freely with members of other knowledge-based communities, then a new golden age of librarianship might be just on the horizon.

Top


This file last modified 07/05/06