DPIP Priorities:
Needs, Challenges, Issues
Workflow for digitization
projects
Katie Bauer, David
Walls
December
20, 2005
The Problem
The Library currently has a dispersed set of discrete digitizing projects conducted by staff with a wide variety of other duties. Library selectors, curators, and faculty approach the various digital groups who begin scanning projects without the necessary infrastructure support to track library materials from the shelf to the production unit, do quality control, and have the finished product cataloged and made available for use.
One example unit on campus illustrates some of these issues:
They have in place: a person with good scanning and technical skills, some
programming support, high-quality equipment, a subject selector with a strong
grasp of her subject area and her collection, and enthusiastic support of the
department head. And yet this group encounters real hurdles to
accomplishing their goal of creating a digital library. It takes them
months to move from selection to creation of a publicly accessible digital
object. One problem is that there is no process in place for
tracking the workflow, so often there is no way to tell where an object is in
the system. Another problem is that the fragmented nature of people's
jobs, where digitization is just one of many duties, causes delays. This small example shows some of the problems
inherent in creating digitization workflow. Without a critical mass of
material to process, it is difficult to build staff expertise and an
efficient workflow. Without the necessary infrastructure support (e.g.,
software to track the project) it is almost impossible to manage.
There is little or no sharing of information among library groups who are conducting digitization programs. The cost of internal versus outsourced scanning is largely unexplored. Few if any of our library digital groups have the mandate to consult with each other about their production methods, best practices, or work with outside vendors. The finished scanning work represents a growing critical mass of essentially unique, one-off projects created with no established standards for digitization or archiving, no large-scale storage or repository in which to store them, and no direct means of accessing digital content.
A second factor that works well for Cornell and Harvard is the presence of a fairly large group (eight person staff in both institutions) who are dedicated to producing metadata for digital projects. This work is supported at Cornell by a programmer, and they thought this was key to their work. (The head of systems at Cornell was careful to point out that having programmers attached to other units, such as a metadata unit, with little or no connection to other systems staff does not work well. At Cornell this is dealt with by having the programmer attached to the metadata unit physically present in the systems offices.) Harvard actually has more metadata specialists dispersed through different departments.
At the Open Collections Program at Harvard a workflow has been established that centers around imaging services, which reports to Preservation. Work is divided into four functional areas: physical selection and coordination; physical preparation and descriptive metadata (done at the same time); digital imaging; and finally the repository. In the third area, they scan the document, create the OCR copy, and create a METS document and a repository XML record. Outside vendors who do work for them are required to produce the XML records as well.
A key divergence between Cornell and Harvard seems to be that Harvard employs a more decentralized workflow than Cornell. Separate units do work, although much of it eventually moves through the impressive digitization space in Widener Library. The difference has seemed to be that Harvard has a very large pot of money that is allowing them to do more digitization in more places. Cornell has had to create a more efficient and centralized system because they need to recoup costs through charging back.
· The existing, well-established selector/curator bibliographic preservation review and process in existence should be supported as the starting point for digital projects.
· The work of the various digital production units should be shared and coordinated and a set of best practices and methods for scanning should be developed. Hire a qualified project coordinator to function as a single point of access for starting digitization projects at Yale, and who will have or develop expertise that will allow projects to be planned and executed efficiently.
· The process of digitizing, cataloging, and creating access to digital works should be integrated into the same Acquisitions, Preservation, Catalog, and Access Services Departments. Giving these existing departments the resources they need to handle digital content will create a stronger and more integrated digital library than a group of separate library units created to manage digital resources.
· Create a group with links to these departments, but with cohesive structure and singular purpose (digitization) [an alternative to the above]
· A metadata production group should be created within the Catalog Department and given the support needed to function as the sole source for assigning metadata elements to all digital production projects for Yale University Library. Support should include programming support. For programming to function most effectively the programmer should report to ILTS.
· Orbis should be used as the database of record for digital reformatting projects for cataloged items. The 583 MARC field should be implemented to create a means of tracking the selection, scanning, and completion of digitization projects.
· The Library needs a system for tracking the digitization process. We recommend that the Library purchase software (or develop something simple in house) to track the movement of an object through digitization. Such software would enable us to know where an object is and to track the time it takes to complete the discrete parts of a task.
· Orbis item type codes should be rigorously assigned to all digital content on fixed media (compact disks and DVDs) to simplify the future identification of media that will need to be migrated or refreshed.
· Outsourcing digital scanning production should be more thoroughly explored to determine if and when this would be most cost effective.
· Efficiencies of scale will only be achieved if we can build a high-volume digitization program. However, to do this the library must first establish standards, expertise in applying those standards and an efficient workflow. In the beginning of DPIP’s program a way to do this will be to work on a small number of large, fairly homogenous projects. This will let us develop good workflow practices for a set of material. After expertise is established we may be able to establish workflow procedures and to ensure that a large volume of work is moved through by instituting a policy that makes it standard practice to digitize more items. Such a strategy might be to decide to digitize material that we must process for some reason: interlibrary loan requests, course reserve material, Eli Express materials from the LSF would all be examples of times when we need to process materials anyway.