Database infrastructure at University Archives
A report to the Metadata subcommittee, 6 October 1998

Manuscripts and Archives currently use a variety of databases and systems to describe, manage, and provide access to its collections. For the purpose of this discussion, only those systems concerning University Archives (UA) will be discussed; the long range plans of the department are to use whatever systems are established for UA as a model for Manuscripts collection information systems and other database applications.

The world as we know it

The information infrastructure as it currently exists involves three general areas of activity: the management of collection information through the creation and use of a system of complex tables and forms and reports; databases that provide item-level indexes to selected materials or are informational research tools; and user access to collection information through a variety of means—databases, EAD, the OPAC, and the WWW (It is important to note that in the case of UA, "user" access implies not only researchers but also administrative offices within the university; staff are also users of the information system, but for facility of understanding, this slight distinction will be made.)

There are several problems with the system as it exists, some of which are software driven, some access driven, and some data entry driven. Although many of the Paradox tables are linked, the system is not completely integrated; when a staff member wants to locate information about a particular collection it is often necessary to look in several different tables. A number of steps are necessary in order to have complete information about a collection retrieved or entered into the system. Not only is this inefficient, it leads to discrepancies and errors (not to mention confusion). These difficulties are magnified when a staff member must also duplicate much of that information for a MARC record and for the finding aid.

One of the biggest concerns for UA with the current system is its inability to handle the large data sets that are being generated. Paradox is designed to handle smaller databases with individual tables of up to approximately 10,000 records; UA has surpassed that number (the largest table has over 35,000 records)—and with the new records management program in place during the next four years, the number will grow rapidly. Of immediate concern is Paradox's slowness in accessing information, due to the large number of records in the target tables. Other software concerns include the scalability of Paradox, which is not feasible for future plans; the limitations of Paradox in handling large text fields; records export is mostly ASCII text in delimited fields. Some effort could be expended to have MARC records, SGML, and portions of the finding aids exported from Paradox, but given some of the other concerns mentioned here and following, not too much work has been focused on this task.

Finally, given the variety of the software used to describe and manage UA's collections, there is some obvious difficulty in integrating the various pieces of the system and making it available to users via the Web. UA would like both the mode of access to (for users and staff), and data entry of, collection information to be through a web-based interface of forms which are linked to the underlying tables. At the moment, this is not feasible.

The nature of UA's goals and objectives for the future necessitate the creation of a system that meets or improves upon all the needs mentioned above. The creation of additional information sources, such as digital surrogates (audio, video, static, etc.), and the need to provide standardized access to electronic records collections only adds to the mix and to the importance of creating an integrated information infrastructure.

The world as we want it

A small working group within the UA is currently working on a plan to develop a system that provides authority, access, and management capabilities. In short, what the group is striving for is the creation of a system infrastructure that has three functions: intellectual access (exporting data in various forms for all users—staff, administrative, researchers), management access (through forms, etc.), and input (into a common database system that supports both previous functions). The attached diagrams outline the system infrastructure towards which UA is moving. A brief summary of those diagrams is as follows:

The system that will give us the world (or, something akin to it)

To summarize the many points mentioned in the two sections above. UA wants to create a system which will serve three purposes: intellectual and management access, and data input. The system must be integrated; it must allow web access for input and query; data export should accommodate all UA needs; it should provide access to electronic records collections and surrogates; it must be a powerful system that can accommodate large data sets and queries across multiple tables; it must be a system that allows UA to complete its work efficiently and with minimal error; finally—but not the least important—it must work with existing library standards and systems.

UA has determined that it will be necessary to implement a database system significantly more robust than Paradox or Access. There are vendors such as Oracle, Microsoft, Sybase, and Informix that offer these capabilities in varying degrees of complexity. Which one will best suit the Archive's needs has not been determined, but UA is closely examining Microsoft's SQL Server in conjunction with NT 4.0 server as a product with the features that are attractive to UA and meet its needs:

UA is hoping to achieve with its new system infrastructure more than simply easier access to data, or better collection management. Rather it is creating a system infrastructure that will integrate all tasks and all access to information.

Kirsten M. Jensen
7 October, 1998


Return to: Metadata Task Group Home Page