EXPANDING THE ROLE OF AUTHORITY FILES
IN THE ARCHIVAL CONTEXT
 
Richard V. Szary
Smithsonian Institution
Office of Information Resource Management
 
Paper presented November 1, 1985
Annual meeting of the Society of American Archivists
Austin, Texas
 
 

Recent developments in the evolution of automated systems for the description and retrieval of historical materials have raised a number of issues stemming from the presumed capability of these systems to act as a vehicle for the interchange of information amongst repositories. In particular, the publication and use of the USMARC format for Archives and Manuscript Control (MARC/AMC) has brought to the fore questions dealing with standards and authority control. MARC/AMC provides a structure within which archival repositories can produce standardized bibliographic descriptions of their holdings that can serve as the basis for shared catalogs and other finding aid systems and other cooperative efforts. The success of such efforts, however, requires a commitment to standards and consistency that goes beyond adherence to MARC/AMC; compatible vocabularies and shared descriptive standards for use of MARC/AMC are also required.

MARC/AMC, as part of a larger system of bibliographic description standards has also increased the possibility of cooperation with libraries and other non-archival repositories (such as museums). In some cases, especially for those archival repositories administratively responsible to libraries, the compatibility of the AMC format with other MARC formats already in use may preclude administration support for the development of separate archival systems. Archives-library cooperation, whether voluntary or forced, however, inevitably leads to the examination of what such cooperation implies in the way of descriptive standards and consistent vocabularies. Archival institutions exploring or undertaking such cooperative ventures are faced with a relatively coherent and accepted base of library practice and authorities (such as AACRZ, LC Names, LCSH) that they much react to.

MARC/AMC has also opened the possibility of building automated archival systems on the foundations of existing MARC-based systems. While the designers of these systems have built them for the cataloging and retrieval of bibliographic descriptions of library materials, it is becoming clear that they can, with some modifications, handle bibliographic descriptions of historical materials as well.

The existence of the MARC format, the possibility of cooperation with non-archival repositories, and the potential usefulness of existing library-oriented bibliographic systems have all focused discussion on a particular aspect of archival practice: bibliographic description. The MARC formats, by nature and intent, are standard structures for descriptions of materials; library practice has always been based on the description of bibliographic entities; existing library-oriented system are designed for direct retrieval of bibliographic records. If one accepts the proposition that traditional archival practice is, or attempts to be, provenance-based rather than bibliographic-based, then the current direction of discussion concerning authorities and standards is redirecting archival thinking about arrangement, description, and retrieval away from its focus on the activities of the creating entities as the primary access point, and towards direct indexing of the content of historical materials.

The danger posed by this development stems not from its inherent nature - for any progress in exchange and wider access depends on standard bibliographic descriptive structures and practices - but from the fact that they address only one-half of the archival approach to the description and retrieval of information about materials. Exclusive reliance on bibliographic description and associated content indexing will improve access to archival materials, if only because of the limitations of current archival practices and methods that tend to be highly individualized and unstructured. If description and retrieval systems are going to exploit the power of the provenance approach to historical materials, then they need to address the question of recording and providing access to the other half of the archivist=s base of information - knowledge about the history, structure, and activities of the persons and organizations that created the materials. This paper will argue that it is possible to address those questions by expanding the role and structures of authority control as currently practiced.
 

The Role of Authority Control in Current Bibliographic Practice

Librarians have justified the expense of implementing and maintaining authority files that control the selection of headings appearing in their catalogs by appealing to the stated purpose of the catalog - providing access to the bibliographic items the catalog records describe. The library catalog must support two major approaches to the bibliographic items - know item searches, in which the user wishes to determine whether the catalog contains a records for a particular work that he or she already knows exists; and category searches, in which the user wishes to find the records for all works having a common characteristic (author, subject, title, color, etc.). The success of both types of searches depends directly on whether the library has used a standard set of headings in building the bibliographic descriptions in the catalog and on how accessible those headings are to the users of the catalog.

Both cataloging staff and researchers require accessibility to headings, although from different perspectives. Once a cataloger has chosen a concept as an access point for a particular bibliographic record, he must be able to determine what heading should be entered to represent that concept. Similarly, once a researcher has chosen a concept by which he wishes to search the catalog, there must be a way for him to identify the heading that is used in the catalog to represent that concept. The first way that authority control can enhance this accessibility is by directing users from headings that represent the concept differently from the way established for the catalog to the preferred heading (e.g., Cars - known as Automobiles). To do this, the authority record for a particular concept or entity must contain the established form of the heading that represents that concept and a list of non-preferred forms.

A second way for authority control to enhance accessibility of headings follows from the nature of human thought, conceptualization, and language. The entities and concepts that can be used to describe bibliographic works are not isolated from each other in the real world, but form complex networks of relationships. These relationships between concepts and entities can take many forms, including broader than/narrower than, successor/predecessor, includes/is included in, uses/is used by, among others. Assuming a standard nonclassified library catalog in which the headings in the catalog represent a specific concept directly rather than through some type of taxonomic structure, the success of the users=s search may depend not only on whether the term he searches for is an accepted term in the catalog, but on whether the user knows or can determine what related headings may also be used in the catalog.

Authority control provides the mechanism for recording and retrieving these types of relationships, assisting catalogers in selecting the most appropriate heading for the catalog record and researchers in defining their search strategy so that it is compatible with the headings used in the catalog. The authority record for a concept or entity has provisions for identifying related headings by listing those headings and specifying the nature of the relationships.

A final function of traditional authority control enhances the accessibility of headings in a less direct way. The circumstances under which a particular heading is used and the rationale for determining those circumstances may not always be obvious to the cataloger or researcher. In these cases, the user needs an explanation of when the heading is appropriate and why that decision was made in order to make a correct determination as to the proper use of the heading. To support this function, the authority record has provisions for including notes, references, and explanations.
 

The Underlying Function of Authority Control

The previous section of this paper explains the current role of authority control in bibliographic systems in terms of the specific functions it attempts to perform. In addition, it placed those functions into a context where they become special cases of a broader function that authority control seeks to serve - access to access points (headings). To state it directly:

The function of authority control is to enhance the accessibility of access points used in the retrieval system to the system=s users. References from non-preferred to established headings, references from a heading to related headings, and explanatory notes on selection and usage are three specific ways in which authority control attempts to fulfill this function in traditional library catalogs. Any examination of the application of authority control to descriptions of historical materials, should be examined in this broader context rather than simply as a means of establishing consistency amongst headings.
 

Applicability of Search Types in Standard Library Catalogs to Catalogs of Historical Materials

Since the role of authority control in the library catalog is so closely tied to the purposes and functions of the catalog, it is necessary to examine whether those purposes and functions are sufficient to support retrieval of historical materials as well. Of the two types of searches that the library catalog is expected to support - known item and category - the second is far more appropriate to searches in catalogs of historical materials. Known item searches are appropriate in library catalogs because the materials described there have usually been published or otherwise distributed, and it is not ususal for the user to have a citation or other prior knowledge of the materials. In conducting a know item search then, the user is expected to arrive at the catalog with an access point already in hand. The role of authority control, in these cases, can be limited to directing the user to the established form of the heading which he brings to the catalog.

Known item searches are much less important in searching catalogs of primary historical materials because their existence is less likely to be known to researchers through citations and other published descriptions. Not only does the researcher not know of their existence, but he is often unaware of the creators, forms, and other circumstances surrounding the creation of the materials, circumstances which may be described by headings that can be used to search the catalog.

Category searches, as supported by library catalogs, are more useful in retrieving descriptions of historical materials, but still do not satisfy the complete range of retrieval requirements. Category searches are commonly limited to titles, authors, and subjects. Title category searches attempt to locate all editions of a work bearing the same title, and as most historical materials do not have titles in the same sense as published works, title category searches are seldom useful. Author category searches have an exact analogy to creator category searches in historical materials, when the user already has the name of the creator. The two most common instances of this are in biographical studies, when the user is interested in documenting the history and activities of a particular person or organization; and when the user has a good understanding of the subject area that he is trying to document and knows the name of persons and organizations likely to have been involved and created materials of interest.

Subject category searches are less useful for many historical materials, especially those that are described from an archival, provenance, perspective. Archivists have recognized that the nature of many historical materials as documentation of an activity, rather than the direct product of an activity (as a published work is ), renders direct content indexing of the subject matter of the materials as less effective than it might be for more directed materials. As a result, they profess to favor retrieval through a knowledge of the activities of the creators and a knowledge of the materials created by each creator - i.e., the provenance approach. Archivists have yet to exploit fully other avenues for category searches on the direct intellectual and physical characteristics of historical materials, such as form of material and media.

Authority control contributes to category searching, particularly in the area of subject categories, through its ability to suggest other avenues of approach, i.e., related headings. Subject categories, by their nature, are not amendable to the same type of well-defined naming conventions that titles and authors/creators are, and users are more likely to need help in determining what other subject headings in the catalog may be useful to their search, than in the other areas.

If these two approaches, known item and category searching (as defined in traditional catalogs), are insufficient to support the retrieval needs for historical materials, what other approaches are needed, and what role does authority control play in them?
 

Supporting the Provenance Approach in the Catalog

The simple answer to the question of other approaches is the provenance method of using information about the activities of creators and the characteristics of materials to identify likely access points that can then be used to search the catalog. Current practice rarely extends to a systematic, structured, and accessible base of knowledge about creators and characteristics. The individual reference person may be well-versed in a particular area of knowledge pertaining to the repository=s holdings (history of the institution, chronological or geographical area of history, etc.) and be able to translate users= requests into a form suitable for addressing the repository=s catalog. Unfortunately, however, the knowledge base from which the reference staff operates is limited, unshared amongst members of the staff, and not recorded or transferred in a systematic way. As a result, the provenance approach becomes an inexact method of providing access, with results differing dramatically depending largely on the expertise of the staff member in the area of knowledge addressed by the search request.

There is a direct analogy between the role of authority control as defined above and the translation aspect of the provenance approach - both serve the function of providing access to access points. The difference lies in the methodology and scope of current authority control practice and the provenance approach. This suggests that it might be possible to expand the structures used to support current authority control to encompass the more generalized function of providing access to access points, and in particular, to support the systematic exploitation of the provenance approach.

This expansion would need to accommodate all of the relational aspects of the provenance approach, all of the various ways in which persons, corporate bodies, geographic entities, intellectual concepts, activities, material characteristics, and other entities can be associated. For example, a person can have relationships to other persons (familial, social, legal), to corporate bodies (legal, social), to geographic entities (residence, birth place, area of research), to intellectual entities (discipline, occupation, adherent, etc.), and to activities (participant, sponsor, etc.). In other words, this expanded authority control mechanism would provide a structure for recording any aspect of an entity=s history or activity, by defining those aspects as a combination of a relationship to another entity and the type of relationship.
 

Implications of the Provenance Approach

As described above, an expanded concept of authority control would support the provenance approach to historical materials by providing a structure that could record and manipulate information about the history and activities of the entities being described. A maxim of current authority control practice is that in order to provide the user with effective access to the catalog records, the authority system specifies a Apreferred,@ Astandardized,@ Aaccepted,@ or Aestablished@ heading to be used to name the person, organization, subject, or other concept being discussed. The provenance approach, as embodied in record groups and other arrangement schemes, also makes the assumption that there needs to be one established scheme or model in which all entities fit in a unique way.

The problem with such monolithic schemes, whether they be library classification schemes, subject lists, or record groups, is that they reflect only a selected, restricted view of the reality that was responsible for the existence of the materials. In order to improve access to descriptions, they fit those descriptions into a predetermined mental model of reality and assume the user will eventually learn the model well enough to use the catalog system effectively. For example, the standard arrangement scheme in institutional archives mirrors the organizational structure, and it is assumed that the users (both staff and researchers) need to learn that structure and use it to retrieve materials.

While such schemes are easier for the repository to maintain and may be all that the catalog technology has been able to support, this monolithic classification/arrangement approach has serious drawbacks. On a practical level, it forces the user to adjust his mental model of the reality he is working with in order to use the catalog.

On a more theoretical level, these schemes fail to represent the multi-faceted and complex network of relationships that actually operated to create the described materials. The record group concept, for example, assumes that the preferred approach to the records of an organization is the bureaucratic structure in which the units that created those records operated. In fact, however, the reality of organizational activity is not reflected in organization charts or citations from legal codes. The interactions within and amongst organizations cannot be adequately described in terms of bureaucratic placement or legal mandate. In addition to the formal and informal networks of persons and organizations that actually influence and direct the course of events (which rarely operate the way charts and directives say they should), the structure itself is constantly evolving, with organizations taking on new functions, dropping others, being merged, and split apart. Even if the bureaucratic structure did reflect the functions and activities of an organization accurately, it would only be for a very narrow snapshot of time.

In fact, there is no one model that can adequately describe the reality of organizational or any other human activity. Each model is created to emphasize a particular facet or view of reality and can only be valid in that context. The record group/bureaucratic model, for example, represents a legalistic and structural view of organizational activity, but is not necessarily valid for offering a sociological perspective. Provenance, as an approach based on the knowledge of history and activities, implies multiple models based on different view of that history and those activities, into which historical materials can be placed, not a single, right model.

Each authority file, containing entity identifiers, descriptions, and relationships, embodies a particular model of reality. In order to embody the provenance approach in all its richness a catalog system would have to support multiple, potentially conflicting, models, or, in catalog terminology, multiple authority files.

The technical, procedural, and presentation aspects of multiple authority files present some difficult conceptual and implementation problems for any catalog system attempting to accommodate them. Taken to its logical conclusion, implementing the provenance approach through multiple authority files in a catalog system would imply that the catalog would contain bibliographic records and concept/entity (authority) records. Access to a bibliographic record would be provided by linking it to the appropriate concept records. The concept records would contain headings that may be used to name that concept as well as descriptions and links to other concept records. The links between concept records would represent the relationships between those concepts. Each user could choose any of the names or links established by other users (in effect, using an established authority file) or add additional names, descriptions, and links to represent conventions and relationships unique to his own model (creating a new authority file).

Given the current sophistication of most users and their retrieval needs, it is likely that a practical implementation would support a limited number of established authority files for the general user, and additional capabilities for more sophisticated users to use in creating their own authority files.

Leaving aside the implementation problems, however, what does this proposal imply for descriptive practices and the role of the repository? Archives often like to characterize themselves as the memory of their institution, but this is too often construed as applying only to their holdings, when, in fact, it is the combination of the archivist=s knowledge of history, structure, and activities of the organization and knowledge of how the holdings document them that constitutes the real memory. If one accepts that the institutional or cultural memory includes both, then it follows that the systematic recording of provenance information, so that it is easily and effectively retrievable, is as important a function for the repository as is a systematic description of holdings.

A renewed emphasis on the importance of provenance information also points up that this information is often as important to the outside researcher as any information he may glean from the holdings. Provenance information can be viewed as an important resource in itself, quite apart form its use in facilitating access to holdings. A structured database of provenance information now buried in finding aids and archivists= heads may prove to be as much of a boon to researchers as improved access to holdings.

This paper opened by identifying a potential danger to the traditional archival way of life that an exclusive reliance on bibliographic description might pose. In examining the nature of the danger, however, and the ways it might be countered, it has uncovered an equally threatening challenge to existing archival practice. In their rush to improve accessability to their holdings through more standardized and automated bibliographic description techniques, archivists should not only be alert to the dangers of an uncritical acceptance of current bibliographic description and retrieval practices, and how they potentially distort archival principles and approaches. They must also be aware of the extremely limited way in which those principles and approaches have been implemented in current archival practice. There must be a greater recognition of the gap between the theory of provenance-based access to historical materials and the reality of how it is implemented in current practice. In analyzing their activities and specifying requirements for new description and retrieval systems, archivists must avoid the tendency to build the limitations of current practice and technology into those systems. Instead, they should take a closer look and develop a deeper understanding and appreciation of the principles and concepts from which practice stems, and insist on capabilities that exploit the power of those ideas more fully.