PLEASE NOTE: This is an archived document! It is of historical interest only and does not necessarily represent current Yale University Library practice. For other archived documents, go to: Archived Cataloging Documentation. For current documentation, go to: Cataloging at Yale.


Proposal for Automated Authority Control Services
for Yale University

From: OCLC
Date: 15 September 1994


OCLC Vendor Information

OCLC History

OCLC Online Computer Library Center, Incorporated, is a not-for-profit membership organization that provides its members with computer-based products, services and systems designed especially for libraries and other educational organizations. Comprised of over 18,000 affiliated libraries in the United States and 52 other nations, OCLC is the largest library information network in the world.

OCLC originated in 1967 when 54 Ohio college and university libraries formed the Ohio College Library Center to develop a cooperative, computerized regional library network. OCLC's bibliographic database, the Online Union Catalog, began its operation in 1971. Today, the database contains over 30 million bibliographic records. Initially supporting only computerized cataloging, the Online Union Catalog now supports more than 60 related services, including online reference databases, interlibrary loan services and retrospective conversion.

In 1981, OCLC changed its name to OCLC Online Computer Library Center, Incorporated. Currently, approximately 800 employees work at OCLC headquarters in Dublin, Ohio.

Return to Top

Proposed Product Overview

Automated Authority Control Service

The Automated Authority Control Service automatically matches and replaces text for headings found in bibliographic records with the correct heading text found in the extended authorities databases for personal and corporate names, LC Subject headings, series headings and Medical Subject headings (MeSH). Personal and corporate names-as-subjects are also treated. Correction algorithms are applied to headings extracted from bibliographic records. The corrected text then replaces the old text in the bibliographic record. Corrections made by the software that are determined as high risk are manually reviewed.

OCLC's automated correction service offers a number of options for the end user. Three file processing options are available: (1) one-time processing, (2) periodic processing, or (3) both. Users may choose to have all the heading fields corrected, or based on a pre-analysis of their data, have specific heading types corrected. Output options include replacement records with retained local fields, or transaction records with old, changed, and new fields. Records or transactions are available via tape or FTP (File Transfer Protocol). Users who choose to receive bibliographic records may further specify if they would like to receive all bibliographic records or only bibliographic records which contain changed headings.

Although market research shows that most libraries prefer replacement records with retained local data, receiving transactions records does not require the libraries local database to remain static during the time OCLC is correcting the headings in their bibliographic records.

Clients have the opportunity to profile this service to fit their specific needs. Pre-processing options allow institutions to specify the obsolete tags they would like changed, as well as the unwanted fields they wan deleted from their bibliographic records.

In addition, each institution has the option to receive one copy of each Library of Congress (LC) authority record which matches each unique heading found in its bibliographic records. Institutions may also receive updated LC authority records when changes are made to authority records they have already received. They can also be able to select various other printed reports for split headings, headings which are not in the LC authority file but are changed by OCLC correction software, changes made to LC records, and headings which could not be corrected.

Tapes that are sent to OCLC become the property of OCLC and will not be returned. Upon completion of processing, OCLC will output either all the bibliographic or the changed bibliographic records in OCLC-MARC format onto new tapes for delivery.

Return to Top

OCLC's Experience with Automated Authority Control Projects

In 1992, OCLC entered into a contract with Harvard University to correct headings in its bibliographic records. These headings included headings for personal names, corporate names, series, LC subjects and MeSH encompassing 2.2 million standard records and 800,000 provisional records. In addition to correcting headings in bibliographic records, OCLC furnished LC authority records for headings found in their bibliographic records and continues to send updated authority records. At the conclusion of processing Harvard's basefile (3,063,276 records), OCLC corrected 139,888 corporate name headings, 298,885 LC subject headings and 335,580 personal name. There were 485,422 LC authority records which were generated from the 774,353 total heading corrections.

In addition to corrections made to bibliographic records from Harvard University, OCLC applied the heading corrections to the Online Union Catalog. The corrections for corporate names were applied in May 1993 and resulted in the correction of 1,106,056 headings in fields 110, 410, 710 and 810. Corrections to LC subject headings were completed in December 1993 and 1,988,895 headings were corrected (6XX, second indicator 0). Personal name corrections were completed in April 1994, and resulted in 2,529,284 heading corrections for 100 and 700 fields. The total number of corrections applied to bibliographic records is 5,624,235.

The correction of series (4XX and 8XX) and MeSH (650, second indicator 2) headings for Harvard and Online Union Catalog bibliographic records began in the 1st Quarter 1995.

OCLC plans to enhance its correction software to correct conference names, uniform titles, and name/uniform title headings. These three heading types, based on a 1% sample of the Online Union Catalog (OLUC), comprise 13% of the total headings in the OLUC. Note that name portions of name/uniform title headings are corrected by the personal and corporate names software.

For conferences and uniform title headings OCLC plans to include algorithms that would correct general stylistic, typographical, and subfielding errors. In addition to general corrections, OCLC anticipates developing algorithms to correct other types of errors unique to conference and uniform title headings.

At present, OCLC is working with the Library of Congress on a joint research project to correct uniform titles and the title portion of name/title headings.

OCLC also plans to develop software to correct LC Children's Subject headings.

Return to Top

Additional Services

In addition to correcting headings, OCLC's automated authority control service corrects internal and terminal punctuation to replaced headings. Institutions may also specify field tags they would like deleted from bibliographic records and select obsolete tags for conversion to current MARC practice.

Institutions may also select the conversion of local subject headings (650, second indicator 4, to LC headings 650, second indicator 0) prior to the correction processing.

In addition to the correction processing for series headings described in the section titled "Heading Correction Algorithms," an institution may override LC non-traced decisions if local decision is to trace.

Return to Top

Unique OCLC Features

OCLC's correction files contain the variant forms of headings that occur in the Online Union Catalog, which are linked to the appropriate authorized headings. Because the Online Union Catalog is so large, the correction file contains millions of variants, greatly boosting a library's heading match rate and drastically reducing the need for human intervention to resolve nonmatched headings.

OCLC views heading correction processing as having two goals. The first goal is to collocate variants and correct them to an authorized LC heading. In the absence of an established LC authority record, the second goal is to collocate variants under a preferred heading form from the OLUC. Once heading variants are brought together, retrieval is improved, and if an LC authority record becomes available in the future, it becomes an easy matter to globally update the preferred form from the OLUC to the newly authorized form of the heading.

To ensure quality correction databases, staff from the Database Quality and the Online Data Quality Control Sections conduct sampling on a regular basis.

Return to Top

Personal name corrections

The software used to correct personal name headings is able to gather widely varying forms of the same heading, thus replicating the intellectual process of a cataloger who is establishing a personal name heading and further reducing the need for human intervention. The software's decision that varied heading forms represent the same individual is based on weighting of attributes in related bibliographic records.

The attributes that are used for linking personal names include: (1) the dates when an author began publishing, (2) the language(s) of the materials, (3) the country(ies) where the materials were published, (4) the subject(s) of an author's works, (5) the relationship(s) of the author to the publication (e.g., editor, illustrator), and (6) titles by the author.

For personal names, corporate names, and series headings, there exist two situations in which OCLC's authority control processing does not strictly conform to AACR2 practice.

In situations where a Library of Congress authority record is established in a pre-AACR2 form (Rules: a), OCLC does not override the authority record with an AACR2 form that may occur in the OLUC. The second situation arises for personal and corporate names that are not represented in the LC Name Authority File. In this situation the OCLC software selects a preferred heading from the OLUC. The software's choice of a preferred heading form from the OLUC is based upon a weighting scheme that takes into account cataloging source, descriptive cataloging rules, and number of occurrences of a heading form in the OLUC.

In some cases, the most preferred heading chosen by the correction software does not conform strictly to AACR2 guidelines. The software occasionally selects a preferred heading form that differs from the form that would have been selected by a trained NACO librarian as the proper AACR2 form. In the following example, the results of the weighting scheme caused the software to select the heading without a qualifier, when a NACO librarian would most likely have selected the qualified heading.

OCLC preferred form: Baravelli, G. C.

AACR2 established form: Baravelli, G. C. $q (Giulio Cesare)

Return to Top

Geographics

Another advantage of OCLC's correction service is the currency of its geographic correction file. OCLC staff have created correction headings which are more up-to-date than the Library of Congress (LC) authority headings for Russia and the former Czechoslovakia.

OCLC's geographic correction file was built to provide users with the most current geographic headings. In some cases, specifically for Russia and the former Czechoslovakia, OCLC has anticipated changes to LC authority records. OCLC has based this decision on end users' needs to access materials reflecting current political boundries.

OCLC does offer institutions the option of selecting LC headings rather than OCLC's enhanced-geographic headings.

LC established form: Myski (Kemerovskaia oblast, R.S.F.S.R.)
OCLC preferred form: Myski (Kemerovskaia oblast, Russia)

LC established form: Slany (Czechoslovakia)
OCLC preferred form: Slany (Czech Republic)


©2001-2007 Yale University Library
Site URL: http://www.library.yale.edu/cataloging/archives/
Top of Page