Yale
University Library
Digital
Preservation Policy
Yale University Library (YUL) Mission:
The Yale University Library, as one of the worlds leading
research libraries, collects, organizes, preserves, and provides access to and
service for a rich and unique record of human thought and creativity. It
fosters intellectual growth and supports the teaching and research missions of
Yale University and scholarly communities worldwide.
Digital Preservation:
Digital preservation is the whole of the activities and
processes involved in the physical and intellectual protection and technical
stabilization of digital resources through time in order to reproduce authentic
copies of these resources.
Digital Preservation Principle Statement:
Yale University Library Digital Preservation Policy supports
the preservation of digital resources that are within the Library's
collections.
These digital resources are subject to the same criteria for selection and preservation
as other resources in the Yale Libraries. These decisions are made by
selectors, curators, and bibliographers as experts on the value of the content,
in consultation with the relevant information technology and preservation
experts. Digital preservation decisions are made on the basis of this Policy,
the Library's Strategic Plan, the digital resources’ enduring value and
the feasibility of the digital resources’ preservation. When possible,
decisions about the need for preservation are made at the time of creation,
acquisition, or licensing of digital resources.
Selectors, curators, and bibliographers in consultation with
technical experts must specify the preservation requirements for the digital
resource. Preservation responsibility
is retained by YUL whether the digital resource is preserved at YUL or
entrusted to an outside agency. Preservation of digital resources may include
any actions necessary to preserve continued access to the digital
material, ensure its authenticity, mitigate and/or reverse the effects
of hardware and software obsolescence and media decay.
This Policy recognizes that the maintenance and the reliable long-term access to Yale’s digital resources are supported by a preservation planning function. Research (monitoring) about technology that supports a repository and the requirements of the designated community it serves is a core activity to preservation planning, as well as outreach and education regarding policies, procedures and best practices for digital resources.
Identification of Content, File Format, Source and
Collecting Levels of Digital Resources to be Preserved:
For digital resources the decision to preserve, as noted
above, is based upon their enduring value and the feasibility of preservation
and not necessarily upon a digital resource’s content, format type, the source
of the resource or the collecting level.
This policy recognizes that preservation strategies and actions vary by these
attributes and characteristics. Below
are four tables that identify examples of content types, formats, sources of
digital resources and collecting level that are likely to be represented in a preservation
repository at YUL or entrusted to a third party to preserve upon Yale’s
behalf.
|
Content Type |
Example |
|
Text Resources |
Theses, dissertations |
|
Resources generated using office applications |
Spreadsheets, Power Point presentations |
|
Electronic records |
emails, financial
and administrative documents |
|
Simple or complex audio, video and image files |
Speech recordings, conference videos, high-resolution
images of works of art |
|
Datasets |
Census data, epidemiological study data |
|
Web resources |
Yale University Frontdoor |
Digital Format Examples:
The format landscape is ever changing as new formats emerge
overtime. Examples of digital formats
include:
|
Content Type |
Format Example |
|
Text |
ASCII (ANSIX3.4 ECMA-6, ISO646), UTF-8 Unicode |
|
Audio (voice, music) |
AIFF, Wave, MP2 |
|
Video |
MPEG, MPEG-2, AVI |
|
Image |
Gif (Gif87a, Gif89a), JPEG, JPEG2000, TIFF (TIFF4.0,
TIFF5.0, TIFF6.0) |
Sources of Digital Content include:
o
Born digital resources and digital surrogates created by
YUL.
o
Born digital resources and digital surrogates originating
external to YUL ;for example other university departments, for which YUL has a
mandate to preserve.
o
Digital resources acquired by YUL through purchase or
donation.
o
Digital resources licensed by YUL with perpetual access
and archival rights.
Collecting Levels:
Collecting
levels for digital resources vary considerably. Examples include resources
hosted at Yale, resources hosted at Yale and elsewhere, i.e., mirrored
resources, and resources hosted elsewhere that YUL makes available through
links to the hosting location. Not all levels of digital
resource collecting include a automatic commitment for preservation, although
any of these resources may be selected for preservation.
Throughout the life cycle of digital resources a
series of inter-related strategic and procedural decisions and the work of a
number of different stakeholders contribute to help ensure that digital
resources survive through time and changing technologies. The information life
cycle is a framework for understanding the cyclical sequence of activities that
all digital resources undergo during their existence. Digital resources may be
conceived; born or adopted; utilized, shaped, and molded; protected from harm.
Eventually digital resources may reach the end of their active lives and be
disposed of in some manner, or be re-born as reformatted, transformed
resources.
Although this Policy primarily focuses upon the preservation stage of the
digital life cycle, the prospects for and the costs involved in preserving
digital resources through time and technological change rest heavily upon
decisions taken about those resources at different stages of their life cycle.
By adhering to a pro-active concept of preservation management, we seek minimal
loss in digital information content, functionality, and accessibility and seek
to ensure that digital resources retain qualities of integrity, authenticity
and reliability over time.
Libraries, museums or academic departments at Yale that
provide preservation or archival services for their designated communities will
by necessity purchase large quantities of storage for digital resources over
time. The support of large scale storage is complicated and requires major
investments in technology and staff to efficiently maintain and operate a
storage management system that delivers basic services such as access, backup
and disaster recovery. This Policy recognizes that there may be multiple
archives or repositories at Yale that have independent storage systems. The
sheer volume of storage needed to archive Yale’s digital resources makes the
economic and operational management of storage an important issue for the
University.
Yale is best served when distributed and disparate systems
conform to standards and best practices that make communication between these
storage systems possible. The ability to integrate or interoperate within and
between storage systems is likely to make backup, disaster recovery and
hardware migration services less risky across all storage systems and more
economical for the University. For example, common import and export services
will enable one storage system to serve as a backup for another thereby
reducing Yale’s total investment in redundancy for these systems. In addition,
to satisfy the custodial requirements of depositors for trust and security, the
management of storage will require specialized technology, software and
management skills. To ensure that Yale makes the right investments in storage
for digital preservation this Policy recommends that the acquisition and
management of storage be guided by technical considerations that are discussed
in the best practice section of this Policy (under development). This Policy
also maintains that storage technologies must be seen as chronically
obsolescent and subject to continuous hardware migrations over time.
YUL strives to ensure the authenticity of digital resources; the mutable nature of digital resources opens the possibility for unauthorized and undetectable changes. Confidence in the authenticity of digital resources over time is particularly crucial owing to the ease with which alterations can be made. From the moment that digital resources are created or acquired, YUL undertakes protective procedures to prevent, discover, and correct loss or corruption of digital resources due to either inadvertent or malicious intent. In addition, supporting evidence, ideally in the form of metadata, must be provided to enable users to evaluate the authenticity of all preserved digital resources.
Metadata is fundamental to preserving Yale University
Library's digital resources. Preservation metadata includes a number of
different types of metadata: administrative (used in managing
information resources including rights and permissions), technical
(describing hardware and software needed to maintain an information object) and
structural (identifying the relationships between objects
such as part of, dependent upon that form intellectual entities.
Particular attention is paid to the documentation of digital provenance
(metadata documenting the history of the object and any actions taken to
maintain and provide access), and of relationships among different objects
within preservation repositories (vs. relationships between resources, i.e.,
structural metadata).
The preservation process must be able to understand, take in, and maintain
metadata submitted with the digital resource while creating its own metadata to
manage the preservation of that resource. For policies, procedures and best
practices related to the creation and handling of metadata at YUL see the IAC
Metadata Committee's website at: http://www.library.yale.edu/cataloging/metadata/IACmetadata.html. For specific policies, procedures or
guidelines regarding the creation and maintenance of preservation metadata see
the IAC Preservation Metadata Committee’s website (under development).
Access:
In preserving the accessibility of
digital resources, the Library will:
o
Maintain information regarding rights and permissions
governing access.
Intellectual Property:
Financial:
Enduring preservation of digital
resources requires substantial and ongoing financial commitments over
time—potentially more so than for traditional materials. Digital preservation
is dynamic; responses to technological obsolescence or media decay must be
taken more quickly and the life expectancy of a preservation treatment is
shorter because the technologies utilized are evolutionary. Consequently,
preservation strategies must be periodically monitored and reassessed as the
technological environment that supports standards, protocols, and formats, etc.
evolves.
Digital preservation may be funded
collaboratively, but finances should be monitored centrally in order to ensure
institutional accountability as well as emphasize and clearly identify costs
across departmental boundaries at Yale.
While the overall financial
commitments to digital preservation are understood to be substantial, the exact
costs of preserving digital resources over time are now difficult to identity
and define. Normal digital preservation activities may include several
different ongoing costs:
o
Technical infrastructure (storage media is only a small
portion of this cost, which includes equipment purchases and ongoing
maintenance, technological obsolescence monitoring, and network connectivity)
o
Staffing (hiring, general and specialized ongoing staff
training)
o
Financial planning (seeking project grants, securing ongoing
budget commitments)
o
Outsourcing (preservation methods undertaken by outside
vendors)
More centralization of digital
preservation will help reduce costs overall by integrating activities and
exploiting economies of scale. However, not only central administrators have a
responsibility to understand the costs of digital preservation. All YUL staff
creators and collectors of digital resources are stakeholders in digital
preservation and should be aware of the financial implications their activities
have on the costs of digital preservation. All stakeholders should follow the
guidelines set forth in this, and related, policies in an effort to contain
costs.[1]
Best Practices:
This policy is supported by Yale University Library's digital preservation best practices. (In development FY06)
Glossary:
This glossary offers non-technical
definitions of terms used within this policy.
Access: The
ability, permission (right) and means to locate, display, obtain, determine
availability of or make use of a digital resource, or information about that
resource.
Authentic copies A duplicate of a digital resource that is
what it purports to be and that is free from tampering or corruption.
Authenticity: A quality of a digital resource
to be judged trustworthy and genuine, based on internal and external evidence.
Content: The material, information and intellectual
substance of a digital resource.
Digital
Preservation is the whole of the activities and processes involved in the
physical and intellectual protection and technical stabilization of digital
resources through time in order to reproduce authentic copies of
these resources.
Digital resources: Encoding of intellectual context in digital form.
Enduring value: The continuing
usefulness or significance of digital resources, based on the administrative,
legal, fiscal, evidential, or historical information they contain and function
they serve, justifying their on-going preservation. The phrase “enduring value”
emphasizes the perceived value of the digital resources when they are
appraised, recognizing that a future selector may reappraise the records and
dispose of them.
File
Format: The organization (fixed, byte-serialized encoding) of
digital information according to a preset specification.
Intellectual
Property: Intellectual
property is intangible property that is created by the mind. Like tangible real
or personal property, the law recognizes the right to own and to control
intellectual property. There are four well recognized types of intellectual
property rights: copyrights, trademarks, patents, and trade secrets. These
forms of intellectual property differ significantly in the rights they confer,
how they are obtained, and how they are maintained.
Life cycle is the
framework for understanding the cyclical sequence of activities that all
digital resources undergo during their existence.
Maintenance: (of
digital resources) The action of keeping the components of digital
resources in working order or in repair. This includes loading digital
resources into storage, managing the storage hierarchy, refreshing the media on
which digital resources are stored, performing routine and special error
checking, providing disaster recovery capabilities, etc. Maintenance may be differentiated
from the broader term Preservation because Maintenance does not include the
metadata management, preservation planning, and access controls necessary to
ensure intellectual protection and to reproduce authentic copies of the digital
resources over time.
Metadata:
Information that describes significant aspects of a resource. Preservation metadata are required to
describe, manage and preserve digital resources over time and will assist in
ensuring essential contextual, historical, and technical information that are
preserved along with the digital resource.
Perpetual access
refers to permanent use of publishers’ retrospective backfiles for subscribed
years to specific publications with content in the same format and access
method with which the publisher provides current content. (See CDC, 16 Dec 1999, Expectations of Yale
University regarding Creating Archives of and Perpetual Access to Electronic
Resources, http://www.library.yale.edu/ecollections/yalearchiving.pdf).
Preservation is the whole
of the activities and processes involved in the physical and intellectual
protection of administrative, legal, fiscal, evidential, historical information
and cultural materials. Preservation encompasses a host of
policies, procedures, and processes that together sustain access or prevent
further deterioration to the materials we choose to save.
Preservation
repository: Technical
infrastructure, polices, procedures and corresponding management services that
provide for the storage, maintenance and preservation of data or information
for long-term use and retrieval.
Provenance: The source and ownership history
of a digital resource.
Stakeholder: A person or group with an
interest, involvement or investment in the digital resource.
SOURCES:
This policy has been informed by the following sources:
Life-Cycle Management of Digital
Data. Preservation Reformatting
Division. Library of Congress. http://www.loc.gov/preserv/prd/presdig/preslifecycle.html
Neil Beagrie and Daniel
Greenstein, A Strategic Policy Framework for Creating and Preserving Digital
Collections. Version 5.0 (Arts and Humanities Data Service Executive, 1998,
updated July 2001) http://ahds.ac.uk/strategic.pdf
Shenton, Helen. Life Cycle Collection Management. http://liber.library.uu.nl/publish/articles/000033/article.pdf
Digital Preservation Coalition Handbook. http://www.dpconline.org/graphics/intro/definitions.html
Columbia University Libraries. Policy for Preservation of Digital Resources. July 2000
http://www.columbia.edu/cu/lweb/services/preservation/dlpolicy.html
John Garret & Donald Waters,
Preserving Digital Information, May 1996
http://www.rlg.org/legacy/ftpd/pub/archtf/final-report.pdf
National Library of Australia,
Digital Preservation Policy, Feb 2002
http://www.nla.gov.au/policy/digpres.html
Yale Office of Cooperative
Research, Yale Intellectual Property Policies http://www.yale.edu/ocr/indust_policies/
PREMIS, Preservation Metadata
Implementation Strategies http://www.oclc.org/research/projects/pmwg/
InterParesThe Long-term Preservation of Authentic Electronic
Records: Findings of the InterPARES Project http://www.interpares.org/book/index.cfm
The International Research on Permanent Authentic Records in Electronic Systems
(InterPARES) 2 Project: Experiential, Interactive, Dynamic Records http://www.interpares.org/ip2/ip2_index.cfm
[1] A Stakeholder might mitigate costs through a number of measures. Just a few examples include: better preparing digital resources for preservation, improving the manner in which materials are transferred to the preserver, creating better descriptions, and understanding the financial implications of acquiring non-standard or unique digital resources.