"I
don't ask for much,
I
only want trust,
And
you know it don't come easy!"
Ringo Starr (1971)
IUPAP Conference on the Long Term
Archiving of Digital Documents in Physics
4-6 November 2001
Ann Okerson,
ann.okerson@yale.edu
PowerPoint
Presentation Linked Here
It's an old piece of wisdom that when a problem
seems difficult to solve, a useful tactic is to try to solve a larger
problem. I apologize if my words for
this meeting seem to open larger cans of worms than the ones already seen open
on the table, but I am convinced that the dilemmas of digital preservation will
actually make more sense if we put them in a wider context.
It is not easy for most of us to imagine ways
and means for assuring the very long-term preservation of electronically
published journals, assuring useful access to them by scholars and
scientists. Others will describe the
current state of play in addressing and resolving these ways and means, so I
will confine myself to a schematic list, in order to say better why I think
invocation of a larger context can help us.
1.
Physical media in which digital artifacts
are stored have limited life expectancy.
Such artifacts need to migrate forward on a regular basis from one
storage vehicle to another if they are to survive in any minimal form. As Dale Flecker, a colleague in the Harvard
Library, wrote, "All digital materials are fragile. They depend for their continued viability
upon technologies that undergo rapid and continual change."
2. Hardware,
operating systems, and software are the objects of constant strategies of
enhancement and new systems are "backwards compatible" in very
limited ways. (In fact, maintaining
backwards compatibility is one of the greatest headaches in system design. There is an old wheeze among such innovators
that goes like this:
"Q. Why is it that God
was able to create the whole universe in only six days?
A. Because he didn't have
to make it compatible with an installed base of systems."
Digital artifacts that will survive to be read
and used must either evolve to function in new environments or new
environments must be designed or adapted in such a way as to allow
"obsolete" forms of information to be used in ways approximating
their initial function.
3. Both
(1) and (2) above can only be done for the long term if there is assurance of a
continuing commitment on the part of social institutions with a reasonable
expectation of endurance to take the steps necessary to assure survival. But:
A. The authors and publishers who create and
distribute such artifacts are often not institutions with a reasonable
expectation of endurance (in this age of mergers, acquisitions, etc.); and
B. The publishers who create and distribute
such artifacts do so with expectation of income arising from sales, income that
very likely will not sustain itself in sufficient quantity to make the
information we hope to preserve self-sustaining; even as
C. Libraries, as the depositories of
information in traditional formats, have a long history of preservation of
materials after their publishers have ceased to care for them (or ceased to do
business at all), but libraries have built this history for the following
reasons:
(1) they physically
possess the artifacts they preserve,
(2) preservation of
traditional print materials can be accomplished with a minimum of attention to
individual artifacts and a modest attention to the conditions of mass storage
-- in other words through a kind of "benign neglect", and
(3) the real assurance
of preservation arises because many institutions throughout the world own and
preserve more or less cheaply and easily the same objects. There has never been a sustained and accurate
census of the state of preservation of library contents -- it is simply the
case that the Ruthenian Journal of Palaeobotany survives because dozens
of libraries take ordinary care of their copies. Three fires and four floods among major
libraries in a century will still leave dozens of copies of most research
findings intact, without human thought or intervention.
In
other words, the library history of preservation success will not translate
automatically at all into any kind of success in the new environment.
D. Even the scientists and in some cases the
learned societies who write and publish the materials may have less than a full
engagement in preservation, especially in fields where the new constantly
supplants the old and where the real interest in a given item a hundred years
after its date of publication may lie among sociologists or historians, not the
scientists in the field who originally wrote and read that item.
E. Finally, governments and other national
and supranational cultural agencies typically place a relatively low priority
on digital preservation issues, while private funding sources (foundations and
philanthropists) do not show signs yet of taking much pleasure in invisible
monuments -- donors would still have a building named after them than a high
speed computer storage device filled with precious historical materials.
4. Even
when materials have been preserved intact in some useful form, provision of
access is necessary in order for that preservation to have some value, but such
provision requires:
A. a further institutional commitment well
above and beyond that of digital preservation alone; and
B. that the providing institution has the assured
legal right to provide the information. The repeated extension of copyright terms
makes this more difficult than ever, because authors' rights in materials
published may reasonably now be expected (imagine an
article by a 25 year old post-doc who lives to age 80) to extend until
approximately the year 2131). Some
rights owners will happily forego those ownership rights in the interests of
science; others will not do so.
Our purpose in meeting here is to address some
of the implications of these issues, so I hope that the above schematic outline
is at least helpful in framing the discussion.
But I want to turn from it to suggest that there is indeed larger
context and a larger issue, one that is determinative for us,
and that is the issue of trust.
I have outlined the issues above in the most
abstract and external way possible, all in the abstract third person, i.e.,
here are things that persons or institutions unstated must do over the next
century or more to achieve our digital preservation goal. It may be perfectly true to say that, but it
is preposterous to speak so confidently about the behavior of our
contemporaries in the scientific publishing arena and of that of generations of
scientists and information professionals whose grandparents have not yet been
born! In other words, to bring these
issues back to the real world in which we move and have our being, we must put
the specific perspective of our own year, or perhaps our own decade, upon them.
And when we do that, the overarching issue
becomes one of trust and confidence.
None of the concerns outlined above are insoluble in themselves. Those of us who have played 1980-vintage video
games on high-end Intel workstations know how the change in scale of computing
power over time makes possible emulations that seemed perfectly impossible in
the short-term. Those publishers here
who have participated in the negotiations that give library users the
remarkably wide access that they enjoy, even to commercially published
journals, will agree that reasonable people can negotiate solutions to almost
any question of rights, ownership, and access.
But abstract assurance is of little value. The chances that a jetliner will not
strike this room while we meet here are astronomically high and in any
reasonable mathematical terms are only inconsequentially lower than they were a
few months ago, but the behavior and expectations of millions around the world
has been dramatically altered by that statistically negligible alteration in
probabilities.
Let me say what I mean more directly. I believe we could make an extremely strong
case right now that for most of the electronic journals on which physical
sciences depend today -- most of which are published by a community of prestigious
learned societies and a handful of for-profit publishers -- there is every
reason to expect that a purely laissez faire approach would produce a
highly acceptable set of preservation outcomes.
That is to say, the American Physical Society and Elsevier Science are
organizations that could go out of business in this new century, but they are
so rooted in the academic culture that such a departure would immediately
attract significant attention and concern.
No one would ever simply pull the plug on a server and walk away. Moreover, information systems are vastly
more robust and reliable than they were a generation ago, and they show every
sign of continuing to move towards greater reliability. Handing over control from one owner to
another is something that may well become easier to do as time passes.
Our libraries are similarly vulnerable and
similarly rooted. Their attention to the
preservation of information for the benefit of their users and future
generations is legendary and professional at the same time. The future of these institutions as
institutions (whatever the mix of responsibilities and services they assume and
whatever artifacts they care for) is something that I think we can mark as
relatively assured.
But at the same time, we cannot and should not
act as though there were no issue to concern ourselves with. First, there are real risks of loss; second,
there are real risks of not giving the concerned community better assurances
than we can now give them.
What are the real risks? I will suggest three:
1. First,
disaster. The dangers of concentrating
resources too exclusively in one place have never been so well dramatized as
they have this fall. But the ordinary
everyday disasters of all times are equally to be feared. A week after 9/11, the
2. Second,
concern for the vital margins. The way
in which I spoke earlier of the assurance we can bring to the survival of
materials entrusted to the learned societies, the Elseviers, and the libraries
of the world deliberately omitted attention to the risks of survival for materials
of high scholarly and scientific value that lies at the margins of everyday
attention. This is the best way, I would
reason, to think about last year's tempest raised by Nicholson Baker's screed The
Double Fold. Baker did not deny that
libraries have done a great deal to assure the preservation of the content of
old newspapers and that a high percentage of what they have done will be indeed
successful.
Baker's concern was about information at the
margins: information from non-standard
sources (local newspapers of great interest); information presented in
non-standard ways (materials in newspapers, such as color illustration;
information that did not lend itself to microfilm reproduction); information
that fell outside the usual bibliographical attention of institutions; and
information whose preservation fell within a margin of error noticeably greater
than 0.001%. The sum total of
information at risk in this way, on the most generous reading of Baker's screed,
is a tiny percentage of the whole, but we should not deny that it is an
important tiny percentage and that the traditional and contemporary missions of
librarianship should leave us concerned with that information. Concern for such information at the margins,
where the returns on investment can diminish rapidly, is concern directed to a
genuine risk.
It is all well and good to hyperventilate about
whether in years to come Elsevier Science can be trusted to maintain access to
electronic backfiles of the Journal of Rhododendron Pathology, but it is
beyond question that huge tracts of culturally, historically, and even
scientifically significant material published in electronic form have already
been lost irrevocably and more are being lost daily. The history of the World-Wide Web is a history
of invention and evanescence, and it is painful to think about how much has
disappeared already. Standards and
systems have not yet evolved, that is quite true: but while we wait for them, information dies
daily. I greatly admire the work of the
Swedish national library, which has taken an approach for archiving the Swedish
web that we could characterize as "grab it now, figure
out what to do with it later". They
will be praised in coming years for having latched on to and preserved huge
swathes of material that are otherwise disappearing on all sides.
3. Fourth,
obsession with ownership. Traditional
models of information access and preservation have depended on traditional
ideas of property, going back to the 18th century revolution in ideas that led
to the creation of that vital oxymoron, "intellectual property". In the traditional construct, assurance comes
with possession, even fetishistic possession.
I strongly believe that we can and should assume that readers and users
of scientific information in the next generations will rapidly become
accustomed to living in a world of distributed resources. Our desktops and operating systems will
quickly make the network transparent to us, as information located thousands of
miles from us will be as accessible and as versatile for use as the information
on our own desktops.
That listing of risks is not meant to be
exhaustive. But if we take it as
representative, then we can now ask what better assurances would look
like. Here are brief suggestions for
each of the three risks:
1. Disaster: This is the easiest, paradoxically
enough. Reasonable people working
together can build distributed archives.
I have been struck in the last year by the extent to which libraries and
publishers can and should learn from the experience and perhaps even share the
resources of the social institutions with an even more urgent concern for
preservation of information -- banks and credit card companies, for example,
who already manage multi-national distributed "archives" of immense
size and making provision for remarkably easy access.
2. Vital
margins: This is the hardest. To preserve information at the margins
requires us to think carefully about priorities, to identify the marginal items
of highest value, and to convoke the authority and influence of
"mission-based" institutions in society (universities, libraries,
foundations) to identify and fund their inclusion in the more naturally
self-sustaining projects of publishers and libraries.
3. The
same strategy can be invoked to address anxieties of ownership. We need some large-scale partnerships, not
just experimental projects but large-scale prototypes that declare openly and
publicly that publishers, learned societies, scientists, and libraries
recognize that we are all in this together for the long run and that we accept
that fact. The current excitements over
the "Public Library of Science" should not distract us from the
underlying commonalities of interests between all the players in this
arena. Declaring by our deeds that we
recognize not only our differences but also our fundamental collegiality will
take us furthest, fastest.
I suggest that libraries and publishers begin
meeting not to discuss digital preservation. If we confine ourselves to that issue, we can
spend many years edging closer to a perfect solution, while more and more
information is put at risk. I suggest
instead we discuss actively the question of just how rapidly we can move to
discontinuing print subscriptions and print publication of key journals. There is no action that we can take that
would have such a dramatic and visible effect on the people who use information
and the people who pay for it. What we
need to do, paradoxically, is to win the confidence of a broader public by
displaying confidence ourselves. To do
so will arouse outrage, of course, and concerns about the "digital
divide."
But no other type of action would force
attention to questions of reasonable access to digital information in both more
and less privileged settings (whether small organizations in first world
countries or large ones in the "third world"). The summer of 2001 was replete with press
releases from the World Health Organization, the Soros Foundation, and other
groups, in partnership with learned publishers in biomedical fields, making
available high quality electronic information, peer reviewed journals entirely
for free or very cheaply to scores of developing nations. This new set of partnership and electronic services
to poorer nations will hasten our need to develop electronic preservation
solutions and it will set an example for access for those of us who are more
privileged.
For in the end, we
need to recognize that our issues of trust – whether of each other or the
digital medium -- truly are difficult, because those of us who prescribe trust
as a solution must also learn to practice it as a discipline. We have reached a point, I believe, where the
parties to all these debates can and should look at each other, shrug, laugh,
say something like "What the heck!" and get on with the business of
making the future happen now, imperfectly and progressively, rather than wait
for perfect solutions. The trust I urge here, in other words, is not the trust
of abstract people out there, but the trust that people in this room for this
conference will place in the possibility of change, progress, and community.