"I don't ask for much,

I only want trust,

And you know it don't come easy!"

Ringo Starr (1971)

 

IUPAP Conference on the Long Term Archiving of Digital Documents in Physics

Lyon, France

4-6 November 2001

Ann Okerson, Yale University

ann.okerson@yale.edu

PowerPoint Presentation Linked Here

 

 

It's an old piece of wisdom that when a problem seems difficult to solve, a useful tactic is to try to solve a larger problem.  I apologize if my words for this meeting seem to open larger cans of worms than the ones already seen open on the table, but I am convinced that the dilemmas of digital preservation will actually make more sense if we put them in a wider context.

 

It is not easy for most of us to imagine ways and means for assuring the very long-term preservation of electronically published journals, assuring useful access to them by scholars and scientists.  Others will describe the current state of play in addressing and resolving these ways and means, so I will confine myself to a schematic list, in order to say better why I think invocation of a larger context can help us.

 

1.                 Physical media in which digital artifacts are stored have limited life expectancy.  Such artifacts need to migrate forward on a regular basis from one storage vehicle to another if they are to survive in any minimal form.  As Dale Flecker, a colleague in the Harvard Library, wrote, "All digital materials are fragile.  They depend for their continued viability upon technologies that undergo rapid and continual change."

 

2.       Hardware, operating systems, and software are the objects of constant strategies of enhancement and new systems are "backwards compatible" in very limited ways.  (In fact, maintaining backwards compatibility is one of the greatest headaches in system design.  There is an old wheeze among such innovators that goes like this:

 

"Q.  Why is it that God was able to create the whole universe in only six days?

 

A.  Because he didn't have to make it compatible with an installed base of systems."

 

Digital artifacts that will survive to be read and used must either evolve to function in new environments or new environments must be designed or adapted in such a way as to allow "obsolete" forms of information to be used in ways approximating their initial function.

 

3.       Both (1) and (2) above can only be done for the long term if there is assurance of a continuing commitment on the part of social institutions with a reasonable expectation of endurance to take the steps necessary to assure survival.  But:

 

A.      The authors and publishers who create and distribute such artifacts are often not institutions with a reasonable expectation of endurance (in this age of mergers, acquisitions, etc.); and

 

B.      The publishers who create and distribute such artifacts do so with expectation of income arising from sales, income that very likely will not sustain itself in sufficient quantity to make the information we hope to preserve self-sustaining; even as

 

C.      Libraries, as the depositories of information in traditional formats, have a long history of preservation of materials after their publishers have ceased to care for them (or ceased to do business at all), but libraries have built this history for the following reasons:

 

(1)     they physically possess the artifacts they preserve,

 

(2)     preservation of traditional print materials can be accomplished with a minimum of attention to individual artifacts and a modest attention to the conditions of mass storage -- in other words through a kind of "benign neglect", and

 

(3)     the real assurance of preservation arises because many institutions throughout the world own and preserve more or less cheaply and easily the same objects.  There has never been a sustained and accurate census of the state of preservation of library contents -- it is simply the case that the Ruthenian Journal of Palaeobotany survives because dozens of libraries take ordinary care of their copies.  Three fires and four floods among major libraries in a century will still leave dozens of copies of most research findings intact, without human thought or intervention.

 

In other words, the library history of preservation success will not translate automatically at all into any kind of success in the new environment.

 

D.      Even the scientists and in some cases the learned societies who write and publish the materials may have less than a full engagement in preservation, especially in fields where the new constantly supplants the old and where the real interest in a given item a hundred years after its date of publication may lie among sociologists or historians, not the scientists in the field who originally wrote and read that item.

 

E.      Finally, governments and other national and supranational cultural agencies typically place a relatively low priority on digital preservation issues, while private funding sources (foundations and philanthropists) do not show signs yet of taking much pleasure in invisible monuments -- donors would still have a building named after them than a high speed computer storage device filled with precious historical materials.

 

4.       Even when materials have been preserved intact in some useful form, provision of access is necessary in order for that preservation to have some value, but such provision requires:

 

A.      a further institutional commitment well above and beyond that of digital preservation alone; and

 

B.      that the providing institution has the assured legal right to provide the information.  The repeated extension of copyright terms makes this more difficult than ever, because authors' rights in materials published may reasonably now be expected (imagine an article by a 25 year old post-doc who lives to age 80) to extend until approximately the year 2131).  Some rights owners will happily forego those ownership rights in the interests of science; others will not do so.

 

Our purpose in meeting here is to address some of the implications of these issues, so I hope that the above schematic outline is at least helpful in framing the discussion.  But I want to turn from it to suggest that there is indeed larger context and a larger issue, one that is determinative for us, and that is the issue of trust.

 

I have outlined the issues above in the most abstract and external way possible, all in the abstract third person, i.e., here are things that persons or institutions unstated must do over the next century or more to achieve our digital preservation goal.  It may be perfectly true to say that, but it is preposterous to speak so confidently about the behavior of our contemporaries in the scientific publishing arena and of that of generations of scientists and information professionals whose grandparents have not yet been born!  In other words, to bring these issues back to the real world in which we move and have our being, we must put the specific perspective of our own year, or perhaps our own decade, upon them.

 

And when we do that, the overarching issue becomes one of trust and confidence.  None of the concerns outlined above are insoluble in themselves.  Those of us who have played 1980-vintage video games on high-end Intel workstations know how the change in scale of computing power over time makes possible emulations that seemed perfectly impossible in the short-term.   Those publishers here who have participated in the negotiations that give library users the remarkably wide access that they enjoy, even to commercially published journals, will agree that reasonable people can negotiate solutions to almost any question of rights, ownership, and access.

 

But abstract assurance is of little value.  The chances that a jetliner will not strike this room while we meet here are astronomically high and in any reasonable mathematical terms are only inconsequentially lower than they were a few months ago, but the behavior and expectations of millions around the world has been dramatically altered by that statistically negligible alteration in probabilities.

 

Let me say what I mean more directly.  I believe we could make an extremely strong case right now that for most of the electronic journals on which physical sciences depend today -- most of which are published by a community of prestigious learned societies and a handful of for-profit publishers -- there is every reason to expect that a purely laissez faire approach would produce a highly acceptable set of preservation outcomes.  That is to say, the American Physical Society and Elsevier Science are organizations that could go out of business in this new century, but they are so rooted in the academic culture that such a departure would immediately attract significant attention and concern.  No one would ever simply pull the plug on a server and walk away.   Moreover, information systems are vastly more robust and reliable than they were a generation ago, and they show every sign of continuing to move towards greater reliability.  Handing over control from one owner to another is something that may well become easier to do as time passes.

 

Our libraries are similarly vulnerable and similarly rooted.  Their attention to the preservation of information for the benefit of their users and future generations is legendary and professional at the same time.  The future of these institutions as institutions (whatever the mix of responsibilities and services they assume and whatever artifacts they care for) is something that I think we can mark as relatively assured.

 

But at the same time, we cannot and should not act as though there were no issue to concern ourselves with.  First, there are real risks of loss; second, there are real risks of not giving the concerned community better assurances than we can now give them.

 

What are the real risks?  I will suggest three:

 

1.       First, disaster.  The dangers of concentrating resources too exclusively in one place have never been so well dramatized as they have this fall.  But the ordinary everyday disasters of all times are equally to be feared.  A week after 9/11, the University of Maryland was struck by a tornado -- a far likelier and far more damaging event for an institution of higher education, and all the more damaging for taking up a far smaller place in the public consciousness.  Whatever solutions we impose on digital preservation do not depend, as preservation of traditional materials did, on necessarily keeping multiple copies in multiple locations.  But we must and will nevertheless include multi-location strategies in any reasonable digital preservation strategy, and the systems we depend on will need to be redundant and coordinated in many ways.  This suggests that the players in long term digital preservation strategies will want to include some big and robust ones.

 

2.       Second, concern for the vital margins.  The way in which I spoke earlier of the assurance we can bring to the survival of materials entrusted to the learned societies, the Elseviers, and the libraries of the world deliberately omitted attention to the risks of survival for materials of high scholarly and scientific value that lies at the margins of everyday attention.  This is the best way, I would reason, to think about last year's tempest raised by Nicholson Baker's screed The Double Fold.  Baker did not deny that libraries have done a great deal to assure the preservation of the content of old newspapers and that a high percentage of what they have done will be indeed successful.

 

Baker's concern was about information at the margins:  information from non-standard sources (local newspapers of great interest); information presented in non-standard ways (materials in newspapers, such as color illustration; information that did not lend itself to microfilm reproduction); information that fell outside the usual bibliographical attention of institutions; and information whose preservation fell within a margin of error noticeably greater than 0.001%.  The sum total of information at risk in this way, on the most generous reading of Baker's screed, is a tiny percentage of the whole, but we should not deny that it is an important tiny percentage and that the traditional and contemporary missions of librarianship should leave us concerned with that information.  Concern for such information at the margins, where the returns on investment can diminish rapidly, is concern directed to a genuine risk.

 

It is all well and good to hyperventilate about whether in years to come Elsevier Science can be trusted to maintain access to electronic backfiles of the Journal of Rhododendron Pathology, but it is beyond question that huge tracts of culturally, historically, and even scientifically significant material published in electronic form have already been lost irrevocably and more are being lost daily.  The history of the World-Wide Web is a history of invention and evanescence, and it is painful to think about how much has disappeared already.  Standards and systems have not yet evolved, that is quite true:  but while we wait for them, information dies daily.  I greatly admire the work of the Swedish national library, which has taken an approach for archiving the Swedish web that we could characterize as "grab it now, figure out what to do with it later".  They will be praised in coming years for having latched on to and preserved huge swathes of material that are otherwise disappearing on all sides.

 

3.       Fourth, obsession with ownership.  Traditional models of information access and preservation have depended on traditional ideas of property, going back to the 18th century revolution in ideas that led to the creation of that vital oxymoron, "intellectual property".  In the traditional construct, assurance comes with possession, even fetishistic possession.  I strongly believe that we can and should assume that readers and users of scientific information in the next generations will rapidly become accustomed to living in a world of distributed resources.  Our desktops and operating systems will quickly make the network transparent to us, as information located thousands of miles from us will be as accessible and as versatile for use as the information on our own desktops.

 

That listing of risks is not meant to be exhaustive.  But if we take it as representative, then we can now ask what better assurances would look like.  Here are brief suggestions for each of the three risks:

 

1.       Disaster:  This is the easiest, paradoxically enough.  Reasonable people working together can build distributed archives.  I have been struck in the last year by the extent to which libraries and publishers can and should learn from the experience and perhaps even share the resources of the social institutions with an even more urgent concern for preservation of information -- banks and credit card companies, for example, who already manage multi-national distributed "archives" of immense size and making provision for remarkably easy access.

 

2.       Vital margins:  This is the hardest.  To preserve information at the margins requires us to think carefully about priorities, to identify the marginal items of highest value, and to convoke the authority and influence of "mission-based" institutions in society (universities, libraries, foundations) to identify and fund their inclusion in the more naturally self-sustaining projects of publishers and libraries.

 

3.       The same strategy can be invoked to address anxieties of ownership.  We need some large-scale partnerships, not just experimental projects but large-scale prototypes that declare openly and publicly that publishers, learned societies, scientists, and libraries recognize that we are all in this together for the long run and that we accept that fact.  The current excitements over the "Public Library of Science" should not distract us from the underlying commonalities of interests between all the players in this arena.  Declaring by our deeds that we recognize not only our differences but also our fundamental collegiality will take us furthest, fastest.

 

I suggest that libraries and publishers begin meeting not to discuss digital preservation.  If we confine ourselves to that issue, we can spend many years edging closer to a perfect solution, while more and more information is put at risk.  I suggest instead we discuss actively the question of just how rapidly we can move to discontinuing print subscriptions and print publication of key journals.  There is no action that we can take that would have such a dramatic and visible effect on the people who use information and the people who pay for it.  What we need to do, paradoxically, is to win the confidence of a broader public by displaying confidence ourselves.  To do so will arouse outrage, of course, and concerns about the "digital divide."

But no other type of action would force attention to questions of reasonable access to digital information in both more and less privileged settings (whether small organizations in first world countries or large ones in the "third world").  The summer of 2001 was replete with press releases from the World Health Organization, the Soros Foundation, and other groups, in partnership with learned publishers in biomedical fields, making available high quality electronic information, peer reviewed journals entirely for free or very cheaply to scores of developing nations.  This new set of partnership and electronic services to poorer nations will hasten our need to develop electronic preservation solutions and it will set an example for access for those of us who are more privileged.

 

For in the end, we need to recognize that our issues of trust – whether of each other or the digital medium -- truly are difficult, because those of us who prescribe trust as a solution must also learn to practice it as a discipline.  We have reached a point, I believe, where the parties to all these debates can and should look at each other, shrug, laugh, say something like "What the heck!" and get on with the business of making the future happen now, imperfectly and progressively, rather than wait for perfect solutions. The trust I urge here, in other words, is not the trust of abstract people out there, but the trust that people in this room for this conference will place in the possibility of change, progress, and community.