Global Resources Network :: Conference at Yale March 2005 spacer
spacer
spacer
spacer spacer spacer
Home  |  Overview  |  Summary  |  Proceedings  |  Roster  |  Global Resources Network
spacer
spacer
spacer spacer spacer
spacer CONFERENCE PROCEEDINGS
spacer spacer
Keynote Address - Jonathan Spence, Sterling Professor of History at Yale University
spacer After Dinner Remarks - David Stam, Librarian Emeritus, Syracuse University
spacer TRADITIONAL MEDIA PANEL
Graham Shaw, Program Director of the Endangered Archives Program, British Library
Ben Kiernan, Director Genocide Studies Program, Yale University and Rich Richie, Curator, Southeast Asia Collection, Yale University
Laura Engelstein, Department of History, Yale University
Question and Answer period for session I
spacer ELECTRONIC MEDIA PANEL
David Germano, Director of the Tibetan and Himalayan Digital Library, University of Virginia
Priscilla Offenhauer, Ph.D., Research Analyst, Area Studies, Federal Research Division, Library of Congress
Joanne Rudof, Archivist of the Fortunoff Video Archive, Yale University
Question and Answer period for session II
spacer Closing - Dr Donald J. Waters, Program Officer for Scholarly Communications; Mellon Foundation
spacer
spacer spacer spacer
spacer spacer

Priscilla Offenhauer, Ph.D., Research Analyst,

Area Studies, Federal Research Division, Library of Congress

 

"Fugitive Sources and Web Archiving"

spacer

 

Dr. Offenhauer

 

 

I'm speaking as a researcher who uses web-based material, not as someone well versed in the technical aspects of the universe of Web Archiving. Let's say I am relatively archive naïve. That is, I don't know how current web archiving projects are going to fix my problems or address my concerns. My role here is to voice my sense of these problems as a researcher.

 

Background as Historian and Work as Research Analyst: Problem Context

 

A quick description of my background and what I do now will put my sense of a researcher's problems in context. I wear two hats that shape my perspective, historian and research analyst. By background I am an historian, European intellectual and social history, and as such interested in all sorts of evidence and its durability for the sake of the historical record and future scholarship. I want to know how social historians of the future will have access to digital ephemera, the same kind of access to unexpected kinds of evidence out of which they make hay now: written ephemera like letters, purchase orders for food in boys' schools in 1800, pay records, diaries, or images. Or how about the historian 50 years from now who wants to understand the thinking of European Muslim radicals in the 1990s individuals who maintain personal web-sites, which will be the main evidence about their thinking. Or even more problematic, the thinking of some as yet unrecognized militant group whose members maintain personal websites, or participate in communications that they want to make fleeting.

 

Those are the sorts of concerns I have as someone with training in history, concerns about future scholarship

 

Although the perspective of historian informs my current work as a research analyst, I have also another set of concerns somewhat more short-term and more related to current policy development. In my present work, I tend to address here-and-now research questions.

 

So my issue is not about the disappearance of evidence upon which I might like to work someday. To the contrary, I am generally looking for the most up-to-date information. This is why I rely so heavily on on-line materials. Such use enables me to by-pass bottlenecks in library processing and in other means of access like purchase from an agency or international source.

My concern, in my present-oriented work, is the lack of accessibility of that evidence the lack of easy access to my sources by the reader of my reports. I have issues around citations to material that disappears or at least cannot be relocated as I cited it, issues about the persistent sourcing for current research. I work now in a research institute within the Library of Congress producing substantial research reports mostly for other government agencies. We draw upon LOC resources but are otherwise not involved in the functioning of the Library.

 

 

The Research and How Clients Use it

 

Examples of topics on which I have written: Current Chinese immigration and emigration, a book-length organizational history of two decades of a huge training program for generals in a branch of the US military, the future of the German military, a country profile on Kenya, the Dutch and Danish space programs, the worldwide phenomenon of human trafficking, and social science research on Muslim women. What I produce has different uses for the clients. Sometimes the work constitutes background information for policy formulation, say, the background on space programs so that NASA knows about potential international partners. Sometimes my reports become reference material for public use, as in the Kenya profile which is posted on the Web at the Library of Congress, and sometimes my findings are the basis for further work in the client agency, as in the case of trafficking, where I gathered wide-ranging gray literature, and the client developed a complex database to accommodate and manipulate it, as well as a full text archive backing up the database.

 

In all cases, I obviously want my evidence to be checkable, verifiable, and possibly, a jumping off point for further research by my reader. I envision my readers as active users in the same way I did when I was writing for a community of historians.

 

At present, I find myself perpetually uneasy in ways I was not before research became so dependent on Web-based materials. Almost all my citations are to Web-based materials. I give the URL's and sometimes the date accessed. In doing so I have the feeling I am participating in a bit of a game in which I am sending the reader on a wild goose chase if the reader wants to track down my source. I go through the same scholarly motions that I did when my access to a source was the same as that of a reader who wanted to check. However, the stability that my mode of citation presupposes no longer exists. Now I cite sources knowing that much of what I cite is not permanently where I say it is going to be. I cite knowing that my source may not be readily available to my reader.

 

To be sure, I recognize that citations made to the Web are not all created equal with respect to Web-site persistence or the challenge they pose to my scholarly conscientiousness. My uneasiness is more warranted about some types of materials than others. There is a continuum of fugitivity. The questions worth considering with respect to my uneasiness are "what factors determine the reliability of my sources?" Or alternatively, "What factors take away the accessibility of my sources to my reader?"

 

 

What Factors Determine the Reliability of My Sources? What Factors Take Away my Sources?

 

A good proportion of what I use for a project relies upon web materials that are posted by authoritative and enduring organizations: government reports, reports by international organizations, human rights organizations, like Human Rights watch. Much of what is accessed through such sites consists of digitized versions of print works, some of it is "born digital." Whether born digital or not, the materials themselves are substantial reports and tend to have gone through the normal vetting process that is involved in print publication as opposed to creation on the Web. The materials are likely to be recognized as valuable and as candidates for permanent capture somewhere, at the very least within the producing organization itself and, I gather, they are likely to be deposited somewhere as well. I can assume with some confidence that the material is surviving somewhere on the Web. I assume that somebody is taking care of preserving what I have relied upon, but I don't routinely ascertain whether that it true. I am typically ignorant of what my reader's luck will be in accessing my cited materials.

 

This probable survival of much I use means that my uneasiness concerning reference and citation often has more to do with the problem of materials changing address than of disappearing altogether. My problem is more one of directing readers to the wrong place. As you probably know, to get around this problem of mis-addressing sources, a number of journals insist that their authors cite non-Web versions of papers (I know of several through a friend in the field of ecology: The Journal of Economic Entomology and the Canadian Journal of Forest Research). However, if I were to cite only non-Web versions of materials, I would falsely represent my research method, and would have to ignore born digital materials.

 

While true fugitivity doesn't apply to much of what I cite, even if changing addresses are a problem, transience lack of Web-site persistence is more real for a portion of what I cite. One example, not from my own work, that I've already mentioned are the personal Web-postings and organizational Web-postings of radical groups that don't want to be found. I encountered such fugitivity in a big way in connection with my work on human trafficking. I sought all the information I could find on the magnitude of cross-border human trafficking worldwide. I used all sorts of sources of the kind I mentioned before materials from authoritative institutions, governments, the U.N. but, in addition, I used Web-based materials from much less substantial NGOs, small groups of grassroots activists, in Nigeria, Ukraine, and Thailand. These groups don't have enough stability to guarantee the longevity of their postings, which indeed may disappear from the Web and vanish altogether for lack of anyone to attend to their preservation. There was similar unreliability in small branch offices of international organizations. What such small entities produced was on-the-ground information about the local picture on trafficking (complete with numbers and details about cases). In the case of this trafficking project, as I mentioned, the client ultimately took the bull by the horns and provided full text access to the materials from which the client-created database drew. That provision of full text was to ensure that any user of the database could easily check the judgments made in entering information into it. The provision of full text was at least a local solution to that continuum of fugitivity.

 

Besides true lack of persistence of Web-based material, other major factors that take away my sources (and make me uneasy) are various kinds of barriers to the reader's access to what I have used as sources. There are several kinds of barriers that I am aware of as I cite material. One big one is commercial inaccessibility and another is national security. A familiar and unproblematic example of commercial barriers is the New York Times archive. Any article that is a few days old costs something to retrieve. That is unproblematic for me, because I know the reader is not closed out. Many other digital materials I use are through library accounts Pro-quest, Lexis-Nexis, FBIS, e-journals. The materials are persistent they have already been selected for preservation in databases but accessibility to the reader is another question.

 

On the question of security-related barriers, I will mention one experience of a colleague not in the Library of Congress, and then conclude with a case of one big study that I did, which illustrates a number of points I have been making and has a security-related punch-line.

 

My colleague was working on Web-based materials on a nation's military. Although the sites were official postings, she repeatedly had the experience of the sites going down after she had visited a few times. She and colleagues who were knowledgeable in computer and web technology concluded that their activity accessing the sites was somehow detected and prompted blocking of the sites.

 

 

Cases: Organizational History of a Training Program

 

My somewhat more elaborate security-related tale has to do with a history I wrote on a huge training program for generals in a branch of the US military. This is a training program in which exercises involve 5,000 to 7,000 people. The people are set up in field headquarters with all their communications equipment, etc. They communicate back and forth about what is going on the battlefield based on reports they are getting about it. They don't actually see it, however, because actual battlefield events are simulated. Anyway, it is a very elaborate program, and I was to trace its origins and development up to the present including the history of the simulation.

 

The training program was conceived in the early 1980s and formed in 1986. The evidential materials from which I worked for the program's early years were mostly print materials, some of them tossed together in cardboard boxes in the basement including: interviews, slide presentations, journal and newspaper articles, and some annual organization histories compliled by the umbrella organization's historian. For several years of the early 1990s, the annual histories became available both in print and as digitized versions on the Web. After 1994, budget cuts eliminated the organization's historian and archivist, and I pieced things together with heavy reliance on disparate Web materials and postings. The materials became truly patchy. An increasing percentage of material was born digital, as well as ephemeral, for example, schedules of planning meetings of different groups within the organization. Much of my source material was raw, transitory material that would be unlikely to be seen as valuable and worthy of digital preservation.

 

The final straw for the issue of fugitivity and accessibility, however, was that the materials I used all became inaccessible shortly after September 11th, 2001. All of my many footnotes with URLs became immediately inoperable. I have saved everything in a personal archive, as I always do, but that won't help the reader. The branch of the military still had the material, but access now required clearance a fairly stiff bar.

 

As I was writing this, I had revelations about things that would help what I go through. If my problem is not so much the persistence of Web materials per se, but the fiction of citing them as if they are like anything else, i. e., stable, I need new ways to tell my reader how to find my evidential material. URLs and personal archives are not enough.

 

A second and related point I need a better understanding of what the fate of my source materials will be. This, it seems to me, will need to be part of the training of researchers. Where stuff goes in general, where is the elephant graveyard? Also, more particularly, for individual documents on web sites, is anyone envisioning tags that will identify the fate of web materials that could be used to clue in researchers? If such things already exist, it is not shared knowledge outside the digital archiving community. If permanent identifiers are being envisioned, researchers need to know.

 

 

 

Priscilla Offenhauer is Senior Research Analyst in Area Studies at the Federal Research Division of the Library of Congress. The Federal Research Division does research projects on contract for other government agencies using the area studies resources of the Library of Congress. She joined the Federal Research Division in 2000 as a specialist in European affairs. She holds degrees from Cornell, Duke, and Boston University, including a Ph.D. in European Intellectual and Social History, with an emphasis on mid-twentieth century critical social theory. Prior to taking up her current research position, she was a university professor and research consultant, teaching most recently at Tufts University on comparative economies and the changing nature of work. At the same time, she was a researcher at Harvard University on international development and business/labor legal issues. Earlier teaching appointments include Boston University in history and social science, and Bonn University, Germany, in comparative literary studies.

Her research at the Federal Research Division focuses on topics of interest to U.S government agencies and usually involves an international dimension, for example, country profiles of Kenya and France, Chinese migration, the future of the German military, Muslim women, an organizational history of a major military training program for generals, and studies on human trafficking across international borders.

 

back to top

spacer
Yale University Library  |  Yale University  |  Yale Center for International and Area Studies
© 2005 Yale University Library | | Webmaster: derek.merleaux@yale.edu
spacer