XML
encoded finding aids of Holocaust survivor testimonies
==========================================================
Background:
The
Fortunoff Video Archive for Holocaust Testimonies (VAHT) holds more than 4,100
testimonies of Holocaust survivors, which are comprised of over 10,000 recorded
hours of videotape. Transcriptions of many of the videos have been made, and
are in various word processor formats, printed copies of which are made
available to researchers. Included in these documents are timestamps that
indicate at what time during the videotaped testimony a "speech
event" takes place. In this sense the testimony serves as a "finding
aid", permitting the researcher the identify the section of videotape that
is of most interest to her. Over the years, as scholars and staff have read
these finding aids, manuscript annotations, such as corrections, notes and
explanations have been added, and these annotations themselves inform and
increase the utility of the finding aid. Such annotations, however, written
with many hands, over time, are often hard to decipher. The wide variation in
the printed finding aids is an impediment to researchers effectively finding
the information they require.
Proposal:
It is
intended that we will develop or adopt an XML encoding scheme to structure the
finding aids and their annotations in order to facilitate access and retrieval,
and further to process them uniformly for consistent display in print and on
screen. Due to the oftentimes sensitive nature of information in the finding
aids, we will also implement access and restriction policies, possibly based on
Internet Protocal Address. We envision that the finding aids be encoded using the Text Encoding for
Interchange (TEI), spoken language section.
Further,
we hope to index these finding aids in a database that will allow searching on
specific XML "fields". Database results will, additionally, be
converted to HTML on-the-fly, using server-side scripting. To complement the
HTML, we aim also to create consistently formatted PDF files for print.
Given a
significant number of TEI encoded finding aids, we hope to generate an EAD
finding aid to serve as a Guide to the Holdings of the archive, and possibly
contribute this EAD instance to the Yale Finding Aid Database.
Aim to:
+
Ecnode approximately 200 finding aids
+
Generate significant portions of testimony metadata, by script, from MARC
records in Orbis
+
Implement a flexible search interface to a native XML database (access may be
restricted owing to the sensitivity of certain testimonies)
+
automatically generate EAD encoded finding aid from the encoded finding aids
+
create HTML and PDF versions of the finding aids programmatically
+
create a sustainable set of tools so that future finding aids may produce TEI
encoded documents.
benefits
to department:
+
uniform formatting of all VAHT testimonies, providing a consistent rendering
for readers
+
possibility of integrating with the Yale EAD finding aid database
+
explorer the possibility of programmatically creating MARC records from the TEI
+
enhanced transcript management functions, such as ordering/sorting finding aid
by, for example, name, transcript number, geographical place, etc.
+
ability to extract data for addition to VAHT/MSSA databases
wider
benefits:
+
formatted finding aids will look like other printed archival finding aids, such
as those at BRBL or MSSA
+
evaluation of a native XML database system that allows complex searching on
specific "tags"
+
methodology for dynamic display of XML documents "on-the-fly"
+
development of an encoded speech transcription scheme that might usefully be
adopted by others, such as encoding oral histories, lectures, or other oral
performances.
Request:
Since
we intend to use open source software, and the department has adequate hardware
for the storage and delivery of the encoded descriptions we are asking only
student time to TEI encode the finding aids. Preliminary estimates incidate an
XML encoding rate of 2 finding aids per hour on average.
ca.
$1000 for student labor
according
to:
200
finding aids (2 p/h) = 100 hours = $1000 (estimating $10 p/h for student labor)