During the last year of World War II Congress established a permanent committee of the House of Representatives to consider subversive activities and Communist controls. The Committee on Un-American Activities (HUAC) had been in existence as a special committee of the House since 1938 (the Dies Committee). Over the course of the next 25 years (HUAC was terminated in February 1969 when the name was changed to the Committee on Internal Security) the Un-American Activities Committee responded to rising tensions between the United States and Russia. The outbreak of the Korean War in 1950 increased existing pressures for anti-subversive legislation. Congressional investigations were held on Communist infiltration of the movie industry, of the Federal government and its employees, and of international and political organizations. Hearings on Communist front organizations and Communist propaganda activities in many metropolitan areas, including New Haven, were published during the 1950s.
The House Committee on Un-American Activities is always a subject of interest by undergraduate and graduate students for senior essays and dissertations. It attracted an unusual amount of interest following September 11, 2001. Although some of the hearings have indexes of names and organizations, many do not. And during part of this particular period of Yale government documents history (1923-1948), the hearings were bound together in a manner that requires an intermediary finding aid. Most of the 82nd through 86th Congresses’ publications (1951-1960) are available only on microfilm.
We are proposing this collection of materials for a digitization project because of the high level of interest and use and the difficulty in using the variety of formats, paper, bound and microfilm. In addition, because US government documents are in the public domain, we do not have copyright issues with digitizing this material.
There is also interest in
this project among other institutions.
Other university libraries may in the future contribute to the project
by digitizing hearings in the series that Yale does not own.
We will select twenty hearings based on importance, interest, and length. At the same time, we will analyze the five hearings that have already been scanned this year to determine the best way to present the text. At present we plan to code the material in XML using a TEI-based DTD. The material resembles a play in many aspects, with “lines” attributed to various speakers. Further research may uncover a more suitable DTD for these types of government documents.
We will hire student assistants to scan the material. They will also perform optical character recognition on the scanned images and mark-up the material in XML. We will purchase OCR and XML editing software to facilitate this process. This software will reside in the Digital Conversion Facility, where it can be used in the future by other units of the library as well. We know from previous experience with scanning five sample hearings that scanning is not very time-consuming. With the help and training from the staff in Electronic Collections, there will be plenty of time available for students to mark up the material.
The digital material will be kept on the Jeeves server, under the auspices of Electronic Collections. It will also be backed-up on CD-ROM.
The Government Documents and Information Center staff will create full bibliographic records in ORBIS for the documents selected that do not currently have online catalog records.
While the material is being tagged, we will experiment with access methods for the hearings. Once the hearings are in XML they can be posted on the web for text searching, along with PDFs of the scanned images. However, there are other possibilities for taking full advantage of the marked up text. We will explore the possibilities of using the Metadata Transmission and Encoding Standard (http://www.loc.gov/standards/mets/) combined with XSL style sheets to provide access to both the images and to full-text searching. METS stores appropriate metadata along with the digital objects in XML, allowing all the information to be used in different ways. The HUAC hearings may also be appropriate for delivery through the Digital Library eXtension Service software, currently in use on the Jeeves server for some locally loaded full-text databases such as the Patrologia Latina (http://jeeves.library.yale.edu/pld/). The hearings would then be delivered in an interface that resembles the Making of America (http://moa.umdl.umich.edu/).
At the end of the year, we will evaluate the feasibility of using student labor for text projects of this nature. We may also have the opportunity to analyze the success of up to three access methods for the electronic text. Finally, we will assess whether this project aids access to this complex collection of important materials.
Timeline:
January - March: select materials for the project, analyze and develop DTD further
March - May: test out DTD on already scanned material by student workers marking up material.
May - June: tweak DTD, evaluate tagging process and workflow.
July - September: using summer students, continue tagging, which may continue into the fall, contingent on finances
July - November: work on METS implementation, including metadata and style sheets. In addition, explore the possibility of implementation on Jeeves server.
December: evaluate project.
|
Student Labor (scanning, markup) |
$10.50/hr x 6 hours/week x 16 weeks (96 hours total) |
$1008 |
|
Software:
XMLSpy, Textbridge (for OCR) |
$99, $73 |
$172 |
|
|
|
|
|
Total: |
|
$1180 |
For
researchers: This project will provide enhanced access to
these materials and enable researchers to more easily use the HUAC hearings
than they can at present.
For Yale University Library: This project will enable Yale to explore methods for delivering electronic text.