The Japanese Newspapers Project: Final Report
Submitted by Kalee Sprague
on behalf of
Meng-Ghon Y Tang, Karen Reardon, Audrey Novak, and Sekiko McDonald
Summary
The goal of the Japanese Newspapers Project was to provide vernacular access to Yale’s Japanese newspaper collection, as a means of experimenting with the storage, indexing, and display of non-roman characters. We were particularly interested in exploring the current level of software support for the CJK sections of the Unicode standard. The Unicode standard has achieved a high level of visibility in the library and computing communities in the past several years as software vendors from Microsoft to Netscape have promoted their products by claiming conformance with the standard. During the course of our project we discovered that current Unicode support is, in reality, inconsistent and unreliable. However, with many workarounds and a little problem solving at Microsoft, we were able to create a Japanese bibliographic database with a dynamically generated web interface.
The Project Plan
The Original project plan consisted of the following phases:
Research
Research into current Unicode support took several forms. We obtained a copy of the full Unicode standard, and did an extensive literature search on current software support of CJK characters. It quickly became apparent that 1999 web browser and Microsoft Access versions did not support 16-bit Unicode characters. Microsoft Access 2000 and Internet Explorer 5.0 both promised to include extensive support for the display and storage of CJK characters in Unicode. It was hoped that Netscape would also make similar strides in Unicode support over the course of the year.
Our investigation into existing data translation tools was also enlightening. There were no freely available tools for converting a marc record from RLIN extended ASCII to Unicode. An RLG representative made it clear during an ALA program on Unicode last summer that RLG would not seriously investigate exporting Unicode records until the majority of library management systems adopt the standard. A new standard for encoding MARC records in Unicode is now available in the MARC21 document issued by the Library of Congress in January 2000. Eventually this document will encourage the widespread adoption of Unicode. In the meantime, the simple translation tools available in products like Microsoft Word 2000 are insufficient for conversion of MARC records to MARC21 Unicode records. Ultimately our group decided that the most realistic option for building the Japanese Newspapers database was to key simplified records into Access 2000.
Installation of Japanese Fonts
Meng-Ghon Y Tang set up a workstation with Office 2000, and began to install the supplemental fonts and resource files necessary to support the input and display of Japanese characters. What promised to be a simple installation quickly became a complex process. Meng has created a detailed instruction sheet outlining the necessary steps for consistently displaying Japanese records on a workstation. These instructions will soon be available on the Workstation Support Group web page.
Creation of a Unicode-compliant Database
The next step was the creation of a Unicode-compliant database in Access 2000. Our plan to key records into the newly installed Access 2000 soon ran into an obstacle. Keying Japanese Unicode characters is impossible in Access 2000. We discovered that Japanese characters have to be created in Microsoft Word or the Microsoft IME editor, then pasted into Microsoft Access.
Before we began keying in the records, Sekiko McDonald from the East Asian cataloging team identified the Yale Japanese Newspaper catalog records in RLIN. Since the records were being manually keyed, we decided to create a brief record consisting of key bibliographic fields and a link to the full record in Crossplex Orbis. Kalee designed the Access data table consisting of:
Meng Tang keyed in the Japanese Title and Transliterated Title. Kalee created the link to the full record in Crossplex Orbis and Katherine Eigen, a WSG student worker, keyed in the remaining record fields.
Creation of a Dynamic Web Interface
Karen Reardon then installed Access 2000 on a Windows 2000 Server running Internet Information Server 4.0; the database was copied to this machine. After much experimentation and a few pointed questions to the Microsoft help desk, a web interface was created using Active Server Pages (.asp). The interface queries the database and dynamically generates web pages based on the criteria. The interface itself supports searching in Japanese and Chinese. The final interface can be seen at http://hedorah.library.yale.edu/chineseandjapanesenewspapers.html.
Our work on this project was one of the most fruitful research and development projects we have participated in. Since we completed the Japanese portion of the project, a small database of Chinese periodicals has been added. Our interface now supports the search and display of both Japanese and Chinese records.
Expenditures
Due to existing software licenses and the availability of student help in workstation support, we did not need any of the grant funds extended to this project.
Return to List of Grants 1993-1999 | Return to SCOPA Grants Home