Minutes from CDC Digital Collections TF

11/21/03         

 

Guests:  Ann Green, Social Science Statistical Laboratory, ITS.

 

Today we discussed the Economic Growth Center Digital Library project.  Ann handed out a power point describing the project. 

 

The project began with an interest at the Mellon Foundation in looking at moving paper statistics to online resources.  Don Waters and the DLF worked on a statistics project with Ann earlier.

 

The EGC has existed since 1961, consists of material from developing countries.  The paper the material is printed on can be of poor quality.  Its labor intensive to acquire this material and its not indexed.  It is used a lot, and has had a great influence on prominent alumni (such as the former president of Mexico).  It is a unique collection.

 

The project is both a preservation and an access project. 

 

Why Mexico:  political connections, had the material already, and a willing curator to help get other material.  They have had copyright issues with some of the material and are working on getting those worked out.  They will need not only central permission but permission from the individual states. In the meantime the material will be up for Yale only.

 

The project has two parts:  they sent the material to be scanned into TIFFs, converted to PDFs and provided some metadata.  This material was OCRed not re-keyed because they wanted to test the scraping mechanism.  A subset of the material was put into Excel tables.  They needed nearly 100% precision on the OCR which was one challenge.  They will also put these tables into the NESSTAR system for better (cell-level) access.  The material required DDI metadata which is very rigorous and they are testing how much they can do right now.

 

We had some questions about the technical details and experience with the vendors.  We also discussed how the grant developed and what the future for the project is.  This grant will provide services that can be offered to other faculty.  Faculty can add to the project as they do other grants using the material.   There was some talk about how this data could have commercial value but the grant is not meant as seed money for that kind of product.

 

The project added two kinds of value:  researchers can now find the data (it was not indexed before) and they can use the data differently in the NESSTAR system, not just in the tables they are published in.

 

The project is another good example of ITS and the Library working together. 

 

Next meeting, December 5, with the Divinity Library.