Scopa Project Summary

Thirteen documents of historical neurological content were chosen for this project, with ten being completed. Documents ranged in size from a minimum of two pages to a maximum of 35 pages (13.23 average), most with at least one image (7.3 average). Once a document was chosen, it was requested from the historical library vaults and a copy of the document was made. Since these were older documents, the quality of the copy needs to be considered as a poor visual copy does not scan sufficiently. Copy quality is a function of the color and quality of the original as well as the quality of the copying machine. With more contemporary articles or images (less brittle pages or binding), the original can be scanned, saving both time and resources. In either case, the scanning glass must be cleaned of dust, paper particles, and image residues to insure a clean scanning image.

Scanning the document

The document was scanned using three programs: Omnipage 9.0 (OCR), Adobe Photoshop (PDF), and Adobe Acrobat (PDF). Omnipage was used as a means to scan an original into a workable, word/html format in order to be accessible to a search engine. Photoshop and Acrobat were used to establish a picture or copy of the original so that it could be displayed on the web and then read or copied. Whereas Omnipage allows terms and words to become searchable in a specific document, the Acrobat pdf form only provides a picture.

Number

Title

Pages

Images

1

MedChTr 1885-68

33

4

2

MedNew 1884-45

6

4

3

MedNew 1904-84

4

3

4

MedTimes 1872-2

3

1

5

NYMedJ 1892-55

3

1

6

MedRec 1889-35

4

0

7

JAMA 1900-34

9

9

8

SurGyOb 1928-47

35

41

9

Lancet 1890-2

2

1

10

MedChTr 1889-72

20

1

11

AmMedSci 1905-130

19

6

12

SurGyOb 1905-1

18

17

13

BritMed 1891-2

16

7

 

Average

13.2308

7.30769

Scanning using Adobe

Photoshop provided better tools than Acrobat for editing the picture of the document. Photoshop would only recognize one page at a time and therefore a four-page document would need to be saved as four individual pages. Acrobat did not have sufficient tools for editing, but did allow a multi-page document to be scanned and saved. If using Photoshop first for editing, the individual pages could be opened in Acrobat and then merged into one document. Time was then spent making sure each page had similar contrast and dimensions. A four-page document would take about one hour to scan into Photoshop, edit and save, then merge using Acrobat. Additional time would be necessary to test the clarity and resolution of the picture on the web and re-edit. Since the pdf-format uses space on the temporary folder of the hard drive, more than sufficient space should be made available on the hard drive for the program to work. Problems occurred in temporary storage (program would crash), uniformity in contrast and size of each page in the document, and the scanner not working in a multi-page format (all of these adding additional time to the process).

Scanning using Omnipage

The scanning process in Omnipage worked in the multi-page format better than the Adobe programs, but there were some initial problems with the documents that had more than ten pages (Omnipage would scan, then either freeze up or crash). The Omnipage AutoScan process allows the technician many options at formatting the scanner process. In some cases, depending on the number of columns or the font, Omnipage would scan like Adobe, only giving a picture of the original. With adjustments to the AutoScan, Omnipage would read the page and give a word-like document. In older documents, the font was unrecognizable and run-on sentences, large spaces between words or sentences, merging columns, merging pages, and merging image captions were possible. In better quality copies, Omnipage did a very good job. A four-page document could be scanned in Omnipage, read, and then images deleted in ten minutes.

Proofreading was the key time-consumer in the process. Proofreading in Omnipage was possible, but many words would be changed when the Omnipage document (met) was transformed into a word document. For this researcher, it was easier and more time-efficient to clean up the major problems in Omnipage—spaces between sentences, run-on sentences, merged columns, headings, images—save as an Omnipage document (met), then save it as a Word document. In Word, the document was opened and proofread. In many cases, the document was compared to the original for proofreading and format three to four times. Depending on the quality of the original and the Omnipage document, a four-page document might take twenty to forty hours of proofreading.

After the Word document was formatted like the original (as a journal article) and proofread, it was saved as a web-based document (html). Images then needed to be scanned and inserted directly into the document. Word has the potential to insert a scanned picture directly into the document, but for better visuals, the image can be saved as a picture format (jpeg). Also when transferring a Word document to html, some of the characters are not recognized, so it must be then proofread again using Notepad or Wordpad. So a four-page document being transferred from Word to html with a couple of images may take ten to twenty-five hours of proofreading and editing.

Time Management

Thirteen documents were chosen for this project in a time frame of 1200 hours. Though ten documents were completed, some work was done on all thirteen. The average time spent on each document was 92.308 hours, though a four-page document might take thirty to fifty hours while the documents with greater number of pages and images could be well over one hundred twenty hours. Some of this time was dictated by training, trial and error, and computer malfunctions.

In retrospect, the time management for this project could be more efficient if the following recommendations were considered:

  1. The quality of the original or copy must be clear without shadows, black lines, etc. and practice with resolution, contrast and other editing tools is necessary.
  2. The computer hard drive must be conditioned for image storage in its temporary folder and zip discs should be used as a means of storing the final products.
  3. Training in preferred image saving formats (gif, jpeg) would be helpful.
  4. A suggested final "web-aligned" product in PDF and html should be available as a comparison for the other documents that are being converted. An outline of settings for formatting and editing would be helpful in making the PDFs and htmls more comparable to each other.