[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: MARC records for LION database
Dear Corinna,
Since this mail has been posted to a number of listservs, I'm
taking the liberty of posting a response to the LES, liblicense
and AUTOCAT lists, with apologies to the many recipients who will
therefore receive this message three times. We take your concerns
very seriously, and would welcome any feedback from the library
community on these issues, either in direct response to these
listservs, or via the kind of collective consultation that you
describe in your mail.
Please accept my apologies for the fact that your initial queries
were not all dealt with in a timely manner, and my assurances
that we aim to respond to all customer queries promptly. Your
document raises a number of important issues, which deserve a
full response: I have therefore attempted to address the broader
editorial questions in the body of this email, and have attached
a document which deals with each of the specific points. Some of
these points are clearly errors on our part, which we are happy
to correct (corrected records will be made available in our
December application release); others, however, raise quite
fundamental editorial or procedural questions, and it is in these
areas that we would particularly welcome input from the members
of these lists.
In response to your first point, about the choice of bibliographic unit,
I think the important distinction to be made is that Literature Online
(LION) is a specialist database of literary texts, unlike Early English
Books Online (EEBO), which is an archive of print volumes in digital
facsimile. The texts are of course originally taken from print volumes,
but the basic unit of the database is the new electronic file that we
have created, rather than, as it is in EEBO, the source printed volume.
In some cases, an electronic file in LION corresponds to one print
volume, but in many cases poems, plays and other works have been
extracted from larger print volumes to create individual files. I agree
that is therefore misleading to describe our MARC records as
representing the 'volume' level, and we will correct the text on this
page accordingly. By 'volume' we meant electronic file (as opposed to,
say, the individual poems contained in within a file), but this needs to
be corrected to avoid the inference that each MARC record corresponds in
all cases to a print volume.
Since EEBO is an archive of print volumes, its MARC records are
effectively book records; LION's, by contrast, faithfully
catalogue the electronic files, which are unique, editorially
created entities. The conventions used in creating the records
are therefore quite different from those adopted in EEBO. LION's
MARC records are provided without charge as finding aids for the
electronic texts in LION, and do not claim to contain the kind of
expanded bibliographic information of the source volumes that you
might expect from a complete catalogue record for a printed book.
They are of course created in accordance with cataloguing
standards, and contain the full bibliographic information
contained within the source texts, but the way we represent the
relationship between the file's contents and the original volume
differs in many cases. The addition of further information that
is not present in the source texts, such as uniform titles,
subject headings, systematic identification of subtitles, and
standardization of the use of brackets, would be a substantial
undertaking, which would probably necessitate either charging for
the records (as we do for the EEBO records), or entering into a
partnership project with external cataloguers. We already know of
at least one librarian who has undertaken some of this work on
the LION MARC records, and we are keen to find the most
appropriate way of sharing and disseminating this enriched data.
Most of the bibliographic inconsistencies which you have
identified relate to how the data was created, rather than how
the records were created. Literature Online was not created all
of a piece: the 16,000 files are taken from 19 separate
electronic collections, published over the course of 15 years,
each of which had its own editorial policy, and most of which
were originally published on CD-ROM with no expectation that they
would one day be cross-searchable. Whereas the original
collections are internally consistent, there will be many
editorial differences in areas such as the title field, often
determined by issues such as the practicalities of searching in a
drama database as opposed to a poetry database. In some cases the
data was digitised by collaborating academic institutions, who
made completely different editorial decisions in these areas, and
we have preserved those decisions rather than standardising with
our own policies.
Our current policy for new Chadwyck-Healey collections (such as
the African Writers Series) is to include the full contents of
the print volumes wherever possible. However, this was not
feasible or appropriate for earlier collections, which has left
us with a legacy of inconsistencies across the contents of LION.
We have a long-term aim of standardizing the bibliographic data
in LION: this would involve re-structuring the data and search
functionality, modifying the file titles and bibliographic
headers, and using this new structure as the basis for the MARC
records and Z39.50 database. Clearly, this would be a
considerable task: before embarking on it, we would need to be
sure that we were taking the right approach and providing the
data in the most useful way for our customers. We would therefore
be grateful for any suggestions in this area.
I look forward to hearing your thoughts on these matters, and to working
with you and your colleagues to help improve the service that we
provide.
Best regards,
Matt Kibble,
Development Manager, Literature,
ProQuest Information and Learning
Cambridge, UK
http://lion.chadwyck.co.uk <http://lion.chadwyck.co.uk/>
http://lion.chadwyck.com <http://lion.chadwyck.com/>
-----Original Message-----
<mailto:owner-liblicense-l@lists.yale.edu> ] On Behalf Of Corinna Baksik
Sent: 07 October 2006 01:16
To: liblicense-l@lists.yale.edu; AUTOCAT
Subject: MARC records for LION database
[please excuse cross-postings]
I would like to publicly raise concerns regarding the MARC
records for the full-text titles in the LION database (Literature
Online). The MARC records (over 16,000) are available from the
vendor at no additional cost to subscribers to the full database.
Our intention at Harvard was to load these records into our
catalog, but close analysis reveals that they are problematic and
of poor quality. I have written a document describing the
problems in detail and posted it here:
<http://ois.harvard.edu/%7Ecorinna/docs/LION_problems.pdf> >
I am interested in whether other libraries would like to approach
the vendor as a group and work with them to address these issues.
It is my understanding that this is a popular database and good
MARC records would be very valuable to subscribers. I would
appreciate any comments or suggestions you have. We are
investigating whether resolution of these problems can be brought
about through license negotiations, but the more subscribers that
are concerned about the quality of the records, the better.
In short, there are three issues that concern me most:
1) The truncation of titles in the 245, e.g. the MARC record
contains "The poems" when the original work is entitled "The
poems of Maria Lowell."
2) The inconsistent use of brackets in the title field:
[Poems, in] The loyalist poetry of the Revolution
The word of Congress ; the factious demagogue. a portrait
[In,
The loyalist poetry of the Revolution]
3) Lack of uniform titles, e.g. MARC record contains "The
tragedie of King Richard the Second" and no uniform title for
"King Richard II."
Please feel free to contact me on or off list. I will summarize
feedback.
Thank you,
Corinna Baksik
Systems Librarian
Harvard University Library