[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Libraries criticized for role in Google Book Search (long)
Bernie:
Here are my thoughts:
Overall, I think these comments don't reflect the agreements and
facts, or fail to accept that libraries operate with limited
resources.
Respecting the comments that participating libraries "are just
giving away access to one company that is cornering the market on
on-line access," and have fostered the "centralizing and
commercializing [of] knowledge under a single corporate
umbrella," I disagree. The participating libraries did not "give
away" access to Google; they received what they perceived to be a
valuable consideration, in the form of digital copies of those
books (PDF plus OCR plus work-level and structural metadata),
accompanied by what they perceived to be fair usage rights under
the circumstances. (See final paragraph respecting the
circumstances of bargaining.)
Nor is Google "cornering the market on on-line access" to these
all of these titles. Respecting the public domain works, in many
instances digital copies are already available on the Internet
from sources such as the participants in the Open Content
Alliance; and under their individual contracts with Google
(which, I believe, will continue to govern the digitized public
domain titles after the settlement becomes effective),
participating libraries may make their digital copies available
to their own patrons and to nonpatrons, through such third
parties as HathiTrust.
Respecting the in-copyright but out of print titles, vendors
other than Google, such as netLibrary, ebrary, and many others,
have digitized thousands of such titles, which presently compete
with Google's digitized copies. In addition, the Google Book
settlement, a nonexclusive agreement, enables participating
libraries to negotiate new digitization agreements with the
copyright owners and vendors other than Google, and facilitates
such new transactions by permitting the Books Rights Registry to
be used in deals with vendors other than Google.
Although Google may have a temporary advantage respecting older
in-copyright and out of print titles, the settlement lowers entry
barriers to that market. Finally, the notion that the Google
Book endeavor either "centraliz[es]" or "commercializ[es]"
knowledge merits some comment. First, "knowledge" is not the
subject of the original Google contracts or the settlement,
because copyright and other property rights in information at
issue here attach at the level of expression, not of knowledge.
All the materials at issue here are readily accessible to
academic users and the public in print or digital format by
avenues unrelated to Google. The reservoir of human knowledge is
not diminished by one drop by virtue of these agreements.
Second, by lowering barriers to entry to the older in-copyright
and out of print market, the settlement arguably will foster
competition in that market, which may increase the dissemination
of those works. If that dissemination leads to greater
knowledge, then the settlement may nurture, rather than
constrain, the growth of knowledge.
Respecting the claim that participating libraries acted "without
concern for user confidentiality," I think the documents read
otherwise. Respecting the University of Michigan's (UM) and
University of Texas's (UT) original contracts with Google,
personally identifying information of patrons may be protected by
the phrase "customer lists" in section 6.1, or, if not, then the
parties may well have thought that no personally identifiable
information of individual patrons would be disclosed to Google
during the digitization process or downstream. In the settlement
agreement, the parties appear to promise to keep confidential
personally identifiable information of patrons in the phrase
"about any customers" in section 15.1 by means of the
confidentiality agreements referred to in section 15.2, and the
auditors will keep such information confidential under a
nondisclosure agreement pursuant to section 8.2(c)(i).
Respecting the claim that participating libraries acted "without
concern for ... preservation . . . or long-term sustainability,"
I think that's inaccurate. Respecting preservation format, the
Library of Congress deems PDF a preferred digital preservation
format for "[t]ext with page-layout rendering," see
http://www.digitalpreservation.gov/formats/content/text_preferences.shtml
The UM and UT original contracts with Google require Google to
give the libraries OCR, page images, and metadata (work-level and
structural); that is, PDF files with embedded text and structural
metadata (connecting text and images). I believe those PDF files
are consistent with the Library of Congress digital preservation
standard. (Note that the LC standard appears to permit PDF
without structural tags, but Google provided structural tags with
the library digital copies; see, e.g.,
http://babel.hathitrust.org/cgi/pt?id=mdp.39015055053659.)
(The settlement does not appear to specify the digital formats
that Google will give Fully Participating Libraries.)
Respecting preservation environment, the UM and UT original
contracts with Google enabled those libraries to transfer their
digital copies to third parties, and UM has transferred them to
HathiTrust, for, among other purposes, preservation. HathiTrust
appears to be pursuing a preservation strategy that complies with
present standards. See http://www.hathitrust.org/objectives .
What's more, the settlement agreement permits each Fully
Participating Library to "reproduce and make technical
adaptations to ... its [library digital copies] as reasonably
necessary to preserve, maintain, manage, and keep [them]
technologically current." Section 7.2(b)(i).
Respecting the claim that participating libraries acted "without
concern for ... image quality," I think the documents read
otherwise. The UM and UT original contracts with Google expressly
give the libraries the right to engage in quality control of the
images by sampling them on a regular basis.
Respecting the claim that participating libraries acted "without
concern for ... search prowess," that's not how the agreements
read or the end-products appear. To the extent that "search
prowess" depends upon both OCR and structural metadata, the UM
and UT original contracts with Google provided for both. To the
extent that "search prowess" depends upon the quality of the
search engine applied to the copies that Google retained, I think
little needs to be said about the quality of Google's current
full text search service. To the extent that "search prowess"
depends upon the quality of the search engines applied to the
library-retained copies, the UM and UT original contracts with
Google permit access through those libraries' own search
services, as well as through services of third parties.
For example, HathiTrust plans to develop advanced search tools
for retrieval of Google library digital copies transferred to it,
including "[r]obust discovery mechanisms like full-text
cross-repository searching." See
http://www.hathitrust.org/objectives. The settlement permits
each Fully Participating Library to "develop or obtain and . . .
deploy finding tools that allow its users to identify pertinent
Books within its [library digital copies] or generate information
from" the same, section 7.2(b)(iv), including search tools to be
used in data mining. Section 7.2(b)(vi).
Respecting the claim that participating libraries acted "without
concern for . . . metadata standards," again this seems
inaccurate. As noted above, the original UM and UT contracts
with Google required Google to provide work-level and structural
metadata with the library digital copies, and this metadata
appears to conform to the Library of Congress's digital
preservation standards. As one can see by viewing the library
digital copies in HathiTrust, those copies are linked to full
MARC 21 bibliographic records, (MARC 21 being an international
metadata standard; see http://www.loc.gov/marc/annmarc21.html);
and feature PDF structural metadata (both structural tags
identifying document segments and metadata linking text and
images), PDF being a national digital preservation standard (see
http://www.digitalpreservation.gov/formats/content/text_preferences.shtml).
Respecting the assertion that the participating libraries "chose
the expedient way rather than the best way to build and extend
their collections," this seems too harsh a view of research
libraries with limited cash resources. Authorities seem to say
that the "best" way to digitize text files, if cost is no issue,
is to generate, for each document, both an XML version and a
PDF/A version that contains embedded text with structural tags,
because, among other reasons, between them they preserve both
logical structure and original layout; see
http://www.digitalpreservation.gov/formats/content/text_preferences.shtml.
But creating two separate files for each document is costly, and
arguably beyond the means of many research institutions. LC also
appears to say that PDF/A or one of the other PDF subtypes alone,
without XML, meets its digital preservation standards, even if
the PDF file lacks structural tags. Though I can't tell whether
the Google participant library digital copies are in PDF/A or
another PDF subtype, I can see that they are PDF and that they
have structural tags, and so they appear to exceed LC's baseline
digital preservation standard. So if "best" is defined to mean
meeting national standards given limited resources, the
participating libraries arguably satisfied that definition
respecting building their digital collections.
In terms of extending libraries' collections, if one has
unlimited resources and can fund all digitization oneself, the
best way to use digital resources to extend one's public domain
collections may be to impose no access or distribution
restrictions on the digital copies. However, where research
libraries' cash resources are limited, "best" should arguably be
defined in terms of the most favorable bargain a library, acting
in the interests of its parent institution and patrons, can
strike with a capable digitization outsourcer willing to accept
noncash consideration. A deliverable conforming to standards but
bearing some usage restrictions may well satisfy that definition.
Respecting in-copyright materials, since rights holders will
practically always insist on usage restrictions as a condition of
digitization no matter what the library offers, there's no basis
for faulting the Google library participants for accepting such
restrictions on digital copies of copyrighted works.
-- Rob Richards
The preceding comments are not offered as legal advice and do not
constitute legal advice.
--
Robert C. Richards, Jr., J.D.*, M.A., M.S.L.I.S.
Philadelphia, PA
E-mail: richards1000@comcast.net
* Member, New York Bar, Retired Status