[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Data- and text-mining licensing
Data/text mining has been a area of scientific study since the
90's. For an overview, see: Losiewicz, Paul Oard, Douglas W
Kostoff, Donald N. "Science and Technology Text Mining Basic
Concepts" (ADA415886). January 01, 2003 28 Page(s) Handle:
http://handle.dtic.mil/100.2/ADA415886.
To my mind, it is a relational "fact-finding" and extraction
expedition. I don't see it is as any different than a traditional
human-initiated search result displayed with visualization tools.
Some database providers offer this service to customers now; see
Thompson Scientific
http://scientific.thomson.com/press/2005/8298419/ .
Data/text-mining also has application in the information commons.
For a discussion of the not-too-distant future, see Judy Hilden's
article: "Will the Future Bring Even More Important Copyright
Issues Than The Ones Raised by Online File-Swapping?" in FindLaw
Writ, 24 May 2005.
http://writ.news.findlaw.com/hilden/20050524.html
"The issues are as simple and fundamental as they are troubling:
Exactly how much content may be copied on the Internet - and
of what kind -- before copyright is infringed? And more deeply,
when is content "copied" in the first place when it comes to the
Internet? Does the fact that the copying is done via a machine
editor - not a human editor - make a difference? "
Bonnie Klein
-----Original Message-----
[mailto:owner-liblicense-l@lists.yale.edu] On Behalf Of Joseph J.
Esposito
Sent: Monday, May 29, 2006 4:57 PM
To: liblicense-l@lists.yale.edu
Subject: Data- and text-mining licensing
I have been involved in a number of discussions concerning data
and text mining recently and wonder if anyone has any experience
with these topics that they would like to share. The basic
question is whether the license for an electronic resource in a
form suitable to be read by humans extends as well to a license
for machine-reading.
The area of data and text mining for scholarly materials is a new
one, at least to me. My understanding is that materials
(research data, user data, published articles, books, etc.) can
be gathered together in such a way as to enable robots to sift
through them and identify patterns and themes. These new
patterns--effectively robot-generated discoveries--may include
things that are not present in any single document in the
collection. Thus, the collection is greater than the sum of its
parts, but that greater value is only perceptible by machines.
This past week I heard an excellent presentation (it is not yet
online, but when the link becomes available, I will post it) by a
biostatistician, who commented that human access to such
databases is "of low value," in contrast to the "higher value of
robot access."
Data and text mining are sometimes being discussed in the context
of the idea of "Web 2.0," but I think this is a mistake. Web 2.0
is a concept of Tim O'Reilly's to describe the emerging practices
on the Internet today in the areas of community-building and
user-generated content. Web 2.0 is a metaphor, not a technical
specification--but a very valuable metaphor. O'Reilly, for
example, distinguishes between the early Web (his 1.0) and the
evolving Web by contrasting Encyclopaedia Britannica and the
Wikipedia. Both 1.0 and 2.0, however, share the fact that the
users are humans. Data mining is a game for machines. It would
be inaccurate to call it "Web 3.0" because machines don't require
a Web interface at all. Web 2.0 is post-modern, but data-mining
is post-human. Today's neologism: the Post Human Internet, or
PHUNET for short, pronounced either FOO-net or (my preference)
PEE-YOU-net. See Charles Stross's novel Accelerando.
Whether or not database mining of this kind will yield the kind
of new insights some believe it will, I do not know, but it would
be useful for the rights situation to be clarified early on to
fend off litigation at a later time. It seems likely to me that
publishers will begin to separate human- and machine-readable
rights, just as they distinguish between subscriptions for
libraries and individuals. There is an interesting precedent put
forward by some members of the library community, who argue that
it is reasonable for publishers to charge for hardcopy, but
electronic materials should be free. It is conceivable that over
time the "low value" of human-readable rights will become Open
Access, leaving the higher value PHUNET rights for aggressive
economic exploitation. It boggles the mind to think what a large
collection of science articles could be worth some day.
Joe Esposito