[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Comment by Peter Brantley
I recently posted a query concerning data-mining to this list.
I happened to share it with Peter Brantley of the California
Digital Library, who replied in his characteristically thoughtful
way. His remarks are pasted in below, with his permission (with
the original informal style of a personal email). Note in
particular the comment about the Open Text Mining Initiative that
is being promulgated by the Nature Group.
Joe Esposito
___
I think this is a very intelligent question, and certainly one
that is being asked. it's not yet a problem at the CDL, and
hasn't been discussed there; but I have discussed flavors of this
with others.
I think there are various ways in which digitized texts could
produce transformative additional IP.
there is the text mining means you describe, in which services or
users are able to elucidate or uncover meanings, linkages, and
patterns there were previously undisclosed. These in turn could
be published or leveraged for revenue in various ways.
(companies like MarkLogic build their businesses off this kind of
work).
there is the value-add that social software techniques could
produce, through the production of lists, perhaps pointing deep
into texts, or at small portions of texts; the IP inherent in
annotation and tagging (who owns these?); and additions to expert
ontologies that might be used within text mining to further
value. (Just a few examples).
there are also virtual texts, in which users able to search
across a range of material might be able to produce new and
useful derivatives, such as "The 100 Best Salpicon Recipes" -
what portion of that IP could be claimed by the original
publisher? is that akin to the relationship of a movie to a
screenplay?
I would note that one of the innovations that Nature Publishing
has recently provided is the Open Text Mining Initiative, which
explicitly provides a mechanism for publishers to produce machine
readable files that facilitate text mining and indexing without
rendering the text to human readership and without forsaking the
lion's share of the IP. I think OTMI will potentially be very
successful, and I think approaches like it will be embraced for
at least an interim period of time.
####