Dublin Core

Kalee Sprague


Overview

The Dublin Core (DC) is a metadata set designed to promote discovery of electronic resources. Metadata is, simply put, data about data. The DC element set provides a simple, flexible means of describing documents, images, sound files, and other networked information objects.

Originally, the DC was created to enhance searching of document-like objects on the web. The first Dublin Core workshop, held in Dublin, Ohio, identified 12 descriptive elements common to most web documents: Title, Author or Creator, Subject and Keywords, Description, Publisher, Other Contributor, Date, Resource Type, Format, Resource Identifier, Source, and Language. The DC elements were designed to describe works generated by a wide variety of intellectual disciplines and in a number of formats; in addition to text, the element set applied to graphics, sound, and video files. Three more elements, Relation, Coverage, and Rights Management, were added at later Dublin Core Workshops to enhance description of images. (Each of the 15 elements is described in greater detail in Appendix A).

Using Dublin Core for Primary Description

When looked at as a primary means of description, the Dublin Core possesses several weaknesses. First, the level of description provided by the Dublin Core is not exhaustive. The 15 DC elements are general in nature. This generality gives the Dublin Core its flexibility to describe different resources. However, this same attribute limits the DC's ability to describe a work at more than a basic level of detail. The "Author/Creator" element, for example, does not distinguish between corporate authors and personal authors. The corporate nature of an author can only be indicated using explanatory text within the element value itself. When other details about the author are added, such as e-mail address or affiliation, the simple element field can grow long and unwieldy.

Second, the Dublin Core does not prescribe a syntax for element values. In contrast to highly regulated metadata formats like USMARC, the Dublin Core element values use natural language. The advantage of using natural language is that authors/creators can describe their works using language and formatting appropriate to their respective disciplines. Natural language values, however, can pose serious problems for search engines [Lynch]. For example, a search engine would have to execute a number of complex comparisons to discover a specific date when DC dates are stored in any number of different formats (i.e. "10/25/98" vs. "10-25-98" vs. "October 25, 1998," etc.). Searching elements like the "Author/Creator" field described above poses similar problems. Search engines have to parse through long text strings looking for pertinent information like author name.

Third, description of collections and surrogates is awkward under the Dublin Core. Data describing a Collection is linked to members of the collection through the "Relation" element. A search engine attempting to group like things together will have difficulty identifying both the collection and all of its members. Describing the relationship between surrogates is also difficult. For example, two versions of a famous photo exist: the photo itself, and a digital image made from the photo. Both the image and the photo have metadata associated with them. Under the Dublin Core, the relationship between image and photo is described using free-text in the "Relation" element. A patron searching specifically for the original photo may have difficulty distinguishing between references to the photo itself and references to the image.

Qualifiers

In order to address some of the weaknesses in the Dublin Core, a series of qualifiers have been proposed to refine the core element set. The proposed qualifiers fall into two groups: "schemes" and "types". Schemes describe the syntax used by element values. The scheme "LCSH," for example, indicates that the values contained in a Dublin Core "Subjects" element are Library of Congress Subject Headings. Types refine the core element itself. The type "CorporateName," for instance, defines a Dublin Core "Creator" as a corporate author. The Dublin Core Workshops have set two limitations on qualifiers. First, a qualifier can only refine an element, not re-define its semantics. Second, the content must still be understood if the element is used without qualifiers. For a full description of the proposed Dublin Core Qualifiers, see Appendix B.

A Networking Approach: The Warwick Framework

While qualifiers enhance the quality of description in the Dublin Core, they add complexity to the schema without necessarily addressing the needs of specialized communities. Many of these communities have developed their own metadata sets like MESL, the VRA Core, and others. Specialized metadata sets resolve primary description issues, but they make data exchange difficult. The unqualified Dublin Core element set, in this context, offers a means of communicating and exchanging data from more specialized metadata schemes. The Warwick Framework provides a model for using the Dublin Core in this role.

Proposed at the second Dublin Core workshop in Warwick, England, the Warwick Framework was designed as an architecture for the exchange of metadata. The framework provides a means both for communicating among different metadata schemas, and for defining hierarchical relationships of information objects. The Warwick Framework is composed of a series of "Packages" and "Containers." A container references a specific information object. The packages are the different entities that describe the object. Within one container, for example, there might be a MARC package, a Dublin Core package, and a simple package containing a URL for a related object. A search engine approaching the container can read through the packages and pick the metadata description most compatible with its operation. The Dublin Core fits well within this framework as a lowest common denominator among metadata sets. The Warwick Framework allows databases to use a local metadata element set for primary description, while utilizing the Dublin Core as a means of communicating with databases using other metadata schemes. A site might include two metadata packages in the "container" pertaining to an image: a VRA-core based description, and a Dublin Core description. Search engines capable of reading VRA-core elements would select the VRA description of the image. Other search engines would scan the Dublin Core description for information.

Within the Warwick Framework, packages can themselves be containers, allowing for an infinite hierarchy of objects and metadata sets. The ability to define hierarchies under the Warwick Framework resolves many of the problems describing related objects with the Dublin Core. For example, a hierarchy of packages and containers could be used to describe a painting within a collection, within a museum. The painting, the collection, and the museum are represented with a cascading series of containers linked to each other through URL's or URN's. A similar hierarchy of containers and packages can clearly define the relationship among surrogates. In the example given previously, the photo and its digital reproduction would each be represented with a container, multiple metadata packages, and package containing a link to the related resource.

Conclusion

The low level of detail encompassed by the Core, and the lack of a defined syntax for element values, make the DC a dubious choice for primary description of material. However, the flexibility and universality of the core make it a good medium for promoting exchange of data among databases using a variety of specialized metadata sets. The DC could provide a means for unifying Yale’s diverse digital resources.


Appendix A: Implementing the Dublin Core HTML

The simplest method of implementing Dublin Core description on the Web is to use HTML. The META tag in both HTML 2.0 and HTML 4.0 supports the Dublin Core element set. HTML 2.0 uses a simple tag syntax indicating metadata scheme and element name, followed by content of the field. For example, the Dublin Core "Creator" of this summary is tagged as:

<META name="DC.creator" content="Kalee Sprague">

Most browsers and many search engines such as Alta Vista support use of the META tag. Under HTML 4.0, the qualifiers "SCHEME" and "LANG" are available to further enhance description in the META tag. For example, the following tag indicates that the DC "Creator" element above is described in English:

<META name="DC.creator" lang="en" content="Kalee Sprague">

RDF

The Resource Description Framework (RDF), developed by the World Wide Web Consortium (W3C), is an experimental method for supporting metadata description of networked resources. RDF works within the XML (Extensible Markup Language) Namespace element. Within its Namespace, RDF references both its own and other metadata schemas by their Uniform Resource Identifier (URI). The URI, in theory, marks the reference location of the metadata standard being used. RDF incorporates the Warwick Framework, allowing the use of multiple metadata schemes to describe an object. The different schemes can be used in parallel or within a single element hierarchy.

The following example uses both DC and MESL elements:

<Description about = "http://www.library.yale.edu/databaseadmin/dublincore.html"
<DC:Title Dublin Core, a Summary</DC:Title
<DC:Subject Metadata, RDF, Dublin Core </DC:Subject

<MESL: Concepts/Function Information Management - Internet
</MESL: Concepts/Function
</Description
</RDR>

Z39.50

Z39.50 offers exciting possibilities for the exchange of Dublin Core metadata. Z39.50 clients and hosts are already widely used for searching across different databases and metadata schemas. The Z39.50 organization plans to incorporate the basic 15 elements of the Dublin Core into the Bib-1 attribute set. The Bib-1 attribute set is the basic bibliographic attribute set used by the Z39.50 Version 2.0 standard. DC qualifiers may be incorporated in a separate attribute set in Z39.50 Version 3.0; plans for this standard are still under way.


Appendix B: Dublin Core Element Set
Reference definition available at
URL:http://purl.org/metadata/dublin_core
1997-11-02

Each element is optional and repeatable; the elements can appear in any order.
Field Label Description
Title Title The name given to the resource, usually by the Creator or Publisher
Author or Creator Creator The person or organization primarily responsible for creating the intellectual content of the resource.
Subject and Keywords Subject The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource.
Description Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources.
Publisher Publisher The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity.
Other Contributor Contributor A person or organization not specified in a Creator element who has made significant
Date Date A date associated with the creation or availability of the resource. Such a date is not to be confused with one belonging in the Coverage element, which would be associated with the resource only insofar as the intellectual content is somehow about that date.
Resource Type Type The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. For the sake of interoperability, Type should be selected from an enumerated list that is currently under development in the workshop series.
Format Format The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource. For the sake of interoperability, Format should be selected from an enumerated list that is currently under development in the workshop series.
Resource Identifier Identifier A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented).
Source Source Information about a second resource from which the present resource is derived. While it is generally recommended that elements contain information about the present resource only, this element may contain a date, creator, format, identifier, or other metadata for the second resource when it is considered important for discovery of the present resource; recommended best practice is to use the Relation element instead.
Language Language The language of the intellectual content of the resource. Where practical, the content of this field should coincide with RFC 1766 [Tags for the Identification of Languages, http://ds.internic.net/rfc/rfc1766.txt ]; examples

include en, de, es, fi, fr, ja, th, and zh.

Coverage Coverage The spatial or temporal characteristics of the intellectual content of the resource. Spatial coverage refers to a physical region (e.g., celestial sector); use coordinates (e.g., longitude and latitude) or place names that are from a controlled list or are fully spelled out. Temporal coverage refers to what the resource is about rather than when it was created or made available (the latter belonging in the Date element).
Rights Management Rights A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.

Appendix C: Dublin Core Qualifiers (Types)
Original documentation available at
http://www.loc.gov/marc/dcqualif.html
1997-10-15

Element Subelement Description
Title Alternative Used for any titles other than the main title; including subtitle, translated title, series title, vernacular name, etc.
Main Used where two or more titles are being recorded for the same resource in order to distinguish the main title from alternative titles.
Creator PersonalName
  • Address
The name of an individual associated with the creation of the resource.
CorporateName
  • Address
The name of an institution or corporation associated with the creation of the resource.
Publisher PersonalName
  • Address
The name of an individual associated with the publication of the resource.
CorporateName
  • Address
The name of an institution or corporation associated with the publication of the resource.
Contributor PersonalName
  • Address

The name of an individual associated with the resource.
CorporateName
  • Address
The name of an institution or corporation associated with the resource.
Date Created Date of creation of the resource
Issued Date of formal issuance (e.g., publication) of the resource.
Accepted Date of acceptance (e.g., for a dissertation or treaty) of the resource.
Available Date (often a range) that the resource will become or did become available.
Acquired Date of acquisition or accession.
DataGathered Date of sampling of the information in the resource.
Valid Date (often a range) of validity of the resource.
Relation Type No definition given
Indicator No definition given
Coverage PeriodName The resource being described is from or related to a named historical period, referred to by this use of the element.
PlaceName The resource being described is associated with a named place, identified by this use of the element.
X The resource being described is associated with a spatial location which may be defined by the use of x, y, (and, possibly, z) co-ordinates.
Y See above
Z See above
T The resource being described is from or associated with an instance in time that may be given numerically.
Polygon The resource being described may be located with respect to a shape, or polygon, defined in space as a series of x, y co-ordinate values.
Line The resource being described may be located with respect to a line defined in space by a series of x, y co-ordinate values.
3d The resource being described may be located with respect to a volume, or hull, defined in three dimensional space as a series of x, y, z co-ordinate values.


Bibliography
"Syntactic Considerations for the Dublin Core."
1997-11-02
http://purl.oclc.org/metadata/dublin_core
(29 Sept. 1998).

Guenther, Rebecca. "Dublin Core Qualifiers/Substructure"
October 15, 1997
http://www.loc.gov/marc/dcqualif.html

Iannella, Renato. "An Idiot's Guide to the Resource Description Framework."
1998-09-03
http://www.dstc.edu.au/RDU/reports/RDF-Idiot/
(5 Oct. 1998).

Lagoze, C. "The Warwick Framework: A Container Architecture for Diverse Sets of Metadata."
D-Lib Magazine.July/August 1996.
http://www.dlib.org/dlib/july96/lagoze/07lagoze.html
(5 October 1998).

Lynch, Clifford. "The Dublin Core Descriptive Metadata Program: Startegic Implications for Libraries and Networked Information Access." ARL. February 1998, pp.5-10.

LeVan, Ralph. "Dublin Core and Z39.50."
Draft Version 1.2. 1998-02-02
http://www.oclc.org/~levan/docs/dublincoreandz3950.html
(3 Nov. 1998).

Weibel, Stuart. "A Proposed Convention for Embedding Metadata in HTML. "
1996-06-02
http://purl.oclc.org/docs/metadata/dublin_core/approach.html
(29 Sept. 1998).


Return to: Metadata Task Group Home Page