OAI-PMH Concepts
What is the OAI-PMH?
The Open Archives Initiative Protocol for Metadata Harvesting (
) specifies a relatively simple method for harvesting data from a data source. The data source, called a repository in OAI-PMH terms, accepts requests that conform to the protocol. The requests can query the repository for information about itself and the service it provides as well as the actual data, called in OAI-PMH terms, that the repository exposes. Responses are returned in the Extensible Markup Language (XML) format, the structure of which is also specified by the protocol. The structure of any metadata included with a response is not specified by the OAI-PMH protocol and a repository may support any number of XML formats for metadata. However, the protocol requires that, at a minimum, all repositories disseminate metadata in the unqualified Dublin Core (oai_dc) XML format. The format of the metadata included in a response can be specified in the request.Requests are made to a repository using the HTTP GET
or POST
methods.
There is a single base URL for all requests. All requests consist of a list of
keyword arguments which take the form of key=value
pairs. One of the
key=value
pairs must be the verb
key paired with one of the six allowed
verb values.
Verb | Description |
---|---|
GetRecord |
Retrieve an individual metadata record from the repository. For example: GetRecord |
Identify |
Retrieve information about a repository. For example: Identify |
ListIdentifiers |
Retrieve a list of identifiers that uniquely identifies each metadata record in the repository. For example: ListIdentifiers |
ListMetadataFormats |
Retrieve a list of the metadata formats that the repository supports. For example: ListMetadataFormats |
ListRecords |
Retrieve all metadata records in the repository. For example: ListRecords |
ListSets |
Retrieve the sets supported by the repository. Sets are predetermined groups of metadata records that identify related items. For example: ListSets |
Additional mandatory and optional arguments are specified depending on the particular verb used.
Some verbs require or allow extra arguments:
Parameter | Description |
---|---|
from |
Restricts the results to metadata records modified on or after the given date. Optional for ListIdentifiers and ListRecords. For example:
|
identifier |
Specifies the metadata record to be retrieved. Required for GetRecord. For example:
|
metadataPrefix |
Specifies the format of the retrieved metadata record(s). Required for GetRecord, ListIdentifiers and ListRecords. For example:
|
resumptionToken |
Beyond the scope of this document, see the OAI-PMH specification section on Flow Control. |
set |
Specify a set using a setSpec value for which we want to retrieve the identifiers or metadata records. Optional for ListIdentifiers and ListRecords. For example:
|
until |
Restricts the results to metadata records modified on or before the given date. Optional for ListIdentifiers and ListRecords. For example:
|
Responses are returned in the XML format. The structure of the XML is mostly specified by the OAI-PMH. However, requests using the verbs GetRecord
or
ListRecords
contain a metadata record section, the structure of which is specified by the metadataPrefix
argument. Take, for example, the request:
http://arXiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:cs/0112017&metadataPrefix=oai_dc
The GetRecord
verb means that we are attempting to retrieve the metadata record for a specific item in the repository. The identifier
argument specifies the particular item we are attempting to retrieve, and the metadataPrefix
argument specifies the format of the metadata record section. In the response below, the metadata record section is contained within the <metadata>
tags. All other parts of the response are specified by the OAI-PMH:
<?xml version="1.0"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2012-12-20T07:37:42Z</responseDate> <request verb="GetRecord" identifier="oai:arXiv.org:cs/0112017" metadataPrefix="oai_dc">http://export.arxiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv.org:cs/0112017</identifier> <datestamp>2007-05-23</datestamp> <setSpec>cs</setSpec> </header> <metadata> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Using Structural Metadata to Localize Experience of Digital Content</dc:title> <dc:creator>Dushay, Naomi</dc:creator> <dc:subject>Computer Science - Digital Libraries</dc:subject> <dc:subject>H.3.7</dc:subject> <dc:description> With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or rendering, from digital object storage and manipulation, so the rendering can be tailored to particular communities of users. Our framework also accommodates extensible digital object behaviors and interoperability. The two key components of our approach are 1) exposing structural metadata associated with digital objects -- metadata about the labeled access points within a digital object and 2) information intermediaries called context brokers that match structural characteristics of digital objects with mechanisms that produce behaviors. These context brokers allow for localized rendering of digital information stored externally. </dc:description> <dc:description>Comment: 23 pages including 2 appendices, 8 figures</dc:description> <dc:date>2001-12-14</dc:date> <dc:type>text</dc:type> <dc:identifier>http://arxiv.org/abs/cs/0112017</dc:identifier> </oai_dc:dc> </metadata> </record> </GetRecord> </OAI-PMH>
Repositories are only required to support one metadata format, unqualified
Dublin Core (oai_dc), but may optionally support any number of additional
formats. The formats that a repository supports can be queried using the
ListMetadataFormats
verb.
Sets are an optional OAI-PMH concept that allow repositories to group items. Sets are defined by three elements:
- setSpec: the unique identifier of a set.
- setName: a short string naming the set.
- setDescription: an optional description of the set.
Sets that are exposed by the repository may be listed using the ListSets
request. They can also be used to limit the responses to the
ListIdentifiers
and ListRecords
requests using the set argument.
Sets may be organized in a hierarchy. Hierarchical organization is specified by the syntax of the setSpec
element. Each hierarchical item in the set is
separated in the setSpec
by a colon. Take, for example, the sets:
setName | setSpec |
---|---|
Institutions | institution |
Oceanside University of Nebraska | institution:nebraska |
Valley View University of Florida | institution:florida |
Subjects | subject |
Existential Kinesiology | subject:kinesiology |
Quantum Psychology | subject:quantum |
Here we have two main sets, institution and subject, each with two subsets.
The advantages of hierarchical sets are:
- Large sets can be partitioned into logically smaller groups.
- All subsets are returned in response to
ListIdentifiers
andListRecords
requests using theset
argument containing a parentsetSpec
.
Note: The setSpec element must only contain alphanumeric characters, colons or the characters: - _ . ! ~ * ' ( )
Other resources
Additional information about the OAI-PMH can be obtained from a number of sources, including: