Project

fieldhand

0.01
No commit activity in last 3 years
No release in over 3 years
A library to harvest metadata from OAI-PMH repositories.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 2.3, < 2.4
~> 3.6
~> 1.21, < 1.22

Runtime

~> 2.5
 Project Readme

Fieldhand Build Status

A Ruby library for harvesting metadata from OAI-PMH repositories.

Current version: 0.12.0
Supported Ruby versions: 1.8.7, 1.9.2, 1.9.3, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6

Installation

gem install fieldhand -v '~> 0.12'

Or, in your Gemfile:

gem 'fieldhand', '~> 0.12'

Usage

require 'fieldhand'

repository = Fieldhand::Repository.new('http://example.com/oai')
repository.identify.name
#=> "Repository Name"

repository.metadata_formats.map { |format| format.prefix }
#=> ["oai_dc"]

repository.sets.map { |set| set.name }
#=> ["Set A.", "Set B."]

repository.records.each do |record|
  # ...
end

repository.get('oai:www.example.com:12345')
#=> #<Fieldhand::Record: ...>

API Documentation

  • Fieldhand::Repository
    • .new(uri[, options])
    • #identify
    • #metadata_formats([identifier])
    • #sets
    • #records([arguments])
    • #identifiers([arguments])
    • #get(identifier[, arguments])
  • Fieldhand::Identify
    • #name
    • #base_url
    • #protocol_version
    • #earliest_datestamp
    • #deleted_record
    • #granularity
    • #admin_emails
    • #compression
    • #descriptions
    • #response_date
  • Fieldhand::MetadataFormat
    • #prefix
    • #schema
    • #namespace
    • #response_date
  • Fieldhand::Set
    • #spec
    • #name
    • #descriptions
    • #response_date
  • Fieldhand::Record
    • #deleted?
    • #status
    • #identifier
    • #datestamp
    • #sets
    • #to_xml
    • #metadata
    • #about
    • #response_date
  • Fieldhand::Header
    • #deleted?
    • #status
    • #identifier
    • #datestamp
    • #sets
    • #response_date
  • Fieldhand::NetworkError
    • Fieldhand::ResponseError
      • Fieldhand::ResponseError#response
  • Fieldhand::ProtocolError
    • Fieldhand::BadArgumentError
    • Fieldhand::BadResumptionTokenError
    • Fieldhand::BadVerbError
    • Fieldhand::CannotDisseminateFormatError
    • Fieldhand::IdDoesNotExistError
    • Fieldhand::NoRecordsMatchError
    • Fieldhand::NoMetadataFormatsError
    • Fieldhand::NoSetHierarchyError

Fieldhand::Repository

A class to represent an OAI-PMH repository:

A repository is a network accessible server that can process the 6 OAI-PMH requests [...]. A repository is managed by a data provider to expose metadata to harvesters.

Fieldhand::Repository.new(uri[, options])

Fieldhand::Repository.new('http://www.example.com/oai')
Fieldhand::Repository.new(URI('http://www.example.com/oai'))
Fieldhand::Repository.new('http://www.example.com/oai', :logger => Logger.new(STDOUT), :timeout => 10, :bearer_token => 'decafbad')
Fieldhand::Repository.new('http://www.example.com/oai', :logger => Logger.new(STDOUT), :timeout => 10, :headers => { 'Custom header' => 'decafbad' })
Fieldhand::Repository.new('http://www.example.com/oai', :logger => Logger.new(STDOUT), :retries => 5, :interval => 30)

Return a new Repository instance accessible at the given uri (specified either as a URI or something that can be coerced into a URI such as a String) with options passed as a Hash:

  • :logger: a Logger-compatible logger, defaults to a platform-specific null logger;
  • :timeout: a Numeric number of seconds to wait before timing out any HTTP requests, defaults to 60;
  • :retries: a Numeric maximum number of times an HTTP request will be retried before raising an error, defaults to 0;
  • :interval: a Numeric number of seconds to wait before the next retry attempt, defaults to 10;
  • :bearer_token: a String bearer token to authorize any HTTP requests, defaults to nil.
  • :headers: a Hash containing custom HTTP headers, defaults to {}.

Fieldhand::Repository#identify

repository.identify
#=> #<Fieldhand::Identify: ...>

Return an Identify for the repository including information such as the repository name, base URL, protocol version, etc.

May raise a NetworkError if there is a problem contacting the repository or any descendant ProtocolError if received in response.

Fieldhand::Repository#metadata_formats([identifier])

repository.metadata_formats
#=> #<Enumerator: ...>
repository.metadata_formats('oai:www.example.com:1')

Return an Enumerator of MetadataFormats available from the repository. Optionally takes an identifier that specifies the unique identifier of the item for which available metadata formats are being requested.

May raise a NetworkError if there is a problem contacting the repository or any descendant ProtocolError if received in response.

Fieldhand::Repository#sets

repository.sets
#=> #<Enumerator: ...>

Return an Enumerator of Sets that represent the set structure of a repository.

May raise a NetworkError if there is a problem contacting the repository or any descendant ProtocolError if received in response.

Fieldhand::Repository#records([arguments])

repository.records
repository.records(:metadata_prefix => 'oai_dc', :from => '2001-01-01')
repository.records(:metadata_prefix => 'oai_dc', :from => Date.new(2001, 1, 1))
repository.records(:set => 'A', :until => Time.utc(2010, 1, 1, 12, 0))

Return an Enumerator of all Records harvested from the repository.

Optional arguments can be passed as a Hash of arguments to permit selective harvesting of records based on set membership and/or datestamp:

  • :metadata_prefix: a String or MetadataFormat to specify the metadata format that should be included in the metadata part of the returned record, defaults to oai_dc;
  • :from: an optional argument with a String, Date or Time UTCdatetime value, which specifies a lower bound for datestamp-based selective harvesting;
  • :until: an optional argument with a String, Date or Time UTCdatetime value, which specifies a upper bound for datestamp-based selective harvesting;
  • :set: an optional argument with a set spec value (passed as either a String or a Set), which specifies set criteria for selective harvesting;
  • :resumption_token: an exclusive argument with a String value that is the flow control token returned by a previous request that issued an incomplete list.

Note that datetimes should respect the repository's granularity otherwise they will return a BadArgumentError.

May raise a NetworkError if there is a problem contacting the repository or any descendant ProtocolError if received in response.

Fieldhand::Repository#identifiers(metadata_prefix[, arguments])

repository.identifiers
repository.identifiers(:metadata_prefix => 'oai_dc', :from => '2001-01-01')
repository.identifiers(:metadata_prefix => 'oai_dc', :from => Date.new(2001, 1, 1))
repository.identifiers(:set => 'A', :until => Time.utc(2010, 1, 1, 12, 0))

Return an Enumerator for an abbreviated form of records, retrieving only Headers with the given optional arguments.

See Fieldhand::Repository#records for supported arguments.

May raise a NetworkError if there is a problem contacting the repository or any descendant ProtocolError if received in response.

Fieldhand::Repository#get(identifier[, arguments])

repository.get('oai:www.example.com:1')
repository.get('oai:www.example.com:1', :metadata_prefix => 'oai_dc')
#=> #<Fieldhand::Record: ...>

Return an individual metadata Record from a repository with the given identifier and optional :metadata_prefix argument (defaults to oai_dc).

May raise a NetworkError if there is a problem contacting the repository or any descendant ProtocolError if received in response.

Fieldhand::Identify

A class to represent information about a repository as returned from the Identify request.

Fieldhand::Identify#name

repository.identify.name
#=> "Repository Name"

Return a human readable name for the repository as a String.

Fieldhand::Identify#base_url

repository.identify.base_url
#=> #<URI::HTTP http://www.example.com/oai>

Returns the base URL of the repository as a URI.

Fieldhand::Identify#protocol_version

repository.identify.protocol_version
#=> "2.0"

Returns the version of the OAI-PMH protocol supported by the repository as a String.

Fieldhand::Identify#earliest_datestamp

repository.identify.earliest_datestamp
#=> 2011-01-01 00:00:00 UTC
repository.identify.earliest_datestamp
#=> #<Date: 2001-01-01 ((2451911j,0s,0n),+0s,2299161j)>

Returns the guaranteed lower limit of all datestamps recording changes, modifications, or deletions in the repository as a Time or Date. Note that the datestamp will be at the finest granularity supported by the repository.

Fieldhand::Identify#deleted_record

repository.identify.deleted_record
#=> "persistent"

Returns the manner in which the repository supports the notion of deleted records as a String. Legitimate values are no; transient; persistent with meanings defined in the section on deletion.

Fieldhand::Identify#granularity

repository.identify.granularity
#=> "YYYY-MM-DDThh:mm:ssZ"

Returns the finest harvesting granularity supported by the repository as a String. The legitimate values are YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ with meanings as defined in ISO 8601.

Fieldhand::Identify#admin_emails

repository.identify.admin_emails
#=> ["admin@example.com"]

Returns the e-mail addresses of administrators of the repository as an Array of Strings.

Fieldhand::Identify#compression

repository.identify.compression
#=> ["gzip", "deflate"]

Returns the compression encodings supported by the repository as an Array of Strings. The recommended values are those defined for the Content-Encoding header in Section 14.11 of RFC 2616 describing HTTP 1.1

Fieldhand::Identify#descriptions

repository.identify.descriptions
#=> ["<description>..."]

Returns descriptions of this repository as an Array of Strings.

As descriptions can be in any format, Fieldhand doesn't attempt to parse descriptions but leaves parsing to the client.

Fieldhand::Identify#response_date

repository.identify.response_date
#=> 2017-05-08 11:21:38 +0100

Return the time and date that the response was sent.

Fieldhand::MetadataFormat

A class to represent a metadata format available from a repository.

Fieldhand::MetadataFormat#prefix

repository.metadata_formats.first.prefix
#=> "oai_dc"

Return the prefix of the metadata format to be used when requesting records as a String.

Fieldhand::MetadataFormat#schema

repository.metadata_formats.first.schema
#=> #<URI::HTTP http://www.openarchives.org/OAI/2.0/oai_dc.xsd>

Return the location of an XML Schema describing the format as a URI.

Fieldhand::MetadataFormat#namespace

repository.metadata_formats.first.namespace
#=> #<URI::HTTP http://www.openarchives.org/OAI/2.0/oai_dc/>

Return the XML Namespace URI for the format as a URI.

Fieldhand::MetadataFormat#response_date

repository.metadata_formats.first.response_date
#=> 2017-05-08 11:21:38 +0100

Return the time and date that the response was sent.

Fieldhand::Set

A class representing an optional construct for grouping items for the purpose of selective harvesting.

Fieldhand::Set#spec

repository.sets.first.spec
#=> "A"

Return unique identifier for the set which is also the path from the root of the set hierarchy to the respective node as a String.

Fieldhand::Set#name

repository.sets.first.name
#=> "Set A."

Return a short human-readable String naming the set.

Fieldhand::Set#descriptions

repository.sets.first.descriptions
#=> ["<setDescription>..."]

Return an Array of Strings of any optional and repeatable containers that may hold community-specific XML-encoded data about the set.

Fieldhand::Set#response_date

repository.sets.first.response_date
#=> 2017-05-08 11:21:38 +0100

Return the time and date that the response was sent.

Fieldhand::Record

A class representing a record from the repository:

A record is metadata expressed in a single format.

Fieldhand::Record#deleted?

repository.records.first.deleted?
#=> true

Return whether or not a record is deleted as a Boolean.

Fieldhand::Record#status

repository.records.first.status
#=> "deleted"

Return the optional status attribute of the record's header as a String or nil.

[A] value of deleted indicates the withdrawal of availability of the specified metadata format for the item, dependent on the repository support for deletions.

Fieldhand::Record#identifier

repository.records.first.identifier
#=> "oai:www.example.com:1"

Return the unique identifier for this record in the repository.

Fieldhand::Record#datestamp

repository.records.first.datestamp
#=> 2011-03-03 16:29:24 UTC

Return the date of creation, modification or deletion of the record for the purpose of selective harvesting as a Time or Date depending on the granularity of the repository.

Fieldhand::Record#sets

repository.records.first.sets
#=> ["A", "B"]

Return an Array of String set specs indicating set memberships of this record.

Fieldhand::Record#to_xml

repository.records.first.to_xml
#=> "<record><metadata>...</metadata><record>"

Return the record as a String of XML.

Fieldhand::Record#metadata

repository.records.first.metadata
#=> "<metadata>..."

Return a single manifestation of the metadata from a record as a String or nil if this is a deleted record.

As the metadata can be in any format supported by the repository, Fieldhand doesn't attempt to parse the metadata but leaves parsing to the client.

Fieldhand::Record#about

repository.records.first.about
#=> ["<about>..."]

Return an Array of Strings of any optional and repeatable containers holding data about the metadata part of the record.

Fieldhand::Record#response_date

repository.records.first.response_date
#=> 2017-05-08 11:21:38 +0100

Return the time and date that the response was sent.

Fieldhand::Header

A class representing the header of a record:

Contains the unique identifier of the item and properties necessary for selective harvesting. The header consists of the following parts:

  • the unique identifier -- the unique identifier of an item in a repository;
  • the datestamp -- the date of creation, modification or deletion of the record for the purpose of selective harvesting.
  • zero or more setSpec elements -- the set membership of the item for the purpose of selective harvesting.
  • an optional status attribute with a value of deleted indicates the withdrawal of availability of the specified metadata format for the item, dependent on the repository support for deletions.

Fieldhand::Header#deleted?

repository.identifiers.first.deleted?
#=> true

Return whether or not a record is deleted as a Boolean.

Fieldhand::Header#status

repository.identifiers.first.status
#=> "deleted"

Return the optional status attribute of the header as a String or nil.

[A] value of deleted indicates the withdrawal of availability of the specified metadata format for the item, dependent on the repository support for deletions.

Fieldhand::Header#identifier

repository.identifiers.first.identifier
#=> "oai:www.example.com:1"

Return the unique identifier for this record in the repository.

Fieldhand::Header#datestamp

repository.identifiers.first.datestamp
#=> 2011-03-03 16:29:24 UTC

Return the date of creation, modification or deletion of the record for the purpose of selective harvesting as a Time or Date depending on the granularity of the repository.

Fieldhand::Header#sets

repository.identifiers.first.sets
#=> ["A", "B"]

Return an Array of String set specs indicating set memberships of this record.

Fieldhand::Header#response_date

repository.identifiers.first.response_date
#=> 2017-05-08 11:21:38 +0100

Return the time and date that the response was sent.

Fieldhand::NetworkError

An error (descended from StandardError) to represent any network issues encountered during interaction with the repository. Any underlying exception is exposed in Ruby 2.1 onwards through Exception#cause.

Fieldhand::ResponseError

An error (descended from NetworkError) to represent any issues in the response from the repository. If the HTTP request is not successful (returning a status code other than 200), a ResponseError exception will be raised containing the error message and the response object.

Fieldhand::ResponseError#response

begin
  repository.records.each do |record|
    # ...
  end
rescue Fieldhand::ResponseError => e
  puts e.response
  #=> #<Net::HTTPServiceUnavailable 503 Service Unavailable readbody=true>
end

Returns the unsuccessful Net::HTTPResponse that caused this error.

Fieldhand::ProtocolError

The parent error class (descended from StandardError) for any errors returned by a repository as defined in the protocol's Error and Exception Conditions.

This can be used to rescue all the following child error types.

Fieldhand::BadArgumentError

The request includes illegal arguments, is missing required arguments, includes a repeated argument, or values for arguments have an illegal syntax.

Fieldhand::BadResumptionTokenError

The value of the resumptionToken argument is invalid or expired.

Fieldhand::BadVerbError

Value of the verb argument is not a legal OAI-PMH verb, the verb argument is missing, or the verb argument is repeated.

Fieldhand::CannotDisseminateFormatError

The metadata format identified by the value given for the metadataPrefix argument is not supported by the item or by the repository.

Fieldhand::IdDoesNotExistError

The value of the identifier argument is unknown or illegal in this repository.

Fieldhand::NoRecordsMatchError

The combination of the values of the from, until, set and metadataPrefix arguments results in an empty list.

Fieldhand::NoMetadataFormatsError

There are no metadata formats available for the specified item.

Fieldhand::NoSetHierarchyError

The repository does not support sets.

Acknowledgements

License

Copyright © 2017-2019 Altmetric and Paul Mucur

Distributed under the MIT License.