eResearch

You are here

UQ Data Collections Registry

Project overview

The UQ Data Collections Registry project developed an institutional metadata store for The University of Queensland.

The UQ Data Collections Registry (UQ-DCR) performs four main functions:

  • Aggregation of metadata from sources across the university.
  • Alloction of NLA party identifiers to parties from the university.
  • Alignment of the metadata with information source from authorative databases from the university.
  • Publication of the metadata records.
Architectural overview of the UQ Data Collections Registry
Aggregation of metadata

The UQ-DCR aggregates metadata records from sources across the university. Research data collections are distributed across the university in faculties and research groups.

The main sources of research data collection records for the UQ-DCR are research databases. These are systems designed to store, process and manage the research data. These systems are used directly by the researcher to manage and use the actual research data, so the collection metadata can be kept consistent with the research data and created as a part of the research process.

The UQ-DCR automatically harvests metadata records from these research databases:

Anthropology Museum
A research database that contains digital records of anthropological and archaeological artefacts from The University of Queensland Anthropology Museum.
Diffraction Image Repository (DIMER)
A research database that contains diffraction images from the UQ Remote Operation Crystallization and X-Ray Diffraction Facility (UQROCX).
Spatially Integrated Social Science (SISS)
SISS is a research database that contains geospatial and statistical analysis of Australian Bureau of Statistics census data, Australian Electoral Commission voting data and simulations from the National Centre for Social and Economic Modelling.
Microscopy Image Repository (MIRAGE)
A research database that contains microscopy images obtained from electron microscopes and other instruments from the UQ Centre for Microscopy and Microanalysis (CMM).

The OzTrack research database is currently undergoing redevelopment and was not available for harvesting by the UQ-DCR, but it can be added when it becomes available. OzTrack is research database that contains animal tracking data collections.

Additional research databases can be added to the UQ-DCR as they become available. Standard formats and protocols for harvesting are used by the UQ-DCR to harvest from the research databases: RIF-CS, Atom-RDC, OAI PMH and Atom-PMH.

There are also collection records that do not come from research databases. These collection records have been manually created by researchers and librarians. The UQ-DCR currently contains manually entered metadata records from:

The UQ Seeding the Commons project
These are collection records created by an ANDS funded Seeding the Commons project at UQ. These collection records were stored in the UQ DataSpace system, but have now been copied into the UQ-DCR.
The Urban Water Research Security Alliance (UWSRA) pilot project
These are collections records created by the Urban Water Research Security Alliance (UWSRA), of which the UQ Advanced Water Management Centre (AWMC) was a member of.
Allocation of NLA Party Identifiers

The UQ-DCR allocates NLA party identifiers to people and organisational units from UQ that do not yet have them.

NLA party identifiers are identifiers issued by the National Library of Australia (NLA) Trove and People Australia systems. Identifiers are needed to reliably identify people, since names are not always unique.

The NLA party identifiers are used in the party records to ensure reliable matching and attribution outside UQ. This is important for a service like the ANDS Research Data Australia which aggregates metadata from multiple institutions across Australia. For example, when a researcher moves from one institution to another, the identifier is used to positively identify them as the same person; or when two different researchers have similar names, they can be distinguished by their different identifiers.

The UQ-DCR ensures that party records for UQ people and organisation units include an NLA party identifier. This allows the ANDS Research Data Australia to positively identify the parties and to correctly associate those parties with the collection records.

Alignment with Institutional Data

The UQ-DCR aligns the harvested metadata records with authoritative information from the university using the UQ Data Hub provided by UQ Information Technology Services (ITS). The UQ Data Hub obtains its information from sources of truth in the university, such as the Human Resources database and Research Master.

This alignment improves the quality of the metadata records by enhancing them with more accurate information. For example, the HR database can supply the correct names and Thompson Reuters ResearcherID for a researcher's party record; or Research Master can supply grant titles and collaborators for an activity record.

Publication of Metadata Records

The UQ-DCR publishes the metadata records as a machine-readable feed that other systems can harvest. The metadata records are represented using RIF-CS and published at http://research.data.uq.edu.au/oai (a machine readable feed using OAI-PMH.)

The feed is harvested by the National Library of Australia (NLA) Trove system. This is a part of the process for allocating the NLA party identifiers. Trove harvests the party records from the UQ-DCR, and allocates NLA party identifiers to them. The UQ-DCR then retrieves the NLA party identifiers from Trove, and adds them to the party records. Trove also creates an entry for the party in its system.

The feed is also harvested by the ANDS Research Data Australia (RDA) system, which is a national metadata store. It aggregates metadata records from institutions across Australia, to publish and make them discoverable.

Reference Group

The project Reference Group included representatives from:
  • UQ Office of the Deputy-Vice Chancellor (Research)
  • UQ Office of the Pro-Vice Chancellor (Research and International)
  • UQ Library
  • UQ Information Technology Services (ITS)
  • UQ Research Computing Centre (RCC)
  • UQ eResearch Lab

Documentation

UQ Data Collections Registry: Architecture 2.0
Documents the architecture of the UQ-DCR. Architecture PDF document.
Atom Feed Protocol for Metadata Harvesting (Atom-PMH)
Protocol used to harvest metadata records. Specification of Atom-PMH.
Atom representation of Research Data Context (Atom-RDC)
Format used to represent metadata records. (Developed by the Seeding the Commons at UQ project.) Specification of Atom-RDC.

Source code

The software developed by the project have been open sourced under the BSD 2-Clause Licence and is available on GitHub.

Miletus
This software harvests metadata from research databases, queries SRU data sources, merges metadata records and publishes them as an OAI-PMH feed. It is used to harvest the research databases. It also harvests a deployment of the Thales software (see below). It aligns them with information from NLA Trove and the UQ Data Hub. And it publishes the metadata records on the feed.
https://github.com/uq-eresearch/miletus
Thales
This software is an editor for manually created metadata records. A deployment of this software holds the manually created metadata records.
https://github.com/uq-eresearch/thales
ODS-SRU interface
This software provides a Search/Retrieval via URL (SRU) interface to the data provided by a relational database backend. It presents the data from the UQ Data Hub as a SRU service for deployment of Miletus to query. Although the UQ Data Hub is only available at The University of Queensland, this software can be the basis of other relational database to SRU interfaces.
https://github.com/uq-eresearch/ods-sru-interface

Acknowledgement

The project ran from 2 April 2012 through to 1 April 2013.

The project is part of the ANDS Metadata Stores Program.


ANDS Logo This project is supported by the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative.