eResearch

You are here

Mirage System Architecture

Mirage Architecture

Microscopes

A typical electron microscope consists of a "column" which has an electron source one end, high-voltage gear to focus and direct an electron beam onto a target, and then a variety of detectors and cameras to capture images and "analysis" data from the mean as it goes through or bounces off the target. The hardware, and electronics are typically controlled by one (or more) computers running some version of Microsoft Windows. The computer typically has a hard drive and a network interface.  Some of CMM's instruments are 15 or more years old, and have old (and fragile) computers running really old versions of Windows.  (In reality, not all CMM instruments are electron microscopes.  However, all the data-producing instruments be catered for by the architecture we describe here, possibly with minor tweaks.)

At the CMM, most instruments are configured so that images and other data captured by the instrument can be written to a permanently mounted network share drive. Each instrument mounts its own read-write share at boot time, and researchers are trained to "save to S:" rather than to the machine's hard drive.  (This is reinforced by reseting the instruments' disks every night as part of CMM's virus mitigation scheme.)

The instrument computers per se don't have any access control.  At the Windows level, each instrument is permanently logged in using a shared account. This arrangement means that we don't have to integrate Windows-level access control on the Instrument computers with anything else.  However, it also means that there is no direct way of knowing who "owns" the data files saved to the shared drive.  This is where ACLS comes into the picture.

ACLS and the login console

The AC Lab System (or ACLS) is a proprietary laboratory management system developed at the University of New South Wales (UNSW).  CMM's ACLS installation performs a number of important tasks for the Centre:

  • It keeps a record of all of CMM's clients, and their associated accounting details.
  • It keeps track of who has been trained and "certified" to use which instruments.
  • It allows the users to book sessions on the primary instruments and on ancilliary equipment for doing sample preparation and so on.
  • It provides mail out to users, usage accounting and reporting, and billing.

ACLS also provides a simple "login console" application that can be installed on an Instrument computer.  This is essentially a screen blocker.  When it is maximized, it prompts the user for an account name and password.  When these are provided, the console programs checks with the ACLS server to see if the user is authorized to use the instrument, and then minimizes itself.

The ACLS Proxy ("Eccles")

Mirage "taps" into ACLS login console to find out who is logged into what instruments when, and hence to infer who owns the files to the instrument's share.  The problem is that the ACLS server is a "closed" system.  It provides no public APIs that you could use to find out who is logged in, and its database is implemented in such away that "back door" access is impractical too.  So in order find out who is logged in, we have developed an ACLS proxy that sits between the login console and the ACLS server, and keeps a track of the login and logout events.  The ACLS proxy can also act as a secondary (slave) service for the primary ACLS server in the event that the primary became unavailable.

The Data Grabber ("Paul")

The next part of the picture is the Data Grabber service that runs on the machine that hosts the instruments' file system shares, and prepares data for ingestion into the main data repository.

The core of the Data Grabber is a "file watcher" that monitors file system events that correspond to file being written by an instrument.  When an event is detected, the Data Grabber first determines what instrument did the writing, and then starts looking for related events; i.e. more writes to the same file, or writes to another file with the same basename in the same directory.  When the file events have finished (based on a per-instrument heuristic), the Data Grabber writes a snapshot of the set of related files to a private area and then assembles a package of administrative metadata.  This includes:

  • the name of the instrument,
  • the name of the user who was logged in at the time (obtained from the ACLS Proxy),
  • the user's session identifier,
  • the files' timestamps, and
  • a cryptographic checksum of each of the files.

The files and metadata for each "grab" are then exposed as an ATOM feed for the Mirage MyTardis instance to ingest ... as a MyTardis Dataset.

The Data Grabber also provides anciliary user functionality, and administrative functionality via a web interface.  These include:

  • a page for doing proxy login in the case where an instrument's ACLS login console is not working,
  • a page for "claiming" files that were created on an instrument when (apparently) nobody was logged in,
  • administration pages for configuring instruments and turning grabbing and the ATOM feed off and on, and
  • an administration page for checking for data grabbing anomalies.

Mirage MyTardis

The final part of the Mirage architecture is the Mirage MyTardis instance.  MyTardis is a somewhat generic web-based portal that is that is designed to proivide user-centric data management services for research data.  From a user's perspective, MyTardis stores your research data and associated metadata, and allows you to organize it, view it, share it and publish it.  From a system's management perspective, it implements an access control model, and has facilities for ingestion and syndication of data and metadata.

The Mirage MyTardis is a standard MyTardis instance with some additional "apps".  These include:

  • an authentication plugin that allows us to tap into ACLS for checking user names and password,
  • an ATOM feed app that ingests the Datasets prepared by the Data Grabber,
  • a collection syndication app that provides an OAI-PMG feed of RIF-CS data collection metadata, and
  • a data migration app that moves datafiles between different online and (in the future) offline storage media managed by the Mirage MyTardis instance.

MyTardis can also be configured to syndicate data to other MyTardis instances.