The University of Queensland Homepage
School of ITEE ITEE Main Website

 Metadata Tool Development at DSTC

Metadata Tool Development at DSTC, 1997-2006


  1. Background
  2. Metadata Schemas
    1. A Complication
  3. A Generic Tool
    1. More Complications
    2. Pragmatic Solutions
  4. Tools Technology
    1. Common Aspects
    2. Thin-client with server-side processing
    3. Thin-client with client-side processing
    4. The Thick Client: MetaEdit
  5. Conclusions
  6. About the Author

Background

The Co-operative Research Centre for Distributed Systems Technology (DSTC) operated for 14 years under the Australian CRC grants scheme. During its lifespan in the years 1992 to 2006, the Internet and the World Wide Web migrated from the realms of largely academic research to the mainstream. As part of their research program, DSTC established a group called the Resource Discovery Unit to investigate technologies and tools that all classes of users might employ to locate and use the rapidly increasing volume of on-line data and services becoming available.

The role that metadata could play in resource discovery was recognized early by DSTC researchers. Their efforts resulted in the first version of AS5044, more popularly known as the Australian Government Locator Service (AGLS), along with tools to create and maintain metadata to this and other standards. This paper gives an overview of the tools and approaches DSTC investigated, highlighting the advantages and disadvantages inherent in the different approaches. It is assumed that the reader understands the basic concepts of client-server technology, metadata and the use of a "schema" to describe metadata types.

Metadata Schemas

A short tour of the sites referenced by Metadata.Net will illustrate the wide variety of metadata schemas developed to describe resources. Details reading of the development history of the different schemas will help in understanding the justification each originating body considered when foisting yet another schema on the world. The number of schemas continues to grow and is not likely to stop or slow anytime soon. It should be obvious that creating and maintaining metadata that conforms to all these differing standards poses a problem, but one that should be easily addressed by a suitable software tool. Unfortunately, this turns out not to be the case.

It will be useful for the purposes of illustration to agree on some terminology. Predominantly, the metadata concerning us here takes the form of name/value pairs where the names are drawn from a specific, finite set. We shall refer to such sets as the metadata schema and individual pairs as elements where the name constitutes the element type and the value its content. A collection of elements describing an individual resource constitutes a record. To economize on the proliferation of names within a schema, each name may have a qualifier appended to it. These qualifiers are also referred to as refinements. The schema that defines the element names with their optional refinements may also define additional attributes associated with elements, such as an encoding scheme that imposes semantic restrictions on the content of the element, or cardinality rules that constrain the number of elements of a specific type that a valid record may contain. Lastly, an element may have a language encoding associated with it. These are expressed using the RFC1766 or ISO639-2 codes for spoken languages.

A Complication

Many schemas are "flat", which is to say the order of the elements within the record is of no semantic significance. Dublin Code (DC) and AGLS--which is based on DC--are examples of such schemas. More complex resources however may be better modelled by associating groups of elements within a record. This enables an element type such as Address for example to be reused within the groups that designate a resource's Custodian as distinct from a Contact. These we will term repeating element groups. Without such a concept, the element type Address would need to be prefixed or suffixed with additional words to distinguish its usage and ensure uniqueness.

As the full definition of a schema might require several such groupings, each comprising several elements, the use of unique names to distinguish association quickly leads to name space pollution and cumbersome complexity. The approach totally breaks down if a schema permits multiple instances of an individual group as there is no way to express which Contact_Name for example, belongs with which Contact_Address!

The trend in metadata schemas is for an increasing use of repeating element groups of arbitrary depth over the simple flat schema. This is not a new concept by any means. It is simply another form of the old "Bill of Materials" pattern where a component is comprised of many sub-components. However it is relatively new in metadata schemas, so tools support for it is infrequent and lack of support can be a show-stopper.

A Generic Tool

The description of metadata schema features just provided is by no means complete, but it should serve to illustrate the challanges associated with creating and maintaining metadata conforming to a specific schema. Creating, distributing and maintaining a tool specific to every metadata schema that exists is impractical due to the costs involved. But given the basic similarity in metadata schemas, a generic tool should be practical.

In general, all metadata maintenance tools share a core of generic functionality. This is the so-called CRUD model: Create, Read, Update, and Delete. Our generic editor would need to underpin these operations with a data model able to abstract all the foreseeable quirks of all possible schemas. All that then remains is a way of configuring the data model with the names, encodings, constraints, etc for specific schemas. So an editor that loads the schema definition and associated rules from a suitably formatted description would appear both viable and cost-effective, but even this has pragmatic problems that are not readily apparent.

More Complications

Like a software application, any metadata schema definition that is in active use will not remain static over time. This introduces the second problem: version control. A record that conforms to an earlier revision of a standard may be invalid when viewed against a later revision of that standard. Hence some way is required to express not only the schema the metadata conforms to, but also the version of the schema it was created against. Sadly, experience shows that this small fact may be obvious to a software engineer, but not to a group of specialists in other fields who are tasked with maintaining metadata schemas and associated controlled term lists and thesauri. As every software engineer knows, version control is a non-trivial subject.

Then there is the problem of ensuring that the metadata is "correct". Cardinality rules are easy to implement. For example, a cardinality rule for all elements can be imposed that says A valid record must contain these elements. This is simple to impose. Not so easy to implement are constraints that say A valid record must have this element, or these elements, but not this combination of elements. While it is by no means impossible to construct an abstract schema syntax that provides sufficient richness to express arbitrarily complex rules, the cure starts to become worse than the problem.

To compound the complexity, a schema may require mutual exclusion between some element encodings and refinenents, or element encoding schemes that define controlled term lists and complex, hierarchical thesauri. A good editor should provide the user with a pick-list for the valid terms when the encoding is selected and warn the user when an invalid combination or content is used. For even more feature-creep, consider encodings for values that contain a date, or an International Standard Book Number (ISBN). A really good editor should be able to validate the date format, or the ISBN check digit. The list goes on, and on.

For a really good time, consider how you might address a change to a thesaurus, or controlled term list with no change in the schema that uses it. It's even possible for the body that owns and maintains the thesaurus to be completely different from that maintaining the schema. One such body, MeSH (Medical Subject Headings), reissues every year. How do you version that?

Pragmatic Solutions

A fully featured tool able to completely validate a metadata record against an arbitrary set of metadata schemas is not an impossibility, but it does present a significant undertaking in its design and maintenance. It will also, of necessity, impose a not insignificant learning curve on the end user.

An alternate solution to this problem is to throw the responsibility back on the tool user. Create a tool that assists with the simple tasks and expect the user of the tool to understand the tool's limitations along with all the possible intricacies of the metadata schema being used to ensure that the metadata is valid.

DSTC investigated both approaches, developing a highly complex, robust, configurable editor, together with far simpler, light-weight and easy to use tools having restricted functionality. The next section will outline the technologies available together with their relative merits and their restrictions.

Tools Technology

Today, "client server" technology is so ubiquitous as to be hardly worth mentioning--it is assumed as a given. As our editor will be separate from the metadata storage, client-server is the logical choice and the basic possibilities for the underlying technology are:

  • Thin-client with server-side processing
  • Thin-client with client-side processing
  • Thick client

The speed of technological evolution seldom seems to keep pace with the problems the technology is intended to solve. Frequently, some technology will almost solve a problem and hold out the promise that given a few years to mature, it will solve all the problems if we can only work around the limitations in the mean time. Of course, by then there will be a new technology that is even better suited, but not quite ready either!

When DSTC began their investigations into metadata maintenance, web servers and Common Gateway Interface (CGI) using scripting languages such as Perl were maturing. New kids on the block were Sun's Java technology and browser based scripting languages such as so-called JavaScript.

DSTC's researchers and engineers built metadata editing tools using combinations of these technologies. We will examine each in turn and consider the impact of technological and other restraints as they applied when the tools were built and how they have evolved today.

Common Aspects

Each tool to be described depends on an external means of expressing the metadata schema that will be interpreted by the application code to layout the editor elements. The syntax for this definition needs to be sufficiently simple to permit relatively non-technical users to define their schemas. Extensible Mark-up Language (XML) appears to be a natural choice for this purpose. Regardless of the format chosen, this approach enables new schemas to be added and existing ones to be modified with relative ease. The intent is to sufficiently decouple the tool from the schema that no changes are required in the tool to support any schema change.

Next is the problem of using the metadata created by the editor. Each tool was designed to store metadata in some internal format, but be able to output it in a selectable standard format (HTML, XML RDF, SOIF, etc). This transformation relied on procedural code, so the addition of a new format, or modification of an existing one required code changes to varying degrees. Today, an XML transform based on XSLT could be used to reduce the coupling between the application and the formatters.

Thin-client with server-side processing

The designation "thin-client" implies that no application resides directly on the user's computer. All interaction with the editor takes place through a generic web browser application. When procedural code must execute, it may run on the client or the server, depending on the approach taken. The "thick-client" solution is the traditional install, configure, and execute environment. In the "thin client", responsibility is pushed back to the server.

The fully "thin client" is model typified by "servlet" technology. This too was in its infancy when DSTC began constructing tools. After some prototyping, this approach was passed over due to the fact that it relied on HTTP which, being a stateless protocol, requires that any "state" data be passed back and forth in its entirety with each client server exchange. As network bandwidth is more valuable than desktop processing power, this model was abandoned.

Today, J2EE and Microsoft Dot Net could offer alternates that approximate thin-client with server-side processing, although there seems no compelling reason to follow this path when network traffic can be avoided by performing the bulk of processing locally through other technology.

Thin-client with client-side processing

To investigate this approach, DSTC built two tools that employed different technology:

  • Reg - JavaScript embedded in a HTML page that is
    dynamically generated by a server-side CGI script
  • Reggie - A Java Applet served by a HTML web page

Both tools were on the bleeding technology edge in 1997/8 when they were designed. In some ways, this remains true today! It is worth noting that "JavaScript" bears only a superficial resemblance to Sun's Java (tm) Language. One writer has gone so far as the say that the intersection of Java and JavaScript is a null set! The idea of a scripting language built into a web browser was conceived and created by Netscape at their height. Procedural instructions would be delivered as part of a HTML page to execute in the user's (Netscape) browser. Initially called "Livescript", its name was changed to JavaScript for dubious marketing reasons in order to catch some glow from the phenomena surrounding the launch of Sun's portable, "write once, run everywhere" language. It was followed by Microsoft's JScript that was almost but not quite compatible and confusion reigned supreme.

Today, both have been combined into ECMAScript and the confusion is somewhat reduced, although cross-browser portability issues still exist and backwards portability is a nightmare. Although widely used for Dynamic HTML (DHTML), developers must go to extraordinary lengths to assure consistent operation on even current generation browsers from different vendors. So selection of JavaScript by DSTC researchers in the late 1990's was a brave choice.

Reg

Entering the Reg URL into a browser causes a Perl based CGI script to run in the server that creates a HTML page for the browser to display. This page is close to being static in content, but its dynamic construction allows the supported metadata schemas to be changed by dropping new definitions into in a directory tree on the server. It also allows metadata records that are stored on the server to be enumerated on the dynamic page. If not desired, this feature can be disabled by a simple change in the script, so dynamic generation of the "static" page is not a bad choice. The user selects a metadata schema for the edit session and optionally supplies the URL of a HTML page that may contain metadata embedded in the <HEAD> section.

The next exchange uses more server-side Perl to dynamically construct another HTML page with element data from the selected schema, optionally populated with element values extracted from the URL by yet another Perl script. The HTML page contains dynamically constructed ECMAScript that calls on more ECMAScript loaded from a "library" file referenced by the page. These calls, initiated by an on-load() instruction in the page <BODY> tag, dynamically write HTML into the browser Document Object Model (DOM) to layout the metadata elements. A Cascading Style Sheet (CSS), also referenced by the generated HTML page, provide the look and feel.

Minimal validation is performed. This is constrained to element cardinality, but other more complex validation could be incorporated. As referenced ECMAScript and CSS files are only reloaded by most modern browsers if the server version is newer that a cached local version, the system is both economical of bandwidth and easily upgradeable with no action required on the client side.

The generated page contains controls that allow the user to format the edited metadata in a number of available, standard formats. The code for this operation dynamically constructs another HTML page that is opened in a new browser window. The formatted metadata in this window can then be saved, or cut and pasted to a location where it can be used. Optionally, the raw metadata may be saved on the server for later reloading. Versions of Reg existed that allowed the metadata to be saved to DSTC's HotMeta repository. Unlike the default Perl based repository, HotMeta was fully searchable.

Reg Advantages

Reg is light weight and as noted above, "roll-out" of upgrades require no client side effort. New schemas may be added easily by placing the XML definition files in a server directory (after suitable testing). In the demonstration version provided, the initial Reg page has provision for the XML schema definition to be entered by the user as a URL. This allows Reg to create and format records using schemas other than those residing on the Reg server.

Reg Disadvantages

In a word, ECMAScript. The highly disjoint implementations of "JavaScript" in earlier browsers, aggravated by the advent of "JScript" can result in more code being required to try to detect and work around browser differences than actual application code. The release of the W3C DOM specification for browsers and the efforts of the ECMA has helped in recent times, but while different users have different browsers of varying revision level, it is impossible to guarantee that Reg will perform as expected on any given user platform. In the version of Reg provided on Metadata.Net, some effort has been taken to ensure the editor and all associated functionality will run in IE6, Firefox, and Mozilla 5.0 under Microsoft Windows XP and Linux, but given all the problems and the high mutation rate, nothing can be guaranteed.

Additionally, the combination of Perl and ECMAScript require the maintenance programmer to be proficient in both, as well as HTML and XML. Additionally, as the User Interface (UI) is 100% HTML based, the widget set is constrained by the browser environment. Theoretically, code could be added to utilize features of enhanced browsers using dynamic object detection for portability, but this obfuscates the code, not only making it more fragile, but significantly complicating testing.

Element cardinality is also a problem. A limitation imposed by state of the art when Reg was written prevented a record from having any more element instances that were created when the page was created by the Perl script. The demonstration system places a limit of three (3) on any one element type. All three are created when the page is created. When the user clicks the control to add an element, an existing one is "unhidden" on the page, provided one exists. It is possible to increase the limit in the CGI script, but this imposes extra bandwidth requirements, and sure as eggs are eggs, someone will still want more than have been allowed for.

Finally, Reg does not support repeating element groups.

Reg Conclusion

If it works on a client machine, Reg provides a simple and quick way to create and format simple metadata that can be manually extracted for final use. In many cases, this will be all a user requires, but it must be recognized that the quality of the metadata is largely up to the users' knowledge of the schema being used.

Reggie

Reggie is based on a Java Applet displayed inside a static HTML page served by a standard web server. Again, this was leading edge technology in 1997. The server provides all the code for the Applet together with any other resources required for execution, such as the metadata schemas and help text. The downloaded code is executed by a Java Virtual Machine (JVM) within the users' web browser. This JVM is separate from any other JVM that may be installed on the client machine. After selecting the schema required and optionally providing the URL of a HTML page containing metadata embedded in the <HEAD> section, the Applet opens a popup metadata editor window.

The editor popup displays at least one of every element type defined by the schema. Where a URL is supplied, element name, content, encodings and language settings are extracted and an attempt made to match the element type against those provided by the schema. Controls on the editor window provide the means of adding and removing element instances. The edited metadata can be formatted using one of the in-built formats, although as we will see, this is not without complications and is the only way available to the user for Reggie to save the edited metadata.

Despite what the similarity in their names suggests, Reg and Reggie did not share common schema definition files. While Reg employed XML (more leading edge technology in 1997), Reggie stored the metadata schemas in flat files using a proprietary format. To a degree, this choice was technology driven. XML was an obvious and logical choice, but while there existed a standard Perl module for XML parsing, it would be years before such a facility became part of the standard Java tool kit distribution. Faced with developing a complex parser that was obviously destined to join the ranks of other instant legacy services, the researchers wisely chose the simpler option.

Reggie and the Applet Sandbox

The designers of Java took great pain to ensure that Java would be a totally safe language in which to develop and deliver web applications--a fact that most end users remain ignorant of. A Java Applet runs in what is termed a sandbox; this being a "safe place" for innocents to play where they are quietly protected from harm. This means that, unlike ActiveX, the range of malicious acts a Java Applet can perform is severely limited. At worst, an Applet may be annoying, never destructive of the client machine and data. The rationale being that since a HTML page can contain executable content that might be non-obvious, the user needs confidence that it won't harm them, otherwise they will turn off Java support (as most wisely choose to do with ActiveX). The Applet sandbox is extremely safe. This is a blessing and a curse.

In a standard configuration, the sandbox prevents a Java Applet from performing operations that would be most useful to a metadata editor. Security model enhancements made subsequent to the creation of Reggie allow users to selectively relax sandbox restrictions for selected Applets based on code signing using Private Key Infrastructure (PKI). As we shall see later with MetaEdit, this is not as good as it may appear. So, out of the box as it were, the formatted metadata that Reggie creates is unusable. It can't be written to the local disk, nor can it be cut and pasted from the popup editor window! Oh dear. The Reggie designers found a way around this problem. We'll examine it when we get to the Disadvantages section.

Reggie Advantages

First, being a Java Applet, Reggie is not constrained to the widget set provided by the browser that launched it. Theoretically, this should allow Reggie to present a much richer UI than Reg. And it does, to a limited degree. At the time when DSTC developed Reggie, the Java graphic environment was, to be kind, poor. This has since been addressed by the Java Foundation Classes (JFC), aka "Swing". Swing was pre-released to developers, but was in a sufficient state of flux that the Reggie developers decided to use a more mature set of UI classes known as the Internet Foundation Classes (IFC) that had been developed and released by Netscape (remember them? JavaScript?) In fact, Swing has some roots in the IFC, but it was not at a state that the Reggie team could use, so today Reggie still depends most intimately on a library that almost no Java developer today remembers, let alone has experience in (except possibly your bald headed, painfully ancient author). So Reggie could be functionally rich, GUI-wise, but isn't. Not only that, the underlying paradigm of some of the controls is different from what today's users expect.

However, the added flexibility of the Applet environment does remove the restriction that HTML imposes on Reg to limit the number of element instances per record. Reggie is therefore able to create as many elements of any type as a user could reasonably want (within the bounds of any cardinality rules).

Like Reg, roll-out of new Reggie versions is relatively painless to the end user. The developer simply creates a new Java Archive file (Jar file) containing the upgraded code and places it on the server. In the worst case, the user will obtain the new code next time they start their browser (all instances thereof for IE). In the best case (Firefox et al), the browser will notice the change and perform the update transparently.

The other advantages listed for Reg also apply, but that's about it.

Reggie Disadvantages

To get around the Java Applet sandbox restrictions without requiring the user to perform extremely complex local security related actions, Reggie resorts to CGI Perl scripts to allow the user to actually use the generated and formatted metadata. The sandbox model allows the Applet to communicate only with the server that provided the Applet code to the users' browser, so Reggie can send a HTTP request that contains the formatted metadata back to the server to launch a Perl script. This script constructs a HTML page containing the formatted metadata and sends it back to the user's browser. This causes a page to popup that can be cut and pasted from.

Similarly, another Perl script can be invoked that places the formatted metadata in an email that is sent to an address supplied by the user. In the provided example, the Perl script obligingly sends the mail regardless of who the addressee happens to be. A more cautious implementation would require the user to register first, supplying an email address that requires the recipient to acknowledge and assent to having this address used in this fashion.

Then there's the IFC. Reggie could be rewritten to use the JFC, but this is a non-trivial task with questionable return on investment.

The last disadvantage to the Applet based approach that must be mentioned could not have be foreseen by the developers in the last century, namely Microsoft's determination to eliminate any competition to their Dot Net technology. A look at the statistics for virtually any web server will show the dominant position held by Microsoft's Internet Explorer browser over other browser technologies. Initially, IE shipped with a Java 1.0 JVM "plug-in". If a user wanted to run an Applet that used later versions, they needed to upgrade the plug-in. While not difficult, this is still a barrier. Fortunately, Reggie, not being JFC based, is quite happy in the 1.0 environment, so all was well until Microsoft decided not to include any JVM with IE. Now a non-technical user, say a librarian, must follow a complicated procedure to make changes in their environment with possibly unforeseeable and far ranging consequences just to create some simple metadata. You don't have to make something impossible to kill it, just bothersome will do. Despite this, Java Applets continue to be used to deliver feature rich GUI applications. Examples include the Internet banking facilities of many major banks and the Australian on-line census form for 2006.

Reggie Conclusions

The Applet approach employed by Reggie does offer some subtle advantages over the ECMAScript approach used by Reg. However in some aspects related to the restrictions imposed by the Java Applet sandbox model, it is even worse. As we saw, developers still need to know some Perl to work on the entire application, but unlike the Perl scripts used by Reg, those used by Reggie are relatively simple, and the Java Applet code, IFC aside, is far more reliable and maintainable. More important, it will give consistent cross-platform results, even in environments running ancient browsers running ancient JVMs. So overall, the Applet based editor, Reggie, has a subtle but distinct edge over the ECMAScript based version, Reg.

The Thick Client: MetaEdit

Drawing on the lessons learnt by the developers and users of Reg and Reggie, DSTC's Engineering Unit was tasked with building a commercial-grade metadata editor to form part of MetaSuite, a complete, schema-neutral, metadata repository, search engine, and maintenance system. The designers quickly decided that they needed the richness of the full Java environment, so MetaEdit was designed to operate as a stand-alone Java desktop application that could also be executed as an Applet. By running on the desktop, MetaEdit had no sandbox restrictions imposed and could make full use of the local file system and network connections to any host, local firewall settings permitting.

The designers wanted MetaEdit to fully validate metadata, including element content when subject to an applied encoding scheme. The editor was to allow new schemas and output formatters to be added dynamically without any need to change the installed application. To facilitate this, a plug-in architecture was developed so that new schemas and translators could be added to a configuration file using a special GUI editor (the configuration being XML). A restart of the editor would then make the new facilities available. In the case of a schema, the plug-in would include all associated restricted term lists and thesauri, plus any custom validators required by the schema to ensure zero error metadata.

To support all modes of working, MetaEdit could run stand-alone, or connected to a HotMeta metadata repository to perform CRUD operations on that repository. Security was provided through a work-flow that restricted who was able to do what. The work-flow plug-in, like the schema and translator plug-ins, were shared between MetaEdit and the HotMeta repository and search engine. When run stand-alone, MetaEdit stored metadata to flat files on the host computer (or network drive) in a variety of formats supported by the plug-in translators. The translators naturally provided round-trip and cross-format capability. One of these translators gave MetaEdit the ability to edit HTML 4.0 metadata in-situ from live or off-line HTML files. Other translators supported export of metadata in formats compatible with HTML Content Management Systems (CMS).

All together, this made MetaEdit very, very feature-rich, and hence complex. The only thing it did not provide was support for repeating element groups, a feature that would have made it even more complex.

MetaEdit Advantages

As would be expected with a full, no restrictions, heavy weight application, MetaEdit provided users with the ability to create and maintain high quality metadata guaranteed to conform to the selected schema, including different versions of that schema. The work-flow module took the editor beyond the role of a simple editor, integrating with whatever process the metadata administrator judged essential in producing and maintaining their repository. Authentication and authorization were provided by HotMeta, optionally through a corporate Light Weight Directory Access Protocol (LDAP) server.

Schemas were defined in XML, providing the element names, refinements, encoding schemes, languages, and help text. Element encodings could be associated with controlled term lists and thesauri in a number of formats. This included the ISO 2788 standard for monolingual thesauri that provides synonyms, broad superior terms, etc. The XML definition specified cardinality, mandatory elements, refinements, and encodings intrinsically. These would be enforced at run-time. For more sophisticated validations, the schema could contain the class names of Java modules responsible for special validations at the record, element, and element encoding levels. The XML definition was passed through a compiler to convert it to a Java class that was added to the tool's configuration file. Once loaded, all aspects of the schema were available to the user through the GUI interface and validation could be expected to be of the very highest order.

Meta-metadata was maintained, recording the authenticated user name of the record creator, modifiers, dates etc. When used in conjunction with the HotMeta repository, this audit information became searchable metadata as an assistance to work-flow and record management.

Being 100% Java based, MetaEdit could run on any platform for which a JVM was available. This also simplified maintenance and enhancement. The "core" of MetaEdit was made sufficiently abstract that it could be reused in other parts of MetaSuite that required a GUI editor tool, for example, the properties and configuration editor and the OAI_PMH Harvester Wizard GUI.

MetaEdit Disadvantages

The aspect that conferred the most advantages on MetaEdit also proved to be its biggest disadvantage. Specifically, it was a heavy-weight desktop application that had to be installed and configured. This means that when used in large organizations, or highly distributed associations, it suffers the "roll-out" problem: how do you distribute it and keep everybody in synch with application and plug-in updates? Despite MetaSuite shipping with a commercial-grade installation and set-up "Wizard", it became painfully apparent that the general user community wanted all that MetaEdit had to offer, but delivered as a light-weight, browser based solution requiring no set-up nor administration.

The roll-out issue could be addressed to a degree by launching MetaEdit as an Applet from a central location, but this required that the user amend the default security setting of their JVM sandbox. While this could be done, the tools provided for the task with the JVM are not well suited to non-technical users. To make matters worse, variations in local settings complicated the documentation of the process. Those who trialled this concluded that the cure was far worse than the disease and management of the roll-out problem was preferable.

Further, being so feature rich, a significant number of users felt that MetaEdit was too hard to use, despite extensive on-line help and a full, traditional user guide in PDF and hard copy form. This compounded over time as users who had embraced the tool demanded additional features such as popup windows for record audit trail viewing, auto-completion for thesaurus searching, the provision for including broad superior terms from hierarchical thesauri, spell checking, the ability to work with a group of records and share these records with sub-groups within the total metadata maintenance group. Each added menu items and tool-bar icons that compounded the tool's apparent complexity to the casual user.

With the closure of DSTC in June 2006, all rights to the intellectual property for MetaSuite were assigned to the Queensland Government through the Queensland Department of Transport. Agreement was struck with other MetaSuite users that provided them with source code to support their existing service, but beyond this, it is not possible to provide access to MetaEdit in any form.

Conclusions

Metadata associated with on- and off-line resources is a valuable aid in resource discovery. Creation and maintenance of this metadata is costly. This cost is measured in the effort required for humans with knowledge of the subject problem domain to create, verify, and attach the metadata. Maintaining the metadata as details of the resource or the schema used to define the metadata changes will significantly add to the cost.

In comparison, the cost of producing a metadata editing tool is small, but not still not insignificant. Compounding this is the fact that modern users are showing a marked reluctance to pay for tools like this, so without benevolent orginizations like DSTC, metadata managers are thrown back on editors that demand a very high degree of knowledge regarding the schema being used.

Bad or inaccurate metadata is worse than no matadata as it represents wasted effort. The production of high quality metadata can be assured through a sophisticated tool, but such a tool carries a high cost of production, maintenance, and user training. Simpler tools are practical, but merely shift the cost to training if the metadata quality is to be maintained.

DSTC explored both ends of the spectrum for tool sophistication. Their premier product is no longer publically available. Their basic tools are dated and suffer operational limitations and idiosyncracies associated with the available technology. However they remain available and used with care, can assist in generating acceptable metadata for selected flat schemas.

About The Author

Ron Chernich spent eight happy years with DSTC as a Senior Software Engineer and Director of the Engineering Unit. He maintained a very hands-on involvement in development of DSTC's commercially licensed products: dCon, a full Java implementation of the OMG CosNotification specification, and MetaSuite, DSTC's complete metadata portal solution. Ron and his team also became custodians and maintainers of other technologies including Reg, Reggie, and fnORB which had been developed by DSTC research and had somehow found their way into widespread use.

Following the closure of DSTC, where Ron had the privilege and sad duty of effectively turning out the lights by cutting the cables to the DSTC servers, Ron joined the eResearch team as a Principal Research Fellow within the Information Technology and Electrical Engineering school of the University of Queensland. Here, through the DART and ARCHER projects, he has been reunited with old friends Reg and Reggie, giving them new life in the Twenty-first Century via the Metadata.Net portal. His research interests include the changing face of metadata in all its aspects.

Copyright © Ronald A Chernich, 2006. All rights reserved.