The role that metadata could play in resource discovery
was recognized early by DSTC researchers. Their efforts resulted in the first
version of AS5044, more popularly known as the Australian Government Locator Service
(AGLS), along with tools to create and maintain metadata to this and other standards.
This paper gives an overview of the tools and approaches DSTC investigated, highlighting
the advantages and disadvantages inherent in the different approaches. It is assumed that
the reader understands the basic concepts of client-server technology, metadata and the use
of a "schema" to describe metadata types.
It will be useful for the purposes of illustration to agree on some terminology.
Predominantly, the metadata concerning us here takes the form of name/value
pairs where the names are drawn from a specific, finite set. We shall refer to such
sets as the metadata schema and individual pairs as elements where the name
constitutes the element type and the value its content. A collection of
elements describing an individual resource constitutes a record.
To economize on the proliferation of names within a schema, each name may have a
qualifier appended to it. These qualifiers are also referred to as refinements.
The schema that defines the element names with their optional refinements may also
define additional attributes associated with elements, such as an encoding scheme
that imposes semantic restrictions on the content of the element, or cardinality
rules that constrain the number of elements of a specific type that a valid
record may contain. Lastly, an element may have a language encoding associated
with it. These are expressed using the RFC1766 or ISO639-2 codes for spoken languages.
As the full definition of a schema might require several such groupings,
each comprising several elements, the use of unique names to distinguish association
quickly leads to name space pollution and cumbersome complexity. The approach
totally breaks down if a schema permits multiple instances of an individual group
as there is no way to express which Contact_Name for example, belongs with
which Contact_Address!
The trend in metadata schemas is for an
increasing use of repeating element groups of arbitrary depth over the simple flat
schema. This is not a new concept by any means. It is simply another form of the old
"Bill of Materials" pattern where a component is comprised of many sub-components.
However it is relatively new in metadata schemas, so tools support for it is infrequent
and lack of support can be a show-stopper.
The description of metadata schema features just provided is by no means complete,
but it should serve to illustrate
the challanges associated with creating and maintaining metadata conforming to a specific
schema. Creating, distributing and maintaining a tool specific to every
metadata schema that exists is impractical due to the costs involved.
But given the basic similarity in metadata schemas, a generic tool should be practical.
In general, all metadata maintenance
tools share a core of generic functionality. This is the so-called
CRUD model: Create, Read, Update, and Delete. Our generic editor would need to
underpin these operations with a
data model able to abstract all the foreseeable quirks of all possible schemas.
All that then remains is a way of configuring the data model with the names,
encodings, constraints, etc for specific schemas.
So an editor that loads
the schema definition and associated rules from a suitably formatted description
would appear both viable and cost-effective, but even this has pragmatic problems
that are not readily apparent.
Like a software application, any metadata schema definition
that is in active use will not remain static over time.
This introduces the second problem: version control. A record that conforms to
an earlier revision of a standard may be invalid when viewed against a later revision
of that standard. Hence some way is required to express not only the schema the
metadata conforms to, but also the version of the schema it was created against.
Sadly, experience shows that this small fact may be obvious to a software engineer,
but not to a group of specialists in other fields who are tasked with maintaining
metadata schemas and associated controlled term lists and thesauri. As every
software engineer knows, version control is a non-trivial subject.
Then there is the problem of ensuring that the metadata is "correct". Cardinality
rules are easy to implement. For example, a cardinality rule for all elements can be imposed
that says A valid record must contain these elements. This is simple to impose.
Not so easy to implement are
constraints that say A valid record must have this element, or these elements,
but not this combination of elements.
While it is by no means impossible to
construct an abstract schema syntax that provides sufficient richness to
express arbitrarily complex rules, the cure starts to become worse than the problem.
To compound the complexity, a schema may require mutual exclusion between
some element encodings and refinenents, or element encoding schemes that define
controlled term lists and complex, hierarchical thesauri.
A good editor should provide the user with a pick-list for the valid terms when the
encoding is selected and warn the user when an invalid combination or content is used.
For even more feature-creep, consider encodings for values
that contain a date, or an International Standard Book Number (ISBN). A really good
editor should be able to validate the date format, or the ISBN check digit. The list
goes on, and on.
For a really good time, consider how you might address a change
to a thesaurus, or controlled term list with no change in the schema that uses it.
It's even possible for the body that owns and maintains the thesaurus to be
completely different from that maintaining the schema. One such body,
MeSH (Medical Subject Headings), reissues every year. How do you version that?
An alternate solution to this problem is to throw the responsibility back on the tool user.
Create a tool that assists with the simple tasks and expect the user of the tool
to understand the tool's limitations along with all the possible intricacies of
the metadata schema being used to ensure that the metadata is valid.
DSTC investigated both approaches, developing a highly complex, robust,
configurable editor, together with far simpler, light-weight and easy to use
tools having restricted functionality. The next section will outline the
technologies available together with their relative merits and their restrictions.
The speed of technological evolution seldom seems to keep pace with the
problems the technology is intended to solve.
Frequently, some technology will almost solve a problem and hold out the promise
that given a few years to mature, it will solve all the problems if we can only
work around the limitations in the mean time. Of course, by then there will be
a new technology that is even better suited, but not quite ready either!
When DSTC began their investigations into metadata maintenance, web servers
and Common Gateway Interface (CGI) using scripting languages such as Perl were
maturing. New kids on the block were Sun's Java technology and browser based
scripting languages such as so-called JavaScript.
DSTC's researchers and engineers built metadata editing tools using combinations
of these technologies. We will examine each in turn and consider the impact of
technological and other restraints as they applied when the tools were built
and how they have evolved today.
Next is the problem of using the metadata created by the editor.
Each tool was designed to store metadata in some internal format, but
be able to output it in a selectable standard format (HTML, XML RDF, SOIF, etc).
This transformation relied on procedural code, so the addition of a new format,
or modification of an existing one required code changes to varying degrees.
Today, an XML transform based on XSLT could be used to reduce the coupling
between the application and the formatters.
The designation "thin-client" implies that no application resides directly
on the user's computer. All interaction with the editor takes place through
a generic web browser application. When procedural code must execute, it
may run on the client or the server, depending on the approach taken. The
"thick-client" solution is the traditional install, configure, and execute
environment. In the "thin client", responsibility is pushed back to the server.
The fully "thin client" is model typified by "servlet" technology.
This too was in its infancy when DSTC began constructing tools. After some
prototyping, this approach was passed over due
to the fact that it relied on HTTP which, being a stateless protocol,
requires that any "state" data be passed back and forth in its entirety with each client
server exchange. As network bandwidth is more valuable than desktop
processing power, this model was abandoned.
Today, J2EE and Microsoft Dot Net could offer alternates that
approximate thin-client with server-side processing, although there seems no
compelling reason to follow this path when network traffic can be avoided by
performing the bulk of processing locally through other technology.
Both tools were on the bleeding technology edge in 1997/8 when they were
designed. In some ways, this remains true today! It is worth noting that
"JavaScript" bears only a superficial resemblance to Sun's Java (tm) Language.
One writer has gone so far as the say that the intersection of Java and
JavaScript is a null set!
The idea of a scripting language built into a web browser was conceived and
created by Netscape at their height. Procedural instructions would be
delivered as part of a HTML page to execute in the user's (Netscape) browser.
Initially called "Livescript", its name was changed to JavaScript for dubious
marketing reasons in order to
catch some glow from the phenomena surrounding the launch of Sun's
portable, "write once, run everywhere" language. It was followed by Microsoft's
JScript that was almost but not quite compatible and confusion reigned supreme.
Today, both have been combined into
ECMAScript and the confusion is somewhat reduced, although cross-browser
portability issues still exist and backwards portability is a nightmare.
Although widely used for Dynamic HTML (DHTML), developers must go to extraordinary
lengths to assure consistent operation on even current generation browsers
from different vendors. So selection of JavaScript by DSTC researchers in the late
1990's was a brave choice.
Minimal validation is performed. This is constrained to element cardinality,
but other more complex validation could be incorporated. As referenced ECMAScript
and CSS files are
only reloaded by most modern browsers if the server version is newer
that a cached local version, the system is both economical of bandwidth
and easily upgradeable with no action required on the client side.
The generated page contains controls that allow the user to format the
edited metadata in a number of available, standard formats. The code for this
operation dynamically constructs another HTML page that is opened in a new
browser window. The formatted metadata in this window can then be saved, or
cut and pasted to a location where it can be used. Optionally, the raw
metadata may be saved on the server for later reloading. Versions of Reg
existed that allowed the metadata to be saved to DSTC's HotMeta repository.
Unlike the default Perl based repository, HotMeta was fully searchable.
Additionally, the combination of Perl and ECMAScript require the
maintenance programmer to be proficient in both, as well as HTML and XML.
Additionally, as the User Interface (UI) is 100% HTML based, the widget set is constrained
by the browser environment. Theoretically, code could be added to utilize
features of enhanced browsers using dynamic object detection for portability,
but this obfuscates the code, not only making it more fragile, but
significantly complicating testing.
Element cardinality is also a problem. A limitation imposed by state
of the art when Reg was written prevented a record from having any more
element instances that were created when the page was created by the Perl
script. The demonstration system places a limit of three (3) on any one
element type. All three are created when the page is created. When the user
clicks the control to add an element, an existing one is "unhidden" on the
page, provided one exists. It is possible to increase the limit in the CGI
script, but this
imposes extra bandwidth requirements, and sure as eggs are eggs, someone
will still want more than have been allowed for.
Finally, Reg does not support repeating element groups.
Despite what the similarity in their names suggests,
Reg and Reggie did not share common schema definition files. While Reg employed
XML (more leading edge technology in 1997), Reggie stored the metadata schemas
in flat files using a proprietary format. To a degree, this choice was technology
driven. XML was an obvious and logical choice, but while there existed a standard
Perl module for XML parsing, it would be years before such a facility became part of
the standard Java tool kit distribution. Faced with developing a complex parser
that was obviously destined to join the ranks of other instant legacy services,
the researchers wisely chose the simpler option.
In a standard configuration, the sandbox prevents a Java Applet from
performing operations that would be most useful to a metadata editor.
Security model enhancements made subsequent to the creation of Reggie allow users to
selectively relax sandbox restrictions for selected Applets based on
code signing using Private Key Infrastructure (PKI). As we shall see later
with MetaEdit, this is not as good as it may appear. So, out of the box as
it were, the formatted metadata that Reggie creates is unusable. It can't be
written to the local disk, nor can it be cut and pasted from the popup editor window!
Oh dear. The Reggie designers found a way around this problem. We'll examine it
when we get to the Disadvantages section.
However, the added flexibility of the Applet environment does remove the
restriction that HTML imposes on Reg to limit the number of element instances
per record. Reggie is therefore able to create as many elements of any type
as a user could reasonably want (within the bounds of any cardinality rules).
Like Reg, roll-out of new Reggie versions is relatively painless to the
end user. The developer simply creates a new Java Archive file (Jar file)
containing the upgraded code and places it on the server. In the worst case, the user
will obtain the new code next time they start their browser (all instances
thereof for IE). In the best case (Firefox et al), the browser will notice
the change and perform the update transparently.
The other advantages listed for Reg also apply, but that's about it.
Similarly, another Perl script can be invoked that places the
formatted metadata in an email that is sent to an address supplied by the user.
In the provided example, the Perl script obligingly sends the mail
regardless of who the addressee happens to be. A more cautious
implementation would require the user to register first, supplying an
email address that requires the recipient to acknowledge and assent
to having this address used in this fashion.
Then there's the IFC. Reggie could be rewritten to use the JFC,
but this is a non-trivial task with questionable return on investment.
The last disadvantage to the Applet based approach that must be mentioned
could not have be foreseen by the developers in the last century, namely Microsoft's
determination to eliminate any competition to their Dot Net technology.
A look at the statistics for virtually any web server will show the dominant
position held by Microsoft's Internet Explorer browser over other browser
technologies. Initially, IE shipped with a Java 1.0 JVM "plug-in". If a
user wanted to run an Applet that used later versions, they needed to
upgrade the plug-in. While not difficult, this is still a barrier. Fortunately,
Reggie, not being JFC based, is quite happy in the 1.0 environment, so all
was well until Microsoft decided not to include any JVM with IE.
Now a non-technical user, say a librarian, must follow a complicated
procedure to make changes in their environment with possibly unforeseeable
and far ranging consequences just to create some simple metadata. You don't
have to make something impossible to kill it, just bothersome will do.
Despite this, Java Applets continue to be used to deliver feature rich
GUI applications. Examples include the Internet banking facilities of
many major banks and the Australian on-line census form for 2006.
The designers wanted MetaEdit to fully validate metadata, including element
content when subject to an applied encoding scheme. The editor was to allow
new schemas and output formatters to be added dynamically without any need to change
the installed application. To facilitate this, a plug-in architecture was developed
so that new schemas and translators could be added to a configuration file
using a special GUI editor (the configuration being XML). A restart of the
editor would then make the new facilities available. In the case of a schema,
the plug-in would include all associated restricted term lists and thesauri, plus
any custom validators required by the schema to ensure zero error
metadata.
To support all modes of working, MetaEdit could run stand-alone, or
connected to a HotMeta
metadata repository to perform CRUD operations on that repository. Security
was provided through a work-flow that restricted who was able to do what.
The work-flow plug-in, like the schema and translator plug-ins, were shared
between MetaEdit and the HotMeta repository and search engine. When run stand-alone,
MetaEdit stored metadata to flat files on the host computer (or network drive)
in a variety of formats supported by the plug-in translators. The translators naturally
provided round-trip and cross-format capability. One of these translators
gave MetaEdit the ability to edit HTML 4.0 metadata in-situ from live
or off-line HTML files. Other translators supported export of metadata
in formats compatible with HTML Content Management Systems (CMS).
All together, this made MetaEdit very, very feature-rich, and hence complex.
The only thing it did not provide was support for repeating element groups,
a feature that would have made it even more complex.
Schemas were defined in XML, providing the element names, refinements,
encoding schemes, languages, and help text. Element encodings could be associated
with controlled term lists and thesauri in a number of formats. This included the ISO 2788
standard for monolingual thesauri that provides synonyms, broad superior terms, etc.
The XML definition specified cardinality, mandatory elements, refinements, and encodings
intrinsically. These would be enforced at run-time. For more sophisticated validations,
the schema could contain the class names of Java modules responsible for
special validations at the record, element, and element encoding levels. The XML
definition was passed through a compiler to convert it to a Java class that
was added to the tool's configuration file. Once loaded, all aspects of the
schema were available to the user through the GUI interface and validation
could be expected to be of the very highest order.
Meta-metadata was maintained, recording the authenticated user name of the
record creator, modifiers, dates etc. When used in conjunction
with the HotMeta repository, this audit information became searchable metadata
as an assistance to work-flow and record management.
Being 100% Java based, MetaEdit could run on any platform for which
a JVM was available. This also simplified maintenance and enhancement.
The "core" of MetaEdit was made sufficiently abstract that it could be
reused in other parts of MetaSuite that required a GUI editor tool,
for example, the properties and configuration editor and the OAI_PMH
Harvester Wizard GUI.
The roll-out issue could be addressed to a degree by launching MetaEdit as an Applet
from a central location, but this required that the user amend the default security setting
of their JVM sandbox. While this could be done,
the tools provided for the task with the JVM are not well
suited to non-technical users. To make matters worse, variations in
local settings complicated the documentation of the process.
Those who trialled this concluded that
the cure was far worse than the disease and management of the roll-out
problem was preferable.
Further, being so feature rich, a significant number of users
felt that MetaEdit was too hard to use, despite extensive on-line help
and a full, traditional user guide in PDF and hard copy form.
This compounded over time as users who had embraced the tool demanded
additional features such as popup windows for record audit trail
viewing, auto-completion for thesaurus searching, the provision for
including broad superior terms from hierarchical thesauri, spell checking,
the ability to work with a group of records and share these records
with sub-groups within the total metadata maintenance group. Each
added menu items and tool-bar icons that compounded the tool's
apparent complexity to the casual user.
With the closure of DSTC in June 2006, all rights to the intellectual
property for MetaSuite were assigned to the Queensland Government
through the Queensland Department of Transport. Agreement was struck
with other MetaSuite users that provided them with source code to
support their existing service, but beyond this, it is not possible
to provide access to MetaEdit in any form.
In comparison, the cost of producing a metadata editing tool is small, but
not still not insignificant. Compounding this is the fact that modern users are showing a
marked reluctance to pay for tools like this, so without benevolent orginizations
like DSTC, metadata managers are thrown back on editors that demand a very high
degree of knowledge regarding the schema being used.
Bad or inaccurate metadata is worse than no matadata as it represents wasted
effort. The production of high quality metadata can be assured through a
sophisticated tool, but such a tool carries a high cost of production, maintenance,
and user training. Simpler tools are practical, but merely shift the cost to
training if the metadata quality is to be maintained.
DSTC explored both ends of the spectrum for tool sophistication.
Their premier product is no longer publically available.
Their basic tools are dated and suffer operational limitations and idiosyncracies
associated with the available technology.
However they remain available and used with care, can assist in generating
acceptable metadata for selected flat schemas.
Following the closure of DSTC, where Ron had the privilege and sad duty of
effectively turning out the lights by cutting the cables to the DSTC servers, Ron joined the
eResearch team as a Principal Research Fellow
within the
Information Technology and Electrical Engineering
school of the University of Queensland. Here, through the
DART
and ARCHER projects,
he has been reunited with old friends Reg and Reggie, giving them new life in the
Twenty-first Century via the Metadata.Net portal.
His research interests include the changing face of metadata in all its aspects.
Copyright © Ronald A Chernich, 2006. All rights reserved.
Metadata Tool Development at DSTC, 1997-2006
Background
The Co-operative Research Centre for Distributed Systems Technology (DSTC) operated
for 14 years under the Australian CRC grants scheme. During its lifespan in the years
1992 to 2006, the Internet and the World Wide Web migrated from the realms of largely academic
research to the mainstream. As part of their research program, DSTC established a
group called the Resource Discovery Unit to investigate technologies and tools
that all classes of users might employ to locate and use the rapidly increasing volume
of on-line data and services becoming available.
Metadata Schemas
A short tour of the sites referenced by Metadata.Net
will illustrate the wide variety of metadata schemas developed to describe resources.
Details reading of the development history of the different schemas will help in
understanding the justification each originating body considered when foisting yet
another schema on the world. The number of schemas continues to grow and is not
likely to stop or slow anytime soon. It should be obvious that creating and maintaining
metadata that conforms to all these differing standards poses a problem, but one that
should be easily addressed by a suitable software tool. Unfortunately, this turns out
not to be the case.
A Complication
Many schemas are "flat", which is to say the order of the elements within the record
is of no semantic significance. Dublin Code (DC) and AGLS--which
is based on DC--are examples of such schemas. More complex resources however may
be better modelled by associating groups of elements within a record. This enables
an element type such as Address for example to be reused within the groups that
designate a resource's Custodian as distinct from a Contact. These we will
term repeating element groups. Without such a concept, the element type Address
would need to be prefixed or suffixed with additional words to distinguish its usage and
ensure uniqueness.
A Generic Tool
More Complications
Pragmatic Solutions
A fully featured tool able to completely validate a metadata record
against an arbitrary set of metadata schemas is not an impossibility, but it does
present a significant undertaking in its design and maintenance. It will also,
of necessity, impose a not insignificant learning curve on the end user.
Tools Technology
Today, "client server" technology
is so ubiquitous as to be hardly worth mentioning--it is assumed as a given.
As our editor will be separate from the metadata storage, client-server is
the logical choice and the basic possibilities for the underlying technology
are:
Common Aspects
Each tool to be described depends on an external means of expressing the metadata
schema that will be interpreted by the application code to layout the editor
elements. The syntax for this definition needs to be sufficiently simple to
permit relatively non-technical users to define their schemas.
Extensible Mark-up Language (XML) appears to be a natural choice for this purpose.
Regardless of the format chosen, this approach enables new schemas to be added
and existing ones to be modified with relative ease. The intent is to sufficiently
decouple the tool from the schema that no changes are required in the tool to
support any schema change.
Thin-client with server-side processing
Thin-client with client-side processing
To investigate this approach, DSTC built two tools that employed
different technology:
dynamically generated by a server-side CGI script
Reg
Entering the Reg URL into a browser causes a Perl based CGI script to run
in the server that creates a HTML page for the browser to display. This page
is close to being static in content, but its dynamic construction allows
the supported metadata schemas to be changed by dropping new definitions into
in a directory tree on the server. It also allows metadata records that are
stored on the server to be enumerated on the dynamic page. If not desired,
this feature can be disabled by a simple change in the script, so dynamic
generation of the "static" page is not a bad choice.
The user selects a metadata schema for the edit session and optionally
supplies the URL of a HTML page that may contain metadata embedded in the
<HEAD> section.
The next exchange uses more server-side Perl to
dynamically construct another HTML page with element data from the selected schema,
optionally populated with element values extracted from the URL by yet another Perl
script. The HTML page contains dynamically constructed ECMAScript that calls
on more ECMAScript loaded from a "library" file referenced by the page. These
calls, initiated by an on-load() instruction in the page
<BODY> tag, dynamically write HTML into the browser
Document Object Model (DOM) to layout the metadata elements. A Cascading
Style Sheet (CSS), also referenced by the generated HTML page, provide the
look and feel.
Reg Advantages
Reg is light weight and as noted above, "roll-out" of upgrades require no
client side effort. New schemas may be added easily by placing the XML
definition files in a server directory (after suitable testing).
In the demonstration version provided, the initial Reg page has provision for
the XML schema definition to be entered by the user as a URL. This allows
Reg to create and format records using schemas other than those residing on
the Reg server.
Reg Disadvantages
In a word, ECMAScript. The highly disjoint implementations of "JavaScript"
in earlier browsers, aggravated by the advent of "JScript" can result in
more code being required to try to detect and work around browser differences
than actual application code. The release of the W3C DOM specification for browsers
and the efforts of the ECMA has helped in recent times, but while different
users have different browsers of varying revision level, it is impossible
to guarantee that Reg will perform as expected on any given user platform.
In the version of Reg provided on Metadata.Net,
some effort has been taken to ensure the editor and all associated
functionality will run in IE6, Firefox, and Mozilla 5.0 under Microsoft
Windows XP and Linux, but given all the problems and the high mutation rate,
nothing can be guaranteed.
Reg Conclusion
If it works on a client machine, Reg provides a simple and quick way to
create and format simple metadata that can be manually extracted for
final use. In many cases, this will be all a user requires, but it must
be recognized that the quality of the metadata is largely up to the
users' knowledge of the schema being used.
Reggie
Reggie is based on a Java Applet displayed inside a static HTML page served by
a standard web server. Again, this was leading edge technology in 1997. The server
provides all the code for the Applet together with any other resources required
for execution, such as the metadata schemas and help text.
The downloaded code is executed by a Java
Virtual Machine (JVM) within the users' web browser.
This JVM is separate from any other JVM that may be installed on the client machine.
After selecting the schema required and optionally providing the URL of a HTML
page containing metadata embedded in the <HEAD> section,
the Applet opens a popup metadata editor window.
The editor popup displays at least one of every element type defined by the schema.
Where a URL is supplied, element name, content, encodings and language settings are extracted
and an attempt made to match the element type against those provided by the
schema. Controls on the editor window provide the means of adding and removing
element instances. The edited metadata can be formatted using one of the in-built
formats, although as we will see, this is not without complications and is the
only way available to the user for Reggie to save the edited metadata.
Reggie and the Applet Sandbox
The designers of Java took great pain to ensure that Java would be a totally safe
language in which to develop and deliver web applications--a fact that most end
users remain ignorant of. A Java Applet runs in what is termed a sandbox;
this being a "safe place" for innocents to play where they are quietly
protected from harm. This means that, unlike ActiveX, the range of
malicious acts a Java Applet can perform is severely limited. At worst, an
Applet may be annoying, never destructive of the client machine and data.
The rationale being that since a HTML page can contain executable content
that might be non-obvious, the user needs confidence that it won't harm
them, otherwise they will turn off Java support (as most wisely choose to
do with ActiveX). The Applet sandbox is extremely safe.
This is a blessing and a curse.
Reggie Advantages
First, being a Java Applet, Reggie is not constrained to the widget set
provided by the browser that launched it. Theoretically, this should allow
Reggie to present a much richer UI than Reg. And it does, to a limited degree.
At the time when DSTC developed Reggie, the Java graphic environment was, to be
kind, poor. This has since been addressed by the Java Foundation Classes
(JFC), aka "Swing". Swing was pre-released to developers, but was in a
sufficient state of flux that the Reggie developers decided to use a more mature set
of UI classes known as the Internet Foundation Classes (IFC) that had been
developed and released by Netscape (remember them? JavaScript?) In fact,
Swing has some roots in the IFC, but it was not at a state that the Reggie
team could use, so today Reggie still depends most intimately on a library
that almost no Java developer today remembers, let alone has experience in
(except possibly your bald headed, painfully ancient author). So Reggie
could be functionally rich, GUI-wise, but isn't. Not only that, the underlying
paradigm of some of the controls is different from what today's users expect.
Reggie Disadvantages
To get around the Java Applet sandbox restrictions without requiring the
user to perform extremely complex local security related actions,
Reggie resorts to CGI Perl scripts to allow the user to actually use
the generated and formatted metadata. The sandbox model allows the Applet
to communicate only with the server that provided the Applet code to the users'
browser, so Reggie can send a HTTP request that contains the formatted metadata
back to the server to launch a Perl script.
This script constructs a HTML page containing the formatted metadata and sends
it back to the user's browser. This causes a page to popup that can be
cut and pasted from.
Reggie Conclusions
The Applet approach employed by Reggie does offer some subtle advantages over
the ECMAScript approach used by Reg.
However in some aspects related to the restrictions imposed by the Java Applet
sandbox model, it is even worse.
As we saw, developers still need to know some Perl to work on
the entire application, but unlike the Perl scripts used by Reg, those used
by Reggie are relatively simple,
and the Java Applet code, IFC aside, is far more
reliable and maintainable. More important, it will give consistent cross-platform
results, even in environments running ancient browsers running ancient JVMs.
So overall, the Applet based editor, Reggie, has a subtle but distinct edge
over the ECMAScript based version, Reg.
The Thick Client: MetaEdit
Drawing on the lessons learnt by the developers and users of Reg and Reggie,
DSTC's Engineering Unit was tasked with building a commercial-grade metadata
editor to form part of MetaSuite, a complete, schema-neutral, metadata
repository, search engine, and maintenance system. The designers quickly
decided that they needed the richness of the full Java environment, so MetaEdit
was designed to operate as a stand-alone Java desktop application that
could also be executed as an Applet. By running on the desktop, MetaEdit had
no sandbox restrictions imposed and could make full use of the local file
system and network connections to any host, local firewall settings permitting.
MetaEdit Advantages
As would be expected with a full, no restrictions, heavy weight application,
MetaEdit provided users with the ability to create and maintain high quality metadata
guaranteed to conform to the selected schema, including different versions
of that schema. The work-flow module took the editor beyond the role of a
simple editor, integrating with whatever process the metadata administrator
judged essential in producing and maintaining their repository. Authentication
and authorization were provided by HotMeta, optionally through a corporate
Light Weight Directory Access Protocol (LDAP) server.
MetaEdit Disadvantages
The aspect that conferred the most advantages on MetaEdit also proved
to be its biggest disadvantage. Specifically, it was a heavy-weight desktop application
that had to be installed and configured. This means that when used in
large organizations, or highly distributed associations, it suffers the
"roll-out" problem: how do you distribute it and keep everybody in synch
with application and plug-in updates? Despite MetaSuite shipping with a commercial-grade
installation and set-up "Wizard", it became painfully apparent that
the general user community wanted all that MetaEdit had to offer, but
delivered as a light-weight, browser based solution requiring no
set-up nor administration.
Conclusions
Metadata associated with on- and off-line resources is a valuable aid in
resource discovery. Creation and maintenance of this metadata is costly.
This cost is measured in the effort required for humans with knowledge of
the subject problem domain to create, verify, and attach the metadata. Maintaining
the metadata as details of the resource or the schema used to define the metadata
changes will significantly add to the cost.
About The Author
Ron Chernich spent eight happy years with DSTC as a Senior Software Engineer
and Director of the Engineering Unit. He maintained a very hands-on involvement
in development of DSTC's commercially licensed products: dCon, a full
Java implementation of the OMG CosNotification specification, and MetaSuite,
DSTC's complete metadata portal solution. Ron and his team also became custodians
and maintainers of other technologies including
Reg,
Reggie,
and fnORB
which had been developed by DSTC research and had somehow found
their way into widespread use.
