Abstracts of
publications - Bob Colomb
- Information systems/ Information Science
- Formal versus Material Ontologies for Information Systems
Interoperation in the Semantic Web
- Issues in Mapping Metamodels in the Ontology Development
Metamodel
- Operationalising Epistemic Modality
- Information Systems Technology Grounded on Institutional
Facts
- Where does the ontology bottom out?
- Why do people pay for information?
- A Digital Library Needs Many Indexes
- Completeness and Quality of an Ontology for an Information
System
- Impact of Semantic Heterogeneity on Federating
Databases
- Use of a Personal Workstation to Access Open
Network Services
- A Power User in Cyberspace: A Database Perspective
- Information Systems Founded on Practice
- Expert
Systems
- Representation of Propositional Expert Systems as
Partial Functions
- Strategies for Building Propositional Expert Systems
- Computational Stability of Expert Systems
- Applications
of Category Theory
- An
Approach to Ontology for Institutional Facts in the Semantic Web
- Category-Theoretic Fibration as an Abstraction
Mechanism in Information Systems
- Major
project proposals
Thesis
Abstracts
- Structuring and Visualising
Risk Management – Wei Seng Alan Ho
- Using Conceptual
Structures and Ontologies to Support E-Commerce – Ahmed Kayed
- Development of a Practical System for Text
Content Analysis and Mining - Andrew Smith
- Multiple Device Web-Based Information System
Development: A Set of Development Guidelines - Marcin Metter
- Generating Database Presentations of L-systems
in Virtual Plant Applications - Phoebe Chen
- Managing complex, open, web-deployable trade
objects - Hung Wing
- Database Discovery in an Organizational
Environment - Andrew Goodchild
- Distributed Querying with Z+SQL over the
Internet - Sonia Finnigan
- A Theory for Multi-function Model-Based
Reasoning through Context Management - Anthony Berglas
- Non-Monotonic Reasoning and End-User
Conceptual Modelling - Nirad Sharma
- Foundations of massively parallel relational and
deductive databases - Ok Cho
- EDI-based interoperation of information
systems - Mohsen Rohani
- Interfacing Essential Drug Informatics
- Vincent Guerini
Merging Ontologies Requires
Interlocking Institutional Worlds
Robert M. Colomb
Mohammad Nazir Ahmad
Abstract
Merging of ontologies is a frequently addressed problem in the ontology literature. This paper argues that in general two even very similar ontologies cannot be merged. Further, where two ontologies can be merged their conceptualizations are special. They are systems of institutional facts which are interlocking. The argument is based on the literature of the federated database problem and on the concepts of speech act and institutional fact.
To appear in Applied Ontology in 2007
Formal versus Material Ontologies for Information Systems
Interoperation in the Semantic Web
Robert
M. Colomb
Information systems ontology is intended to
facilitate interoperability among the many applications which are now becoming available
on the Internet. In particular, it is intended to facilitate the development of
intelligent agents which can automate a large part of the task of a user
achieving some end employing multiple autonomous applications. A large number
of ontologies exist supporting specific kinds of interoperation among selected,
generally mutually aware, applications. The intent of the upper ontology
movement is to develop an abstract description of what there is in the world,
in an application-independent form, which can be used both to help build
specific ontologies and to help in finding common ground among them. This paper
argues that, for the purposes of information systems interoperation and the
semantic web, application-independent upper ontologies are unlikely to be
successful because of semantic heterogeneity. However, the paper argues for a
distinction in upper ontologies between formal and material ontologies, based
on analogies with concepts in Kant’s synthetic a priori, and that formal
ontologies whose focus is on how we see the world are more likely to be
successfully developed in the absence of applications than are material
ontologies, which attempt to catalog the world a priori.
The Computer Journal vol. 49, no. 1 pp. 4-19.
Issues in Mapping
Metamodels in the Ontology Development Metamodel
Robert
M. Colomb, Anna Gerber and Michael Lawley
The Ontology Development Metamodel consists
of a number of separate metamodels linked by mappings. There are a number of
structural heterogeneities among the metamodels which make mapping not
straightforward. This paper shows how the mapping technology
Query/View/Transform (QVT) can be used to perform the mappings. Some of the
problems are addressed, showing how they can be resolved using QVT.
Presented at: 1st
International Workshop on the Model-Driven Semantic Web (MSDW 2004) Monterey, California, USA. 20-24 September, 2004.
Operationalising
Epistemic Modality
Peter
Bruza and Robert M. Colomb
In order for intelligent computer agents to
interact in a knowledge network, they need to have some idea what they know and don’t know.
This anthropomorphic concept must be operationalised so that we can specify the
computer programs. This powerpoint presentation is a step in that direction.
Presented at DSTC Symposium Brisbane, Australia 16-17 August, 2004.
Information Systems
Technology Grounded on Institutional Facts
Robert
M. Colomb
Abstract
This
paper presents a theory explaining the success of information systems
development based on SQL-type database technology by showing that the assumptions
underlying that technology correspond very closely to the way Searle’s
institutional facts are created. The theory presented is a theory of action and
design, so its productivity is shown by retrodiction of the necessity for
business process engineering to achieve integration of information systems
within an organisation, and prediction that interorganisational integration of
information systems using the internet can succeed only if the applications
share institutional facts. The theory finally is used to predict that
autonomous intelligent agent applications can succeed in the information spaces
populated by these common institutional facts.
Presented at Information
Systems Foundations: Constructing and Criticising 16-17 July 2004, The Australian National
University, Canberra, Australia
Where does the
ontology bottom out?
Robert M. Colomb
Abstract
Ontologies are collections of terms with a more or less
complex structure, most often used to support interorganisational
interoperability. The question is, what are the primitives? I contend that the
primitives must be grounded in more-or-less explicit agreements among the
parties. The contracts, the standard business practices as formulated in EDI,
agreed primitive terms for the product catalogs, prices, etc, all supported by
a legal and audit environment. The structural relationships are grounded in
more general agreements which take the form of mathematical systems which are
standardised in the educational systems of the developed world.
Position paper for panel discussion "Foundations of
Information Systems" Australasian Conference on Information Systems
2000 Brisbane, Australia 6-8 December 2000
Why do people pay for information?
Robert M. Colomb
Abstract
This paper investigates the situations where there are
incentives for people to pay a premium over the channel costs for information
content. It concludes that there are at least four: premium low relative to
channel costs and monopoly, which are less interesting as they are not specific
to information; where the information need is idiosyncratic; and where the
quality of the information source is critical.
Prometheus in (2001) Vol 19,
No. 1 45-53..
A Digital Library Needs Many Indexes
Robert M. Colomb
Abstract
The main argument of this paper is that one should expect to
have not one, but many indexes for a large, heterogeneous digital library.
Philosophical considerations, supported by ethnological studies of
information-seeking behaviour, lead one to doubt that a single index would
work. Therefore, the undeniable success of single-index physical libraries
requires explanation - they work because they are limited in scope, and the
reasons why they work are not satisfied by Internet-scale digital libraries. We
look at how people find things in the real world, and notice that one
information structure they employ is the specialised magazine, of which there
are tens of thousands. We look at how the specialised magazine supports
practical reasoning, which leads to the expectation that there should be a
large number of indexes on the Web in which we call 'behaviour space' as
distinct from the 'resource space' in which the recognised indexes live. The
paper concludes with suggestions as to what multiple indexes in both behaviour
space and resource space would look like and how they would be interrelated.
for comment.
Category-Theoretic Fibration as an Abstraction Mechanism in
Information Systems
Robert M. Colomb
(With C.N.G. Dampney
Michael Johnson
School of Mathematics, Physics, Computing and
Electronics, Macquarie University)
Abstract
This paper examines the problem of establishing a formal
relationship of abstraction and refinement between abstract enterprise models
and the concrete information systems which implement them. It introduces and
justifies a number of reasonableness requirements, which turn out to justify
the use of category theoretic concepts, particularly fibrations, to precisely
specify a semantics for enterprise models which enables them to be considered
as abstractions of the conceptual models from which the implementing
information systems are built. The category-theoretic concepts are developed
towards the problem of testing whether a system satisfies the fibration axioms,
and are applied to case studies to demonstrate their practicability.
Acta Informatica to appear. paper (PDF)
Representation of Propositional Expert Systems as Partial Functions
Robert M. Colomb
Abstract
Propositional expert systems classify cases, and can be
built in several different forms, including production rules, decision tables
and decision trees. These forms are inter-translatable, but the translations
are much larger than the originals, often unmanageably large. In this paper a
method of controlling the size problem is demonstrated, based on induced
partial functional dependencies, which makes the translations practical in a
principled way. The set of dependencies can also be used to filter cases to be
classified, eliminating spurious cases, and cases for which the classification
is likely to be of doubtful validity.
Artificial Intelligence 109
pp. 187-209.
Paper presented at International Conference on Formal Ontology in
Information Systems (FOIS'98) Trento, Italy, 6-8 June, 1998. In N. Guarino
(ed.) Formal Ontology in Information Systems
IOS-Press (Amsterdam) 207-217.
Completeness and
Quality of an Ontology for an Information System
Robert M. Colomb
(With Ron Weber
Department of Commerce, The University of Queensland)
Abstract
We examine the problems of completeness and quality in
design of information systems. Taking the view that an information is a
representation of a social reality created by genres of speech acts, we view the
state of an information system as a text, and the dynamics of the system as
essentially the dynamics of a text editor. This view enables us to make use of
a generalised ontology developed by Bunge to get a clear picture of the
functions of an information system, and therefore a set of criteria for
ontological completeness. Further, quality in an information system is seen as
a matching between the semiotics of the system and the semiotics of the
organisation in which the system is embedded, allowing us to make use of the
quality principles advocated by Debenham. The value of these results is
essentially that they validate the large body of existing information systems,
and also validate the basic approach used to construct them, although
suggesting some improvements. We can build and use information systems
confident that they will be valid under changes in the understanding of meaning
and also changes in the understanding of the metaphysics underlying physical
and social reality.
The Computer Journal 40(5) 1997 pp. 235 -244.
Impact of Semantic
Heterogeneity on Federating Databases
Robert M. Colomb
Abstract
The difficult problems in design of systems which
facilitate interoperation and mediation among information sources and their
consumers arise from the presence of semantic heterogeneity among the schemas
and ontologies supporting the different services. The purpose of this paper is
to develop a taxonomy of semantic heterogeneity, and to describe, taking the perspective
of text databases, the conditions under which autonomy-respecting
interoperation of different kinds are likely to be feasible. The main
conclusion is that interoperation can be based on structured database
technology only if the participating organisations communicate among
themselves, otherwise the considerations underlying text databases dominate the
technology used.
International Journal of Intelligent Systems (1995) Vol. 10, No. 3, pp 295-328.
Strategies for
Building Propositional Expert Systems
Robert M. Colomb
(With Charles Y.C. Chung, CSIRO Division of
Information Technology)
Abstract
The core of this paper is a proof that stratified Horn
clause propositional systems are equivalent to and can be efficiently
transformed into decision tables by a process closely related to
assumption-based truth maintenance. The transformed systems execute much faster
and in a bounded time, leading to the possibility of executing real-time expert
systems in microseconds on fine-grained parallel computers. One consequence is
to simplify the consistency and completeness analysis for such systems, in
particular the problem of ambiguity. A deeper consequence is that it makes
sense to view these systems as stochastic processes. This, and an analysis of
the problem of maintenance of these systems, leads to the conclusion that by
and large rule induction approaches are better than rule construction
approaches for building them.
Expert Systems With Applications
(1992) Vol 5, No 2/3 pp 411-419.
Computational
Stability of Expert Systems
Robert M. Colomb
Abstract
It has been shown that propositional expert systems are
equivalent to decision tables, and therefore equivalent to classification
systems. In many cases, the elementary facts for the classification may not be
accurately known. Even if they are, frequently the expert system reasons on the
basis of qualitative descriptors of quantitative measurements, which may be
subject to borderline effects. This paper considers the computational stability
of the classification in the presence of errors in the data, using concepts
derived from error-correcting codes, in particular Hamming distance. It
suggests a number of methods of analysis of the decision table to identify
potential instabilities, and suggests methods of correcting or avoiding these
problems.
Australian Computer Journal Vol.
25, No. 1 (1993) pp. 7-13.
Use of a Personal Workstation to Access Open Network Services
Robert M. Colomb
Work performed while author was visiting the School of
Computing Sciences, University of Technology, Sydney.
Abstract
Imagine that people have powerful, flexible workstations
which can adapt to their work habits, and that they use an open distributed
computing environment for information and computing resources. The user should
have a uniform and seamless view of this computing environment. In addition,
the user has a large investment in wordprocessors, spreadsheets and other
personal productivity tools. It becomes natural then to argue that the network
applications should interact with the user employing the user's personal
productivity tools. We need to develop an abstract view of the capabilities of
a User Interface Management System (UIMS), which is at a much higher level than
graphics interface standards like X11, and also standards for the communication
of documentation from an application to a UIMS. There must be standards for
sharing models of data structures and definitions of the semantics of their various
components. This paper sketches some requirements for the solution of these
problems, based partly on data base technology and the design of persistent
programming languages and access procedures for persistent object stores.
Informal talk presented at DSTC Symposium
11-12 July, 1996
A Power User in
Cyberspace: A Database Perspective
Abstract
To a power user, cyberspace has three parts: their own
computing environment, the organizational computing environments in which the
user participates because of their organizational relationships, and the rest
of the world on the Net. This paper considers the database-oriented facilities
required for this view of cyberspace to work smoothly, the present state of
technology and some research issues.
Position paper for Information Systems
Foundations: Practice and Ontology Workshop Macquarie
University, Sydney Australia, 29 September, 1999.
Information Systems
Founded on Practice
Robert M. Colomb
Abstract
We worry about foundations out of a fear that our practice
might collapse. However, the success of a system must validate the method of
building it. Practice can be profitably problematised, though, since it can be
discussed, must be taught, and can be improved. The theoretical concepts used
to do this do not refer to underlying reality, but are metaphors, limited
interpretations of a complex whole. The foundation of practice is practice
itself. We can discuss, teach and improve practice by metaphor, but the
metaphor is never definitive. We can get as much certainty as it is possible to
have, but we can never be certain we understand what we are doing, even though
what we do might be effective.
Thesis Abstracts
An
Architecture for Ubiquitous Mobile Service Delivery
Paul O'Brien
Awarded August 2006
Highly mobile people (HMPs) require flexible, reactive service delivery due to their regularly changing location and activities and the lack of a wired network connection. A mobile service delivery system should be able to detect relevant events that occur such as change of location, availability of new last-minute specials, sales opportunities and safety issues and then reactively take action in response to these events. This work describes a situation management ontology based framework for delivering such a system. Issues addressed include HMP and service states and events, context, situations and situation-action rules, and syntactically and semantically compatible XML ontologies for their specification.
A generic situation management ontology is developed in OWL using the ontology development tool, Protégé. This ontology is combined with domain specific classes in the travel domain to create a travel situation management ontology that can be used as the basis for a ubiquitous mobile travel service application. Using a typical independent traveller scenario, the travel situation management ontology is instantiated to demonstrate its effectiveness. The flexibility of the generic situation management ontology is demonstrated by creating an academic situation management ontology by simply replacing a small number of domain specific classes.
A framework is also proposed that is based on the situation management ontology, distributed, co-operating software agents, and context based filtering, and is suitable for mobile service delivery. The example framework uses the situation management ontologies developed in this work and action rules to link situation specification to situation detection and action.
The ontologies and action rules are
semantically consistent and are specified in the XML based, industry
standard language, OWL, thus
drawing together previous independent work in a number of diverse disciplines.
Structuring and Visualising Risk Management
Wei Seng Alan Ho
Awarded July 2006
The dictionary defines
risk as the potential harm that may arise from some present process or from
some future event while vulnerability is the state of being vulnerable or
exposed. Risk management helps to boost security by analysing current
vulnerabilities in the organization and assessing their likelihood in relation
to the materialisation of a risk. In this project's context, a relationship
between risk and vulnerability can be defined as a particular vulnerability
that is contributing to the materialisation of a risk. However, such
relationships between risks and vulnerabilities are often complex and poses a
challenge for human understanding. It is necessary to provide visualisation to
easily see the relationships between risks and vulnerabilities and
vulnerabilities that are contributing to the materialisation of a risk.
In this project, the
development of a causal network was proposed to visualise the relationships
between risks and vulnerabilities. Through using deduction to reason about the
likelihood of the risk under conditions of the presence or absence of
vulnerabilities, the proposed causal network can help to structure the vulnerabilities
into categories with relation to the risks that they are contributing to. Then,
with visualisation included to present the categorisation, it allows the user
to have a structured way in seeing a top-level view of risks that are high,
medium or low in severity as well as a drill-down view of individual
vulnerabilities that are contributing to a risk.
In addition, a belief
calculus called Subjective Logic (SL) was introduced to aid risk experts in
expressing their opinion about vulnerabilities and risks in a more realistic
approach, which is enabling them to differentiate between their gut feeling and past experiences. Instead of representing
opinion in a one-dimensional (1D) scalar format, SL is adapted to represent
conditional and joint probability calculations, as well as combining two joint
probabilities in a three-dimensional (3D) format (belief, disbelief, and
uncertainty). This provides a richer input for risk assessment because SL is
suitable for such situation where there is more or less uncertainty about
whether a given proposition is true of false. The visualisation strategy is
also adapted to exploit the richer risk assessment so that it provides the user
a richer risk picture that enables them to make value-added risk assessment and
mitigation strategies.
This project believes
the causal network together with SL can help organization allocate valuable
resources to derive mitigation strategies to resolve risks.
Using Conceptual Structures and Ontologies to
Support E-Commerce
Ahmed Kayed
Awarded January, 2003
Electronic
Commerce (EC) is emerging as a major Web-supported application. EC supports
many business transactions via a network. The Internet is an open environment,
widely distributed, and relatively inexpensive. Business transactions usually
run under closed environments. To conduct business on the Internet, many
problems must be solved. Examples of these problems are: security,
authentication, heterogeneity, interoperability, and ontological problems. It
is the aim of this research to provide an infrastructure for
business-to-business EC. To narrow the scope of this research the focus is a
specific business process, the tendering process. To support EC applications
(tendering in particular), ontologies were used to solve many problems in this
domain. It has been argued that beyond software engineering and process
engineering, ontological engineering is the third capability needed if
successful e-commerce is to be realized. Conceptual Graphs (CGs) are used to
implement these ontologies. CGs are a method of knowledge representation
developed by Sowa based on Charles Peirce's Existential Graphs and semantic
networks of artificial intelligence.
This research is
directed to answer the question: How can explicit ontologies be obtained, constructed,
used and implemented to support e-commerce (tendering in particular)? To answer this question in a practical way, three more specific
questions are defined. They are: How can ontologies be built and used
generally and in the tendering domain? How can CGs be used to implement these
ontologies? What can ontology offer for tendering automation?
The research
theme can be summarized as creating a new method for building and managing a
tendering system and solving some problems in CGs to implement an ontology for
the tendering domain. This thesis shows that ontologies and CGs could be used
to facilitate and support e-commerce. An ontological-based tendering system
will help in testing the feasibility of the ontological approach, which will
contribute to building a new generation of business-to-business EC. The
proposed solution deploys the mediator concept to build a shared ontology. The
mediator will be responsible for maintaining different types of ontologies and
performing different types of matching. This will facilitate the automation of
many tendering activities such as tender forming, buyer and seller matching,
bid evaluation and other activities. Four levels of abstractions are defined to
build the ontologies. At some levels, two types of ontology have been
established: one for concepts and the other for structures. Some CG tools have
been used to build CG structures for tendering from existing Electronic Data
Interchange (EDI) messages.
Algorithms have been developed to extract signatures, which is a
primitive CG where a single relation links two of more concepts, from CG-EDI
templates. Ontologies have been used to index the tendering data. Indexes have
been built around signatures. An algorithm has been developed to index and
retrieve tendering information using CGs and ontologies. Using CGs to implement
ontologies has been formally analyzed using the Bunge-Wand-Weber (BWW) model.
Tendering is
well addressed in many disciplines and many commercial systems have automated
the process or a part of the process. The significant point in this research is
using explicit ontologies and deploying the e-mediator concept for matchmaking
in the tendering domain.
The existence of
these ontologies means that some means to manage them is required. Many
ontology-based systems build tools that help them in managing their ontologies,
but there are no clear methodologies to build such a system. This thesis
articulates specifications for an Ontology Management System (OMS) using CGs.
The meaning is defined, the components are identified, and the methodology to
build an OMS using CGs is outlined.
This thesis
stands in between Information Systems (IS) (which covers a macro view or a
descriptive view) and Computer Science (CS) (which covers a micro view or
technical view). The reader whose background is information systems will find
the first chapters of this thesis are the more business oriented and
descriptive part. Readers whose concerns are computer science will find the
more technical aspects in the later part. The thesis attempts to balance the IS
and CS disciplines in clarifying how ontologies can be used to support
e-commerce.
Development of a
Practical System for Text Content Analysis and Mining
Andrew Edward Smith
Awarded November, 2002
This thesis describes the design, development, and field
testing of a practical and efficient system for tagging, mapping and mining
conceptual information from large text collections. The system was intended to
emulate many of the techniques involved in Content Analysis: a conceptual
overview of the data, trend discovery, and drill-down. The design constraints
for this project were: simplicity, robustness, speed, usability, clarity, and
good precision and recall. The challenge was to see if this could be achieved,
and how well.
The general strategy chosen involved abstracting families of words to thesaurus
concepts. These concepts were then used to classify text at a resolution of
several sentences. The resulting concept tags were then indexed and mapped to
provide a document exploration environment for the user.
To achieve this, several novel algorithms were developed, including a learning
optimiser for automatically adapting a concept to the word usage within the
text body, and a many-body clustering process for generating a cluster map of
concepts based on the text data. Novel techniques for automatically selecting
`interesting' concepts and for detecting aliases were also developed.
Extensive testing was performed on real-world document collections, of interest
to real clients where possible. The primary criterion for success was set at
the outset to be the response of users to the system in real applications. Many
real document sets have been mapped and much was learned from the process, and
from the results. Client response has been favorable.
Multiple
device web-based information system development:
A set of development guidelines
Marcin Metter
Awarded June, 2001
The introduction of the Wireless Application Protocol (WAP)
in 1997 by the WAP Forum has provided highly mobile users with access to 'live'
Internet-based information services in the 'palm of their hand'. Previously,
access to mobile information services was limited to either 'off-line' systems,
where the device was periodically refreshed, or by attaching the device to a
wireless modem or mobile phone, which requires more than one device to be
carried.
The aim of the study was to develop a set of design
guidelines that allow access to complex information systems to be provided by
the majority of web-enabled devices. It is clear form previous examinations
that a single device is not capable of satisfying all of users' requirements,
due to varying device capabilities. Due to the large range of devices available
three general categories, or levels, were used with each category linked to a
specific type of information need.
Using this information it was found that due to the
differences in device limitations between the levels, the 'lowest common
denominator' method would have to be used in order to provide a single interface
for all device levels. The common denominator was found to be WML, although a
slight modification of the document header and footer is required for use on
higher level devices. Although, with both WAP and HTML being the converted into
the single XHTML standard, this problem will be removed.
In order to examine the problems related to the
presentation of complex information to the user with the limitations of WAP,
the examination concentrated on the area drug informatics. This decision was
made due the evident increase in demand for such information because of a
greater emphasis being placed on 'evidence based medicine'.
Two applications where examined for during the guideline
development process, a web-based and a stand-alone application. The examination
of the web-based application provided a list of the common set of features
found within WML and HTML, with the key problems related to navigation, table
structures, and frames being highlighted and discussed. The resulting
guidelines, where then tested on the stand-alone application, with the
discovery that a major problem area was the presentation of table structures
used for comparing information.
A number of possible solutions to the problems have been
developed and are presented, with the guidelines focusing on the use of simple
markup features, such as, text formatting, hyperlinks, and table structures.
Also detailed is the ability to use the ignoring of unknown markup tags by HTML
browsers to an advantage. That is, to allow information to be optimally split
into a set of cards, by a WAP browser, and presented as a single page, by the
HTML browser.
Generating
Database Presentations of L-systems in Virtual Plant Applications
Phoebe Chen
Awarded December, 2000
One of the most important advantages of database systems is
that the underlying mathematics is rich enough to specify very complex
operations with a small number of statements in the database language. This
research covers an aspect of biological informatics, that is the marriage of
information technology and biology, involving the study of real world phenomena
using virtual plants derived from L-systems simulation.
L-systems were introduced in 1968 by Aristid Lindenmayer as
a mathematical model of multicellular organisms. Not much consideration has
been given to the problem of persistent storage for these simulations. Current
procedures for querying data generated by L-systems for scientific experiments,
simulations and measurements are also inadequate. To address these problems the
research in this thesis presents a generic process for data modelling tools
(L-DBM) between L-systems and Database systems.
This thesis shows how L-system productions can be
generically and automatically represented in database schema and how a database
can be populated from the L-system strings. This thesis further describes the
idea of pre-computing recursive structures in the data into derived attributes
using compiler generation. A method to allow a correspondence between biologistís
terms and compiler generated terms in a biologist-friendly computing
environment is supplied. This environment includes a visual query interface.
The L-DBM is a generic procedure. Once the L-DBM gets any specific L-systems
productions and its declarations, its can generate the specific schema for both
simple correspondence terminology and also complex recursive structure data
attributes and relationships. The same correspondence applies to any L-system
using the same vocabulary. Once established it can be used to support an entire
research program. So the research contributes a generic solution for all kinds
of L-systems.
Interfacing Essential Drug Informatics
Vincent Guerrini
Awarded November, 1999
Retrieval of electronic veterinary drug information remains
traditional or human based inhibiting the retrieval of Precise Drug Information
(PDI). This finding was unexpected since drug terms are unique and limited to
about 800 terms. Electronic drug sites used forms or links extracting General
Drug Information (GDI) suggesting that heuristics had not been used in the
design. Electronic Drug information was mainly (83%) text based and indexed in
pharmacological, therapeutic or pharmacopeia terms Survey results showed that
at medical practices, surgery or emergencies, PDI rather than GDI was required.
Replies by 63 veterinarians over the period August 1997 to November 1998,
revealed that 82% preferred PDI rather than GDI. Drug information was mostly
retrieved from textbooks (79%), computers, (45%) and colleagues (33%) whereas
only 6% was retrieved from libraries. On computers, 6% used local databases,
62% the internet and 27% programs. Most respondents (67%) expressed a need for
PDI. The program may be viewed at http://www.uq.edu.au/~csvguerr/about.htm.
Pharmacological and toxicological literature terminology, survey results and
heuristic design suggested that PDI terms be restricted to "drug
name", "form", "interactions", "preparations",
"uses", "doses", "administration",
"precautions", "adverse effects", "warnings" and
"overdose". On the main interface, pre-interest levels were
designated by "allabout", "justabout" and
"moreabout", each linked to key attributes "Species" an The
program was built to be compatible with popular and future systems using XML or
SNOMED terminology. PDI-XML allowed explicit, meaningful, descriptive tags,
self-defined data types and multimedia to be created on one interface. PDI-XML
database linking specifications provided a method for extracting PDI from other
drug text sites or databases. When used in combination with standardized
nomenclature (SNOMED), XML provides more concise access to PDI.
This thesis provides a description of Precise Drug
Informatics Database in HTML or XML in 3 search steps. The functional
electronic version of this thesis may be viewed at http://www.uq.edu.au/~csvguerr/msthesis.htm.
The search innovations included a pre-determined choice to reduce time and
effort, limited search steps, avoidance of forms or queries, limiting the
information, and using meaningful file identifiers . The interface was confined
to functional links with combined terminology and minimal
Managing
complex, open, web-deployable trade objects
Hung Wing
Awarded September, 1998
Abstract
Worldwide co-operation, coupled with increasing competition
in every aspect of business, has forced companies to be more flexible and
efficient than ever before. Consequently, companies need effective ways of
mass-marketing their products and extending their operations to the open global
markets, while still trying to minimise operating costs. This must be achieved
without reduction in the quality expected in conventional business operations.
A demand, then, is created for a just-in-time, on-demand,
team-based, networked, geographically dispersed, and automatic approach to
business operations, aimed at promoting open trading in the immediate future.
The following research providing a generic framework which will support the
distribution, sharing and management of trade documents.
A few years ago, electronic trade transactions in
manufacturing, purchasing and banking were only available to relatively small
`closed' groups of traders who could afford the initial high start-up cost and
elaborate negotiations. Now, due to the successful emergence of the new
supporting technologies, computerised trade transactions can be extended to
accommodate much wider business communities and applications. Ideally, we aim
to have these transactions open and flexible enough to be considered useful to
most Web end-users.
The emerging advanced technlogies such as Internet,
Distributed Object Computing, Component Software, Groupware, Middleware, Global
Directory Services, Electronic Document Interchange (EDI), and Workflow have
indicated that, when combined, these technologies will become the primary
paradigm for capturing corporate information and will become the overall
framework for managing non-record, computerised trade oriented information.
However, these new technologies do not come without
inefficiencies and major shortcomings. Serious problems with ambiguous EDI
messages, inflexible workflow, and the informality associated with the
modelling of business processes and their contents are among the key obstacles
for effective co-operation between two or more workspaces. Furthermore, the
disparity of the above technologies has prevented deployment of and support for
innovative and effective Web-deployable trade applications.
These factors emphasise the need for a generic, integrated,
formal framework which can be used to support trade applications involving a
large number of heterogeneous, autonomous, distributed trade objects. A Web
deployable trade object, in this context, is a special kind of `compound'
document which is composed of many different kinds of contents ranging from a
simple spreadsheet cell with formula, to a framework containing computerised
trade messages and other complex business information and infrastructure.
To facilitate the openness and flexibility of trades, an
architecture based on the so-called Virtual Object Model (VOM), has been
introduced. Based on the generic VOM, this architecture provides an integrated
environment which supports the different trading services. These services
include: 1) the Framework Manager, allowing users to create, view and edit
traded documents; 2) the EDI Mapping Facility, allowing inter-organisation
trade messages to be understood and effectively used within a heterogeneous
trade environment; and finally 3) the Document-based Workflow Management System
(DFMS), allowing trade documents to be systematically routed to the right
interchanges at the appropriate times and in the right situations.
Provision of the above trading services under one
integrated formal framework helps to mask out the complexity associated with
the underlying supporting technologies. In addition, using Conceptual Graphs, a
logic-based, formal language, and other well established formal theories such
as speech act, formal concept analysis, and underlying event logic, to
implement and model the trade objects, allows the complex collaborations and
trade messages and processes to be specified, enforced, and reasoned about. We
believe that simulation, verification and other proactive features of advanced
trade applications can also be facilitated by using the declarative assertions
associated with trade messages and processes.
In short, this thesis describes how we can effectively
deploy and support the next generation of electronic commerce applications. We
start out by examining what kind of trade contents and collaborations should be
captured and subsequently, how these may be formalised. We then identify and
overcome some relevant interoperability problems associated with complex global
trading. In particular, we provide useful models and algorithms to remove some
serious EDI and workflow limitations. By doing this, we hope to supply a formal
construct which will support Web deployable trading concerned with the
specification, distribution, and management of non-record, trade oriented
documents representing corporate information.
Database Discovery in an Organizational Environment
Andrew Goodchild
Awarded September, 1998
Abstract
In a large organisation with potentially hundreds of online
databases, finding and making sense of an unfamiliar database is a daunting
task. Existing approaches to database discovery fail to deal with this problem
effectively. Either the approach does not scale well, as in the multidatabase
approach, or the approach cannot effectively catalogue systems like databases
that contain few useful terms, as in the general resource discovery approach.
Furthermore, a common problem with both these approaches is that they are
narrowly focussed on the technical problems of discovery and leave usability as
an after-thought.
This thesis treats resource discovery fundamentally as a
problem that is embedded in a user's work activities and considers scalability
as a secondary, yet intrinsically related problem. We shall use lessons learned
from the library science community in identifying a framework for building
effective and usable discovery tools and from this framework we will design a
database discovery system that is less susceptible to the problems of existing
systems.
The database discovery tool described in this thesis is
based upon enterprise models. Many organisations already find that enterprise
models are a valuable aid in activities like developing new information
services or business planning. In this thesis we speculate that another useful
application of enterprise models could be resource discovery. This thesis
explores the idea that by relating detailed database schemas to a coarser
grained enterprise model, users can query the enterprise model to uncover data
buried within organisational databases. In examining the use of enterprise
models in database discovery we explores various ways of formalising and
implementing the system and considers some of the computational issues in
dealing with the scale of database discovery in the organisational environment.
From this foundation we have extended it to handle the notion of relevance
ranking, different levels of abstraction in conceptual models and keyword searching.
Distributed Querying with Z+SQL over the Internet
Sonya M. Finnigan
Awarded July, 1999
Abstract
The ANSI/NISO Z39.50 Standard defines a protocol to
facilitate the interconnection of computer systems for the search and retrieval
of information in database. This thesis presents Z+SQL, the adaptation of this
protocol to the SQL domain. Z+SQL unites the advantages of the SQL query
language with the interoperable information retrieval services of Z39.50.
Coupled with the existing Z39.50 profiles, Z+SQL facilitates both dynamic and
interoperable SQL querying and retrieval making distributed querying with SQL
across domain-specific communities a reality.
This thesis briefly explains the importance of having
interoperable information retrieval networks. It promotes a standards-based
approach, and briefly describes the underlining principles behind the Z39.50
protocol. It then describes in detail the proposed SQL extension, Z+SQL, giving
specific examples of how it could be implemented within the museum community
under the CIMI Z39.50 profile. In conclusion, the paper outlines the current
status of Z+SQL, future extensions to the proposal and the planned release of
SQL enabled Z39.50 products.
Thesis Structure:
- Chapter 1 gives an overview of the thesis.
- Chapter 2 outlines the current trends in information retrieval
promoting a standards-based approach. It looks at what the ideal
information retrieval standard would look like and then compares this to
both existing and emerging international and industry standards.
- Chapter 3 presents the central idea of the dissertation,
distributing an SQL query using Z+SQL. Presented first is an analysis of
the problem of making an SQL database available on an open network
environment. It then compounds this problem with the requirement of common
semantics in order to broadcast an SQL query across that environment. The
final part of Chapter 3 is a description of several applications which
motivate such a requirement.
- Chapter 4 describes, in layman's terms, the underlying principles
behind the Z39.50 standard - what it is, how it works, how it fulfils the
need for interoperable information retrieval and provides many of the
facilities envisaged for distributing an SQL query.
- Chapter 5 outlines what Z+SQL is, in particular, formalising
the Z+SQL architectural design with relation to the existing Z39.50 model.
Examples of benefits of Z+SQL to both the existing Z39.50 community and
the SQL community are then discussed.
- Chapter 6 describes in detail Z+SQL as an extension to the
ANSI/NISO Z39.50 Version 3 -1995: it's definition and restrictions.
- Chapter 7 outlines the current status of the Z+SQL proposal
both within the standard process and in ongoing commercial software
development.
- Chapter 8 concludes by summarising the advantages of Z+SQL as
an essential tool for distributing SQL queries over the Internet.
A
Theory for Multi-function Model-Based Reasoning through Context Management
Nirad Sharma
Awarded January, 1998.
Formal descriptions of domain models in a representation language
typically embed task-specific assumptions, hindering their reuse for
problem-solvers other than those for which they were originally captured. To
state models in complete generality requires statement of every possible
qualification for the concepts of the model, a clearly infeasible task.
Further, the exchange of knowledge bases requires an interlingua to be highly
expressive if it is to accommodate exchange of knowledge for a wide array of
tasks and situations.
A trade-off arises between explicating qualifications of
concepts in a theory to improve the generality and the cost of reasoning with a
regress of qualifications. The need to reify and structure sets of
qualifications motivates the formalisation of contexts. A further consequence
of rich interlingua is that specifications cannot be directly mechanised from
due to the intractability associated with the highly expressive form. A more
efficacious approach to knowledge sharing seems to be the design of
representation languages in which models can be stated for sharing between a
few tasks, that have clean unifying semantics, and from which direct
mechanisations are achievable.
The role of contexts as formalised objects in knowledge representation
languages is investigated including the relationships between various
classical, modal and meta-theoretic treatments. The introduction of contexts
into a representation scheme facilitates reasoning with theories and their
languages captured relative to different perspectives and levels of detail
within a uniform formal system. A specific investigation has been undertaken
into the effects on context-naive formulations of subsumption lattices in the
framework of an order-sorted first order logic extended with multiple sort
partial orders and [an essentially multi-modal] context mechanism.
Concentrating on sharing theories between the configuration
and diagnosis tasks, a novel application for a cardinality-minimising variant
of McCarthy's circumscription schema has been observed for providing a common
semantic foundation for the two tasks, particularly when viewed as variants on
finite model generation. While a convenient characterisation of the two tasks
and an interesting application of non-monotonic representations, model
generation from circumscribed first order classical theories is not
computationally feasible using the most general form. CLP(FD) is shown to
provide an effective mechanisation of the model generation tasks for
configuration and diagnosis for an interesting class of constraint-based
specifications for devices and components, demonstrating an interesting
application of constraint logic programming techniques as well as being an
exercise in designing a knowledge representation language.
Compositional modelling of model fragments from a library
facilitates automatic fragment selection and the construction of simplest
adequate models of the devices as different applications require. An
intermediate modelling language and model selection algorithm are presented for
construction of the simplest adequate constraint system for the diagnosis and
configuration tasks from a shared model fragment library, integrating our work
on contexts for multiple perspective libraries and circumscriptive semantic
characterisations.
Non-Monotonic Reasoning and End-User Conceptual Modelling
Anthony Berglas
Awarded March, 1997
An important trend n the design of information systems is
the production of more generalized designs which can be configured by end users
without the need for professional programmers. Techniques include verticalizing
tables, generalizing subtypes and providing complex parameter tables. However,
these designs are difficult to implement, computationally expensive, and often
provide poor user interfaces.
The growth of personal computers has seen the emergence of
many non-computer professionals who can and do build small information systems
using tools such as spreadsheets and simple database products. This thesis
argues that while these "power users" do not have the years of
experience that is required to build complex systems, they could learn how to
make simple changes to existing ones such as adding a new attribute to an
entity. This enables simpler, more specialized designs to be built because
power users can configure the conceptual model itself rather than just update
configuration tables.
This thesis describes the technology that is required to
make this feasible. In particular:
- Fine grained conceptual authorization mechanisms are developed
that can prevent users from corrupting an application's fundamental
integrity;
- An elegant method of expressing active business rules based on
KL-ONE style defined types is provided; and
- CASE technology is developed that enables changes to a schema
to be automatically reflected in regenerated applications.
Existing CASE tools can significantly automate the
production of information systems because much of the processing required to
implement them can be inferred from their conceptual models. However, a
waterfall approach is inevitably used in which default intermediate
representations of programs are generated and then these default
representations are modified to provide the specific functionality that is
required. These modifications need to be manually reapplied if the generators
are rerun to reflect changes in the conceptual model. Power users are expected
to understand basic modelling techniques but not to be competent 4GL
programmers, so this problem needs to be addressed if end user computing is to
be extended to conceptual modelling.
To avoid the waterfall, conceptual models can be annotated
so that application programs can be generated directly from the schema without
the need for the intermediate representation. Numerous annotations are required
to specify an application but they can usually be given a default value which
may be based on the value of other annotations. This results in complex
networks through which default values need to be determined and multiple
inheritance conflicts resolved.
The thesis addresses these conflicts using a new First/Only
inheritance scheme that is based on the Touretsky path-based logics. The
First/Only scheme differs from other path-based logics in that the relationship
between the annotations whose values are being defaulted is not specialization
so different operators are used from the conventional IsA/Is-Not-A ones used by
Touretsky. A formal definition is presented together with an outline how it can
be used to develop non-waterfall CASE tools which can in turn be manipulated by
power end users.
Foundations
of massively parallel relational and deductive databases
Ok Hyeong Cho
Awarded August, 1996
Over decades, rapid advances in semiconductor technology
have made it possible to build massively parallel computers containing tens of
thousands of processors. The technology is likely to continue to progress
further for some time to come. In addition, optical three dimensional storage
and optical interconnections, another rapidly evolving technology, open new opportunities
due to inherent massive parallelism and non-interference of light beams.
However, the approaches used in current parallel database research can not take
advantage of massive parallelism which can be provided by these technologies,
due to the speedup limitation.
In this thesis, we present a computational framework for
relational and deductive database systems which takes advantage of the emerging
opportunities for massive parallelism and discuss the validity and feasibility
of the framework. The approach we take is based on associative computing and
fine-grained parallelism. Associative computing provides massively parallel
computation and content-addressed search, while data parallelism performs
symbolic computation by means of data shuffling very efficiently. The most
important aspect of the framework is that it does not suffer from the speedup
limitation. Rather it exploits massively parallel processors allowing unlimited
speedup.
The framework consists of the Associative Random Access Machine
(ARAM), a data representation scheme, and a set of algorithms for extended
relational algebra and Datalog query evaluation. The ARAM is an architectural
model for massively parallel processors generalized from associative computers
and massively parallel SIMD machines. The data representation scheme, which is
drawn from the SITDAC model, is based on tabular representation of relational
information. The framework assumes the useful facilities and functions of
associative computers, massively parallel SIMD machines and database machines.
The focus of this thesis is on algorithms for extended relational algebra (ERA)
and their application to deductive databases. Algorithms for extended
relational algebra are presented in two paradigms: associative and data
parallel. Associative algorithms are based on the principle of stepwise
refinement and require O(1) or O(n) parallel steps. Data parallel algorithms
are constructed from the primitives of the scan-vector parallel programming
model and other data-parallel operations. They can be performed in O(log n) or
O(log2n). In the algorithms, set-oriented processing is employed to
achieve massive parallelism. For deductive databases, a differential evaluation
scheme for Datalog queries which is based on magic set transformation is
presented. The scheme is designed to maximize the expressibility of Datalog
programs while maintaining set-oriented computability. In addition, it can
incorporate various optimization techniques developed by the deductive database
community.
EDI-based
interoperation of information systems
Mohsen Rohani
Awarded September, 1996
Traditional modelling of information systems has mostly
focussed on analysing data flows and transactions. However, dramatic
improvements in the cost and capabilities of information technology extend
computer use beyond transaction processing into communication and coordination.
Although the penetration of information systems into internal business
processes has been facilitated by the technical quality and capability of
computers and communication devices, the objectives in this arena have not been
achieved by the virtue of the information technology alone. At the same time,
these advancements in distributed computing and networking have provided the
technical basis for building information systems across organizational
boundaries. One approach to constructing such systems has been to employ the
relevant in-house information systems and to have them interoperate by means of
exchanging (structured) messages. Electronic Data Interchange (EDI) is a
special case of this approach. The most complicated form of EDI called
Incremental Paper Trail (IPT) systems are responsible for pursuing the
underlying business processes as well as maintaining the information in their
relevant messages. Making a transition to transaction-oriented approaches,
although possibly helpful in certain circumstances, can not solve the problem
in general. It may also cause either leaving an in-house information system in
an inconsistent state or semantically crashing a global long-duration
transaction.
Once the need for a management system in the IPT approach
to EDI applications is recognized the next step is to investigate the
boundaries of the responsibilities such a manager should take. That is, the borderline
between the IPT management system and the in-house information systems involved
should be determined. The Dynamic Essential Modelling of Organizations (DEMO)
approach to modelling Open Active Systems is used in this regard. The essential
model of an IPT system demonstrated by the Actor-Bank-Channel-Diagramming
technique shows which actions and/or communications are essential, i.e. require
the decision of a responsible human being. The idea is that the IPT management
system must not be directly involved in accomplishing such actions/
communications. Essential actions must be left to be performed by the relevant
organizations. The IPT management system may accomplish the informational
and/or documental actions.
Assuming such a management system supporting IPT
applications, there is a need for a modelling tool in order to describe the
underlying business process to the system. The model must be rich enough to be
able to represent all scenarios in the IPT systems. Furthermore, it should be
capable of supporting on-the-fly changes by providing enough flexibility. The
majority of the procedure-based models (e.g. DOMINO and AMIGO) as well as the
network approaches to modelling conversations derived from the speech-act
theory (e.g. TheCoordinator, CHAOS and ActionWorkflow) implicitly assume that
the underlying process is routine and predictable, and therefore, lack such
adaptability. A less-rigid model, introduced by the COSMOS project, which
supports situatedness by permitting the local management of interactions that
can take place in a globally structured activity, is recommended for IPT
systems. There is, however, a significant gap between the Structure Definition
Language (SDL) introduced by COSMOS and the essential model of the IPT system
depicted in Actor-Bank-Channel Diagramming (ABCD) technique. Using a CASE tool
the process of traversing from ABCD to SDL could be accomplished
semi-automatically. Helpful guidelines and rules required for preparing such a
tool are provided.
Finally, the implementation issues relevant to the proposed
solution for IPT systems are investigated. It is shown that the COSMOS approach
in handling the information of the exchanged messages is not adequate here. On
the other hand, commercially available active database management systems are
unable to manipulate directly SDL rules. Rewriting the SDL rules as the
Event-Condition-Action (ECA) rules manually will cause certain problems in
later maintenance of the system while a fully automatic translation between
these two is not viable. Although some guidelines for a semi-automatic
translation are provided, finding out the required rules in this regard needs
more research. For maintaining the IPT document, various approaches each of
which is suitable for certain circumstances are discussed. It is suggested that
the application of a distributed database management system will solve the
problem in general.
Voice
Navigation of Information Spaces
- Proposed project within EDST CRC
- Outcome of DSTC D.1 workshop,
December 1998
Continuous speech recognition systems are getting better
and better. Why not investigate applications for this technology? People tend
to dismiss speech recognition since it isn't easy to see how present-day
computer applications need it. However, the tasks performed by present-day
computer applications are those suited to keyboard interfaces. Information
tasks where speech input is vital have probably been completely overlooked.
The proposal is not especially to develop better speech
processing technology. The technology is already good enough to do some things
and is improving all the time. The proposal is to think of uses for the
technology and build prototypes using existing technology perhaps with some
plausible cheating. The aim is to identify information processing tasks where
speech is critical, ideally to identify killer applications.
Research like this needs to be carried out in some sort of
quasi-real environment, and should address real needs in a real industry. I
suggest that health is a good candidate, partly because health is already being
considered as a significant application area for DSTC II.
If television programs like ER and Peak Practice are any
guide, there should be many applications for speech. Medical diagnosis and
treatment involve interaction of possibly several practitioners with a patient.
This interaction is carried out in an information space, part of which is in
the heads of the practitioners and patient but the overwhelming majority of it
is external to them. The space includes the patient's current record of
treatment (the chart in a hospital), the patient's medical history (possibly
held by a practice but also possibly held on a smart card, the standard
references, other practitioners (more senior in ER) (with longer acquaintance
with the patient in Peak Practice), specialists, pathology laboratories,
pharmacopoeias and the medical literature.
It is possible to organise the external information space
so that it can monitor the practitioners' interaction with the patient in a way
that augments the practitioners' experience and capabilities, without taking
control. The practitioner could be 'wired' as in police shows so that the
environment could monitor the interaction via speech recognition. A key aspect
of the possibility of interaction is a standard vocabulary, but these already
largely exist (Read codes, SNOMED), so that the practitioner would be able to
speak to the environment with directed messages, distinguishible from exchanges
with the patient.
Most interaction is pretty routine. The environment could
signal to the practitioner that the interaction was normal by some sort of
background hum which could be altered as more unusual areas were entered. The
practitioner could interact with the environment directly, getting responses by
voice or displayed on a screen by a number of methods depending on whether the
patient were an active participant (GP surgery) or passive (casualty ward,
veterinary) and the number of people present.
Other people (specialists, senior practitioners) would be
on call and could be consulted through the environment. This kind of feature
would be especially valuable for practices in rural or remote areas.
Finally, some of the information in the space is of
relevance to the patient. A doctor's surgery has a wall plastered with
anatomical diagrams, baby development stage charts, etc which are used by the
doctor to assist the patient to understand their situation.
There are many research and advanced application issues
which the existing DSTC or its close relatives is competent to address. These
include:
- Instrumentation and input-output devices appropriate to
particular contexts. Communication infrastructure both within the
practitioner's area and external.
- The language of interaction, particularly from the
practitioner's point of view. The practitioner's messages to the
environment would constitute a continuously modified query on the
information space.
- Structure of the information space and query processing / update
procedures which continually bring the information relevant to the
interaction to the surface as the interaction evolves.
- Active agents in the environment which monitor the interaction
for specific purposes. For example, a diagnosis agent which tracks the
practitioner, generating alternative diagnoses and treatments which can
make suggestions if discrepancies arise. Another example would monitor the
interaction from the point of view of particular specialists. The agent
could have ready questions to be used for differential diagnosis or to
determine whether the specialist should be contacted.
- This would be a natural vehicle for telemedicine. The
environment could record and monitor the interaction in such a way that
its knowledge bases could be updated and improved.
