The University of Queensland Homepage
School of ITEE ITEE Main Website

 UQCS: Details for Bob Colomb

 


Abstracts of publications - Bob Colomb

Research Student Theses



Thesis Abstracts

 


Merging Ontologies Requires Interlocking Institutional Worlds

Robert M. Colomb

Mohammad Nazir Ahmad

Abstract

Merging of ontologies is a frequently addressed problem in the ontology literature. This paper argues that in general two even very similar ontologies cannot be merged. Further, where two ontologies can be merged their conceptualizations are special. They are systems of institutional facts which are interlocking. The argument is based on the literature of the federated database problem and on the concepts of speech act and institutional fact.

To appear in Applied Ontology in 2007

paper


Formal versus Material Ontologies for Information Systems Interoperation in the Semantic Web

Robert M. Colomb

Information systems ontology is intended to facilitate interoperability among the many applications which are now becoming available on the Internet. In particular, it is intended to facilitate the development of intelligent agents which can automate a large part of the task of a user achieving some end employing multiple autonomous applications. A large number of ontologies exist supporting specific kinds of interoperation among selected, generally mutually aware, applications. The intent of the upper ontology movement is to develop an abstract description of what there is in the world, in an application-independent form, which can be used both to help build specific ontologies and to help in finding common ground among them. This paper argues that, for the purposes of information systems interoperation and the semantic web, application-independent upper ontologies are unlikely to be successful because of semantic heterogeneity. However, the paper argues for a distinction in upper ontologies between formal and material ontologies, based on analogies with concepts in Kant’s synthetic a priori, and that formal ontologies whose focus is on how we see the world are more likely to be successfully developed in the absence of applications than are material ontologies, which attempt to catalog the world a priori.

The Computer Journal vol. 49, no. 1 pp. 4-19.

paper

 


Issues in Mapping Metamodels in the Ontology Development Metamodel

Robert M. Colomb, Anna Gerber and Michael Lawley

The Ontology Development Metamodel consists of a number of separate metamodels linked by mappings. There are a number of structural heterogeneities among the metamodels which make mapping not straightforward. This paper shows how the mapping technology Query/View/Transform (QVT) can be used to perform the mappings. Some of the problems are addressed, showing how they can be resolved using QVT.

Presented at: 1st International Workshop on the Model-Driven Semantic Web (MSDW 2004) Monterey, California, USA. 20-24 September, 2004.

paper

 


Operationalising Epistemic Modality

Peter Bruza and Robert M. Colomb

In order for intelligent computer agents to interact in a knowledge network, they need to have some  idea what they know and don’t know. This anthropomorphic concept must be operationalised so that we can specify the computer programs. This powerpoint presentation is a step in that direction.

Presented at DSTC Symposium Brisbane, Australia 16-17 August, 2004.

paper


Information Systems Technology Grounded on Institutional Facts

Robert M. Colomb

Abstract

This paper presents a theory explaining the success of information systems development based on SQL-type database technology by showing that the assumptions underlying that technology correspond very closely to the way Searle’s institutional facts are created. The theory presented is a theory of action and design, so its productivity is shown by retrodiction of the necessity for business process engineering to achieve integration of information systems within an organisation, and prediction that interorganisational integration of information systems using the internet can succeed only if the applications share institutional facts. The theory finally is used to predict that autonomous intelligent agent applications can succeed in the information spaces populated by these common institutional facts.

Presented at Information Systems Foundations: Constructing and Criticising 16-17 July 2004, The Australian National University, Canberra, Australia

Paper

Where does the ontology bottom out?

Robert M. Colomb

Abstract

Ontologies are collections of terms with a more or less complex structure, most often used to support interorganisational interoperability. The question is, what are the primitives? I contend that the primitives must be grounded in more-or-less explicit agreements among the parties. The contracts, the standard business practices as formulated in EDI, agreed primitive terms for the product catalogs, prices, etc, all supported by a legal and audit environment. The structural relationships are grounded in more general agreements which take the form of mathematical systems which are standardised in the educational systems of the developed world.

Position paper for panel discussion "Foundations of Information Systems" Australasian Conference on Information Systems 2000 Brisbane, Australia 6-8 December 2000

Paper

 


Why do people pay for information?

Robert M. Colomb

Abstract

This paper investigates the situations where there are incentives for people to pay a premium over the channel costs for information content. It concludes that there are at least four: premium low relative to channel costs and monopoly, which are less interesting as they are not specific to information; where the information need is idiosyncratic; and where the quality of the information source is critical.

Prometheus in (2001) Vol 19, No. 1 45-53..

Paper

 


A Digital Library Needs Many Indexes

Robert M. Colomb

Abstract

The main argument of this paper is that one should expect to have not one, but many indexes for a large, heterogeneous digital library. Philosophical considerations, supported by ethnological studies of information-seeking behaviour, lead one to doubt that a single index would work. Therefore, the undeniable success of single-index physical libraries requires explanation - they work because they are limited in scope, and the reasons why they work are not satisfied by Internet-scale digital libraries. We look at how people find things in the real world, and notice that one information structure they employ is the specialised magazine, of which there are tens of thousands. We look at how the specialised magazine supports practical reasoning, which leads to the expectation that there should be a large number of indexes on the Web in which we call 'behaviour space' as distinct from the 'resource space' in which the recognised indexes live. The paper concludes with suggestions as to what multiple indexes in both behaviour space and resource space would look like and how they would be interrelated.

for comment.

Paper


Category-Theoretic Fibration as an Abstraction Mechanism in Information Systems

Robert M. Colomb

(With C.N.G. Dampney

Michael Johnson

School of Mathematics, Physics, Computing and Electronics, Macquarie University)

Abstract

This paper examines the problem of establishing a formal relationship of abstraction and refinement between abstract enterprise models and the concrete information systems which implement them. It introduces and justifies a number of reasonableness requirements, which turn out to justify the use of category theoretic concepts, particularly fibrations, to precisely specify a semantics for enterprise models which enables them to be considered as abstractions of the conceptual models from which the implementing information systems are built. The category-theoretic concepts are developed towards the problem of testing whether a system satisfies the fibration axioms, and are applied to case studies to demonstrate their practicability.

Acta Informatica to appear. paper (PDF)


Representation of Propositional Expert Systems as Partial Functions

Robert M. Colomb

Abstract

Propositional expert systems classify cases, and can be built in several different forms, including production rules, decision tables and decision trees. These forms are inter-translatable, but the translations are much larger than the originals, often unmanageably large. In this paper a method of controlling the size problem is demonstrated, based on induced partial functional dependencies, which makes the translations practical in a principled way. The set of dependencies can also be used to filter cases to be classified, eliminating spurious cases, and cases for which the classification is likely to be of doubtful validity.

Artificial Intelligence 109 pp. 187-209.

Paper

 


Paper presented at International Conference on Formal Ontology in Information Systems (FOIS'98) Trento, Italy, 6-8 June, 1998. In N. Guarino (ed.) Formal Ontology in Information Systems IOS-Press (Amsterdam) 207-217.

Completeness and Quality of an Ontology for an Information System

Robert M. Colomb

(With Ron Weber

Department of Commerce, The University of Queensland)

Abstract

We examine the problems of completeness and quality in design of information systems. Taking the view that an information is a representation of a social reality created by genres of speech acts, we view the state of an information system as a text, and the dynamics of the system as essentially the dynamics of a text editor. This view enables us to make use of a generalised ontology developed by Bunge to get a clear picture of the functions of an information system, and therefore a set of criteria for ontological completeness. Further, quality in an information system is seen as a matching between the semiotics of the system and the semiotics of the organisation in which the system is embedded, allowing us to make use of the quality principles advocated by Debenham. The value of these results is essentially that they validate the large body of existing information systems, and also validate the basic approach used to construct them, although suggesting some improvements. We can build and use information systems confident that they will be valid under changes in the understanding of meaning and also changes in the understanding of the metaphysics underlying physical and social reality.

Paper

 


The Computer Journal 40(5) 1997 pp. 235 -244.

Impact of Semantic Heterogeneity on Federating Databases

Robert M. Colomb

Abstract

The difficult problems in design of systems which facilitate interoperation and mediation among information sources and their consumers arise from the presence of semantic heterogeneity among the schemas and ontologies supporting the different services. The purpose of this paper is to develop a taxonomy of semantic heterogeneity, and to describe, taking the perspective of text databases, the conditions under which autonomy-respecting interoperation of different kinds are likely to be feasible. The main conclusion is that interoperation can be based on structured database technology only if the participating organisations communicate among themselves, otherwise the considerations underlying text databases dominate the technology used.

Paper


International Journal of Intelligent Systems (1995) Vol. 10, No. 3, pp 295-328.

Strategies for Building Propositional Expert Systems

Robert M. Colomb

(With Charles Y.C. Chung, CSIRO Division of Information Technology)

Abstract

The core of this paper is a proof that stratified Horn clause propositional systems are equivalent to and can be efficiently transformed into decision tables by a process closely related to assumption-based truth maintenance. The transformed systems execute much faster and in a bounded time, leading to the possibility of executing real-time expert systems in microseconds on fine-grained parallel computers. One consequence is to simplify the consistency and completeness analysis for such systems, in particular the problem of ambiguity. A deeper consequence is that it makes sense to view these systems as stochastic processes. This, and an analysis of the problem of maintenance of these systems, leads to the conclusion that by and large rule induction approaches are better than rule construction approaches for building them.


Expert Systems With Applications (1992) Vol 5, No 2/3 pp 411-419.

Computational Stability of Expert Systems

Robert M. Colomb

Abstract

It has been shown that propositional expert systems are equivalent to decision tables, and therefore equivalent to classification systems. In many cases, the elementary facts for the classification may not be accurately known. Even if they are, frequently the expert system reasons on the basis of qualitative descriptors of quantitative measurements, which may be subject to borderline effects. This paper considers the computational stability of the classification in the presence of errors in the data, using concepts derived from error-correcting codes, in particular Hamming distance. It suggests a number of methods of analysis of the decision table to identify potential instabilities, and suggests methods of correcting or avoiding these problems.


Australian Computer Journal Vol. 25, No. 1 (1993) pp. 7-13.

Use of a Personal Workstation to Access Open Network Services

Robert M. Colomb

Work performed while author was visiting the School of Computing Sciences, University of Technology, Sydney.

Abstract

Imagine that people have powerful, flexible workstations which can adapt to their work habits, and that they use an open distributed computing environment for information and computing resources. The user should have a uniform and seamless view of this computing environment. In addition, the user has a large investment in wordprocessors, spreadsheets and other personal productivity tools. It becomes natural then to argue that the network applications should interact with the user employing the user's personal productivity tools. We need to develop an abstract view of the capabilities of a User Interface Management System (UIMS), which is at a much higher level than graphics interface standards like X11, and also standards for the communication of documentation from an application to a UIMS. There must be standards for sharing models of data structures and definitions of the semantics of their various components. This paper sketches some requirements for the solution of these problems, based partly on data base technology and the design of persistent programming languages and access procedures for persistent object stores.

paper

 


Informal talk presented at DSTC Symposium 11-12 July, 1996

A Power User in Cyberspace: A Database Perspective

Abstract

To a power user, cyberspace has three parts: their own computing environment, the organizational computing environments in which the user participates because of their organizational relationships, and the rest of the world on the Net. This paper considers the database-oriented facilities required for this view of cyberspace to work smoothly, the present state of technology and some research issues.

paper

 


Position paper for Information Systems Foundations: Practice and Ontology Workshop Macquarie University, Sydney Australia, 29 September, 1999.

Information Systems Founded on Practice

Robert M. Colomb

Abstract

We worry about foundations out of a fear that our practice might collapse. However, the success of a system must validate the method of building it. Practice can be profitably problematised, though, since it can be discussed, must be taught, and can be improved. The theoretical concepts used to do this do not refer to underlying reality, but are metaphors, limited interpretations of a complex whole. The foundation of practice is practice itself. We can discuss, teach and improve practice by metaphor, but the metaphor is never definitive. We can get as much certainty as it is possible to have, but we can never be certain we understand what we are doing, even though what we do might be effective.

paper

 


Thesis Abstracts


An Architecture for Ubiquitous Mobile Service Delivery

Paul O'Brien

Awarded August 2006

Highly mobile people (HMPs) require flexible, reactive service delivery due to their regularly changing location and activities and the lack of a wired network connection. A mobile service delivery system should be able to detect relevant events that occur such as change of location, availability of new last-minute specials, sales opportunities and safety issues and then reactively take action in response to these events. This work describes a situation management ontology based framework for delivering such a system. Issues addressed include HMP and service states and events, context, situations and situation-action rules, and syntactically and semantically compatible XML ontologies for their specification.

A generic situation management ontology is developed in OWL using the ontology development tool, Protégé. This ontology is combined with domain specific classes in the travel domain to create a travel situation management ontology that can be used as the basis for a ubiquitous mobile travel service application. Using a typical independent traveller scenario, the travel situation management ontology is instantiated to demonstrate its effectiveness. The flexibility of the generic situation management ontology is demonstrated by creating an academic situation management ontology by simply replacing a small number of domain specific classes.

A framework is also proposed that is based on the situation management ontology, distributed, co-operating software agents, and context based filtering, and is suitable for  mobile service delivery. The example framework uses the situation management ontologies developed in this work and action rules to link situation specification to situation detection and action.

The ontologies and action rules are semantically consistent and are specified in the XML based, industry standard  language, OWL, thus drawing together previous independent work in a number of diverse disciplines.


Structuring and Visualising Risk Management

Wei Seng Alan Ho

Awarded July 2006

The dictionary defines risk as the potential harm that may arise from some present process or from some future event while vulnerability is the state of being vulnerable or exposed. Risk management helps to boost security by analysing current vulnerabilities in the organization and assessing their likelihood in relation to the materialisation of a risk. In this project's context, a relationship between risk and vulnerability can be defined as a particular vulnerability that is contributing to the materialisation of a risk. However, such relationships between risks and vulnerabilities are often complex and poses a challenge for human understanding. It is necessary to provide visualisation to easily see the relationships between risks and vulnerabilities and vulnerabilities that are contributing to the materialisation of a risk.

In this project, the development of a causal network was proposed to visualise the relationships between risks and vulnerabilities. Through using deduction to reason about the likelihood of the risk under conditions of the presence or absence of vulnerabilities, the proposed causal network can help to structure the vulnerabilities into categories with relation to the risks that they are contributing to. Then, with visualisation included to present the categorisation, it allows the user to have a structured way in seeing a top-level view of risks that are high, medium or low in severity as well as a drill-down view of individual vulnerabilities that are contributing to a risk.

In addition, a belief calculus called Subjective Logic (SL) was introduced to aid risk experts in expressing their opinion about vulnerabilities and risks in a more realistic approach, which is enabling them to differentiate between their gut feeling and past experiences. Instead of representing opinion in a one-dimensional (1D) scalar format, SL is adapted to represent conditional and joint probability calculations, as well as combining two joint probabilities in a three-dimensional (3D) format (belief, disbelief, and uncertainty). This provides a richer input for risk assessment because SL is suitable for such situation where there is more or less uncertainty about whether a given proposition is true of false. The visualisation strategy is also adapted to exploit the richer risk assessment so that it provides the user a richer risk picture that enables them to make value-added risk assessment and mitigation strategies.

This project believes the causal network together with SL can help organization allocate valuable resources to derive mitigation strategies to resolve risks.

 


Using Conceptual Structures and Ontologies to Support E-Commerce

Ahmed Kayed

Awarded January, 2003

Electronic Commerce (EC) is emerging as a major Web-supported application. EC supports many business transactions via a network. The Internet is an open environment, widely distributed, and relatively inexpensive. Business transactions usually run under closed environments. To conduct business on the Internet, many problems must be solved. Examples of these problems are: security, authentication, heterogeneity, interoperability, and ontological problems. It is the aim of this research to provide an infrastructure for business-to-business EC. To narrow the scope of this research the focus is a specific business process, the tendering process. To support EC applications (tendering in particular), ontologies were used to solve many problems in this domain. It has been argued that beyond software engineering and process engineering, ontological engineering is the third capability needed if successful e-commerce is to be realized. Conceptual Graphs (CGs) are used to implement these ontologies. CGs are a method of knowledge representation developed by Sowa based on Charles Peirce's Existential Graphs and semantic networks of artificial intelligence.

This research is directed to answer the question: How can explicit ontologies be obtained, constructed, used and implemented to support e-commerce (tendering in particular)? To answer this question in a practical way, three more specific questions are defined. They are: How can ontologies be built and used generally and in the tendering domain? How can CGs be used to implement these ontologies? What can ontology offer for tendering automation?

The research theme can be summarized as creating a new method for building and managing a tendering system and solving some problems in CGs to implement an ontology for the tendering domain. This thesis shows that ontologies and CGs could be used to facilitate and support e-commerce. An ontological-based tendering system will help in testing the feasibility of the ontological approach, which will contribute to building a new generation of business-to-business EC. The proposed solution deploys the mediator concept to build a shared ontology. The mediator will be responsible for maintaining different types of ontologies and performing different types of matching. This will facilitate the automation of many tendering activities such as tender forming, buyer and seller matching, bid evaluation and other activities. Four levels of abstractions are defined to build the ontologies. At some levels, two types of ontology have been established: one for concepts and the other for structures. Some CG tools have been used to build CG structures for tendering from existing Electronic Data Interchange (EDI) messages.  Algorithms have been developed to extract signatures, which is a primitive CG where a single relation links two of more concepts, from CG-EDI templates. Ontologies have been used to index the tendering data. Indexes have been built around signatures. An algorithm has been developed to index and retrieve tendering information using CGs and ontologies. Using CGs to implement ontologies has been formally analyzed using the Bunge-Wand-Weber (BWW) model.

Tendering is well addressed in many disciplines and many commercial systems have automated the process or a part of the process. The significant point in this research is using explicit ontologies and deploying the e-mediator concept for matchmaking in the tendering domain.

The existence of these ontologies means that some means to manage them is required. Many ontology-based systems build tools that help them in managing their ontologies, but there are no clear methodologies to build such a system. This thesis articulates specifications for an Ontology Management System (OMS) using CGs. The meaning is defined, the components are identified, and the methodology to build an OMS using CGs is outlined.

This thesis stands in between Information Systems (IS) (which covers a macro view or a descriptive view) and Computer Science (CS) (which covers a micro view or technical view). The reader whose background is information systems will find the first chapters of this thesis are the more business oriented and descriptive part. Readers whose concerns are computer science will find the more technical aspects in the later part. The thesis attempts to balance the IS and CS disciplines in clarifying how ontologies can be used to support e-commerce.


Development of a Practical System for Text Content Analysis and Mining

Andrew Edward Smith

Awarded November, 2002

This thesis describes the design, development, and field testing of a practical and efficient system for tagging, mapping and mining conceptual information from large text collections. The system was intended to emulate many of the techniques involved in Content Analysis: a conceptual
overview of the data, trend discovery, and drill-down. The design constraints for this project were: simplicity, robustness, speed, usability, clarity, and good precision and recall. The challenge was to see if this could be achieved, and how well.

The general strategy chosen involved abstracting families of words to thesaurus concepts. These concepts were then used to classify text at a resolution of several sentences. The resulting concept tags were then indexed and mapped to provide a document exploration environment for the user.

To achieve this, several novel algorithms were developed, including a learning optimiser for automatically adapting a concept to the word usage within the text body, and a many-body clustering process for generating a cluster map of concepts based on the text data. Novel techniques for automatically selecting `interesting' concepts and for detecting aliases were also developed.

Extensive testing was performed on real-world document collections, of interest to real clients where possible. The primary criterion for success was set at the outset to be the response of users to the system in real applications. Many real document sets have been mapped and much was learned from the process, and from the results. Client response has been favorable.


Multiple device web-based information system development:
A set of development guidelines

Marcin Metter

Awarded June, 2001

The introduction of the Wireless Application Protocol (WAP) in 1997 by the WAP Forum has provided highly mobile users with access to 'live' Internet-based information services in the 'palm of their hand'. Previously, access to mobile information services was limited to either 'off-line' systems, where the device was periodically refreshed, or by attaching the device to a wireless modem or mobile phone, which requires more than one device to be carried.

The aim of the study was to develop a set of design guidelines that allow access to complex information systems to be provided by the majority of web-enabled devices. It is clear form previous examinations that a single device is not capable of satisfying all of users' requirements, due to varying device capabilities. Due to the large range of devices available three general categories, or levels, were used with each category linked to a specific type of information need.

Using this information it was found that due to the differences in device limitations between the levels, the 'lowest common denominator' method would have to be used in order to provide a single interface for all device levels. The common denominator was found to be WML, although a slight modification of the document header and footer is required for use on higher level devices. Although, with both WAP and HTML being the converted into the single XHTML standard, this problem will be removed.

In order to examine the problems related to the presentation of complex information to the user with the limitations of WAP, the examination concentrated on the area drug informatics. This decision was made due the evident increase in demand for such information because of a greater emphasis being placed on 'evidence based medicine'.

Two applications where examined for during the guideline development process, a web-based and a stand-alone application. The examination of the web-based application provided a list of the common set of features found within WML and HTML, with the key problems related to navigation, table structures, and frames being highlighted and discussed. The resulting guidelines, where then tested on the stand-alone application, with the discovery that a major problem area was the presentation of table structures used for comparing information.

A number of possible solutions to the problems have been developed and are presented, with the guidelines focusing on the use of simple markup features, such as, text formatting, hyperlinks, and table structures. Also detailed is the ability to use the ignoring of unknown markup tags by HTML browsers to an advantage. That is, to allow information to be optimally split into a set of cards, by a WAP browser, and presented as a single page, by the HTML browser.


Generating Database Presentations of L-systems in Virtual Plant Applications

Phoebe Chen

Awarded December, 2000

One of the most important advantages of database systems is that the underlying mathematics is rich enough to specify very complex operations with a small number of statements in the database language. This research covers an aspect of biological informatics, that is the marriage of
information technology and biology, involving the study of real world phenomena using virtual plants derived from L-systems simulation.

L-systems were introduced in 1968 by Aristid Lindenmayer as a mathematical model of multicellular organisms. Not much consideration has been given to the problem of persistent storage for these simulations. Current procedures for querying data generated by L-systems for scientific experiments, simulations and measurements are also inadequate. To address these problems the research in this thesis presents a generic process for data modelling tools (L-DBM) between L-systems and Database systems.

This thesis shows how L-system productions can be generically and automatically represented in database schema and how a database can be populated from the L-system strings. This thesis further describes the idea of pre-computing recursive structures in the data into derived attributes
using compiler generation. A method to allow a correspondence between biologistís terms and compiler generated terms in a biologist-friendly computing environment is supplied. This environment includes a visual query interface. The L-DBM is a generic procedure. Once the L-DBM gets any specific L-systems productions and its declarations, its can generate the specific schema for both simple correspondence terminology and also complex recursive structure data attributes and relationships. The same correspondence applies to any L-system using the same vocabulary. Once established it can be used to support an entire research program. So the research contributes a generic solution for all kinds of L-systems.


Interfacing Essential Drug Informatics

Vincent Guerrini

Awarded November, 1999

Retrieval of electronic veterinary drug information remains traditional or human based inhibiting the retrieval of Precise Drug Information (PDI). This finding was unexpected since drug terms are unique and limited to about 800 terms. Electronic drug sites used forms or links extracting General Drug Information (GDI) suggesting that heuristics had not been used in the design. Electronic Drug information was mainly (83%) text based and indexed in pharmacological, therapeutic or pharmacopeia terms Survey results showed that at medical practices, surgery or emergencies, PDI rather than GDI was required. Replies by 63 veterinarians over the period August 1997 to November 1998, revealed that 82% preferred PDI rather than GDI. Drug information was mostly retrieved from textbooks (79%), computers, (45%) and colleagues (33%) whereas only 6% was retrieved from libraries. On computers, 6% used local databases, 62% the internet and 27% programs. Most respondents (67%) expressed a need for PDI. The program may be viewed at http://www.uq.edu.au/~csvguerr/about.htm. Pharmacological and toxicological literature terminology, survey results and heuristic design suggested that PDI terms be restricted to "drug name", "form", "interactions", "preparations", "uses", "doses", "administration", "precautions", "adverse effects", "warnings" and "overdose". On the main interface, pre-interest levels were designated by "allabout", "justabout" and "moreabout", each linked to key attributes "Species" an The program was built to be compatible with popular and future systems using XML or SNOMED terminology. PDI-XML allowed explicit, meaningful, descriptive tags, self-defined data types and multimedia to be created on one interface. PDI-XML database linking specifications provided a method for extracting PDI from other drug text sites or databases. When used in combination with standardized nomenclature (SNOMED), XML provides more concise access to PDI.

This thesis provides a description of Precise Drug Informatics Database in HTML or XML in 3 search steps. The functional electronic version of this thesis may be viewed at http://www.uq.edu.au/~csvguerr/msthesis.htm. The search innovations included a pre-determined choice to reduce time and effort, limited search steps, avoidance of forms or queries, limiting the information, and using meaningful file identifiers . The interface was confined to functional links with combined terminology and minimal


Managing complex, open, web-deployable trade objects

Hung Wing

Awarded September, 1998

Abstract

Worldwide co-operation, coupled with increasing competition in every aspect of business, has forced companies to be more flexible and efficient than ever before. Consequently, companies need effective ways of mass-marketing their products and extending their operations to the open global markets, while still trying to minimise operating costs. This must be achieved without reduction in the quality expected in conventional business operations.

A demand, then, is created for a just-in-time, on-demand, team-based, networked, geographically dispersed, and automatic approach to business operations, aimed at promoting open trading in the immediate future. The following research providing a generic framework which will support the distribution, sharing and management of trade documents.

A few years ago, electronic trade transactions in manufacturing, purchasing and banking were only available to relatively small `closed' groups of traders who could afford the initial high start-up cost and elaborate negotiations. Now, due to the successful emergence of the new supporting technologies, computerised trade transactions can be extended to accommodate much wider business communities and applications. Ideally, we aim to have these transactions open and flexible enough to be considered useful to most Web end-users.

The emerging advanced technlogies such as Internet, Distributed Object Computing, Component Software, Groupware, Middleware, Global Directory Services, Electronic Document Interchange (EDI), and Workflow have indicated that, when combined, these technologies will become the primary paradigm for capturing corporate information and will become the overall framework for managing non-record, computerised trade oriented information.

However, these new technologies do not come without inefficiencies and major shortcomings. Serious problems with ambiguous EDI messages, inflexible workflow, and the informality associated with the modelling of business processes and their contents are among the key obstacles for effective co-operation between two or more workspaces. Furthermore, the disparity of the above technologies has prevented deployment of and support for innovative and effective Web-deployable trade applications.

These factors emphasise the need for a generic, integrated, formal framework which can be used to support trade applications involving a large number of heterogeneous, autonomous, distributed trade objects. A Web deployable trade object, in this context, is a special kind of `compound' document which is composed of many different kinds of contents ranging from a simple spreadsheet cell with formula, to a framework containing computerised trade messages and other complex business information and infrastructure.

To facilitate the openness and flexibility of trades, an architecture based on the so-called Virtual Object Model (VOM), has been introduced. Based on the generic VOM, this architecture provides an integrated environment which supports the different trading services. These services include: 1) the Framework Manager, allowing users to create, view and edit traded documents; 2) the EDI Mapping Facility, allowing inter-organisation trade messages to be understood and effectively used within a heterogeneous trade environment; and finally 3) the Document-based Workflow Management System (DFMS), allowing trade documents to be systematically routed to the right interchanges at the appropriate times and in the right situations.

Provision of the above trading services under one integrated formal framework helps to mask out the complexity associated with the underlying supporting technologies. In addition, using Conceptual Graphs, a logic-based, formal language, and other well established formal theories such as speech act, formal concept analysis, and underlying event logic, to implement and model the trade objects, allows the complex collaborations and trade messages and processes to be specified, enforced, and reasoned about. We believe that simulation, verification and other proactive features of advanced trade applications can also be facilitated by using the declarative assertions associated with trade messages and processes.

In short, this thesis describes how we can effectively deploy and support the next generation of electronic commerce applications. We start out by examining what kind of trade contents and collaborations should be captured and subsequently, how these may be formalised. We then identify and overcome some relevant interoperability problems associated with complex global trading. In particular, we provide useful models and algorithms to remove some serious EDI and workflow limitations. By doing this, we hope to supply a formal construct which will support Web deployable trading concerned with the specification, distribution, and management of non-record, trade oriented documents representing corporate information.


Database Discovery in an Organizational Environment

Andrew Goodchild

Awarded September, 1998

Abstract

In a large organisation with potentially hundreds of online databases, finding and making sense of an unfamiliar database is a daunting task. Existing approaches to database discovery fail to deal with this problem effectively. Either the approach does not scale well, as in the multidatabase approach, or the approach cannot effectively catalogue systems like databases that contain few useful terms, as in the general resource discovery approach. Furthermore, a common problem with both these approaches is that they are narrowly focussed on the technical problems of discovery and leave usability as an after-thought.

This thesis treats resource discovery fundamentally as a problem that is embedded in a user's work activities and considers scalability as a secondary, yet intrinsically related problem. We shall use lessons learned from the library science community in identifying a framework for building effective and usable discovery tools and from this framework we will design a database discovery system that is less susceptible to the problems of existing systems.

The database discovery tool described in this thesis is based upon enterprise models. Many organisations already find that enterprise models are a valuable aid in activities like developing new information services or business planning. In this thesis we speculate that another useful application of enterprise models could be resource discovery. This thesis explores the idea that by relating detailed database schemas to a coarser grained enterprise model, users can query the enterprise model to uncover data buried within organisational databases. In examining the use of enterprise models in database discovery we explores various ways of formalising and implementing the system and considers some of the computational issues in dealing with the scale of database discovery in the organisational environment. From this foundation we have extended it to handle the notion of relevance ranking, different levels of abstraction in conceptual models and keyword searching.


Distributed Querying with Z+SQL over the Internet

Sonya M. Finnigan

Awarded July, 1999

Abstract

The ANSI/NISO Z39.50 Standard defines a protocol to facilitate the interconnection of computer systems for the search and retrieval of information in database. This thesis presents Z+SQL, the adaptation of this protocol to the SQL domain. Z+SQL unites the advantages of the SQL query language with the interoperable information retrieval services of Z39.50. Coupled with the existing Z39.50 profiles, Z+SQL facilitates both dynamic and interoperable SQL querying and retrieval making distributed querying with SQL across domain-specific communities a reality.

This thesis briefly explains the importance of having interoperable information retrieval networks. It promotes a standards-based approach, and briefly describes the underlining principles behind the Z39.50 protocol. It then describes in detail the proposed SQL extension, Z+SQL, giving specific examples of how it could be implemented within the museum community under the CIMI Z39.50 profile. In conclusion, the paper outlines the current status of Z+SQL, future extensions to the proposal and the planned release of SQL enabled Z39.50 products.

Thesis Structure:

  • Chapter 1 gives an overview of the thesis.
  • Chapter 2 outlines the current trends in information retrieval promoting a standards-based approach. It looks at what the ideal information retrieval standard would look like and then compares this to both existing and emerging international and industry standards.
  • Chapter 3 presents the central idea of the dissertation, distributing an SQL query using Z+SQL. Presented first is an analysis of the problem of making an SQL database available on an open network environment. It then compounds this problem with the requirement of common semantics in order to broadcast an SQL query across that environment. The final part of Chapter 3 is a description of several applications which motivate such a requirement.
  • Chapter 4 describes, in layman's terms, the underlying principles behind the Z39.50 standard - what it is, how it works, how it fulfils the need for interoperable information retrieval and provides many of the facilities envisaged for distributing an SQL query.
  • Chapter 5 outlines what Z+SQL is, in particular, formalising the Z+SQL architectural design with relation to the existing Z39.50 model. Examples of benefits of Z+SQL to both the existing Z39.50 community and the SQL community are then discussed.
  • Chapter 6 describes in detail Z+SQL as an extension to the ANSI/NISO Z39.50 Version 3 -1995: it's definition and restrictions.
  • Chapter 7 outlines the current status of the Z+SQL proposal both within the standard process and in ongoing commercial software development.
  • Chapter 8 concludes by summarising the advantages of Z+SQL as an essential tool for distributing SQL queries over the Internet.

A Theory for Multi-function Model-Based Reasoning through Context Management

Nirad Sharma

Awarded January, 1998.

Formal descriptions of domain models in a representation language typically embed task-specific assumptions, hindering their reuse for problem-solvers other than those for which they were originally captured. To state models in complete generality requires statement of every possible qualification for the concepts of the model, a clearly infeasible task. Further, the exchange of knowledge bases requires an interlingua to be highly expressive if it is to accommodate exchange of knowledge for a wide array of tasks and situations.

A trade-off arises between explicating qualifications of concepts in a theory to improve the generality and the cost of reasoning with a regress of qualifications. The need to reify and structure sets of qualifications motivates the formalisation of contexts. A further consequence of rich interlingua is that specifications cannot be directly mechanised from due to the intractability associated with the highly expressive form. A more efficacious approach to knowledge sharing seems to be the design of representation languages in which models can be stated for sharing between a few tasks, that have clean unifying semantics, and from which direct mechanisations are achievable.

The role of contexts as formalised objects in knowledge representation languages is investigated including the relationships between various classical, modal and meta-theoretic treatments. The introduction of contexts into a representation scheme facilitates reasoning with theories and their languages captured relative to different perspectives and levels of detail within a uniform formal system. A specific investigation has been undertaken into the effects on context-naive formulations of subsumption lattices in the framework of an order-sorted first order logic extended with multiple sort partial orders and [an essentially multi-modal] context mechanism.

Concentrating on sharing theories between the configuration and diagnosis tasks, a novel application for a cardinality-minimising variant of McCarthy's circumscription schema has been observed for providing a common semantic foundation for the two tasks, particularly when viewed as variants on finite model generation. While a convenient characterisation of the two tasks and an interesting application of non-monotonic representations, model generation from circumscribed first order classical theories is not computationally feasible using the most general form. CLP(FD) is shown to provide an effective mechanisation of the model generation tasks for configuration and diagnosis for an interesting class of constraint-based specifications for devices and components, demonstrating an interesting application of constraint logic programming techniques as well as being an exercise in designing a knowledge representation language.

Compositional modelling of model fragments from a library facilitates automatic fragment selection and the construction of simplest adequate models of the devices as different applications require. An intermediate modelling language and model selection algorithm are presented for construction of the simplest adequate constraint system for the diagnosis and configuration tasks from a shared model fragment library, integrating our work on contexts for multiple perspective libraries and circumscriptive semantic characterisations.


Non-Monotonic Reasoning and End-User Conceptual Modelling

Anthony Berglas

Awarded March, 1997

An important trend n the design of information systems is the production of more generalized designs which can be configured by end users without the need for professional programmers. Techniques include verticalizing tables, generalizing subtypes and providing complex parameter tables. However, these designs are difficult to implement, computationally expensive, and often provide poor user interfaces.

The growth of personal computers has seen the emergence of many non-computer professionals who can and do build small information systems using tools such as spreadsheets and simple database products. This thesis argues that while these "power users" do not have the years of experience that is required to build complex systems, they could learn how to make simple changes to existing ones such as adding a new attribute to an entity. This enables simpler, more specialized designs to be built because power users can configure the conceptual model itself rather than just update configuration tables.

This thesis describes the technology that is required to make this feasible. In particular:

  • Fine grained conceptual authorization mechanisms are developed that can prevent users from corrupting an application's fundamental integrity;
  • An elegant method of expressing active business rules based on KL-ONE style defined types is provided; and
  • CASE technology is developed that enables changes to a schema to be automatically reflected in regenerated applications.

Existing CASE tools can significantly automate the production of information systems because much of the processing required to implement them can be inferred from their conceptual models. However, a waterfall approach is inevitably used in which default intermediate representations of programs are generated and then these default representations are modified to provide the specific functionality that is required. These modifications need to be manually reapplied if the generators are rerun to reflect changes in the conceptual model. Power users are expected to understand basic modelling techniques but not to be competent 4GL programmers, so this problem needs to be addressed if end user computing is to be extended to conceptual modelling.

To avoid the waterfall, conceptual models can be annotated so that application programs can be generated directly from the schema without the need for the intermediate representation. Numerous annotations are required to specify an application but they can usually be given a default value which may be based on the value of other annotations. This results in complex networks through which default values need to be determined and multiple inheritance conflicts resolved.

The thesis addresses these conflicts using a new First/Only inheritance scheme that is based on the Touretsky path-based logics. The First/Only scheme differs from other path-based logics in that the relationship between the annotations whose values are being defaulted is not specialization so different operators are used from the conventional IsA/Is-Not-A ones used by Touretsky. A formal definition is presented together with an outline how it can be used to develop non-waterfall CASE tools which can in turn be manipulated by power end users.


Foundations of massively parallel relational and deductive databases

Ok Hyeong Cho

Awarded August, 1996

Over decades, rapid advances in semiconductor technology have made it possible to build massively parallel computers containing tens of thousands of processors. The technology is likely to continue to progress further for some time to come. In addition, optical three dimensional storage and optical interconnections, another rapidly evolving technology, open new opportunities due to inherent massive parallelism and non-interference of light beams. However, the approaches used in current parallel database research can not take advantage of massive parallelism which can be provided by these technologies, due to the speedup limitation.

In this thesis, we present a computational framework for relational and deductive database systems which takes advantage of the emerging opportunities for massive parallelism and discuss the validity and feasibility of the framework. The approach we take is based on associative computing and fine-grained parallelism. Associative computing provides massively parallel computation and content-addressed search, while data parallelism performs symbolic computation by means of data shuffling very efficiently. The most important aspect of the framework is that it does not suffer from the speedup limitation. Rather it exploits massively parallel processors allowing unlimited speedup.

The framework consists of the Associative Random Access Machine (ARAM), a data representation scheme, and a set of algorithms for extended relational algebra and Datalog query evaluation. The ARAM is an architectural model for massively parallel processors generalized from associative computers and massively parallel SIMD machines. The data representation scheme, which is drawn from the SITDAC model, is based on tabular representation of relational information. The framework assumes the useful facilities and functions of associative computers, massively parallel SIMD machines and database machines. The focus of this thesis is on algorithms for extended relational algebra (ERA) and their application to deductive databases. Algorithms for extended relational algebra are presented in two paradigms: associative and data parallel. Associative algorithms are based on the principle of stepwise refinement and require O(1) or O(n) parallel steps. Data parallel algorithms are constructed from the primitives of the scan-vector parallel programming model and other data-parallel operations. They can be performed in O(log n) or O(log2n). In the algorithms, set-oriented processing is employed to achieve massive parallelism. For deductive databases, a differential evaluation scheme for Datalog queries which is based on magic set transformation is presented. The scheme is designed to maximize the expressibility of Datalog programs while maintaining set-oriented computability. In addition, it can incorporate various optimization techniques developed by the deductive database community.


EDI-based interoperation of information systems

Mohsen Rohani

Awarded September, 1996

Traditional modelling of information systems has mostly focussed on analysing data flows and transactions. However, dramatic improvements in the cost and capabilities of information technology extend computer use beyond transaction processing into communication and coordination. Although the penetration of information systems into internal business processes has been facilitated by the technical quality and capability of computers and communication devices, the objectives in this arena have not been achieved by the virtue of the information technology alone. At the same time, these advancements in distributed computing and networking have provided the technical basis for building information systems across organizational boundaries. One approach to constructing such systems has been to employ the relevant in-house information systems and to have them interoperate by means of exchanging (structured) messages. Electronic Data Interchange (EDI) is a special case of this approach. The most complicated form of EDI called Incremental Paper Trail (IPT) systems are responsible for pursuing the underlying business processes as well as maintaining the information in their relevant messages. Making a transition to transaction-oriented approaches, although possibly helpful in certain circumstances, can not solve the problem in general. It may also cause either leaving an in-house information system in an inconsistent state or semantically crashing a global long-duration transaction.

Once the need for a management system in the IPT approach to EDI applications is recognized the next step is to investigate the boundaries of the responsibilities such a manager should take. That is, the borderline between the IPT management system and the in-house information systems involved should be determined. The Dynamic Essential Modelling of Organizations (DEMO) approach to modelling Open Active Systems is used in this regard. The essential model of an IPT system demonstrated by the Actor-Bank-Channel-Diagramming technique shows which actions and/or communications are essential, i.e. require the decision of a responsible human being. The idea is that the IPT management system must not be directly involved in accomplishing such actions/ communications. Essential actions must be left to be performed by the relevant organizations. The IPT management system may accomplish the informational and/or documental actions.

Assuming such a management system supporting IPT applications, there is a need for a modelling tool in order to describe the underlying business process to the system. The model must be rich enough to be able to represent all scenarios in the IPT systems. Furthermore, it should be capable of supporting on-the-fly changes by providing enough flexibility. The majority of the procedure-based models (e.g. DOMINO and AMIGO) as well as the network approaches to modelling conversations derived from the speech-act theory (e.g. TheCoordinator, CHAOS and ActionWorkflow) implicitly assume that the underlying process is routine and predictable, and therefore, lack such adaptability. A less-rigid model, introduced by the COSMOS project, which supports situatedness by permitting the local management of interactions that can take place in a globally structured activity, is recommended for IPT systems. There is, however, a significant gap between the Structure Definition Language (SDL) introduced by COSMOS and the essential model of the IPT system depicted in Actor-Bank-Channel Diagramming (ABCD) technique. Using a CASE tool the process of traversing from ABCD to SDL could be accomplished semi-automatically. Helpful guidelines and rules required for preparing such a tool are provided.

Finally, the implementation issues relevant to the proposed solution for IPT systems are investigated. It is shown that the COSMOS approach in handling the information of the exchanged messages is not adequate here. On the other hand, commercially available active database management systems are unable to manipulate directly SDL rules. Rewriting the SDL rules as the Event-Condition-Action (ECA) rules manually will cause certain problems in later maintenance of the system while a fully automatic translation between these two is not viable. Although some guidelines for a semi-automatic translation are provided, finding out the required rules in this regard needs more research. For maintaining the IPT document, various approaches each of which is suitable for certain circumstances are discussed. It is suggested that the application of a distributed database management system will solve the problem in general.


Voice Navigation of Information Spaces

  • Proposed project within EDST CRC
  • Outcome of DSTC D.1 workshop, December 1998

Continuous speech recognition systems are getting better and better. Why not investigate applications for this technology? People tend to dismiss speech recognition since it isn't easy to see how present-day computer applications need it. However, the tasks performed by present-day computer applications are those suited to keyboard interfaces. Information tasks where speech input is vital have probably been completely overlooked.

The proposal is not especially to develop better speech processing technology. The technology is already good enough to do some things and is improving all the time. The proposal is to think of uses for the technology and build prototypes using existing technology perhaps with some plausible cheating. The aim is to identify information processing tasks where speech is critical, ideally to identify killer applications.

Research like this needs to be carried out in some sort of quasi-real environment, and should address real needs in a real industry. I suggest that health is a good candidate, partly because health is already being considered as a significant application area for DSTC II.

If television programs like ER and Peak Practice are any guide, there should be many applications for speech. Medical diagnosis and treatment involve interaction of possibly several practitioners with a patient. This interaction is carried out in an information space, part of which is in the heads of the practitioners and patient but the overwhelming majority of it is external to them. The space includes the patient's current record of treatment (the chart in a hospital), the patient's medical history (possibly held by a practice but also possibly held on a smart card, the standard references, other practitioners (more senior in ER) (with longer acquaintance with the patient in Peak Practice), specialists, pathology laboratories, pharmacopoeias and the medical literature.

It is possible to organise the external information space so that it can monitor the practitioners' interaction with the patient in a way that augments the practitioners' experience and capabilities, without taking control. The practitioner could be 'wired' as in police shows so that the environment could monitor the interaction via speech recognition. A key aspect of the possibility of interaction is a standard vocabulary, but these already largely exist (Read codes, SNOMED), so that the practitioner would be able to speak to the environment with directed messages, distinguishible from exchanges with the patient.

Most interaction is pretty routine. The environment could signal to the practitioner that the interaction was normal by some sort of background hum which could be altered as more unusual areas were entered. The practitioner could interact with the environment directly, getting responses by voice or displayed on a screen by a number of methods depending on whether the patient were an active participant (GP surgery) or passive (casualty ward, veterinary) and the number of people present.

Other people (specialists, senior practitioners) would be on call and could be consulted through the environment. This kind of feature would be especially valuable for practices in rural or remote areas.

Finally, some of the information in the space is of relevance to the patient. A doctor's surgery has a wall plastered with anatomical diagrams, baby development stage charts, etc which are used by the doctor to assist the patient to understand their situation.

There are many research and advanced application issues which the existing DSTC or its close relatives is competent to address. These include:

  1. Instrumentation and input-output devices appropriate to particular contexts. Communication infrastructure both within the practitioner's area and external.
  2. The language of interaction, particularly from the practitioner's point of view. The practitioner's messages to the environment would constitute a continuously modified query on the information space.
  3. Structure of the information space and query processing / update procedures which continually bring the information relevant to the interaction to the surface as the interaction evolves.
  4. Active agents in the environment which monitor the interaction for specific purposes. For example, a diagnosis agent which tracks the practitioner, generating alternative diagnoses and treatments which can make suggestions if discrepancies arise. Another example would monitor the interaction from the point of view of particular specialists. The agent could have ready questions to be used for differential diagnosis or to determine whether the specialist should be contacted.
  5. This would be a natural vehicle for telemedicine. The environment could record and monitor the interaction in such a way that its knowledge bases could be updated and improved.