This initiative addresses the problem of large-scale, cross-institutional data resource access, sharing and fusion. The core part of this initiative is a pilot study to identify requirements, system architectures, and key technology barriers to establishing an ICT infrastructure to support large-scale data resource sharing between research institutions through a case study in environmental science.
Environmental scientists provide environmental monitoring and research, assessment, modelling and information services for a wide range of areas such as coastal zone, biodiversity, air and water quality, to support decision making by the government and the community. There exists a large collection of data, comprehensive in terms of geographical and temporal coverage and thematic layers, and a rich body of sophisticated environmental models. However, there are several well recognised difficulties in achieving the full outcome potential from the large investment in collecting environmental data and developing models. Two prominent problems are:
Any benefits from developing and adopting various standards, and advances in computer hardware are overwhelmed by the amount and diversity of data, and the complexity of models which increase at a much faster pace. This is particularly relevant in emerging advanced applications where multiple data sources and models need to be integrated in a dynamic fashion.
The Queensland Environmental Protection Agency (the EPA) is a partner in this project, providing historical spatial data, real-time spatial data from air and water quality monitoring sensors and other data capturing devices, and simulation data from predicative modelling services which need to be shared among environmental scientists, other scientific communities and the general public. The case study focuses on an application within the EPA called WildNet. WildNet is a database that documents scientific information for the State's animals, including rare and threatened species. It maintains a large store of ecological data which depends heavily on other services and is itself a service to other applications. The primary current means for entering data into WildNet is via distributed human observers, there are compelling needs to support input from mobile phone technology and radio tracking of animals. Complex spatial analysis is performed to generate geospatial services, as well complex simulation to generate predictive distributions of species over space and time. These later simulations require expert ellicitation from ecologists to validate and adjust predictions. Visualisation is an important component of the elicitation task involving spatial modellers, ecologists and community wildlife groups. Predictive simulations are operated in a workshop environment with researchers distributed at different locations.
We aim to develop a collaborative way to develop and manage metadata, and to support semi-automatic creation of metadata (such that legacy datasets can be eventually calibrated using adequate metadata by a community effort). While the user of this system may not be aware of the Semantic Web and Semantic Grid, the system we develop will comply with the standards to ensure our practical solution is compatible with the mainstream effort in this area.
Many existing tools which support eResearch are concerned about coarse-grained data sharing. Fine-grained data sharing is essential when exchanging a large dataset is too costly or the data custodian is unwilling the share large parts of the data. We build on SRB to enrich its metadata catalogue (MCAT) and employ distributed database technologies to identify a balanced data granularity and efficient strategies for fine-grained data access.
This resarch involves investigating coarse-grained parallel processing, by high level decomposition, with the goal of parallelising existing complex models without re-implementing them.