The University of Queensland Homepage
School of ITEE ITEE Main Website

 DASFAA2008 Workshop on Data Quality in Collaborative Information Systems


 


Managing Data Quality in
Collaborative Information Systems

Workshop in Conjunction with DASFAA 2008
13th International Conference on Database Systems for Advanced Applications
19-22nd March, 2008, New Delhi, India

 


Important Dates

Submission Details

Program Committee

Workshop Program

 

 

 

 

 

 

 

 

 

 

 

 

 

Download Workshop Program

Keynote Speaker
Divesh Srivastava, AT&T Labs-Research

Title: The Bellman data quality browser.

Abstract: Data quality is a serious concern in complex industrial-scale databases, which often have thousands of tables and tens of thousands of columns.  Commonly encountered problems include duplicates and default values in columns treated as keys, data inconsistencies, and poor quality join paths.  Compounding the data quality problems are incomplete and out-of-date metadata about the database and the processes used to populate the database. These problems make the task of analyzing data particularly challenging.  The Bellman data quality browser has been built to effectively address such problems.  Bellman profiles the database and computes concise statistical summaries of the contents of the database to identify approximate keys, frequent values of a field (often default values), joinable fields, and to understand database dynamics (changes in a database over time). In this talk, I'll describe the technology underlying Bellman and how it is used to help make sense of complex databases.

 

Accepted Papers

Assessing Data Quality within Available Context
Jingyu Han, Dawei Jiang, and Zhiming Ding

CHARIOT: A Comprehensive Data Integration and Quality Assurance Model for Agro-Meteorological Data
Mark Anthony F. Mateo and Carson Kai-Sang Leung

An Approach to Cadastral Map Quality Evaluation in Republic of Latvia
Anita Jansone

Data Quality for Decision support – The Indian Banking Scenario
Hemalatha Diwakar, Alka Vaidya

Invited Papers

Genomic Information Quality
Qing Liu, Xuemin Lin

DeepDetect: An Extensible System for Detecting Attribute Outliers & Duplicates in XML
Qiangfeng Peter Lau, Wynne Hsu, Judice L. Y. Koh, and Mong Li Lee

Call for Papers

Poor data quality is known to compromise the credibility and efficiency of commercial as well as public endeavours. Several developments from industry as well as academia have contributed significantly towards addressing the problem. These typically include analysts and practitioners who have contributed to the design of strategies and methodologies for data governance; solution architects including software vendors who have contributed towards appropriate system architectures that promote data integration and; and data experts who have contributed to data quality problems such as duplicate detection, identification of outliers, consistency checking and many more through the use of computational techniques. The attainment of true data quality lies at the convergence of the three aspects, namely organizational, architectural and computational.

At the same time, importance of managing data quality has increased manifold in today's global information sharing environments, as the diversity of sources, formats and volume of data grows. In this workshop we target data quality in the light of collaborative information systems where data creation and ownership is increasingly difficult to establish. Collaborative settings are evident in enterprise systems, where partner/customer data may pollute enterprise data bases raising the need for data source attribution, as well as in scientific applications, where data lineage across long running collaborative scientific processes needs to be established.

Collaborative settings thus warrant a pipeline of data quality methods and techniques that commence with (source) data assessment, data cleansing, methods for sustained quality, integration and linkage, and eventually ability for audit and attribution.

The workshop will provide a forum to bring together diverse researchers and make a consolidated contribution to new and extended methods to address the challenges of data quality in collaborative settings. Topics covered by the workshop include at least the following:

Data linkage and fusion
Entity resolution, duplicate detection, and consistency checking
Data profiling and preparation
Use of data mining for data quality assessment
Methods for data transformation, reconciliation, consolidation
Data lineage and provenance
Models, frameworks, methodologies and metrics for data quality
Application specific data quality, case studies, experience reports

Submitted papers will be evaluated on the basis of significance, originality, technical quality, and exposition. Papers should clearly establish the research contribution, and relation to previous research. Position and survey papers are also welcome.

Workshop Format

The full day workshop will consist of oral presentations, discussions, and invited talks. The workshop will also provide opportunity for demo sessions, where presenters can showcase advanced prototypes based on their research where applicable.

Submission of Papers

Authors should submit papers reporting original works that are currently not under review or published elsewhere. The paper should be submitted as email attachment to shazia@itee.uq.edu.au in PDF format, with maximum length fifteen (15) pages, following Springer-Verlag's LNCS manuscript submission guidelines, available at http://www.springer.de/comp/lncs/authors.html.

Workshop Proceedings

The workshop proceedings are to be published as part of Springer Verlag's Lecture Notes in Computer Science series.  After the workshop, selected papers may have an opportunity to be invited to submit an extended version of the paper to an International Journal.

Important Dates

November 23, 2007            Submission of Papers
December 12, 2007
           Notification to Authors
December 21, 2007
           All Camera-Ready Copies Due
March 19, 2008
                   Workshop

Program Committee

Nick Koudas, University of Toronto
Markus Helfert, Dublin City University
Floris Geerts, University of Edinburgh
Xuemin Lin, University of New South Wales
Rosanne Price, The University of Melbourne
Wasim Sadiq, SAP Research
Graeme Shanks, University of Melbourne
Diane Strong, Worcester Polytechnic Institute
Kerry Taylor, CSIRO ICT Centre
Harry Zhu, Old Dominion University

Workshop Organizers

Shazia Sadiq, Xiaofang Zhou
School of Information Technology and Electrical Engineering
The University of Queensland, Brisbane, Australia

Inquiries: shazia@itee.uq.edu.au