|
|
Workshop
in Conjunction with DASFAA 2008
|
|
|
Keynote Speaker Title: The Bellman data quality browser. Abstract: Data quality is a serious concern
in complex industrial-scale databases, which often have thousands of tables
and tens of thousands of columns.
Commonly encountered problems include duplicates and default values in
columns treated as keys, data inconsistencies, and poor quality join
paths. Compounding the data quality
problems are incomplete and out-of-date metadata about the database and the
processes used to populate the database. These problems make the task of
analyzing data particularly challenging.
The Bellman data quality browser has been built to effectively address
such problems. Bellman profiles the
database and computes concise statistical summaries of the contents of the
database to identify approximate keys, frequent values of a field (often
default values), joinable fields, and to understand database dynamics
(changes in a database over time). In this talk, I'll describe the technology
underlying Bellman and how it is used to help make sense of complex
databases. Accepted Papers Assessing
Data Quality within Available Context CHARIOT: A
Comprehensive Data Integration and Quality Assurance Model for
Agro-Meteorological Data An Approach
to Cadastral Map Quality Evaluation in Republic of Latvia Data
Quality for Decision support – The Indian Banking Scenario Invited Papers Genomic
Information Quality DeepDetect:
An Extensible System for Detecting Attribute Outliers & Duplicates in XML Call for Papers Poor data quality is known to compromise the credibility and efficiency of commercial as well as public endeavours. Several developments from industry as well as academia have contributed significantly towards addressing the problem. These typically include analysts and practitioners who have contributed to the design of strategies and methodologies for data governance; solution architects including software vendors who have contributed towards appropriate system architectures that promote data integration and; and data experts who have contributed to data quality problems such as duplicate detection, identification of outliers, consistency checking and many more through the use of computational techniques. The attainment of true data quality lies at the convergence of the three aspects, namely organizational, architectural and computational. At the same time, importance of managing data quality has increased manifold in today's global information sharing environments, as the diversity of sources, formats and volume of data grows. In this workshop we target data quality in the light of collaborative information systems where data creation and ownership is increasingly difficult to establish. Collaborative settings are evident in enterprise systems, where partner/customer data may pollute enterprise data bases raising the need for data source attribution, as well as in scientific applications, where data lineage across long running collaborative scientific processes needs to be established. Collaborative settings thus warrant a pipeline of data quality methods and techniques that commence with (source) data assessment, data cleansing, methods for sustained quality, integration and linkage, and eventually ability for audit and attribution. The workshop will provide a forum to bring together diverse researchers and make a consolidated contribution to new and extended methods to address the challenges of data quality in collaborative settings. Topics covered by the workshop include at least the following: Data
linkage and fusion Submitted papers will be evaluated on the basis of significance, originality, technical quality, and exposition. Papers should clearly establish the research contribution, and relation to previous research. Position and survey papers are also welcome. Workshop Format The full day workshop will consist of oral presentations, discussions, and invited talks. The workshop will also provide opportunity for demo sessions, where presenters can showcase advanced prototypes based on their research where applicable. Authors should submit papers reporting original works that are currently not under review or published elsewhere. The paper should be submitted as email attachment to shazia@itee.uq.edu.au in PDF format, with maximum length fifteen (15) pages, following Springer-Verlag's LNCS manuscript submission guidelines, available at http://www.springer.de/comp/lncs/authors.html. Workshop Proceedings The workshop proceedings are to be published as part of Springer Verlag's Lecture Notes in Computer Science series. After the workshop, selected papers may have an opportunity to be invited to submit an extended version of the paper to an International Journal. November
23, 2007 Submission
of Papers Nick
Koudas, University of Toronto Workshop Organizers Shazia Sadiq, Xiaofang Zhou |

