The University of Queensland Homepage
School of ITEE ITEE Main Website

 Scamseek: Automatic Identification of Financial Scams on the Internet

Scamseek: Automatic Identification of Financial Scams on the Internet

Speaker: Jon Patrick (University of Sydney)

When: 10:00, Friday, 13 May 2005

Venue: 78-420

The Scamseek project had the principal objective of building an industrially viable system that retrieves scam candidate texts from the Internet and classifies them as to their potential risk of containing an illegal investment proposal or advice. The system was commissioned by the Australian Securities & Investment Commission (ASIC) and the value of the system is the gain of significant time and efficiency savings for the human analysts. The project was developed in two stages over 15 months and produced multiple classifiers for three different types of data, achieved higher than expected performance statistics on classifications, was completed on time, and under budget. The development of the system required the solution of two major problems in document classification, namely accurate identification of classes with very small footprints, most less than <1% of the corpus, and classification using meaning intention rather than word strings. The approach taken used Systemic Functional Grammar, to model the semantics of the scam classes and used unigrams with significant language pre-processing to assist in separating irrelevant documents. ASIC can operate the system on a 24/7 basis. Litigations have been initiated by ASIC from classifications made by the very first production run of the system. The estimate of savings in human analyst effort in its monitoring role is the order of 100-fold. The estimate in savings to the community by bringing speedier detection and intervention of scams cannot be estimated readily but is likely to be of the order of tens of millions of dollars per annum. A number of software engineering issues for efficient generation of language processing systems for a commercial context will also be addressed.

The Scamseek project is the largest computational linguistics research project conducted in Australia with a total budget of $2.2M. It was commissioned by the Australian Securities & Investment Commission (ASIC) and funded through the University of Sydney, Macquarie University, ASIC, Capital Markets CRC and AC3.

Speaker's biography:

Jon Patrick holds the Chair of Language Technology at the University of Sydney. He has worked on the computation of language since the early 1980s when he built the first systems to capture in real time verbal descriptions of team sports. This work was later expanded to be a generic system for any behavioural events. He has subsequently tackled wider problems of capturing meta-descriptions of behavioural events especially in the field of psychotherapy, including commentary by experts on therapist training sessions. He is a registered psychologist and has practiced as a therapist. More recently he has studied the Basque language and produced the first comprehensive student reference text of Basque grammar in English. He has researched the automated use of static resources, such as dictionaries and grammar descriptions in the training systems for second language learning. He now concentrates on developing methods for using Systemic Functional Grammar in the analyses of the meanings in texts.

 

Hospitality: Phil Cook

Contact: Phil Cook (SSE seminar co-ordinator) (philc@itee.uq.edu.au)

SSE seminar web page: http://www.itee.uq.edu.au/~sse/Seminars.html