IJCAI 2011 Tutorial: Parallel Data Mining on Multicores

Table of Contents

Short presentation

The tutorial Parallel Data Mining on Multicores aims at providing a panorama of the interest of parallelism for data mining algorithms, with a heavy focus on frequent pattern mining.

After an introduction by a specialist of the domain of parallelism, the tutorial will provide a general presentation of parallel data mining algorithms, with a review of existing works and of today's challenges. The tutorial will end with a specialized talk, delving deeper on parallel pattern mining.

Description of the tutorial

Data mining consists in extracting "valid, novel, potentially usefull and ultimately understandable" patterns in data (Fayyad, 96). It relies on applying complex and time consumming algorithms to the data in order to extract patterns of interest. Nowadays, the volume of data to handle is huge, and the patterns to extract are more and more complex. Thus, data mining solutions can hardly scale to real world data.

Since 2005, physical limits have prevented further frequency increases in processors, cancelling the possible performance increases coming with these frequency increases. However the number of transistors on a die continues to grow, which lead to a new generation of processors having multiple computation cores. Exploiting optimally these processors is done through the writing of parallel programs, which can be difficult, especially regarding memory management.

Data mining researchers, and especially specialists of frequent pattern mining, always in need of more computing power, have started investigating new algorithms dedicated for multi-core processors (Buehrer 06, Lucchese 07, Tatikonda 09). Analyzing their first results suggests that using multicore processors can indeed be exploited in order to drastically improve the performance of data mining algorithms. This comes as a price of changing the way one thinks about how the data mining algorithms are designed, as many "rules of thumb" from the sequential era lead to catastrophic parallel performance.

Outline of the tutorial

The tutorial will be organized in three steps:

  1. General introduction on multi-core parallelism
  2. Overview of parallel data mining
  3. Algorithmic solutions for efficient parallel pattern mining

The introduction on parallelism will be delivered by Marc Snir, a specialist of the domain, who is not from the data mining/machine learning domain. The parallel data mining overview will be presented by the tutorial organizers, while the detailed presentation of efficient solutions for parallel pattern mining will be given by one of the most recognized researchers on the field, Shirish Tatikonda, author of a seminal paper on the field in VLDB 2009.

Target audience

The tutorial aims at gathering people who have interest in discovering how parallelism can help them mining their data. The target audience will thus consist mainly of data mining researchers and practitioners. People interested in discovering parallelism and parallel algorithms should also be interested in this tutorial.

The prerequisites are a basic knowledge of data mining algorithms, especially frequent pattern mining algorithms. Prior knowledge of parallelism is not required.

Interest to IJCAI audience

Researchers having interest in data mining, either as a research topic or as a tool, constitute a significative portion of IJCAI audience. These researchers will be directly interested by this tutorial.

On a broader scope, this tutorial will discuss about parallelization of complex and often irregular algorithms. Such kind of algorithms appear in many domains of AI, so a lot of researchers interested in discovering the challenges of parallelizing complex algorithms should also be appealed by this tutorial.

Presenters

Marc Snir

Professor Marc Snir is Michael Faiman and Saburo Muroga Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign and has a courtesy appointment in the Graduate School of Library and Information Science. He currently pursues research in parallel computing. He is Associate Director for Extreme Scale Computing at NCSA, co-PI for petascale Blue Waters system and co-director of the Intel and Microsoft funded Universal Parallel Computing Research Center (UPCRC).

He was head of the Computer Science Department from 2001 to 2007. Until 2001 he was a senior manager at the IBM T. J. Watson Research Center where he led the Scalable Parallel Systems research group that was responsible for major contributions to the IBM SP scalable parallel system and to the IBM Blue Gene system.

Shirish Tatikonda

Shirish Tatikonda is one of the most recognized young researchers on parallel data mining, especially in the specific domain of semi-structured data and tree mining. He has published in the major conferences in this domain (e.g. ICDE, VLDB, SIGIR, CIKM). Having recently joined IBM Almaden, Shirish Tatikonda has also a good experience in teaching, for example for a course on "Introduction to High-Performance Computing".

Anne Laurent

Anne Laurent has been Assistant Professor at the LIRMM lab since September 2003. As a member of the TATOO group, she works on data mining, sequential pattern mining, tree mining, both for trends and exceptions detections and is particularly interested in the study of the use of fuzzy logic to provide more valuable results, while remaining scalable.

Alexandre Termier

Alexandre Termier research area is the efficient mining of closed frequent patterns. He has produced several algorithms to mine semi-structured data in order to find tree or DAG (Directed Acyclic Graph) patterns. In order to improve mining time, he is currently working on parallel pattern mining algorithms, especially through the work of his PhD student Benjamin Négrevergne.

Contact

Alexandre Termier
first name [dot] last name [at] imag.fr
LIG (Laboratoire d'Informatique de Grenoble)
Université Joseph Fourier
681 rue de la Passerelle
B.P. 72, 38402 Saint Martin d'Hères
FRANCE
Phone: +33 4 76 82 72 07
Fax: +33 4 76 82 72 87
http://membres-liglab.imag.fr/termier/

Last modified:

Date: 2011-05-11 17:09:25 CEST

Org version 7.5 with Emacs version 23

Validate XHTML 1.0