Publications from 2019 to 2022

    .
  • Identifying Privacy Risks Raised by Utility Queries. Hira Asghar, Christophe Bobineau and Marie-Christine Rousset Proceedings of WISE 2022 (23rd International Conference on Web Information Systems Engineering). October 2022, Biarritz, France.

    Personal data are increasingly disseminated over the Web through mobile devices and smart environments, and are exploited for developing more and more sophisticated services and applications. All these advances come with serious risks for privacy breaches that may reveal private information wanted to remain undisclosed by data producers. It is therefore of utmost importance to help them to identify privacy risks raised by requests of service providers for utility purposes. In this paper, we first formalize privacy risks by privacy queries expressed (and kept secret) by each data producer to specify the data they do not want to be disclosed. Then, we develop a formal approach for detecting incompatibility between privacy and utility queries expressed as temporal aggregate conjunctive queries. The distinguishing point of our approach is to be data-independent and to come with an explanation based on the query expressions only. This explanation is intended to help data producers understand the detected privacy breaches and guide their choice of the appropriate technique to correct it.

  • PARTICUL: Part Identification with Confidence measure using Unsupervised Learning Romain Xu-Darme, Georges Quénot, Zakaria Chihani and Marie-Christine Rousset Proceedings of 2nd workshop on Explainable and Ethical AI, associated with ICPR 2022 (26th International Conference on Pattern Recognition). August 2022, Montréal.

    In this paper, we present PARTICUL, a novel algorithm for unsupervised learning of part detectors from datasets used in fine-grained recognition. It exploits the macro-similarities of all images in the training set in order to mine for recurring patterns in the feature space of a pre-trained convolutional neural network. We propose new objective functions enforcing the locality and unicity of the detected parts. Additionally, we embed our detectors with a confidence measure based on correlation scores, allowing the system to estimate the visibility of each part. We apply our method on two public fine-grained datasets (Caltech-UCSD Bird 200 and Stanford Cars) and show that our detectors can consistently highlight parts of the object while providing a good measure of the confidence in their prediction. We also demonstrate that these detectors can be directly used to build part-based fine-grained classifiers that provide a good compromise between the transparency of prototype-based approaches and the performance of non-interpretable methods.

  • CONSTRUCT Queries Performance on a Spark-Based Big RDF Triplestore Adam Sanchez-Ayte, Fabrice Jouanotand Marie-Christine Rousset Proceedings of ESWC 2022 (The Semantic Web - 19th International Conference). June 2022, Crete, Greece.

    Despite their potential, CONSTRUCT queries have gained little attraction so far among data practitioners, vendors and researchers. In this paper, we first exhibit performance bottlenecks of existing triplestores for supporting CONSTRUCT queries over large knowledge graphs. Then, we describe a novel Spark-based architecture for big triplestores, called TESS, that we have designed and implemented to overcome the above limitations by using parallel computing. TESS ensures ACID properties that are required for a sound and complete implementation of CONSTRUCT-based forward-chaining rules reasoning

  • IOPE: Interactive Ontology Population and Enrichment Guided by Ontological Constraints. Shadi Baghernezhad-Tabasi, Loïc Druette, Fabrice Jouanot, Celine Meurger, and Marie-Christine Rousset Proceedings of WISE 2021 (22nd International Conference on Web Information Systems Engineering). October 2021, Melbourne, Australia.

    Specialized ontologies are constructed to capture the skills of experienced experts, with the goal of sharing them with a larger community of trainees and less experienced experts in the domain. A challenging task in the engineering pipeline of specialized ontologies is ontology updating, which encompasses enrichment and population. Domain experts require an understanding of the RDF notation and OWL semantics to update their ontologies, which is not often the case. In this paper, we present IOPE, an interactive framework for the automatic construction of a Graphical User Interface (GUI) to support the controlled update process, by enabling the experts to easily interact with their ontology. We contribute a set of "mapping rules" to transform the ontological constraints into interactive widgets organized in the form of pre-filled Web pages in the GUI, and a set of "binding rules" to transform the expert interactions into RDF graphs, and hence perform the updates. In an extensive set of experiments, we illustrate the ecacy of IOPE in empowering medical experts to update their specialized ontology.

  • RDF graph anonymization robust to data linkage Remy Delanaux, Angela Bonifati, Marie-Christine Rousset and Romuald Thion. Proceedings of WISE 2019 (20th International Conference on Web Information Systems Engineering). January 2020, Hong Kong.

    Privacy is a major concern when publishing new datasets in the context of Linked Open Data (LOD). A new dataset published in the LOD is indeed exposed to privacy breaches due to the linkage to objects already present in the other datasets of the LOD. In this paper, we focus on the problem of building \emph{safe} anonymizations of an RDF graph to guarantee that linking the anonymized graph with any external RDF graph will not cause privacy breaches. Given a set of privacy queries as input, we study the data-independent safety problem and the sequence of anonymization operations necessary to enforce it. We provide sufficient conditions under which an anonymization instance is safe given a set of privacy queries. Additionally, we show that our algorithms for RDF data anonymization are robust in the presence of \texttt{sameAs} links that can be explicit or inferred by additional knowledge.

  • OntoSIDES: Ontology-based student progress monitoring on the national evaluation system of French Medical Schools. Olivier Palombi, Fabrice Jouanot, Nafissetou Nziengam, Behrooz Omidvar-Tehrani, Marie-Christine Rousset, Adam Sanchez. Artificial Intelligence in Medicine . Elsevier, 2019, 96, pp.59-67.

    We introduce OntoSIDES, the core of an ontology-based learning management system in Medicine, in which the educational content, the traces of students' activities and the correction of exams are linked and related to items of an official reference program in a unified RDF data model. OntoSIDES is an RDF knowledge base comprised of a lightweight domain ontology that serves as a pivot high-level vocabulary of the query interface with users, and of a dataset made of factual statements relating individual entities to classes and properties of the ontology. Thanks to an automatic mapping-based data materialization and rule-based data saturation, OntoSIDES contains around 8 millions triples to date, and provides an integrated access to useful information for student progress monitoring, using a powerful query language (namely SPARQL) allowing users to express their specific needs of data exploration and analysis. Since we do not expect end-users to master the raw syntax of SPARQL and to express directly complex queries in SPARQL, we have designed a set of parametrized queries that users can customize through a user-friendly interface.

    Publications from 2018

  • Query-based Linked Data Anonymization. Remy Delanaux, Angela Bonifati, Marie-Christine Rousset and Romuald Thion. Proceedings of ISWC 2018 (17th International Semantic Web Conference) . Octobrer 2018, Monterey, USA.

    We introduce and develop a declarative framework for privacy- preserving Linked Data publishing in which privacy and utility policies are speci ed as SPARQL queries. Our approach is data-independent and leads to inspect only the privacy and utility policies in order to deter- mine the sequence of anonymization operations applicable to any graph instance for satisfying the policies. We prove the soundness of our algo- rithms and gauge their performance through experiments.

    Publications from 2017

  • Datalog revisited for reasoning in Linked Data. Marie-Christine Rousset, Manuel Atencia, Jerome David, Fabrice Jouanot, Olivier Palombi and Federico Ulliana. Proceedings of Reasoning Web. Semantic Interoperability on the Web - 13th International Summer School 2017. July 2017, London, UK, pages 121-166.

    Linked Data provides access to huge, continuously growing amounts of open data and ontologies in RDF format that describe entities, links and properties on those entities. Equipping Linked Data with inference paves the way to make the Semantic Web a reality. In this survey, we describe a unifying framework for RDF ontologies and databases that we call deductive RDF triplestores. It consists in equipping RDF triplestores with Datalog inference rules. This rule language allows to capture in a uniform manner OWL constraints that are useful in practice, such as property transtivity or symmetry, but also domain-specific rules with practical relevance for users in many domains of interest. The expressivity and the genericity of this framework is illustrated for modeling Linked Data applications and for developing inference algorithms. In particular, we show how it allows to model the problem of data linkage in Linked Data as a reasoning problem on possibly decentralized data.We also explain how it makes possible to efficiently extract expressive modules from Semantic Web ontologies and databases with formal guarantees, whilst effectively controlling their succinctness. Experiments conducted on real-world datasets have demonstrated the feasibility of this approach and its usefulness in practice for data integration and information extraction

    Publications from 2016

  • Uncertainty-Sensitive Reasoning for Inferring sameAs Facts in Linked Data Mustafa Al-Bakri, Manuel Atencia, Jerome David, Steffen Lalande and Marie-Christine Rousset. Proceedings of ECAI 2016 (22sd European Conference on Artificial Intelligence) . August 2016, The Hague, Holland. 9 pages (double-column).

    Discovering whether or not two URIs described in Linked Data, in the same or different RDF datasets, refer to the same real-world entity is crucial for building applications that exploit the cross-referencing of open data. A major challenge in data interlinking is to design tools that effectively deal with incomplete and noisy data, and exploit uncertain knowledge. In this paper, we model data interlinking as a reasoning problem with uncertainty. We introduce a probabilistic framework for modelling and reasoning over uncertain RDF facts and rules that is based on the semantics of probabilistic Datalog. We have designed an algorithm, ProbFR, based on this framework. Experiments on real-world datasets have shown the usefulness and effectiveness of our approach for data linkage and disambiguation.

  • Ontology-Mediated Queries for NOSQL Databases. Marie-Laure Mugnier, Marie-Christine Rousset and Federico Ulliana. Proceedings of AAAI 2016 (30th Conference on Artificial Intelligence) . February 2016, Phoenix, Arizona. 7 pages (double-column).

    Ontology-Based Data Access has been studied so far for relational structures and deployed on top of relational databases. This paradigm enables a uniform access to heterogeneous data sources, also coping with incomplete information. Whether OBDA is suitable also for non-relational structures, like those shared by increasingly popular NOSQL languages, is still an open question. In this paper, we study the problem of answering ontology-mediated queries on top of key-value stores. We formalize the data model and core queries of these systems, and introduce a rule language to express lightweight ontologies on top of data. We study the decidability and data complexity of query answering in this setting.

  • TopPI: An Efficient Algorithm for Item-Centric Mining. Martin Kirchgessner, Vincent Leroy, Alexandre Termier, Sihem Amer-Yahia and Marie-Christine Rousset. Proceedings of DaWaK 2016 (8th International Conference on Big Data Analytics and Knowledge Discovery) . September 2016, Porto, Portugal. 15 pages.

    We introduce TopPI, a new semantics and algorithm designed to mine long-tailed datasets. For each item, and regardless of its frequency, TopPI finds the k most frequent closed itemsets that item belongs to. For example, in our retail dataset, TopPI finds the itemset " nori seaweed, wasabi, sushi rice, soy sauce " that occurrs in only 133 store receipts out of 290 million. It also finds the itemset " milk, puff pastry " , that appears 152,991 times. Thanks to a dynamic threshold adjustment and an adequate pruning strategy, TopPI efficiently traverses the relevant parts of the search space and can be parallelized on multi-cores. Our experiments on datasets with different characteristics show the high performance of TopPI and its superiority when compared to state-of-the-art mining algorithms. We show experimentally on real datasets that TopPI allows the analyst to explore and discover valuable itemsets.

    Publications from 2015

  • My Corporis Fabrica Embryo: An ontology-based 3D spatio-temporal modeling of human embryo development Pierre-Yves Rabattu, Benoit Mass\'e, Federico Ulliana, Marie-Christine Rousset, Damien Rohmer, Jean-Claude L\'eon and Olivier Palombi. Journal of Biomedical Semantics 2015, 6:36. September 2015, 15 pages

    In this work, we present a novel ontology describing the development of the human embryology deforming 3D models. Beyond describing how organs and structures are composed, our ontology integrates a procedural description of their 3D representations, temporal deformation and relations with respect to their developments. We also created inferences rules to express complex connections between entities. It results in a unified description of both the knowledge of the organs deformation and their 3D representations enabling to visualize dynamically the embryo deformation during the Carnegie stages. Through a simplified ontology, containing representative entities which are linked to spatial position and temporal process information, we illustrate the added-value of such a declarative approach for interactive simulation and visualization of 3D embryos.

  • Extracting Bounded-level Modules from Deductive RDF Triplestores Marie-Christine Rousset and Federico Ulliana. Proceedings of AAAI 2015 (29th Conference on Artificial Intelligence) . January 2015, Austin, Texas. 7 pages (double-column).

    We present a novel semantics for extracting bounded-level modules from RDF ontologies and databases augmented with safe inference rules, \`a la Datalog. Dealing with a recursive rule language poses challenging issues for defining the module semantics, and also makes module extraction algorithmically unsolvable in some cases. Our results include a set of module extraction algorithms compliant with the novel semantics. Experimental results show that the resulting framework is effective in extracting expressive modules from RDF datasets with formal guarantees, whilst controlling their succinctness.

  • Inferring same-as facts from Linked Data: an iterative import-by-query approach Mustafa Al-Bakri, Manuel Atencia, Steffen Lalande, Marie-Christine Rousset. Proceedings of AAAI 2015 (29th Conference on Artificial Intelligence) .January 2015, Austin, Texas. 7 pages (double-column).

    In this paper we model the problem of data linkage in Linked Data as a reasoning problem on possibly decentralized data. We describe a novel import-by-query algorithm that alternates steps of sub-query rewriting and of tailored querying the Linked Data cloud in order to import data as specific as possible for inferring or contradicting given target same-as facts. Experiments conducted on a real-world dataset have demonstrated the feasibility of this approach and its usefulness in practice for data linkage and disambiguation.

    Publications from 2014

  • My Corporis Fabrica: an ontology-based tool for reasoning and querying on complex anatomical models. Olivier Palombi, Federico Ulliana, Valentin Favier, Jean-Claude Leon and Marie-Christine Rousset. Journal of Biomedical Semantics 2014, 5:20. May 2014, 13 pages Open Access

    First, we provide a new ontology of anatomy called My Corporis Fabrica (MyCF), which conforms to FMA but extends it by making explicit how anatomical structures are composed, how they contribute to functions, and also how they can be related to 3D complex objects. Second, we have equipped MyCF with automatic reasoning capabilities that enable model checking and complex queries answering. We illustrate the added-value of such a declarative approach for interactive simulation and visualization as well as for teaching applications.

  • Semantic Filtering of Scientific Articles guided by a Domain Ontology Fabrice Jouanot, Cyril Labbe, Elena Michael, Marie-Christine Rousset, Maithe Tauber and Federico Ulliana. Proceedings of ICMLDA'14 (International Conference on Machine Learning and Data Analysis). October 2014, 6 pages

    The problem that we address in this paper is how to improve the accuracy of retrieving specialized information within a textual scientific corpus. We present a new approach in which the keywords expressing the bibliographical needs of a researcher are related to a domain ontology. We illustrate how such a declarative ontolology-based approach can be used both for computing varied statistics, and also for helping experts to find useful fine-grained information within a textual corpus.

    Publications from 2013

  • Efficiently Rewriting Large Multimedia Application Execution Traces with few Event Sequences Christiane Kamdem Kengne, Leon Fopa, Alexandre Termier, Noha Ibrahim, Marie-Christine Rousset, Takashi Washio and Miguel Santana. KDD 2013 (19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining), Industrial Track, Chicago, August 2013.

    The analysis of multimedia application traces can reveal important information to enhance program execution comprehension. However typical size of traces can be in gigabytes, which hinders their effective exploitation by application developers. In this paper, we study the problem of finding a set of sequences of events that allows a reduced-size rewriting of the original trace. These sequences of events, that we call blocks, can simplify the exploration of large execution traces by allowing application developers to see an abstraction instead of low-level events. The problem of computing such set of blocks is NP-hard and naive approaches lead to prohibitive running times that prevent analysing real world traces. We propose a novel algorithm that directly mines the set of blocks. Our experiments show that our algorithm can analyse real traces of up to two hours of video. We also show experimentally the quality of the set of blocks proposed, and the interest of the rewriting to understand actual trace data.

  • Trust in Networks of Ontologies and Alignments Manuel Atencia, Mustafa Al Bakri and Marie-Christine Rousset. Knowledge and Information Systems, Volume 37, number 3 , December 2013, 28 pages.

    In this paper we introduce a mechanism of trust adapted to semantic peer-to-peer (P2P) networks in which every peer is free to organize its local resources as instances of classes of its own ontology. Peers use their ontologies to query other peers, and alignments between peers' ontologies make it possible to reformulate queries from one local peer's vocabulary to another. Alignments are typically the result of manual or (semi-)automatic ontology matching. However, resulting alignments may be unsound and/or incomplete, and, therefore, query reformulation based on alignments may lead to unsatisfactory answers. Trust can assist peers to select the peers in the network that are better suited to answer their queries. In our model, the trust that a peer has towards another peer depends on a specific query, and it represents the probability that the latter peer will provide a satisfactory answer to the query. In order to compute trust, we perform Bayesian inference that exploits ontologies, alignments and user feedback. We have implemented our method and conducted an evaluation. Experimental results show that trust values converge as more queries are sent and answers received. Furthermore, when query-answering is guided by trust the quality of peers' answers, measured with precision and recall, is improved.

  • PARAMINER: a generic pattern mining algorithm for multi-core architectures. Benjamin Negrevergne, Alexandre Termier, Marie-Christine Rousset and Jean-Francois Mehaut. Data Mining and Knowledge Discovery , March 2013, 41 pages.

    In this paper, we present ParaMiner which is a generic and parallel algorithm for closed pattern mining. Para Miner is built on the principles of pattern enumeration in strongly accessible set systems. Its efficiency is due to a novel dataset reduction technique (that we call EL-reduction), combined with novel technique for performing dataset reduction in a parallel execution on a multi-core architecture. We illustrate Para Miner’s genericity by using this algorithm to solve three different pattern mining problems: the frequent itemset mining problem, the mining frequent connected relational graphs problem and the mining gradual itemsets problem. In this paper, we prove the soundness and the completeness of Para Miner. Furthermore, our experiments show that despite being a generic algorithm, Para Miner can compete with specialized state of the art algorithms designed for the pattern mining problems mentioned above. Besides, for the particular problem of gradual itemset mining, Para Miner outperforms the state of the art algorithm by two orders of magnitude.

  • Robust Module-based Data Management. Francois Goasdoue and Marie-Christine Rousset. IEEE Transactions on Knowledge and Data Engineering , Volume 25, Issue 3, March 2013, pages 648-661

    The current trend for building an ontology-based data management system (DMS) is to capitalize on efforts made to design a preexisting well-established DMS (a reference system). The method amounts to extracting from the reference DMS a piece of schema relevant to the new application needs – a module –, possibly personalizing it with extra-constraints w.r.t. the application under construction, and then managing a dataset using the resulting schema. In this paper, we extend the existing definitions of modules and we introduce novel properties of robustness that provide means for checking easily that a robust module-based DMS evolves safely w.r.t. both the schema and the data of the reference DMS. We carry out our investigations in the setting of description logics which underlie modern ontology languages, like RDFS, OWL, and OWL2 from W3C. Notably, we focus on the DL-liteA dialect of the DL-lite family, which encompasses the foundations of the QL profile of OWL2(i.e., DL-liteR): the W3C recommendation for efficiently managing large datasets.

    Publications from 2012

  • Web Data Management, Serge Abiteboul, Ioana Manolescu, Philippe Rigaux, Marie-Christine Rousset, Pierre Senellart, book published by Cambridge University Press. .

    The ambition of this book is to cover the many facets of distributed data management on the Web. We will explain the foundations of the Web standard for data management, XML. We will travel in logical countries (e.g., description logic), that provide foundations for the Semantic Web that is emerging in modern data integration applications. We will show the beauty of software tools that everyone is already using today, for example Web search engines. And finally, we will explain the impressive machinery used nowadays to manipulate amount of data of unprecedented size. We are witnessing an emergence of a new, global information system created, explored, and shared by the whole humankind. The book aims at exposing the recent achievements that help make this system usable.

    Publications from 2011

  • Alignment-based trust for resource finding in semantic P2P networks. Manuel Atencia, Jerome Euzenat, Giuseppe Pirro and Marie-Christine Rousset. Proceedings of ISWC 2011 (10th International Semantic Web Conference) .

    In a semantic P2P network, peers use separate ontologies and rely on alignments between their ontologies for translating queries. Nonetheless, alignments may be limited |unsound or incomplete| and generate awed translations, leading to unsatisfactory answers. In this paper we present a trust mechanism that can assist peers to select those in the network that are better suited to answer their queries. The trust that a peer has towards another peer depends on a speci c query and represents the probability that the latter peer will provide a satisfactory answer. In order to compute trust, we exploit both alignments and peers' direct experience, and perform Bayesian inference.We have implemented our technique and conducted an evaluation. Experimental results showed that trust values converge as more queries are sent and answers received. Furthermore, the use of trust improves both precision and recall.

  • Discovery of Probabilistic Mappings between Taxonomies: Principles and Experiments Remi Tournaire, Jean-Marc Petit, Marie-Christine Rousset, and Alexandre Termier. Journal of Data Semantics (JoDS), Volume 15, pages 66-101.

    In this paper, we investigate a principled approach for defining and discovering probabilistic mappings between two taxonomies. First, we compare two ways of modeling probabilistic mappings which are compatible with the logical constraints declared in each taxonomy. Then we describe a generate and test algorithm which minimizes the number of calls to the probability estimator for determining those mappings whose probability exceeds a certain threshold. Finally, we provide an experimental analysis of this approach.In this paper, we investigate a principled approach for defining and discovering probabilistic mappings between two taxonomies. First, we compare two ways of modeling probabilistic mappings which are compatible with the logical constraints declared in each taxonomy. Then we describe a generate and test algorithm which minimizes the number of calls to the probability estimator for determining those mappings whose probability exceeds a certain threshold. Finally, we provide an experimental analysis of this approach.

    Publications from 2009

  • DL-LiteR in the Light of Propositional Logic for Decentralized Data Management Nada Abdallah, Francois Goasdoue and Marie-Christine Rousset. Proceedings of IJCAI 2009 (21th International Joint Conference on Artificial Intelligence .

    This paper provides a decentralized data model and associated algorithms for peer data management systems (PDMS) based on the DL-LiteR description logic. Our approach relies on reducing query reformulation and consistency checking for DL-LiteR into reasoning in propositional logic. This enables a straightforward deployment of DL-LiteR PDMSs on top of SomeWhere, a scalable propositional peer-to-peer inference system. We also show how to use the state-of-the-art Minicon algorithm for rewriting queries using views in DL-LiteR in the centralized and decentralized cases.

  • Combining a Logical and a Numerical method for Reference Reconciliation. Fatiha Sais, Nathalie Pernelle and Marie-Christine Rousset. Journal of Data Semantics, Volume 12, 2009 .

    The reference reconciliation problem consists in deciding whether different identifiers refer to the same data, i.e. correspond to the same real world entity. In this article we present a reference reconciliation approach which combines a logical method for reference reconciliation called L2R and a numerical one called N2R. This approach exploits the schema and data semantics, which is translated into a set of Horn FOL rules of reconciliation. These rules are used in L2R to infer exact decisions both of reconciliation and non-reconciliation. In the second method N2R, the semantics of the schema is translated in an informed similarity measure which is used by a numerical computation of the similarity of reference pairs. This similarity measure is expressed in a non linear equation system, which is solved by using an iterative method. The experiments of the methods made on two different domains, show good results for both recall and precision. They can be used separately or in combination. We have shown that their combination allows to improve runtime performance.

    Publications from 2008

  • DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm Alexandre Termier, Marie-Christine Rousset, Michèle Sebag, Kouzou Ohara, Takashi Washio, Hiroshi Motoda IEEE Transactions on Knowledge and Data Engineering, 20(2), pages 300-320 .

    In this paper we present a new tree mining algorithm, DryadeParent, based on the hooking principle first introduced in Dryade. In the experiments, we demonstrate that the branching factor and depth of the frequent patterns to find are key factors of complexity for tree mining algorithms. We show that DryadeParent outperforms the current fastest algorithm, CMTreeMiner,by orders of magnitude on data sets where the frequent tree patterns have a high branching factor.

  • A Probabilistic Trust Model for Semantic Peer to Peer Systems Gia-Hien Nguyen, Philippe Chatalic, Marie-Christine Rousset. Proceedings of EDBT'08 Workshop on Data Management in Peer-to-peer systems, pages 59-65..

    Semantic peer to peer (P2P) systems are fully decentralized overlay networks of people or machines (called peers) sharing and searching varied resources (documents, videos, photos, data, services) based on their semantic annotations using ontologies. They provide a support for the emergence of open and decentralized electronic social networks, in which no central or external authority can control the reliability of the peers participating to the network. This lack of control may however cause some of the results provided by some peers to be unsatisfactory, because of inadequate or obsolete annotations. In this paper, we propose a probabilistic model to handle trust in a P2P setting. Our model supports a local computation of the trust of peers into classes of other peers. We claim that our model is well appropriate to the dynamic of P2P networks and to the freedom of each peer within the network to have different viewpoints towards the peers with whom he/she interacts.

    Publications from 2007

  • L2R: a Logical method for Reference Reconciliation Fatiha Sais, Nathalie Pernelle, Marie-Christine Rousset. Proceedings of AAAI 2007 (22th Conference on Artificial Intelligence, pages 329-334..

    The reference reconciliation problem consists in deciding whether different identifiers refer to the same data, i.e., correspond to the same world entity. The L2R system exploits the semantics of a rich data model, which extends RDFS by a fragment of OWL-DL and SWRL rules. In L2R, the semantics of the schema is translated into a set of logical rules of reconciliation, which are then used to infer correct decisions both of reconciliation and no reconciliation. In contrast with other approaches, the L2R method has a precision of 100% by construction. First experiments show promising results for recall, and most importantly significant increases when rules are added.

  • SomeRDFS in the Semantic Web. Philippe Adjiman, Francois Goasdoué, Marie-Christine Rousset. Journal of Data Semantics (JoDS), Volume 8, pages 158-181.

    The Semantic Web envisions a world-wide distributed architecture where computational resources will easily inter-operate to coordinate complex tasks such as query answering. Semantic marking up of web resources using ontologies is expected to provide the necessary glue for making this vision work. Using ontology languages, (communities of) users will build their own ontologies in order to describe their own data. Adding semantic mappings between those ontologies, in order to semantically relate the data to share, gives rise to the Semantic Web: data on the web that are annotated by ontologies networked together by mappings. In this vision, the Semantic Web is a huge semantic peer data management system. In this paper, we describe the SomeRDFS peer data management systems that promote a "simple is beautiful" vision of the Semantic Web based on data annotated by RDFS ontologies.

  • Mining XML patternsLaurent Candillier, Ludovic Denoyer, Patrick Gallinari, Marie-Christine Rousset, Alexandre Termier, Anne-Marie Vercoustre. Book chapter, Data Mining Patterns: New Methods ans Applications . Pascal Poncelet, Maguelonne Teisseire, Florent Masseglia Editors.

    XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the increasing size of XML collections as information sources, mining techniques that traditionally exist for text collections or databases need to be adapted and new methods to be invented to exploit the particular structure of XML documents. Basically XML documents can be seen as trees, which are well known to be complex structures. This chapter describes various ways of using and simplifying this tree structure to model documents and support efficient mining algorithms. We focus on three mining tasks: classification and clustering which are standard for text collections; discovering of frequent tree structure which is especially important for heterogeneous collection. This chapter presents some recent approaches and algorithms to support these tasks together with experimental evaluation on a variety of large XML collections.

    Publications from 2006

  • SomeWhere: a scalable P2P infrastructure for querying distributed ontologies. Marie-Christine Rousset, Philippe Chatalic, Philippe Adjiman, Philippe Chatalic, Francois Goasdoué, Laurent Simon. Invited talk, Proceedings of ODBASE 2006 (5t International Conference on Ontologies Databases and Applications of Semantics), pages 698-703..

    In this invited talk, we present the SomeWhere approach and infrastructure for building semantic peer-to-peer data management systems based on simple personalized ontologies distributed at a large scale. Somewhere is based on a simple class-based data model in which the data is a set of resource identifiers (e.g., URIs), the schemas are (simple) definitions of classes possibly constrained by inclusion, disjunction or equivalence statements, and mappings are inclusion, disjunction or equivalence statements between classes of different peer ontologies. In this setting, query answering over peers can be done by distributed query rewriting, which can be equivalently reduced to distributed consequence finding in propositional logic. It is done by using the message-passing distributed algorithm that we have implemented for consequence finding of a clause w.r.t a set of distributed propositional theories. We summarize its main properties (soundness, completeness and termination), and we report experiments showing that it already scales up to a thousand of peers. Finally, we mention ongoing work on extending the current data model to RDF(S) and on handling possible inconsistencies between the ontologies of different peers.

  • Reasoning with Inconsistencies in Propositional Peer-to-Peer Inference Systems. Philippe Chatalic, Gia-Hien Nguyen, Marie-Christine Rousset. Proceedings of ECAI 2006 (European Conference on Artificial Intelligence, pages 352-357..

    In a peer-to-peer inference system, there is no centralized control or hierarchical organization: each peer is equivalent in functionality and cooperates with other peers in order to solve a collective reasoning task. Since peer theories model possibly different viewpoints, even if each local theory is consistent, the global theory may be inconsistent. We exhibit a distributed algorithm detecting inconsistencies in a fully decentralized setting. We provide a fully distributed reasoning algorithm, which computes only well-founded consequences of a formula, i.e., with a consistent set of support.

  • A Compact Representation for Least Common Subsumers in the description logic ALE. Chan Le Duc, Nhan Le Thanh, Marie-Christine Rousset. AI Communications, Volume 19, Number 3, pages 239-273..

    This paper introduces a compact representation which helps to avoid the exponential blow-up in space of the Least Common Subsumer (lcs) of two ALE-concept descriptions. Based on the compact representation we define a space of specific graphs which represents all ALE-concept descriptions including the lcs. Next, we propose an algorithm exponential in time and polynomial in space for deciding subsumption between concept descriptions represented by graphs in this space. These results provide better understanding of the double exponential blow-up of the approximation of ALC-concept descriptions by ALE-concept descriptions: double exponential size of the approximation in the ordinary representation is unavoidable in the worst case.

  • Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web. Philippe Adjiman, Philippe Chatalic, Francois Goasdoué, Marie-Christine Rousset, Laurent Simon Journal of Artificial Intelligence Research, Volume 25, pages 269-314..

    In a peer-to-peer inference system, each peer can reason locally but can also solicit some of its acquaintances, which are peers sharing part of its vocabulary. In this paper, we consider peer-to-peer inference systems in which the local theory of each peer is a set of propositional clauses defined upon a local vocabulary. An important characteristic of peer-to-peer inference systems is that the global theory (the union of all peer theories) is not known (as opposed to partition-based reasoning systems). The main contribution of this paper is to provide the first consequence finding algorithm in a peer-to-peer setting: it is anytime and computes consequences gradually from the solicited peer to peers that are more and more distant. We exhibit a sufficient condition on the acquaintance graph of the peer-to-peer inference system for guaranteeing the completeness of this algorithm. Another important contribution is to apply this general distributed reasoning setting to the setting of the Semantic Web through the SomeWhere semantic peer-to-peer data management system. The last contribution of this paper is to provide an experimental analysis of the scalability of the peer-to-peer infrastructure that we propose, on large networks of 1000 peers.

  • Somewhere in the Semantic Web. Marie-Christine Rousset, Philippe Adjiman, Philippe Chatalic, Francois Goasdoué, , Laurent Simon. Invited talk, Proceedings of SOFSEM 2006 (International Conference on Current Trends in Theory and Practice of Computer Science), Prague, Janvier 2006. Also in Proceedings of PPSWR 2005 (International Workshop on Principles and Practice of Semantic Web Reasoning). .

    In this paper, we describe the SomeWhere semantic peer-to-peer data management system that promotes a "small is beautiful" vision of the Semantic Web based on simple personalized ontologies (e.g., taxonomies of classes) but which are distributed at a large scale. In this vision of the Semantic Web, no user imposes to others his own ontology. Logical mappings between ontologies make possible the creation of a web of people in which personalized semantic marking up of data cohabits nicely with a collaborative exchange of data. In this view, the Web is a huge peer-to-peer data management system based on simple distributed ontologies and mappings.

    Publications from 2005

  • Scalability Study of Peer-to-Peer Consequence Finding. Philippe Adjiman, Philippe Chatalic, Francois Goasdoué, Marie-Christine Rousset, Laurent Simon Proceedings of IJCAI 2005 (International Joint Conference on Artificial Intelligence).

    In peer-to-peer inference systems, each peer can reason locally but also solicit some of its acquaintances, sharing part of its vocabulary. This paper studies both theoretically and experimentally the problem of computing proper prime implicates for propositional peer-to-peer systems, the global theory (union of all peer theories) of which is not known (as opposed to partition-based reasoning).

  • Efficient mining of high branching factor attribute trees. By Alexandre Termier, Marie-Christine Rousset, Michèle Sebag, Kouzou Ohara, Takashi Washio, Hiroshi Motoda. Proceedings of ICDM 05 (International Conference on Data Mining). .

    In this paper, we present a new tree mining algorithm, DRYADEPARENT, based on the hooking principle first introduced in DRYADE. In the experiments, we demonstate that the branching factor and depth of the frequent patterns to find are key factors of complexity for tree mining algorithms. We whow that DRYADEPARENT outperforms the current fastest algorithm, CMTreeMiner, by orders of magnitude on datasets where the frequent patterns have a high branching factor.

  • Enriching a relational datawarehouse by integrating XML data : report ont the edot project By Hélène Gagliardi, Ollivier Haemmerlé, Damiano Migliori, Nathalie Pernelle, Marie-Christine Rousset, Fatiha Sais. Proceedings of Second Franco Japanese Workshop on Information Search Integration and Personnalization (ISIP). .

    In this paper we present a method for integrating and querying XML data in a relational setting. though this method is generic, it has been motivated and validated by a knowledge management application on Microbiology: the e.dot projet.

    Publications from 2004

  • Small Can Be Beautiful in the Semantic Web. By Marie-Christine Rousset. Invited talk, Proceedings of International Semantic Web Conference (ISWC 2004), pages 6--16, 2004. .

    In 1984, Peter Patel-Schneider published a paper entitled "Small can be Beautiful in Knowledge Representation" in which he advocated for limiting the expressive power of knowledge representation formalisms in order to guarantee good computational properties and thus make knowledge-based systems usable as part of larger systems. In this paper, I aim at showing that the same argument holds for the Semantic Web: if we want to give a chance for the Semantic Web to scale up and to be broadly used, we have to limit the expressive power of the ontologies serving as semantic marking up of resources. In addition, due to the scale of the Web and the disparity of its users, it is unavoidable to have to deal with distributed heterogeneous ontologies. In this paper, I will argue that a peer-to-peer infrastructure enriched with simple distributed ontologies (e.g., taxonomies) reconciled through mappings is appropriate and scalable for supporting the future Semantic Web.

  • DRYADE: a new approach for discovering closed frequent trees in heterogeneous tree databases. By Alexandre Termier, Marie-Christine Rousset and Michele Sebag. Proceedings of International Conference on Data Mining (ICDM 2004), pages 543--546, 2004. .

    In this paper, we present a novel algorithm for discovering tree paterrns in a tree database. This algorithm uses a relaxed tree inclusion definition, making the problem complex (checking tree inclusion is NP-complete), but allowing to mine highly heterogeneous databases. To obtain good performances, our DRYADE algorithm discovers only closed frequent tree patterns.

  • Distributed Reasoning in a Peer-to-Peer Setting. By Philippe Adjiman, Philippe Chatalic, Francois Goasdoué, Marie-Christine Rousset and Laurent Simon. Proceedings of European Conference on Artificial Intelligence (ECAI 2004), pages 945-946, 2004. .

    In a peer to peer inference system, each peer can reason locally but can also solicit some of its acquaintances, which are peers sharing part of its vocabulary. In this paper, we consider peer to peer inference systems in which the local theory of each peer is a set of propositional clauses defined upon a local vocabulary. An important characteristic of peer to peer inference systems is that the global theory (the union of all peer theories) is not known (as opposed to partition-based reasoning systems). The contribution of this paper is twofold. We provide the first consequence finding algorithm in a peer to peer setting: it is incremental, anytime and computes consequences gradually from the solicited peer to peers that are more and more distant. We exhibit a sufficient condition on the acquaintance graph of the peer to peer inference system for guaranteeing the completeness of this algorithm. We report first promising experimental results.

  • Highlighting latent structure in documents. By Helka Folch, Benoit Habert, Michele Jardino, Nathalie Pernelle, Marie-Christine Rousset and Alexandre Termier. Proceedings of International Conference on Language Resources ans Evaluation (LREC 2004), Vol IV, pages 1331-1334, 2004. .

    Extensible Markup Language (XML) is playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. It is a simple, very flexible text format, used to annotate data by means of markup. XML documents can be checked for syntactic well-formedness and semantic coherence through DTD and schema validation which makes their processing easier. In particular, data with nested structure can be easily represented with embedded tags. This structured representation should be used in information retrieval models which take structure into account. As such, it is meta-data and therefore a contribution to the Semantic Web. However, nowadays, there exists huge quantities of raw texts and the issue is how to find an easy way to provide these texts with sensible XML structure. Here we present an automatic method to extract tree structure from raw texts. This work has been supported by the Paris XI University (BQR2002 project, Paris-XI University).

  • Answering Queries using Views: a KRDB Perspective for the Semantic Web By Francois Goasdoué and Marie-Christine Rousset # ACM Transactions on Internet Technology (TOIT), Volume 4(3), pages 255--288, 2004. .

    In this paper, we investigate a first step towards the long-term vision of the Semantic Web by studying the problem of answering queries posed through a mediated ontology to multiple information sources whose content is described as views over the ontology relations. The contributions of this paper are twofold. We first offer a uniform logical setting which allows us to encompass and to relate the existing work on answering and rewriting queries using views. In particular, we make clearer the connection between the problem of rewriting queries using views and the problem of answering queries using extensions of views. Then we focus on an instance of the problem of rewriting conjunctive queries using views through an ontology expressed in a description logic, for which we exhibit a complete algorithm.

    Publications from 2003

  • Querying Distributed Data through Distributed Ontologies: a Simple but Scalable Approach By Francois Goasdoué and Marie-Christine Rousset IEEE Intelligent Systems, 18(5), pp 60--65 .

    This article presents a simple but scalable framework for peer-to-peer data-sharing systems, in which the problem of answering queries over a network of semantically related peers is always decidable. The approach is characterized by a simple class-based language for defining peer schemas as hierarchies of atomic classes, and mappings as inclusions of logical combinations of atomic classes. We provide an anytime and incremental method for computing all the answers to a query posed to a given peer such that the answers are ordered from ones involving peers close to the queried peer to those involving more distant peers.

  • Knowledge Representation for Information Integration. By Marie-Christine Rousset and Chantal Reynaud. Information Systems International Journal, 29(1), pp 3-22 .

    An information integration system provides a uniform query interface to a collection of distributed and heterogeneous information sources, giving users or other agents the illusion that they interrogate a centralized and homogeneous information system. In this paper, we focus on the use of knowledge representation techniques for building mediators for information integration. A mediator is based on the specification of a single mediated schema describing a domain of interest, and on a set of source descriptions expressing how the content of each source available to the system is related to the domain of interest. These source descriptions, also called mappings because they model the correspondence between the mediated schema and the schemas of the data sources, play a central role in the query answering process. We present two recent information integration systems, namely Picsel and Xyleme, which are illustrative of two radically different choices concerning the expressivity of the mediated schema.

  • Semantic Integration in Xyleme: a Uniform Tree-based Approach. By Claude Delobel, Chantal Reynaud, Marie-Christine Rousset, Jean-Pierre Sirot and Dan Vodislav. Journal on Data & Knowledge Engineering, 44(2), pp 267-298.

    Xyleme is a huge warehouse integrating XML data of the Web. Xyleme considers a simple data model with data trees and tree types for describing the data sources, and a simple query language based on tree queries with boolean conditions. The main components of the data model are a mediated schema modeled by an abstract tree type, as a view of a set of tree types associated with actual data trees, called concrete tree types, and a mapping expressing the connection between the mediated schema and the concrete tree types. The first contribution of this paper is formal: we provide a declarative model-theoretic semantics for Xyleme tree queries, a way of checking tree query containment, and a characterization of tree queries as a composition of branch queries. The other contributions are algorithmic and handle the potentially huge size of the mapping relation which is a crucial issue for semantic integration and query evaluation in Xyleme. First, we propose a method for pre-evaluating queries at compile time by storing some specific meta-information about the mapping into map translation tables. These map translation tables summarize the set of all the branch queries that can be generated from the mediated schema and the set of all the mappings. Then, we propose different operators and strategies for relaxing queries which, having an empty map translation table, will have no answer if they are evaluated against the data. Finally, we present a method for semi-automatically generating the mapping relation.

    Publications from 2002

  • TreeFinder: a First Step towards XML Data Mining By Alexandre Termier, Marie-Christine Rousset and Michèle Sebag. Proceedings of the International Conference on Data Mining ICDM 02

    In this paper, we consider the problem of searching frequent trees from a collectionof tree-structured data modeling XML data. The TreeFinder algorithm aims at finding trees, such that their exact or perturbed copies are frequent in a collection of labelled trees. To cope with complexity issues, TreeFinder is correct but not complete: it finds a subset of the actually frequent trees. The default of completeness is experimentally investigated on artificial medium size datasets; it is shown that TreeFinder reaches completeness or falls short to it for a range of experimental settings.

  • Zoom: a nested Galois lattices-based system for conceptual clustering. By Nathalie Pernelle, Marie-Christine Rousset, Henry Soldano and Véronique Ventos. Journal on Experimental and Theoretical Artificial Intelligence, 14(2002),pp 157-187.

    This paper deals with the representation of multi-valued data by clustering them in a small number of classes organized in a hierarchy and described at an appropriate level of abstraction. First we investigate a partial order, namely nesting, relating Galois Lattices. Second, we investigate the intensional and extensional aspects of the languages used in our Zoom system. Finally, the nesting order between the Galois lattices corresponding to these various languages are exploited in the interactice system Zoom.

  • Compilation and Approximation of Conjunctive Queries by Concept Descriptions. By Francois Goasdoué and Marie-Christine Rousset. Proceedings of the 15th European Conference on Artificial Intelligence (ECAI 2002), also presented at DL 2002.

    In this paper, we characterize the logical correspondence between conjunctive queries and concept descriptions. We exhibit a necessary and sufficient condition for the compilation of a conjunctive query into an equivalent ALE concept description. We provide a necessary and sufficient condition for the approximation of conjunctive queries by maximally subsumed ALN concept descriptions.

  • the Xyleme Project. By Serge Abiteboul, Sophie Cluet, Guy Ferran and Marie-Christine Rousset. Computer Networks 39 (2002), pp 225-238.

    The current development of the Web and the generalization of XML technonology provides a major opportunity to radically change the management of distributed data. We have developed a prototype of a dynamic warehouse for XML data, namely Xyleme. In the present paper, we briefly present some motivation and important aspects of the work performed in the framework of the Xyleme project.

  • The Semantic Web Needs Languages for Representing (Complex) Mappings Between (Simple) Ontologies. By Marie-Christine Rousset. Invited paper in Trends & Controversies of IEEE Intelligent Systems, Volume 17, Number 2.

    The main issue for a Web ontology language is less the choice of a language for describing ontologies than the choice of a language for representing mappings between ontologies.

  • Picsel and Xyleme: two illustrative information integration agents By Marie-Christine Rousset and Chantal Reynaud. Book chapter in Intelligent Information Agents Research and Development in Europe: An AgentLink Perspective Editors: Matthias Klusch, Sonia Bergamaschi, Paolo Petta, Pete Edwards; Publisher: Springer Verlag, LNCS State of the Art Surveys.

    An information integration agent provides a uniform query interface to a collection of distributed and heterogeneous information sources, giving users or other agents the illusion that they interrogate a centralized and homogeneous information system. In this chapter we focus on integration information agents that follow a mediator approach. A mediator is based on the specification of a single mediated schema describing a domain of interest, and on a set of source descriptions expressing how the content of each source available to the system is related to the domain of interest. These source descriptions, also called mappings because they model the correspondence between the mediated schema and the schemas of the data sources, play a central role in the query answering process. We present two recent information integration agents, namely Picsel and Xyleme, which are illustrative of two radically different choices concerning the expressivity of the mediated schema.

  • Mining XML Data with Frequent Trees. By Alexandre Termier, Marie-Christine Rousset and Michèle Sebag. Proceedings of the workshop DBFusion 2002

    The paper aims at extracting common tree structures in heterogeneous XML data. This approach extends Frequent Item Set techniques to tree structure representation. We provide an algorithm which extracts all subtrees embedded in the data above a given frequency threshold. As a result, this technique provides both a clustering of the data and a symbolic characterization of each cluster. This characterisation is made of the maximally specific subtrees embedded in all the data trees of each cluster. The scalability of the algorithm is investigated on artificial medium-sized data.

  • Construction de médiateurs pour intégrer des sources d'information multiples et hétérogènes : le projet PICSEL. By Marie-Christine Rousset, Alain Bidault, Christine Froidevaux, Helene Gagliardi, Francois Goasdoué, Chantal Reynaud, Brigitte Safar. To appear in Revue I3

    Le nombre croissant de données accessibles via des réseaux (intranet, internet, etc.) pose le problème de l'intégration de sources d'information préexistantes, souvent distantes et hétérogènes, afin de faciliter leur interrogation par un large public. Une des premières approches proposées en intégration d'informations prone la construction de médiateurs. Un médiateur joue un rôle d'interface de requetes entre un utilisateur et des sources de données. Il donne à l'utilisateur l'illusion d'interroger un système homogène et centralisé en lui évitant d'avoir à trouver les sources de données pertinentes pour sa requête, de les interroger une à une, et de combiner lui-même les informations obtenues. L'objectif de cet article est de présenter le projet PICSEL qui offre un environnement déclaratif de construction de médiateurs. PICSEL se distingue des systèmes d'intégration d'informations existants par la possibilité d'exprimer le schéma du médiateur dans un langage (Carin) combinant le pouvoir d'expression d'un formalisme à base de règles et d'un formalisme à base de classes (la logique de description ALN). PICSEL intègre un module d'affinement de requêtes, première brique d'un module de dialogue coopératif entre un médiateur et ses utilisateurs

    Publications from 2001

  • A Uniform Approach for Querying Large Tree-structured Data through a Mediated Schema. By Claude Delobel and Marie-Christine Rousset. Proceedings of Foundations of Models For Information Integration Workshop (FMII-2001)

    We are interested in defining and querying large and highly heterogeneous database repository for XML data and their DTDs. We consider a very simple data model with data trees and tree types for describing the data sources and a very simple query language based on tree queries with boolean conditions. The main components of the data model are a mediated schema as a view of a set of concrete tree types and a mapping expressing the connection between the mediated schema and the concrete tree types. We provide a general tree query semantics and a characterization of tree queries as composition of branch queries. We show also that we can pre-evaluate queries by storing some specific meta-information related to mapping information. We consider also the problem of query containment for tree queries and we propose some policies of relaxation for queries without answer. The research presented here was motivated by the Xyleme project at INRIA, whose objective is to develop a data warehouse for Web XML documents.

  • Automatic Construction and Refinement of a Class Hierarchy over Semistructured Data. By Nathalie Pernelle, Marie-Christine Rousset and Veronique Ventos. Proceedings of Principles of Data Mining and Knowledge Discovery (PKDD 2001) , also in Proceedings of the Ontology Learning IJCAI-2001 Workshop.

    In many applications, it becomes crucial to help users to access to a huge amount of data by clustering them in a small number of classes described at an appropriate level of abstraction. In this paper, we present an approach based on the use of two languages of description of classes for the automatic clustering of multi-valued data. The first language of classes has a high power of abstraction and guides the construction of a lattice of classes covering the whole set of the data. The second language of classes, more expressive and more precise, is the basis for the refinement of a part of the lattice that the user wants to focus on.

  • Combining Statistics and Semantics for Word and Document Clustering. By Alexandre Termier, Marie-Christine Rousset and Michele Sebag. Proceedings of the Ontology Learning IJCAI-2001 Workshop.

    A new approach for constructing pseudo-keywords, referred to as Sense Units, is proposed. Sense Units are obtained by a word clustering process, where the underlying similarity reflects both statistical and semantic properties, respectively detected through Latent Semantic Analysis and WordNet. Sense Units are used to recode documents and are evaluated from the performance increase they permit in classification tasks. Experimental results show that accounting for semantic information in fact { decreases} the performances compared to LSI standalone. The main weaknesses of the current hybrid scheme are discussed and several tracks for improvement are sketched.

  • Construction et affinement automatiques de hierarchies de classes a partir de donnees semi-structurees. Par Nathalie Pernelle, Veronique Ventos et Marie-Christine Rousset. Actes des Journees Francophones d'Extraction et de Gestion de Connaissances (EGC'2001), Nantes, Janvier 2001

    In many applications, it becomes crucial to help users to access to a huge amount of data by clustering them in a small number of classes described at an appropriate level of abstraction. In this paper, we present an approach based on the use of two languages of description of classes for the automatic clustering of semistructured data. The first language of classes has a high power of abstraction and guides the construction of a basic lattice of classes covering the whole set of the data. The second language of classes, more expressive and more precise, is the basis for the refinement of a part of the lattice that the user wants to focus on.

    Publications from 2000

  • The use of Carin language and algorithms for Information Integration: the Picsel system By Francois Goasdoue, Veronique Lattes and Marie-Christine Rousset. International Journal on Cooperative Information Systems, Vol.9, N°4 (2000) 383-401

    Picsel is an information integration system over sources that are distributed and possibly heteregeneous. The approach which has been chosen in Picsel is to define an information server as a knowledge-based mediator in which Carin is used as the core logical formalism to represent both the domain of application and the contents of information sources relevant to that domain. In this paper, we describe the way the expressive power of the Carin language is exploited in the Picsel information integration system, while maintaining the decidability of query answering. We illustrate it on examples coming from the tourism domain, which is the first real case that we have to consider in Picsel, in collaboration with France Telecom (CNET) and the travel agency Degriftour.

  • Rewriting Conjunctive Queries using Views in Description Logics with Existential Restrictions By Francois Goasdoue and Marie-Christine Rousset. Proceedings of the International Workshop on Description Logics (DL'00), Aachen, Germany, 2000.

    In database, rewriting queries using views has received significant attention because of its relevance to several fields such as query optimization, data warehousing, and information integration. In those settings, data used to answer a query are restricted to be extensions of a set of predefined queries (views). The information integration context is typical of the need of rewriting queries using views for answering queries. Users of information integration systems do not pose queries directly to the (possibly remote) sources in which data are stored but to a set of virtual relations that have been designed to provide a uniform and homogeneous access to a domain of interest. When the contents of the sources are described as views over the (virtual) domain relations, the problem of reformulating a user query into a query that refers directly to the relevant sources becomes a problem of rewriting queries using views. In this paper, we study the problem of rewriting conjunctive queries over DL expressions into conjunctive queries using a set of views that are a set of distinguished DL expressions, for three DLs allowing existential restrictions: FLE, ALE and ALEN. For FLE, we present an algorithm that computes a representative set of all the rewritings of a query. In the full version of this paper (cf. Technical report), we show how to adapt it to deal with the negation of atomic concepts in the queries and in the views, in order to obtain a rewriting algorithm for ALE. Finally, we show that obtaining a finite set representative of all the rewritings of a query is not guaranteed in ALEN.

  • A zero-one law for random sentences in description logics. By Bernard Ycart and Marie-Christine Rousset. Proceedings of the Colloquium on Mathematics and Computer Science: Algorithms, Trees, Combinatorics and Probabilities.

    We consider the set of sentences in a decidable fragment of first order logic, restricted to unary and binary predicates, expressed on a finite set of objects. Probability distributions on that set are defined and studied. For large sets of objects, is is shown that under these distributions, random sentences typically have a very particular form, and that all monotone symmetric properties of these random sentences have a probability close to 0 or 1.

    Publications from 1999

  • Query Expansion in Description Logics and CARIN By Marie-Christine Rousset. Proceedings of the AAAI Fall Symposium on Question Answering Systems, Cape Cod, November 99.

    In this paper, we address the problem of determining the expansions of a query in description logics possibly combined with Datalog rules, and we show its use for information integration. Expanding a query consists of determining all the ways of deriving it from atoms built on some distinguished predicates.

  • Backward Reasoning in Aboxes for Query Answering By Marie-Christine Rousset. Proceedings of the International Workshop on Description Logics (DL'99), Linkoping, Sweden, July 99 , Also in Proceedings of the 6th International Workshop on Knowledge Representation meets Databases (KRDB'99), Linkoping, Sweden, July 99.

    In this paper, we address the problem of answering (unions of) conjunctive queries on Aboxes. We propose a backward-chaining query evaluation based on query expansion. Expanding a query consists of determining all the ways of deriving it from atoms built on some distinguished predicates.

    Publications from 1998

  • Combining Horn Rules and Description Logics in CARIN By Alon Levy and Marie-Christine Rousset. Artificial Intelligence Journal, vol 104, September 1998.

    50 pages

    We describe CARIN, a novel family of representation languages, that combine the expressive power of Horn rules and of description logics. We address the issue of providing sound and complete inference procedures for such languages. We identify existential entailment as a core problem in reasoning in CARIN, and describe an existential entailment algorithm for the ALCNR description logic. As a result, we obtain a sound and complete algorithm for reasoning in non recursive CARIN-ALCNR knowledge bases, and an algorithm for rule subsumption over ALCNR. We show that in general, the reasoning problem for recursive CARIN-ALCNR knowledge bases is undecidable, and identify the constructors of ALCNR causing the undecidability. We show two ways in which CARIN-ALCNR knowledge bases can be restricted while obtaining sound and complete reasoning.

    This paper is an extended version of two papers that were earlier published in the conferences AAAI'96 and ECAI'96

  • The use of CARIN language and algorithms for Information Integration: the PICSEL project By Veronique Lattes and Marie-Christine Rousset. Proceedings of the ECAI'98 workshop on Intelligent Information Integration, August 1998, Brighton(UK)

    PICSEL is an information integration system over sources that are distributed and possibly heteregeneous. The approach which has been chosen in PICSEL is to define an information server as a knowledge-based mediator in which CARIN is used as the core logical formalism to represent both the domain of application and the contents of information sources relevant to that domain. The goal of this paper is to point out the way we exploit the expressive power of CARIN in a way that enables avoiding the general problem of query reformulation, by mixing the Global-as-Local and Local-as-Global approaches. We illustrate it on examples coming from the tourism domain, which the first real case that we have to consider in PICSEL, in collaboration with the Web (and Minitel) travel agent Degriftour (http://www.degriftour.fr/)

  • Verification of Knowledge Bases based on Containment Checking, By Alon Levy and Marie-Christine Rousset. Artificial Intelligence Journal, Volume 101, Number 1-2, May 1998.

    27 pages

    Building complex knowledge based applications requires encoding large amounts of domain knowledge. After acquiring knowledge from domain experts, much of the effort in building a knowledge base goes into verifying that the knowledge is encoded correctly. A knowledge base is verified if it can be shown that certain constraints always hold between the inputs and the outputs. We consider the knowledge base verification problem for Horn rule knowledge bases and for three kinds of constraints: I/O consistency constraints, I/O dependency constraints and Input completeness constraints. For the first two cases, we establish tight complexity results on the problem, and show in what cases it is decidable. In the third case, we show that the problem is, in general, undecidable, and we identify two decidable cases. In our analysis we show how the properties of the problem vary depending on the presence of recursion in the Horn rules, the presence of interpreted predicates, and the presence of negation in the antecedents of the rules. Our approach to the verification problem is based on showing a close relationship to the problem of query containment , studied in the database literature. This connection also provides novel algorithms for the knowledge base verification problem. Finally, we provide the first algorithm for verifying hybrid knowledge bases that combine the expressive power of Horn rules and the description logic ALCNR.

    This paper is an extended version of a paper that was earlier published in the conference AAAI'96 and that obtained a best paper award

    Publications from 1997

  • Verifying the Web: a Position Statement By Marie-Christine Rousset. Proceedings of the 4th European Symposium on the Validation and Verification of Knowledge Based Systems (EUROVAV-97), June 26-28, 1997, Leuven, Belgium

    This paper investigates the issues raised by checking potential update anomalies of a web site. It is of course impossible to control the updates for the whole World Wide Web. However, it is conceivable to provide tools for pointing out potential missing updates for a given web site (i.e, a set of Web pages, for instance related to a given matter or to a given institution, which is under the control of a web site manager). The assumption that we make in this paper is that modeling the semantics of the considered Web site is possible, while modeling the semantics of the sources outside of it, which can be connected to it, is not possible. Therefore, two different update problems have to be distinguished, depending on whether the updates that have to be done inside the web site are triggered by changes occuring inside or outside the web site.

  • Verification of Knowledge Bases: a Unifying Logical View By Alon Levy and Marie-Christine Rousset. Proceedings of the 4th European Symposium on the Validation and Verification of Knowledge Based Systems (EUROVAV-97), June 26-28, 1997, Leuven, Belgium

    Notions of correctness and completeness of a KB are impossible to capture completely by a formal definition. However, when the knowledge base is represented in a declarative logical formalism, they can be approached by a logical analysis of its contents. A logical analysis of the knowledge base and constraints that are known to hold on the domain enables us to detect anomalies or discrepancies between the knowledge represented in the KB and the domain. This paper describes a unified logical framework for the verification problem of knowledge bases represented by logical rules (i.e., Horn rules and some extensions). We consider several instances of the verification problem, describe algorithms for verification, and establish the computational complexity of the verification problem. In particular, we consider the verification w.r.t. consistency constraints that are specified separately on the inputs and on the outputs. Next, we consider dependency constraints on the relations between inputs and outputs. Finally, we consider several extensions to our logical formalism that enable us to express richer classes of constraints. The extensions involve the use of negation and the use of constructors of description logics in the constraints.

  • Revision of Rule Bases By Fatma Bouali, Stephane Loiseau and Marie-Christine Rousset. Proceedings of the 4th European Symposium on the Validation and Verification of Knowledge Based Systems (EUROVAV-97), June 26-28, 1997, Leuven, Belgium

    Building a correct knowledge base (KB) is an incremental and difficult process, made of successive verification and revision steps. Verification steps aim at ensuring that a KB is correct. Revision steps aim at modifying the knowledge base in order to eliminate the anomalies that the verification steps have pointed out. In this paper, we propose a method for the revision of an incorrect rule base, based on characterizing the possible causes of the detected inconsistencies. The possible actions of revision that are considered are adding integrity constraints, deleting rules, and more interestingly refining existing rules by adding new conjuncts.

  • Rewriting Queries Using Views in Description Logics By Catriel Beeri, Alon Y. Levy and Marie-Christine Rousset. Proceedings of the 16th ACM Symposium on Principles of Database Systems (PODS-97), Tucson, May 1997

    The problem of rewriting queries using views is to find a query expression that uses only a set of views and is equivalent to (or maximally contained in) a given query. Rewriting queries using views is important for query optimization and for applications such as information integration and data warehousing. Description logics are a family of logics that were developed for modeling complex hierarchical structures, and can also be viewed as a query language with an interesting tradeoff between complexity and expressive power. We consider the problem of rewriting queries using views expressed in description logics and conjunctive queries over description logics. We show that if the view definitions do not contain existential variables, then it is always possible to find a rewriting that is a union of conjunctive queries, and furthermore, this rewriting produces the maximal set of answers possible from the views. If the views have existential variables, the rewriting may be recursive. We present an algorithm for producing a recursive rewriting, that is guaranteed to be a maximal one when the underlying database forms a tree of constants. We show that in general, it is not always be possible to find a maximal rewriting.

    Publications from 1996

  • CARIN: A Representation Language Combining Horn Rules and Description Logics By Alon Y. Levy and Marie-Christine Rousset. Proceedings of European Conference on Artificial Intelligence, ECAI 96, Budapest, August 1996.

    We describe CARIN, a novel family of representation languages, which integrate the expressive power of Horn rules and of description logics. We address the key issue in designing such a language, namely, providing a sound and complete inference procedure. We identify existential entailment as a core problem in reasoning in CARIN, and describe an existential entailment algorithm for CARIN languages whose description logic component is ALCNR. This algorithm entails several important results for reasoning in CARIN, most notably: (1) a sound and complete inference procedure for non recursive CARIN-ALCNR, and (2) an algorithm for determining rule subsumption over ALCNR.

  • Modeling and Verifying Complex Objects : A Declarative Approach Based on Description Logics By Marie-Christine Rousset and Pascale Hors. Proceedings of European Conference on Artificial Intelligence, ECAI 96, Budapest, August 1996.

    Description logics are good candidates for representing complex objects. Their expressive and computational properties have been extensively studied and are well understood. However, from a knowledge engineering perspective, the current reasoning services provided by description logic systems do not suffice for representing objects. Adding services (e.g help for debugging) is necessary to make description logic systems useful tools for building object-centered knowledge bases. In this paper, we present a principled glass-box approach both for trace-based debugging and for supporting transparent modeling of whole-part relationship, using description logics. This approach has been proven very useful for a real application of modeling components of power plants.

  • Verification of Knowledge Bases based on Containment Checking By Alon Levy and Marie-Christine Rousset. Proceedings of American Conference on Artificial Intelligence, AAAI 96, Portland, August 1996.

    Building complex knowledge based applications requires encoding large amounts of domain knowledge. After acquiring knowledge from domain experts, much of the effort in building a knowledge base goes into verifying that the knowledge is encoded correctly. We consider the problem of verifying hybrid knowledge bases that contain both Horn rules and a terminology in a description logic. Our approach to the verification problem is based on showing a close relationship to the problem of query containment. Our first contribution, based on this relationship, is presenting a thorough analysis of the decidability and complexity of the verification problem, for knowledge bases containing recursive rules and the standard interpreted predicates. Second, we show that important new classes of constraints on correct inputs and outputs can be expressed in a hybrid setting, in which a description logic class hierarchy is also considered, and we present the first complete algorithm for verifying such hybrid knowledge bases.

  • The Limits on Combining Horn Rules with Description Logics By Alon Levy and Marie-Christine Rousset. Proceedings of American Conference on Artificial Intelligence, AAAI 96, Portland, August 1996.

    Horn rule languages have formed the basis for many Artificial Intelligence application languages, but are not expressive enough to model domains with a rich hierarchical structure. Description logics have been designed especially to model rich hierarchies. Several applications would significantly benefit from combining the expressive power of both formalisms. This paper focuses on combining recursive function-free Horn rules with the expressive description logic ALCNR, and shows exactly when a hybrid language with decidable inference can be obtained. First, we show that several of the core constructors of description logics lead by themselves to undecidability of inference when combined with recursive function-free Horn rules. We then show that without these constructors we obtain a maximal subset of ALCNR that yields a decidable hybrid language. Finally, we describe a restriction on the Horn rules that guarantees decidable inference when combined with all of ALCNR, and covers many of the common usages of recursive rules.