Robust Module-Based Data Management

October 21, 2013 Comments Off Posted in: Final year projects

Robust Module-Based Data Management
Franc¸ois Goasdoue´ and Marie-Christine Rousset
Abstract—The current trend for building an ontology-based data management system (DMS) is to capitalize on efforts made to
design a preexisting well-established DMS (a reference system). The method amounts to extracting from the reference DMS a piece
of schema relevant to the new application needs—a module—, possibly personalizing it with extra constraints w.r.t. the application
under construction, and then managing a data set using the resulting schema. In this paper, we extend the existing definitions of
modules and we introduce novel properties of robustness that provide means for checking easily that a robust module-based DMS
evolves safely w.r.t. both the schema and the data of the reference DMS. We carry out our investigations in the setting of description
logics which underlie modern ontology languages, like RDFS, OWL, and OWL2 from W3C. Notably, we focus on the DL-liteA dialect
of the DL-lite family, which encompasses the foundations of the QL profile of OWL2 (i.e., DL-liteR): the W3C recommendation for
efficiently managing large data sets.
Index Terms—Models and principles, database management, personalization, algorithms for data and knowledge management,
artificial intelligence, intelligent web services, Semantic Web
Ç
1 INTRODUCTION
IN many application domains (e.g., medicine or biology),
comprehensive schemas resulting from collaborative
initiatives are made available. For instance, SNOMED is
an ontological schema containing more than 400.000 concept
names covering various areas such as anatomy, diseases,
medication, and even geographic locations. Such wellestablished
schemas are often associated with reliable data
that have been carefully collected, cleansed, and verified,
thus providing reference ontology-based data management
systems (DMSs) in different application domains.
A good practice is therefore to build on the efforts made
to design reference DMSs whenever we have to develop our
own DMS with specific needs. A way to do this is to extract
from the reference DMS the piece of schema relevant to our
application needs, possibly to personalize it with extra
constraints w.r.t. our application under construction, and
then to manage our own data set using the resulting schema.
Recent work in description logics (DLs, [1]) provides
different solutions to achieve such a reuse of a reference
ontology-based DMS. Indeed, modern ontological languages—
like the W3C recommendations RDFS, OWL, and
OWL2—are actually XML-based syntactic variants of wellknown
DLs. All those solutions consist in extracting a
module from an existing ontological schema such that all
the constraints concerning the relations of interest for the
application under construction are captured in the module
[2]. Existing definitions of modules in the literature basically
resort to the notion of (deductive) conservative extension of
a schema or of uniform interpolant of a schema, a.k.a.
forgetting about noninteresting relations of a schema.
Ghilardi et al. [3] formalize those two notions for schemas
written in DLs and discusses their connection. Up to now,
conservative extension has been considered for defining a
module as a subset of a schema. In contrast, forgetting has
been considered for defining a module as only logically
implied by a schema (by definition forgetting cannot lead to a
subset of a schema in the general case). Both kinds of
modules have been investigated in various DLs, e.g., DL-lite
[4], [5], EL [6], [7], [8], and ALC [7], [9], [10].
In this paper, we revisit the reuse of a reference ontologybased
DMS in order to build a new DMS with specific
needs. We go one step further by not only considering the
design of a module-based DMS (i.e., how to extract a module
from a ontological schema): we also study how a modulebased
DMS can benefit from the reference DMS to enhance
its own data management skills. We carry out our
investigations in the setting of DL-lite, which is the
foundation of the QL profile of OWL2 recommended by
the W3C for efficiently managing large RDF data sets. RDF
is the W3C’s Semantic Web data model, which is rapidly
spreading in more and more applications, and can be seen
as a simple relational model restricted to unary and binary
predicates. In addition, DL-lite comes with efficient
inference algorithms [11] for querying RDF data through
(DL-lite) ontologies and for checking data consistency w.r.t.
integrity constraints (ICs) expressed in DL-lite.
Our contribution is to introduce and study novel properties
of robustness for modules that provide means for
checking easily that a robust module-based DMS evolves
safely w.r.t. both the schema and the data of the reference
DMS. From a module robust to consistency checking, for any
data update in a corresponding module-based DMS, we
show how to query the reference DMS for checking whether
the local update does not bring any inconsistency with the
data and the constraints of the reference DMS. From a
module robust to query answering, for any query asked to
648 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 3, MARCH 2013
. F. Goasdoue´ is with INRIA Saclay and Laboratoire de Recherche en
Informatique (LRI), Baˆtiment 650, Universite´ Paris-Sud, 91405 Orsay
Cedex, France. E-mail: fg@lri.fr.
. M.-C. Rousset is with the Institut Universitaire de France and Laboratoire
d’Informatique de Grenoble (LIG), Universite´ de Grenoble, 681 rue de la
Passerelle, BP 72, 38402 St. Martin d’Heres Cedex, France.
E-mail: Marie-Christine.Rousset@imag.fr.
Manuscript received 8 Mar. 2011; revised 29 Sept. 2011; accepted 30 Nov.
2011; published online 8 Dec. 2011.
Recommended for acceptance by J. Dix.
For information on obtaining reprints of this article, please send e-mail to:
tkde@computer.org, and reference IEEECS Log Number TKDE-2011-03-0113.
Digital Object Identifier no. 10.1109/TKDE.2011.255.
1041-4347/13/$31.00 2013 IEEE Published by the IEEE Computer Society
a module-based DMS, we show how to query the reference
DMS for obtaining additional answers by also exploiting the
data stored in the reference DMS.
It is worth noticing that our investigations are sustained
by real use cases. For instance, the MyCF DMS (MyCorporisFabrica,
www.mycorporisfabrica.org, [12]) has been built
by hand from the FMA DMS (Foundational Model of
Anatomy, sig.biostr.washington.edu/projects/fm). The extraction
step has focused on particular parts of the human
body (e.g., hand, foot, and knee), while the personalization
step has enriched the descriptions of these parts with both
3D geometrical and bio-mechanical information. Notably,
careful attention was paid so that MyCF still conforms with
FMA at the end of the manual process.
The paper is organized as follows: We start with an
illustrative example in Section 2 that highlights the issues
and solutions on which we elaborate in the rest of the paper.
In Section 3, we present the DL-lite description logic, which
provides the formal basis of Section 4, in which we study
robust modules and safe personalization. In Section 5, we
provide algorithms and complexity results for extracting
robust modules from schemas and for checking the safe
personalization of modules. We conclude in Section 6 with
related work and perspectives.
2 ILLUSTRATIVE EXAMPLE
Consider a reference DMS for scientific publications (like
DBLP) defined by the ontological schema O and the data set
D in Fig. 1.
The schema O is built upon the unary relations
Publication, ConfPaper, ShortPaper, FullPaper,
JournPaper, Survey, and the binary relations hasTitle,
hasDate, hasVenue, and hasAuthor. It consists of
inclusion constraints and of integrity constraints (disjointness
and functional constraints). These constraints
are written in Fig. 1 using DL-lite, in which 9r denotes
the usual (relational) projection on the first attribute of the
binary relation r and ðfunct rÞ denotes the functional
dependency from the first attribute of the binary relation r
to the second one. The constraints in O state that any
publication has a single title (step 1), a single date of
publication (step 2), a single venue (step 3), and at least one
author (step 4). In addition, only publications have a title
(step 5), papers in conference proceedings or in journals
(which are disjoint) are publications (step 6), short papers or
full papers (which are disjoint) are papers in conference
proceedings, and surveys are journal papers (step 7).
The data set D consists of instances for the relations in O.
It is expressed as relational tables in Fig. 1. In particular,
those tables state that:
. doi1 is the Digital Object Identifier1 (DOI) of the full
paper entitled “Complexity of Answering Queries
Using Materialized Views” and published in
PODS ’98 by Serge Abiteboul (”SA”) and Oliver M.
Duschka (”OD”),
. doi2 is the DOI of the survey entitled “Answering
queries using views: A survey” and published in
VLDB Journal in 2001 by Alon Y. Halevy (”AH”), and
. doi3 is the DOI of the journal paper entitled
“MiniCon: A scalable algorithm for answering
queries using views” and published in VLDB
Journal in 2001 by Rachel Pottinger (”RP”) and Alon
Y. Halevy (”AH”).
It is worth noticing here that in contrast with the
relational model, data management in DLs needs some
reasoning to exhibit all the relevant implicit data regarding
a given task, e.g., consistency checking or query answering.
For instance, doi1 does not have to be explicitly stored in the
ConfPaper and Publication tables due to the inclusion
constraints in O, while it implicitly belongs to those
relations due to these constraints.
2.1 Designing a Module-Based DMS
Suppose that we have to develop a DMS about scientific
publications, e.g., for a company or a university. If we are
interested in managing journal papers and their authors
only, we can extract a module from O w.r.t. the relations of
interest JournPaper and hasAuthor. A possible module O0
consists of the constraint JournPaper v 9hasAuthor.
Suppose now that the person in charge of populating this
module-based DMS stores by mistake doi1 in the local
JournPaper table, and its authors ”SA” and ”OD” in the
local hasAuthor table, as illustrated in Fig. 2.
2.2 Global Consistency: Illustration
It is easy to see that though our module-based DMS is
consistent, it is inconsistent together with the reference
DMS: doi1 is a journal paper in our DMS, while it is a
GOASDOUE AND ROUSSET: ROBUST MODULE-BASED DATA MANAGEMENT 649
Fig. 1. A reference DMS defined by the schema O and the data set D.
1. http://www.doi.org.
conference paper in the reference DMS. This violates a
constraint of the reference schema ((step 6) in O).
Detecting this kind of inconsistency, called a global
inconsistency, is important since it indicates that some of
our data contradicts the reference DMS, and thus is
probably erroneous. Our basic idea is therefore to use the
whole reference DMS (schema and data) as extra constraints
to be satisfied by a module-based DMS. Of course, we do
not want to import the whole reference DMS into our own
DMS in order to do this. Instead, we extend the notion of
module to robustness to consistency checking, so that global
consistency checking can be performed on demand or upon
update: We ensure that the module captures the (possibly
implied) constraints in the reference schema that are
required to detect inconsistency related to the relations of
interest. Then, at global consistency checking time, those
constraints are verified against the distributed data set
consisting of the data set of the module-based DMS plus
that of the reference DMS.
Making our module O0 robust to consistency checking
requires adding (integrity) constraints like JournPaper v
:FullPaper which allows detecting inconsistency related to
the relation of interest JournPaper. Note that this constraint
brings the relation FullPaper into the module, while it is not
of interest w.r.t. our application needs. At global consistency
checking time, the constraints in the module that allow
detecting inconsistency w.r.t. the relation of interests are
verified by evaluating a boolean union of conjunctive
queries QðÞ: ½9x JournPaperðxÞ ^ FullPaperðxÞ _
which looks for the existence of counter-examples to any
of those constraints. Here, the first conjunctive query in QðÞ
looks for a possible counterexample to JournPaper v
:FullPaper. The subtle point is that the evaluation of QðÞ
is distributed among the module-based DMS and the
reference one. As a result, the query to evaluate against the
DMSs is QðÞ: ½9x ðJournPaperðxÞ _ JournPaperrefðxÞÞ ^
FullPaperref ðxÞ _ where the distributed evaluation is
reflected in the names of the relations: r denotes a local
relation, while rref denotes the corresponding relation in the
reference DMS. The above QðÞ exhibits a global inconsistency
due to doi1 belonging to the local JournPaper table of
our module-based DMS and to the FullPaperref table of the
reference DMS.
2.3 Global Answers: Illustration
Suppose now that our DMS can answer conjunctive
queries (a.k.a. select-project-join queries), e.g., QðxÞ :
JournPaperðxÞ ^ hasAuthorðx; ’’AH’’Þ asking for the journal
papers written by Alon Y. Halevy. In some situation, it
is interesting to provide answers from our DMS together
with the reference one, called global answers, typically
when our own DMS provides no or too few answers. To
do so, we extend the notion of module to robustness to
query answering, so that global query answering can be
performed on demand. We ensure that the module
captures the knowledge in the reference schema that is
required to answer any query built upon the relations of
interest. Then, at global query answering time, this
knowledge is used to identify the relevant data for a
given query within the distributed data set consisting of the
data set of the module-based DMS plus that of the
reference DMS.
Making O0 robust to query answering requires adding
inclusion constraints like Survey v JournPaper which
allows exhibiting implicit tuples for the relation of interest
JournPaper: those explicitly stored for the relation Survey.
Again, such a constraint brings the relation Survey into the
module, while it is not of interest w.r.t. our application
needs. At global query answering time, the constraints in
the module that allow answering a given query built upon
relations of interest are used to reformulate this query into a
union of conjunctive queries
QðxÞ: ½JournPaperðxÞ ^ hasAuthorðx; ’’AH’’Þ
_ ½SurveyðxÞ ^ hasAuthorðx; ’’AH’’Þ _
which models all the ways to answer it from a data set.
Here, the second conjunctive query in QðxÞ results from the
inclusion constraint Survey v JournPaper. Again, since the
data set is distributed among the module-based DMS and
the reference one, the query to evaluate in fact is
QðxÞ: ½ðJournPaperðxÞ _ JournPaperref ðxÞÞ
^ ðhasAuthorðx; ’’AH’’Þ _ hasAuthorref ðx; ’’AH’’ÞÞ
_ ½Surveyref ðxÞ ^ ðhasAuthorðx; ’’AH’’Þ
_ hasAuthorref ðx; ’’AH’’ÞÞ _ . . . :
In particular, QðxÞ finds doi2 and doi3 as global answers,
due to the presence in the reference DMS of: doi2 in the
Surveyref table, ðdoi2; ’’AH’’Þ in the hasAuthorref table, doi3
in the JournPaperref table, and ðdoi3; ’’AH’’Þ in the
hasAuthorref table.
2.4 Safe Personalization: Illustration
Suppose now that a (possibly robust) module does not meet
all the constraints for our application under development. A
personalization step—which amounts to adding the appropriate
constraints—is thus necessary. However, it must be
carefully done since personalizing can lead to loose global
data management skills (i.e., robustness) or even the
essence of the notion of module. To prevent this, we exhibit
sufficient conditions for a safe personalization.
For instance, suppose that we personalize O0 with the
constraints hasAuthor v hasRightsOn and
9hasRightsOn v JournPaper
in order to express that any author of a journal paper has
some rights on that paper, the notion of rights concerning
only journal papers. Note that in DLs, r denotes the inverse
of the binary relation r, i.e., the relation obtained by
swapping its two attributes. Thus, 9r denotes the usual
(relational) projection on the second attribute of r. Adding
the above constraints to O0 leads to the implied constraint
9hasAuthor v JournPaper, which makes sense w.r.t. the
650 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 3, MARCH 2013
Fig. 2. A module-based DMS defined by the schema O0 and the
data set D0.
reference DMS as it is built upon relations in O only. Yet, this
constraint does not hold in the reference DMS. As a result,
the personalization of O0 mentioned above is not safe. In fact,
the extra-constraint 9hasAuthor v JournPaper makes the
reference DMS insconsistent: on one hand, conference
papers are declared disjoint from journal papers, on
the other hand, by having authors they are inferred by the
extra constraint as being journal papers, therefore making
any data set including a conference paper inconsistent.
2.5 Reducing Data Storage: Illustration
Robust module-based DMSs offer an interesting peculiarity
w.r.t. data storage. Indeed, global data management is
performed on a data set that is distributed among the
module-based DMS and the reference one. Notably,
redundancy can occur in the distributed data set when
some same instances of the relations of interest are both
stored in the module-based DMS and stored—explicitly or
implicitly—in the reference DMS. Therefore, a way of
reducing data storage in a robust module-based DMS is to
store only data that are not already somehow stored in the
reference DMS. This can be easily checked by asking queries
to this DMS.
For instance, consider a robust version of the module
O0 that we safely personalize with the constraints
hasContactAuthor v hasAuthor and (funct hasContact
Author) stating that having a single contact author is a
particular case of having an author. If the purpose of our
DMS is only to store the contact authors for the journal
papers of the reference DMS, the corresponding modulebased
DMS with minimal storage is depicted in Fig. 3. In
particular, nothing is stored in the local tables JournPaper
and hasAuthor for the sake of nonredundancy: doi2 and
doi3 are not stored locally in JournPaper because doi2 is
explicitly stored in JournPaperref and doi3 is implicitly
stored in JournPaperref since it is explicitly stored in
Surveyref and Survey v JournPaper holds in O.
It is worth noticing that distributing data storage may be
done in order to optimize the distributed evaluation of
queries on relations of interest over the module-based DMS
and the reference one. For instance, consider the query
asking for the authors of journal papers whose contact
author is Rachel Pottinger: QðxÞ: 9y JournPaperðyÞ ^
hasAuthorðy; xÞ ^ hasContactAuthorðy; ’’RP’’Þ. The possible
paper identifiers for whom Rachel Pottinger is the
contact author are obtained locally from the local table
hasContactAuthor: here, we obtain doi3 only. Verifying that
doi3 is a journal paper and getting its authors is not done
locally but by querying the reference DMS. In particular, the
fact that doi3 is a journal paper is obtained by inference from
the storage of doi3 in the reference table Surveyref and from
the constraint Survey v JournPaper declared in the reference
schema. Sharing data storage this way avoids
redundancy in the distributed evaluation of the query Q
over the module-based DMS and the reference one.
3 DL-LITE DATA MODEL
Generally speaking, in DLs [1], a schema is called a Tbox
and its associated data set is called an Abox.
A Tbox T is defined upon a signature (a.k.a. vocabulary),
denoted sigðT Þ, which is the disjoint union of a set of unary
relations called atomic concepts and a set of binary relations
called atomic roles. It consists of a set of constraints called
terminological axioms, typically inclusion constraints between
complex concepts or roles, i.e., unary or binary DL
formulae built upon atomic relations using the constructors
allowed in DL under consideration.
An Abox defined upon sigðT Þ is a set of facts called
assertional axioms, relating DL formulae to their instances.
A knowledge base (KB) K ¼ hT ; Ai is made of a Tbox T and
an Abox A. The legal KBs vary according to the DL used to
express terminological and assertional axioms, and to the
restrictions imposed on those axioms.
In this paper, we focus on DL-lite KBs.
3.1 DL-Lite KBs
In DL-lite, the concepts and roles that can be built from
atomic concepts and atomic roles are of the following form:
B ! A j 9R, C ! B j :B, R ! P j P, E ! R j :R where A
denotes an atomic concept, P an atomic role, and P the inverse
of P. B denotes a basic concept (i.e., an atomic concept A or an
unqualified existential quantification on a basic role 9R) and R a
basic role (i.e., an atomic role P or its inverse P). Finally, C
denotes a general concept (i.e., a basic concept or its negation)
and E a general role (i.e., a basic role or its negation).
The (set) semantics of concepts and roles is given in
terms of interpretations. An interpretation I ¼ ðI; :I Þ consists
of a nonempty interpretation domain I and an
interpretation function :I that assigns a subset of I to each
atomic concept, and a binary relation over I to each
atomic role. The semantics of nonatomic concepts and
nonatomic roles are defined as follows:
. ðPÞI ¼ fðo2; o1Þ j ðo1; o2Þ 2 PIg,
. ð9RÞI ¼ fo1 j 9o2 ðo1; o2Þ 2 RIg, and
. ð:BÞI ¼ InBI and ð:RÞI ¼ I InRI .
The axioms allowed in a Tbox of DL-lite2 are concept
inclusion constraints of the form B v C, role inclusion
constraints of the form R v E, and functionality constraints
on roles of the form ðfunct RÞ. It is important to note that
general concepts or roles (i.e., with negation) are only
allowed on the right hand side of inclusion constraints,
whereas only basic concepts or roles (i.e., without negation)
occur on the left hand side of such constraints. Moreover,
only basic roles occur in functionality constraints.
GOASDOUE AND ROUSSET: ROBUST MODULE-BASED DATA MANAGEMENT 651
Fig. 3. A nonredundant robust module-based DMS defined by the
schema O00 and the data set D00.
2. These axioms actually correspond to the dialect DL-liteFR of DL-lite.
Inclusions of the form B1 v B2 or R1 v R2 are called
positive inclusions (PIs), while inclusions of the form B1 v
:B2 or of the form R1 v :R2 are called negative inclusions
(NIs). PIs allow expressing inclusion dependencies, while
NIs and functionalities allow expressing integrity constraints.
For instance, in Fig. 1, the PI 9hasTitle v
Publication expresses that only publications have titles,
the NI ConfPaper v :JournPaper expresses that conference
proceeding papers and journal papers must be disjoint, and
the functionality ðfunct hasTitleÞ expresses the usual
(relational) functional dependency from the first attribute
of hasTitle to the second one.
An interpretation I ¼ ðI; :I Þ is a model of an inclusion
B v C (resp. R v E) if BI CI (resp. RI EI ). It is a model
of a functionality constraint ðfunct RÞ if the binary relation RI
is a function, i.e., ðo; o1Þ 2 RI and ðo; o2Þ 2 RI implies
o1 ¼ o2. I is a model of a Tbox if it is a model of all of its
constraints. A Tbox is satisfiable if it has a model. A Tbox T
logically entails (a.k.a. implies) a constraint , written T ,
if every model of T is a model of . For instance, in Fig. 1,
the PI JournPaper v 9hasAuthor is implied by O due to the
presence of JournPaper v Publication and Publication v
9hasAuthor in O. Finally, a Tbox T logically entails (a.k.a.
implies) a Tbox T 0, written T T0 , if every model of T is a
model of T 0; and two Tboxes T and T 0 are logically
equivalent, written T T0 , iff T T0 and T 0 T .
An Abox consists of a finite set of membership
assertions of the form AðaÞ and Pða; bÞ, i.e., on atomic
concepts and on atomic roles, stating, respectively, that a
is an instance of A and that the pair of constants ða; bÞ is
an instance of P. The interpretation function of an
interpretation I ¼ ðI; :IÞ is extended to constants by
assigning to each constant a a distinct object aI 2 I
(i.e., the so called unique name assumption holds). An
interpretation I is a model of the membership assertion AðaÞ
(resp. Pða; bÞ) if aI 2 AI (resp., ðaI; bIÞ 2 PI ). It is a model of
an Abox if it satisfies all of its assertions.
An interpretation I is a model of a KB K ¼ hT ; Ai if it is a
model of both T and A. A KB K is satisfiable (a.k.a.
consistent) if it has at least one model. A KB K logically entails
(a.k.a. implies) a constraint or assertion , written K , if
every model of K is a model of . For instance, in Fig. 1, the
KB hO; Di of the DMS implies JournPaperðdoi2Þ due to
Survey v JournPaper in O and Surveyðdoi2Þ in D.
Observe that any KB can be written equivalently as a first
order logic (FOL) KB and a relational database following
the open-world assumption (OWA). The correspondences for
Tbox constraints are summarized in Fig. 4 for PIs, in Fig. 5
for NIs, and in Fig. 6 for functionalities. As for Abox
assertions, they are simply FOL facts (i.e., ground atoms)
and instances for atomic concepts and roles. Remark that, in
constrast with usual databases following the closed-world
assumption (CWA), databases following OWA have incompletely
specified data and their schemas (i.e., their constraints)
are used to infer implicit facts, in addition to detect
data inconsistencies. For example, we have seen above that
hO; Di JournPaperðdoi2Þ, while doi2 is not explicitly
stored in D in Fig. 1. That is, doi2 is an implicit instance of
JournPaper in D. This peculiarity has an important impact
on most of the classical data management tasks like
consistency checking and query answering [13].
3.2 Queries over a KB
A query q is of the form qðxÞ: 9y ðx; yÞ where ðx; yÞ is a
FOL formula, the variables of which are only the free
variables x and the bound variables y, and the predicates of
652 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 3, MARCH 2013
Fig. 4. DL-lite PI axioms in FOL and relational notations. For the relational notation, which corresponds to unary and binary inclusion dependencies,
we assume that the first and second attributes of any atomic role are named 1 and 2, respectively.
Fig. 5. DL-lite NI axioms in FOL and relational notations. For the relational notation, which corresponds to exclusion/disjointness dependencies, we
assume that the first and second attributes of any atomic role are named 1 and 2, respectively. We also assume that ? the empty relation.
Fig. 6. DL-lite functionality axioms in FOL and relational notations. For
the relational notation, which corresponds to functional dependencies,
we assume that the first and second attributes of any atomic role are
named 1 and 2, respectively.
which are either atomic concepts or roles of the KB. The arity
of a query is the number of its free variables, e.g., 0 for a
boolean query. When ðx; yÞ is of the form conjðx; yÞ where
conjðx; yÞ is a conjunction of atoms, q is called a conjunctive
query. Conjunctive queries, a.k.a. select-project-join queries,
are the core database queries.
Given an interpretation I ¼ ðI; :IÞ, the semantics qI of a
boolean query q is defined as true if ½ð;; yÞI ¼ true, and
false otherwise, while the semantics qI of a query q of arity
n 1 is the relation of arity n defined on I as follows:
qI ¼ fe 2 ðIÞn j ½ðe; yÞI ¼ trueg. An interpretation that
evaluates a boolean query to true, respectively, a nonboolean
query to a non empty set, is a model of that query.
3.3 Answer Set of a Query over a KB
Let q be a query over a KB K ¼ hT ; Ai.
If q is nonboolean, the answer set of q over K is defined
as: ansðq; KÞ ¼ ft 2 Cn j K qðtÞg where C is the set of
constants appearing in the KB, qðtÞ is the closed formula
obtained by replacing in the query definition the free
variables in x by the constants in t, and K qðtÞ means as
usual that every model of K is a model of qðtÞ.
If q is boolean, the answer set of q over K is by
convention either ftrueg or ffalseg: ansðq; KÞ ¼ ftrueg if
and only if K qðÞ, i.e., every model of K is a model of qðÞ.
This corresponds to the so-called certain answers semantics
requiring that an answer to a query, given a set of
constraints (expressed here as a Tbox), to be an answer in
all the models satisfying the constraints.
3.4 FOL-Reducibility of Data Management
The DL-Lite family [11] has been designed so that data
management is FOL-reducible. This property allows reducing
a data management task over a KB hT ; Ai to the evaluation
against A only of a FOL query computed using T only.
The main idea of FOL-reducibility is to be able to
perform a data management task in two separate steps: a
first reasoning step that produces the FOL query and a
second step which evaluates that query in a pure relational
fashion. Indeed, FOL queries can be processed by SQL
engines, thus taking advantage of well-established query
optimization strategies supported by standard relational
database management systems.
In fact, FOL-reducibility of data management holds in DLlite
only if we forbid functionality constraints on roles
involved in right-hand sides of role inclusion constraints.3
For instance, in Fig. 3, having hasContactAuthor v
hasAuthor and ðfunct hasContactAuthorÞ in O00 is legal,
while having hasContactAuthor v hasAuthor and (funct
hasAuthor would be illegal. In the following, we only
consider DL-lite Tboxes and KBs in which this restriction
holds. Note that, as shown in [11], if we do not impose the
above restriction on DL-lite, instance checking (a particular
case of query answering) is P-complete in data complexity,
ruling out FOL-reducibility of query answering since data
complexity of answering a FOL query is inAC0
P [13], [14].
3.4.1 Consistency
It has been shown in [11] that given a Tbox T , it is always
possible to construct a FOL query qunsat such that
ansðqunsat; AÞ ¼ ftrueg4 iff the KB hT ; Ai is inconsistent,
for any Abox A associated with T .
Building the qunsat query relies on the computation of the
IC-closure of T , i.e., the set of the integrity constraints (NI or
functionality constraints) that are implied by T : each
constraint in the IC-closure is transformed into a conjunctive
boolean query looking for counterexamples to it; qunsat is the
union of these unsat queries. The transformation of the
integrity constraints into unsat queries corresponds in fact to
their negation and is summarized in Fig. 7 for NIs and in Fig. 8
for functionalities. For instance, the unsat query corresponding
to the negation of the NI JournPaper v :ConfPaper is
qðÞ: 9xJournPaperðxÞ ^ ConfPaperðxÞ.
3.4.2 Query Answering
It has been shown in [11] that given a Tbox T and for any
query q built upon atomic concepts and roles of T , it is
always possible to construct a FOL query qrew (called its
perfect rewriting) such that ansðq; hT ; AiÞ ¼ ansðqrew; AÞ for
any Abox A associated with T .
Notably, [11] provides the PerfectRefðq;T Þ algorithm
which computes the perfect rewriting qrew of q using—PIs
of—T only (i.e., independently of A), which is a union of
conjunctive queries built upon atomic concepts and roles
of T .
4 MODULE-BASED DATA MANAGEMENT
The main idea underlying the notion of module of a Tbox is
to capture some constraints of the Tbox, including all the
(implied) constraints built upon a given signature, denoted
the signature of interest.
Our definition of module extends and encompasses the
existing definitions. In contrast with [3], [4], [7], [9], we do
not impose modules of a Tbox to be subsets of it. For a
module to capture some constraints of the Tbox, it is indeed
sufficient to impose that it is logically entailed by the Tbox.
In contrast with [5], [6], [8], [10], we do not impose the
signature of modules to be restricted to the signature of
interest. In fact, as we have shown through the illustrative
example, the robustness properties may enforce the
signature of modules to contain additional relations that
GOASDOUE AND ROUSSET: ROBUST MODULE-BASED DATA MANAGEMENT 653
3. This corresponds to the dialect DL-liteA of DL-lite.
4. By a slight abuse of notation, we denote hereinafter the answer set
ansðq; h;; AiÞ of a query q over a KB h;; Ai by ansðq; AÞ, which corresponds
exactly to the standard relational evaluation of q against the relational
database A.
Fig. 7. From NI axioms to unsat queries.
Fig. 8. From functionality axioms to unsat queries.
are not relations of interest but that are logically related to
them.
Definition 1 (Module). Let T be a Tbox and sigðT Þ a
signature of interest. A module of T w.r.t. is a Tbox T
such that:
. sigðT Þ sigðT Þ,
. T T, and
. for any Tbox constraint built upon , T iff
T .
Notations. For distinguishing the relations of interest in the
signature of a module T from those possibly imported from
the reference Tbox for robustness purposes, we use the
following notations: r denotes a relation of interest (i.e., in
), while rref denotes a relation of the reference Tbox.
We denote sigþðT Þ the set difference between the
signature of T and , i.e., the set of relations r1
ref ; . . . ; rk
ref
of the reference Tbox that are involved in constraints of the
module T . Later on, we will denote rmod the novel relations
that may be added to the signature of a module for
personalization purposes.
Example (continued). Consider the reference Tbox O of the
running example. Consider the signature of interest
¼ fJournPaper; HasAuthorg. Let T 1
and T 2
be the
following Tboxes: T 1
¼ fJournPaper v 9hasAuthorg
and T 2
¼ fJournPaper v 9hasAuthor; JournPaper v
:ConfPaperrefg.
T 1
and T 2
are both modules of O w.r.t. : they use
relations from O only; without being subsets of O, they
are implied by O (JournPaper v 9hasAuthor is implied
by JournPaper v Publicationref and Publicationref v
9hasAuthor in O, and JournPaper v :ConfPaperref is
equivalent to ConfPaperref v :JournPaper in O); the
only constraint built upon that is implied by O is
JournPaper v 9hasAuthor, which is in both T 1
and T 2

(and thus implied by them).
Here, sigþðT 1
Þ ¼ ; since T 1
involves only relations of
interest (i.e., in ), while sigþðT 2
Þ ¼ fConfPaperrefg.
It is worth noticing that, as the above example shows, a
module of a Tbox w.r.t. a signature of interest may not be
unique.
4.1 Robust Module-Based Data Management
We define now the two notions of robustness for modules
that have been illustrated in Section 2.
Notations. From now on, A=sig denotes the restriction of an Abox
A to the assertions of A built upon the signature sig only.
4.1.1 Robustness to Consistency Checking
A module of a Tbox T w.r.t. a signature is robust to
consistency checking (Definition 2) if it contains enough
constraints to detect data inconsistency due to .
Definition 2 (Robustness to consistency checking). Let T
be a module of a Tbox T w.r.t. a signature sigðT Þ. T is
robust to consistency checking iff for every Abox A built
upon and for every Abox A
built upon sigðT Þn that is
consistent with T : hT ;A [A
i is consistent iff hT ; ðA [
A
Þ=sigðT Þi is consistent.
Example (continued). Consider again the above two
modules T 1
and T 2
of O w.r.t. ¼ fJournPaper;
hasAuthorg.
Neither T 1
nor T 2
is robust to consistency checking,
as shown by the following counterexample. Let T be
either T 1
or T 2
, A be fPublicationðaÞg, and A
be
fFullPaperrefðaÞg. A
=sigðT Þ is empty, thus ðA [
A
Þ=sigðT Þ ¼ fPublicationðaÞg, so hT ; ðA [A
Þ=sigðT Þi
is consistent. However, hO;A [A
i is inconsistent:
FullPaperref v :JournPaper is implied by
FullPaperref v ConfPaperref
and ConfPaperref v :JournPaper in O, and violated by
ðA [A
Þ ¼ fJournPaperðaÞ; FullPaperrefðaÞg.
The algorithms and results of Section 5 will allow
checking that the following T 3
is robust to consistency
checking:
T 3
¼ fJournPaper v 9hasAuthor;
JournPaper v :ConfPaperref ;
JournPaper v :FullPaperref ;
JournPaper v :ShortPaperref ;
Surveyref v JournPaperg:
Theorem 1 shows that global consistency (see Definition 3)
of a module-based DMS built from a module robust to
consistency checking can be verified by restricting the
construction of the unsat queries to the integrity constraints
implied by the module, and then by evaluating their union
against the data set distributed among the module-based
DMS and the reference one.
Definition 3 (Global consistency). Let hT ; Ai and hT 0;A0i be
two consistent KBs. hT 0;A0i is globally consistent w.r.t.
hT ; Ai iff hT [ T 0;A[A0 i is consistent.
In the following, we consider the global consistency of a
module-based KB hT ;A0i w.r.t. a reference KB hT ; Ai
where T is a module of T . From now on, if it is clear from
the context, we omit to mention the reference KB when we
refer to global consistency.
Theorem 1 (Global consistency checking). Let hT ; Ai and
hT ;A0i be consistent KBs such that T is a module robust to
consistency checking of T w.r.t. a signature , and A0 is an
Abox built upon . Let qunsatðT Þ be the union of unsat
queries obtained from the IC-closure of T . Then: hT ;A0i is
globally consistent iff ansðqunsatðT Þ;A=sigðT Þ [A0Þ ¼
ffalseg.
Proof. We first show, based on the standard first order
semantics of DL-lite, that: hT ;A[A0i is consistent iff
hT ;A=sigðT Þ [A0i is consistent. For one direction, this
follows from: hT ;A[A0 i hT ; ðA [ A0Þ=sigðT Þi with
ðA [ A0Þ=sigðT Þ ¼ A=sigðT Þ [A0. For the converse direction,
hT ;A=sigðT Þ [A0i is consistent, so hT ; ðA [
A0Þ=sigðT Þi is also consistent, since ðA [ A0Þ=sigðT Þ ¼
A=sigðT Þ [A0. Now, consider the subset ðA [ A0Þ
of A[
A0 built upon sigðT Þn, and its complement ðA [ A0Þ.
654 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 3, MARCH 2013
We therefore get: hT ; ððA [ A0Þ [ ðA[A0Þ
Þ=sigðT Þi is
consistent. By construction, ðA [ A0Þ
A. Since the
KB hT ; Ai is consistent and hT ; Ai hT ; ðA [ A0Þ
i,
hT ; ðA [ A0Þ
i is also consistent. As a result, Definition 2
applies and we get: hT ; ðA [ A0Þ [ ðA[A0 Þ
i is consistent,
i.e., hT ;A[A0i is consistent.
Theorem 1 then follows from the two-step process
for checking consistency in a DL-liteKB hT ; Ai: computation
of the IC-closure of the Tbox T and translation of
each IC in the IC-closure into a conjunctive unsat query;
the KB is consistent iff the union qunsat of all the unsat
queries is evaluated to false against A seen as a
relational database. Since checking the global consistency
(i.e., whether hT ;A[A0i is consistent) is equivalent
to checking the consistency of hT ;A=sigðT Þ [A0i, as
shown above, it can be done by evaluating the union of
unsat queries obtained from the IC-closure of T
(instead of T ) against the distributed data set A=sigðT Þ [
A0 (instead of A[A0). tu
Corollary 1 provides a simple way to optimize global
consistency checking by evaluating a necessary and sufficient
subset of the unsat queries obtained from the IC-closure of a
module robust to consistency checking. It directly follows
from the fact that in Theorem 1 each KB hT ; Ai and hT ;A0i
taken in isolation is consistent. That is, the only (implied)
ICs that can be violated are those requiring the two Aboxes
in order to be checked.
Corollary 1. Let hT ; Ai and hT ;A0i be consistent KBs such that
T is a module robust to consistency checking of T w.r.t. a
signature , and A0 is an Abox built upon . Then: hT ;A0i is
globally consistent iff ansðqopt
unsatðT Þ;A=sigðT Þ [A0Þ ¼
ffalseg, where qopt
unsatðT Þ is the union of the unsat queries
obtained from the IC-closure of T , without considering the
integrity constraints involving relations from sigþðT Þ only
(i.e., of the form rref ).
Example (continued). The NI Surveyref v :FullPaperref is
in the IC-closure of the module T 3
: it is implied by
Surveyref v JournPaper and
JournPaper v :ConfPaperref
in T 3
. However, the corresponding unsat query is
useless to check global consistency (as it would check
the consistency of the reference DMS which is known to
be consistent), and can be discarded from the unsat
queries to evaluate.
A global inconsistency may arise from an update of the
reference DMS or of the module-based DMS. Using the
above results, such a global inconsistency can be checked on
demand or on a periodical basis.
In addition, global consistency can also be checked upon
each module-based DMS’s update. Corollary 2 provides a
way to simplify the unsat queries to be posed to the
reference DMS, in function of the update.
In the following, we assume w.l.o.g. that an update adds
or modifies a single tuple in the module-based DMS. The
generalization to an update involving several tuples is
obvious. Observe that deletions are not considered here
since they cannot lead to a global inconsistency. Indeed, DLlite,
as a FOL language, is monotonic.
Definition 4 (Unsat query relevant to update). Let T be a
module defined w.r.t. a signature . Let A and P be an atomic
concept and an atomic role in , respectively, and A0
ðrefÞ and
QðrefÞ, respectively, be an atomic concept and an atomic role in
either or sigþðT Þ.
An unsat query is relevant to an update AðaÞ iff it is of one
of the following forms:
. qðÞ: A0
ðrefÞðaÞ s.t. either A v :A0
ðrefÞ or A0
ðrefÞ v :A
is in the IC-closure of T ,
. qðÞ: 9Y QðrefÞða; Y Þ s.t. either A v :9QðrefÞ or
9QðrefÞ v :A is in the IC-closure of T , and
. qðÞ: 9Y QðrefÞðY ;aÞ s.t. either A v :9Q
ðrefÞ or
9Q
ðrefÞ v :A is in the IC-closure of T .
An unsat query is relevant to an update Pða; bÞ iff it is of
one of the following forms:
. qðÞ: QðrefÞða; bÞ s.t. either P v :QðrefÞ or QðrefÞ v
:P, or P v :Q
ðrefÞ or Q
ðrefÞ v :P, is in the ICclosure
of T ,
. qðÞ: QðrefÞðb; aÞ s.t. either P v :Q
ðrefÞ or Q
ðrefÞ v
:P, or P v :QðrefÞ or QðrefÞ v :P, is in the ICclosure
of T ,
. qðÞ: 9Y QðrefÞða; Y Þ s.t. either 9P v :9QðrefÞ or
9QðrefÞ v :9P is in the IC-closure of T ,
. qðÞ: 9Y QðrefÞðY ;aÞ s.t. either 9P v :9Q
ðrefÞ or
9Q
ðrefÞ v :9P is in the IC-closure of T ,
. qðÞ: 9Y QðrefÞðY ; bÞ s.t. either 9P v :9Q
ðrefÞ or
9Q
ðrefÞ v :9P is in the IC-closure of T ,
. qðÞ: 9Y RðrefÞðb; Y Þ s.t. either 9P v :9QðrefÞ or
9QðrefÞ v :9P is in the IC-closure of T ,
. qðÞ: 9YPða; Y Þ ^ Y 6¼ b s.t. ðfunct PÞ is in the ICclosure
of T , and
. qðÞ: 9YPðY ; bÞ ^ Y 6¼ a s.t. ðfunct PÞ is in the ICclosure
of T .
Corollary 2 (Global consistency upon local update). Let
hT ; Ai and hT ;A0i be consistent KBs such that T is a
module robust to consistency checking of T w.r.t. a signature
, and A0 is an Abox built upon . Let be an update of A0
and ðA0Þ its result. Let qunsatð; T Þ be the union of unsat
queries relevant to . Then: hT ; ðA0Þi is globally consistent
iff ansðqunsatð; T Þ;A=sigðT ÞÞ ¼ ffalseg.
Example (continued). Consider the update that amounts to
adding JournPaperðdoi1Þ to a module-based DMS built
from the module T 3
. Checking the global consistency
upon this update can be done by asking the union of the
unsat queries built from the IC-closure of T 3
that are
relevant to the update: q1ðÞ :- ConfPaperrefðdoi1Þ, q2ðÞ :-
FullPaperrefðdoi1Þ, and q3ðÞ :- ShortPaperrefðdoi1Þ.
4.1.2 Robustness to Query Answering
A module of a Tbox T defined w.r.t. a signature is robust
to query answering if it contains enough knowledge to
compute the answer set of any query built upon asked to
any Abox associated with T .
GOASDOUE AND ROUSSET: ROBUST MODULE-BASED DATA MANAGEMENT 655
Definition 5 (Robustness to query answering). Let T be a
module of a Tbox T w.r.t. a signature sigðT Þ. T is
robust to query answering iff for every Abox A consistent
with T and for every query q built upon :
ansðq; hT ; AiÞ ¼ ansðq; hT ;A=sigðT ÞiÞ.
Example (continued). The algorithms and results of Section 5
will enable to check that the following module is robust to
query answering:
T 4
¼ fJournPaper v 9hasAuthor;
Publicationref v 9hasTitleref ;
Publicationref v 9hasAuthor;
9hasTitleref v Publicationref ;
ConfPaperref v Publicationref ;
JournPaper v Publicationref ;
ShortPaperref v ConfPaperref ;
FullPaperref v ConfPaperref ;
Surveyref v JournPaperg:
However, the module T 3
is not robust to query
answering, as shown by the following counterexample.
Consider the Abox A ¼ fConfPaperrefðaÞg for the reference
Tbox O and the query qðxÞ: 9y hasAuthorðx; yÞ
built upon . a 2 ansðq; hT ; AiÞ because ConfPaperref v
9hasAuthor is implied by O, but ansðq; hT 3
;A=sigðT 3
ÞiÞ ¼
; since, although
ConfPaperrefðaÞ 2 A=sigðT 3
Þ; ConfPaperref v 9hasAuthor
is not implied by T 3
.
Theorem 2 shows that if a module-based DMS is built
from a module robust to query answering, global query
answering (see Definition 6) can be done by computing the
perfect rewriting of the query using the module only, and
then by evaluating it against the data set distributed among
the module-based DMS and the reference one.
Definition 6 (Global answer set). Let hT ; Ai and hT 0;A0i be
consistent KBs, and q a query over hT 0;A0i. The global answer
set of q w.r.t. hT ; Ai, denoted global ansðq; hT ; Ai; hT 0;A0iÞ,
is: ansðq; hT [ T 0;A[A0 iÞ.
In the following, we consider the global answer sets of
queries over a module-based KB hT ;A0i and w.r.t. a
reference KB hT ; Ai where T is a module of T . From now
on, if it is clear from the context, we omit to mention the
reference KB when we refer to global answer sets.
Theorem 2 (Global query answering). Let hT ; Ai and
hT ;A0i be KBs such that T is a module of T w.r.t.
which is robust to query answering, and A0 is built upon .
Let q be a conjunctive query built upon and qrewðT Þ its
perfect rewriting w.r.t. T . If hT ;A0i is globally consistent,
the n : global ansðq; hT ; Ai; hT ; A0iÞ ¼ ansðqrewðT Þ;
A=sigðT Þ [A0Þ.
Proof. By definition, global ansðq; hT ; Ai; hT ;A0iÞ ¼ ansðq;
hT [ T ;A[A0iÞ ¼ ansðq; hT ;A[A0 iÞ since T T.
From the global consistency of hT ;A0i, we get the
consistency of hT [ T ;A[A0i from Definition 3, i.e., of
hT ;A[A0i. Definition 5 now applies and we get:
ansðq; hT ;A[A0iÞ ¼ ansðq; hT ; ðA [ A0Þ=sigðT ÞiÞ
¼ ansðq; hT ;A=sigðT Þ [A0iÞ;
since ðA [ A0Þ=sigðT Þ ¼ A=sigðT Þ [A0.
Theorem 2 then follows from the fact that in DL-lite
ansðq; hT ;A=sigðT Þ [A0iÞ can be obtained by evaluating
the perfect rewriting qrewðT Þ of q w.r.t. T against the
distributed data set A=sigðT Þ [A0. tu
Observe that Theorem 2 applies when a module-based
DMS is globally consistent. From a practical viewpoint, this
means that a module robust to query answering should also
be robust to consistency checking, so that global consistency
can be effectively checked by a module-based DMS. In
general, Theorem 2 does not hold when a module-based
DMS is globally inconsistent. The reason is that any
tuple—of the arity of a query—built from the Aboxes of
the reference and module-based DMSs (i.e., A[A0) is then
an answer, while it may not be possible to exhibit every
such tuple at the module-based DMS level which only has a
partial view of those two Aboxes (i.e., A=sigðT Þ [A0).
Example (continued). Consider again the module T 4
of O
w.r.t. ¼ fJournPaper; hasAuthorg that is robust to
query answering. Let the data set D in Fig. 1 be the
Abox A associated with O. Let A0 ¼ fJournPaperðdoi1Þg
be the Abox associated with T 4
. hT 4
;A0i is globally
inconsistent since doi1 cannot be at the same time both a
journal paper and a (full) paper in conference proceedings.
As a result, ’’1998’’ (like any other constant in A) is
a global answer to the query qðxÞ: JournPaperðxÞ.
However, that answer cannot be obtained from, as
’’1998’’ is not in, A=sigðT 4
Þ [A0.
4.2 Safe Personalization in Module-Based Data
Management
We investigate now how the personalization of a module—
through updates—can preserve its global data management
skills. Indeed, it is often necessary to personalize an
extracted module so that it fully copes with the new
application needs.
Definition 7 (Personalization of a module). Let T be a Tbox
and let T be a module of T w.r.t. sigðT Þ. A Tbox T 0 is a
personalization of T iff T 0 is obtained from T after a
sequence of insertions or deletions of Tbox constraints (i.e.,
PIs, NIs, or functionalities).
Definitions 8 and 9 exhibit safeness conditions both at the
Tbox and the Abox levels of a module-based DMS.
Theorems 3 and 4 then show that these conditions are
sufficient to perform global data management of any safe
instance of a safe personalization of a module.
A personalization of a module is safe provided that: first,
the updates cannot involve atomic concepts and roles of the
reference DMS’s Tbox other than those in the signature of
the module; second, the updates must conform with the
reference DMS’s Tbox; third, the resulting updated Tbox
must be a module of the reference DMS’s Tbox with the
same robustness(es) as the personalized module.
656 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 3, MARCH 2013
Definition 8 (Safe personalization of a module). Let T be a
module of a Tbox T w.r.t. sigðT Þ. A personalization T 0 of
T is safe w.r.t. T iff
1. sigðT Þ \ ðsigðT 0ÞnsigðT ÞÞ ¼ ;,
2. T is a module of T [T0 w.r.t. sigðT Þ, and
3. T 0 is a module ofT [T0 w.r.t. sigðT 0ÞnsigþðT Þ with
the same robustness(es) as T .
Example (continued). Consider the personalization T 0 of
the module T 4
obtained by adding the constraints (thus
the novel atomic role hasRightOnmod): hasAuthor v
hasRightOn
mod and 9hasRightOn
mod v JournPaper, stating
that authors have rights on their journal papers.
T 0 is not a safe personalization of T 4
w.r.t. Obecause O
is not a module of O [ T 0: 9hasAuthor v JournPaper
(which is built upon the signature of O) is indeed implied
by O [ T 0—actually by the above two constraints—but
not by O alone, thus does not conform with O.
In contrast, the algorithms and results of Section 5 will
enable to check that T 00, obtained by adding to T 4
the
constraint hasAuthor v hasRightOn
mod, is a safe personalization
of T 4
w.r.t. O.
A safe instance of a personalization T 0 of a module T is
an Abox built only upon the relations of that remain in T 0
after personalization of T , plus the novel relations
introduced in T 0 by personalization of T .
Definition 9 (Safe instance). Let T be a module of a Tbox T
w.r.t. sigðT Þ. Let T 0 be a personalization of T . A safe
instance of T 0 is an Abox built upon sigðT 0ÞnsigþðT Þ, where
sigþðT Þ is the set difference between the signature of T and .
Example (continued). Consider the above safe personalization
T 00 of T 4
w.r.t. O.
The Abox A1 is an instance of T 00 that is not safe
because it states information on the relation
Publicationref which is in sigþðT 4
Þ:
A1 ¼ fPublicationref ðaÞ; hasContactAuthormodða; cÞg:
In contrast, the Abox A2 is a safe instance of T 0:
A2 ¼ fJournPaperðaÞ; hasContactAuthormodða; cÞg:
Theorems 3 and 4 result directly from Definitions 8 and 9
by adapting accordingly the proofs of Theorems 1 and 2.
Theorem 3 (Safe global consistency checking). Let hT ; Ai
and hT 0;A0i be consistent KBs, such that T 0 is a safe
personalization of a module T of T w.r.t. , T being robust
to consistency checking, and A0 is a safe instance of T 0. Let
qunsatðT 0Þ be the union of unsat queries obtained from the ICclosure
of T 0. Then: hT 0;A0i is globally consistent iff
ansðqunsatðT 0Þ;A=sigðT 0Þ [A0Þ ¼ ffalseg.
Theorem 4 (Safe global query answering). Let hT ; Ai and
hT 0;A0i be KBs, such that T 0 is a safe personalization of a
module T of T w.r.t. , T being robust to query answering,
and A0 is a safe instance of T 0. Let q be a query built upon
sigðT 0ÞnsigþðT Þ and qrewðT 0Þ its perfect rewriting w.r.t. T 0.
If hT 0;A0i is globally consistent, then:
global ansðq; hT ; Ai; hT 0;A0iÞ ¼ ansðqrewðT 0Þ;A=sigðT 0Þ [A0Þ:
Remark 1. Similarly to Corollary 1, safe global consistency
checking can be optimized by evaluating the unsat queries
obtained from a necessary and sufficient subset of the ICclosure
of the personalized module: those that are not built
upon relations of the reference DMS only (i.e., of the form
rref ), and those built only upon novel relations resulting
from personalization (i.e., of the form rmod).
Remark 2. Corollary 2 can also be adapted to safe global
consistency upon update, provided a slight extension of the
definition of an unsat query relevant to an update
(Definition 4): updates now concern both atomic concepts
and roles of the form A or Amod, and P or Pmod,
respectively; and unsat queries relevant to an update are
defined from the minimal subset of ICs for checking safe
global consistency (Remark 1).
4.3 Optimizing Practical Module-Based Data
Management Using Minimal Modules
In general, several (possibly personalized) modules of a
Tbox may exist for a same signature of interest. Notably, a
module always exists since a Tbox is a module of itself
(robust to both consistency checking and query answering).
To compare the existing modules for a given Tbox and a
given signature, we define minimal modules based on the
notions of syntactic minimality and of semantic minimality.
Syntactic minimality deals with redundancy within a
module, while semantic minimality deals with the amount
of useless extra-knowledge captured within a module w.r.t.
the given signature and expected robustness(es).
Definition 10 (Minimal module). Let T 0 be a (safely
personalized) module of a Tbox T w.r.t. , possibly robust to
query answering and/or consistency checking.
. T 0 is syntactically minimal iff any strict subset of T 0
is not equivalent to T 0.
. T 0 is semantically minimal iff for any (safely and
identically personalized) module T 00 of T w.r.t. with
the same robustness(es): if T 0 T 00 then T 00 T 0.
. T 0 is minimal iff it is both syntactically and
semantically minimal.
It is worth noticing that minimal modules are desirable,
since nonminimality induces useless extra computation in
practical module-based DMSs (e.g., QuOnto5), as illustrated
below.
Example (continued). Consider again the module T 3
that is
robust to consistency checking. T 3
is not—semantically—
minimal, as the constraint Surveyref v JournPaper is
neither required by a module w.r.t. nor by its
robustness to consistency checking. The issue is that at
(global) consistency checking time, such a constraint is
used—by state-of-the-art algorithms (e.g., [11])—to compute
the IC-closure of the module, thus may lead to the
generation of useless ICs and to the construction and
evaluation of the corresponding unsat queries. Here, the
useless generated constraints would be:
Surveyref v :ConfPaperref ; Surveyref v :FullPaperref ;
and Surveyref v :ShortPaperref :
GOASDOUE AND ROUSSET: ROBUST MODULE-BASED DATA MANAGEMENT 657
5. http://www.dis.uniroma1.it/quonto/.
Now, consider again the module T 4
that is robust to
query answering. T 4
is not—syntactically—minimal, as
the constraint JournPaper v 9hasAuthor is redundant:
it is implied by JournPaper v Publicationref and
Publicationref v 9hasAuthor in T 4
. The issue is that at
(global) query answering time, such a constraint is
used—by state-of-the-art algorithms (e.g., [11])—to compute
the perfect rewriting of a query, leading to useless
computation. For instance, consider the query qðxÞ:
9y hasAuthorðx; yÞ. From q and Publicationref v
9hasAuthor, PerfectRefðq; T 4
Þ first produces q0ðxÞ:
Publicationref ðxÞ from which in turn it produces
q00ðxÞ: JournPaperðxÞ using
JournPaper v Publicationref :
This reformulation is redundant since from q and
JournPaper v 9hasAuthor, PerfectRefðq; T 4
Þ also produces
q00ðxÞ: JournPaperðxÞ.
5 ALGORITHMS FOR ROBUST MODULE-BASED
DATA MANAGEMENT
We provide here algorithms for extracting modules from a
DL-lite Tbox (Section 5.1), checking safe module personalization
(Section 5.2)—checking whether an instance is safe
is trivial—, and minimizing a module (Section 5.3).
5.1 Extracting Robust Modules
The ERM algorithm for extracting robust modules from a
given DL-lite Tbox (Algorithm 1) relies on the notion of
(deductive) closure of a Tbox.
Definition 11 (Closure of a Tbox). The closure of a Tbox T is
the Tbox clðT Þ made of every constraint (PI, NI, and
functionality) implied by T (i.e., clðT Þ ¼ f j T g).
The following theorem characterizes the main properties
of the closure of a Tbox that will be used later on.
Theorem 5 (Properties of Tbox closure). Let T be a DL-lite
Tbox.
1. The size of clðT Þ is at most quadratic in the size of
sigðT Þ.
2. Computing clðT Þ is polynomial in the size of sigðT Þ.
3. T and clðT Þ are equivalent and for any constraint
built upon sigðT Þ: T iff 2 clðT Þ.
Proof. The first item follows from the fact that the number
of atomic concepts, respectively, of atomic roles, is
bounded by the size #sigðT Þ of sigðT Þ. It follows that
the number of possible B v C in T is bounded by 18
#sigðT Þ2 since there are 3 #sigðT Þ possible Bs and 6
#sigðT Þ possible Cs, as a B is of the form A, 9P, or 9P,
while a C is of the form B or :B; the number of possible
R v E in T is bounded by 8 #sigðT Þ2 since there are
2 #sigðT Þ possible Rs and 4 #sigðT Þ possible Es, as
a R is of the form P or P, while a E is of the form R or
:R; the number of possible ðfunct RÞ is bounded by 2
#sigðT Þ since there are 2 #sigðT Þ possible Rs as
mentioned above. As a result, the size of clðT Þ is at most
quadratic in the size of sigðT Þ.
The second item follows from the fact that checking
whether a Tbox T implies a constraint is polynomial in
the size of T in DL-lite [11, Theorem 27], this size being
bounded by that of its closure (first item).
Finally, the third item trivially follows from Definition
11: T clðT Þ and T clðT Þ, thus clðT Þ T . tu
With the definition of closure in place, we provide the
Extract Robust Module (ERM) algorithm (Algorithm 1) that
builds a module of a Tbox w.r.t. a signature and the desired
robustness(es). The resulting module is semantically minimal
and is obtained by filtering the closure of the Tbox or
the Tbox itself for keeping the relevant constraints.
Algorithm 1. the ERM algorithm
ERMðT ; ; RQA;RCCÞ
Input: a DL-lite Tbox T , a signature sigðT Þ, two
booleans RQA and RCC
Output: a module T of T w.r.t. , which is semantically
minimal, robust to query answering if RQA ¼ true, and
robust to consistency checking if RCC ¼ true
(1) T ;
(2) foreach 2 clðT Þ
(3) if is built upon only
(4) T T [ fg
(5) else if RCC ¼ true and is a NI X v :Y s.t. X or Y is
built upon
(6) T T [ fg
(7) if RQA ¼ true
(8) sig ; T 0
;
(9) while T 6¼ T 0

(10) T 0
T ; sig0 sig
(11) foreach PI X v Y 2 T s.t. Y is built upon sig0
(12) T T [ fX v Y g
(13) sig sig [ sigX (for X built upon the signature
sigX)
(14) return T
Theorem 6 (Properties of the ERM algorithm). Let T be a
DL-lite Tbox and a subset of sigðT Þ. ERMðT ; ; RQA;
RCCÞ terminates and returns a module of T w.r.t. , which is
semantically minimal, robust to query answering if RQA ¼
true, and robust to consistency checking if RCC ¼ true.
Proof. Termination for lines 2-6 follows from the finiteness
of clðT Þ, and for lines 7-13 because in the worst case
T T, so T ¼ T 0
eventually.
T , the output of ERMðT ; ; RQA;RCCÞ, is a module
of Tbox T w.r.t. (Definition 1) since T clðT Þ and the
filtering condition of line 3.
If RCC ¼ true, robustness to consistency checking is
guaranteed by the lines 5-6, which precisely extract all
(and only) the implied ICs exhibited in Corollary 1.
If RQA ¼ true, robustness to query answering is
guaranteed by the lines 7-13, which precisely extract all
(and only) the PIs required for constructing the perfect
rewriting of any conjunctive query q built upon ,
i.e., those used by PerfectRefðq;T Þ ( [11, Lemma 39]).
Finally, if RCC ¼ false and RQA ¼ false, T is semantically
minimal since it is constructed from lines 1-4, thus
built upon only. If RCC ¼ true or RQA ¼ true,
658 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 3, MARCH 2013
minimality results from the fact that only the required
constraints are added by lines 5-6 and 7-13. tu
Corollary 3 states the complexity of computing (robust)
modules.
Corollary 3 (Complexity of module extraction). Let T be a
DL-lite Tbox and sigðT Þ. Computing a module of T w.r.t.
, possibly robust to query answering and/or consistency
checking, is polynomial in the size of sigðT Þ.
Proof. The worst computational cost is that of a module
robust to both consistency checking and query answering.
ERMðT ; ; true; trueÞ first computes a module robust to
consistency checking using lines 1-6. The closure of T
must be computed and then its constraints are filtered.
This can be done in polynomial time in the size of sigðT Þ
according to Theorem 5. Then, ERMðT ; ; true; trueÞ filters
T for robustness to query answering. In the worst case,
according to Theorem 5, we have a number of iterations of
the while loop that is at most quadratic in the size of
sigðT Þ (since T clðT Þ), each of which has to filter T
whose size is at most quadratic in the size of sigðT Þ. As a
result, the worst case computation of a module of T w.r.t.
is polynomial in the size of sigðT Þ. tu
5.2 Checking Safe Personalization of a Module
The Safe Personalization Checking (SPC) algorithm (Algorithm
2) checks whether the personalization T 0 of a module
T a of Tbox T w.r.t. a signature is safe. It proceeds by
checking sequentially whether one of the three conditions of
Definition 8 is violated.
Algorithm 2. The SPC algorithm
SPCðT 0; T ; T ; RQA;RCCÞ
Input: a Tbox T 0 that is a personalization of the module T
of a Tbox T w.r.t. sigðT Þ, and two booleans RQA and
RCC denoting respectively whether T is robust to query
answering and/or consistency checking
Output: true if T 0 is safe, false otherwise
(1) if sigðT Þ \ ðsigðT 0ÞnsigðT ÞÞ 6¼ ;
(2) return false
(3) if clðT Þ 6¼ ERMðT [ T 0; sigðT Þ; false; falseÞ
(4) return false
(5) if clðERMðT [ T 0; sigðT 0ÞnsigþðT Þ; RQA;RCCÞÞ
6¼ clðT 0Þ
(6) return false
(7) return true
Theorem 7 (Properties of the SPC algorithm). Let T be a
Tbox and T a module of T w.r.t. a signature sigðT Þ, the
robustness of which is specified by two booleans RQAand RCC
(RQA ¼ true iff it is robust to query answering, RCC ¼ true
iff it is robust to consistency checking). Let T 0 be a personalization
of T : SPCðT 0; T ; T ; RQA;RCCÞ terminates and
returns true iff T 0 is a safe personalization of T w.r.t. T .
Proof. Termination directly follows from the finiteness of
the closure of a Tbox and from the termination of the ERM
algorithm (Theorem 6).
As for correctness, line 1 of Algorithm 2 obviously
checks whether the first item 1 of Definition 8 is falsified.
For checking whether T is not a module of T [T0 w.r.t.
sigðT Þ, the second item of Definition 8, based on the
properties of the closure of a Tbox (item 3. of Theorem 5),
line 3 of Algorithm 2 compares the closure of T with the
closure of a semantically minimal module ofT [T0 w.r.t.
sigðT Þ. The latter is computed by ERMðT [ T 0; sigðT Þ;
false; falseÞ, as the result of the loop starting at Line 2 in
Algorithm 1. Finally, for checking whether the third item
of Definition 8 is falsified, line 5 of Algorithm 2 compares
the closure of T 0 with the closure of a semantically
minimal module ofT [T0 w.r.t. sigðT 0ÞnsigþðT Þ with the
same robustness(es) as T . Because of the lines 7-13 of
Algorithm 1 that work on T instead of clðT Þ like the
previous lines, when RQA ¼ true, it may not be the case
that ERMðT [ T 0; sigðT 0ÞnsigþðT Þ; RQA;RCCÞ outputs
the closure of a module. Therefore, comparing the closure
of T 0 with the closure of ERMðT [ T 0; sigðT 0Þn sigþðT Þ;
RQA;RCCÞ is necessary in line 5 of Algorithm 2 for
checking the last item of Definition 8. tu
Corollary 4 states the complexity of checking whether a
personalization of a module is safe. It directly follows from
the polynomial time and size of the closure construction
(Theorem 5).
Corollary 4 (Complexity of checking safeness). Let T 0 be the
personalization of a (possibly robust) module T of a Tbox T
w.r.t. a signature sigðT Þ. Deciding whether T 0 is a safe
personalization of T w.r.t. T is polynomial in the size of
sigðT [ T 0Þ.
5.3 Computing Minimal Modules by Reduction
As mentioned in Section 4, minimal modules play an
important role for the efficiency of practical module-based
data management.
We build minimal modules by computing syntactically
minimal modules from (possibly personalized) semantically
minimal modules. To do so, we introduce the notion of
reduction of a Tbox for removing redundant constraints.
This operation is dual to that of closure.
Definition 12 (Tbox reduction). The reduction of a Tbox T ,
denoted redðT Þ, is obtained by starting from redðT Þ ¼ T ,
then applying exhaustively the following rule until no more
constraints can be removed from redðT Þ: If 2 redðT Þ and
redðT Þnfg , then: remove from redðT Þ.
The following theorem characterizes the main properties
of the reduction of a Tbox.
Theorem 8 (Properties of Tbox reduction). Let T be a Tbox.
1. Computing redðT Þ is polynomial in the size of sigðT Þ.
2. T and redðT Þ are equivalent.
3. Any strict subset of redðT Þ is not equivalent to T .
Proof. The first item results from the fact that the size of a
Tbox is bounded by that of its closure and checking
whether a Tbox implies a constraint is polynomial in the
size of that Tbox in DL-lite ([11, Theorem 27]), that size
being bounded by that of its closure. The last two items
directly follow from Definition 12. tu
Corollary 5 provides a simple means to compute
(possibly personalized) minimal modules. It follows from
GOASDOUE AND ROUSSET: ROBUST MODULE-BASED DATA MANAGEMENT 659
Definition 8, Theorem 6 (which states that ERM outputs
semantically minimal modules), and Theorem 8.
Corollary 5 (Computation/complexity of minimal modules).
Let T be a Tbox and sigðT Þ. Let T 0 be a safe
personalization of (the reduction of) ERMðT ; ; RQA;RCCÞ.
redðT 0Þ is minimal and is computed in polynomial time in the
size of sigðT 0Þ.
6 CONCLUSION
The modules introduced in this paper generalize both the
modules obtained by extracting a subset of a Tbox w.r.t.
selected relations (e.g., [3], [4], [7], [9]) or by forgetting about
relations (e.g., [5], [6], [8], [10]).
In addition, in contrast with existing work, we have
considered the problem of safe personalization of modules
built from an existing reference DMS. This raises new issues
to check easily that a module-based DMS evolves independently
but coherently w.r.t. the reference DMS from which
it has been built. We have introduced two notions of
module robustness that make possible to build locally the
relevant queries to ask to the reference database in order to
check global consistency (possibly upon each update), and
to obtain global answers for local queries. We have
provided polynomial time algorithms that extract minimal
and robust modules from a reference ontological schema
expressed as a DL-lite Tbox.
Wang et al. [5] extract modules from DL-lite schemas
following a forgetting approach. It proposes an alternative
to our result about global query answering, which applies
under the severe constraints that the data set of the
reference DMS has to be modified (write access is required).
Compared to the algorithm developed by Konev et al. [7]
for extracting modules from acyclic EL ontological schemas,
our approach handles possibly cyclic DL-liteA schemas,
while keeping data consistency and query answering
reducible to standard database queries [11].
In contrast with the recent work on extracting modules
from DL-liteontological schema [4], we focus on the
DL-liteA fragment for which consistency checking and
query answering are FOL-reducible. This is crucial when
ontologies are used as schemas over large data sets stored
and queried as relational databases.
Datalog [15] is an extension of Datalog that has also
been designed for query answering over ontologies. Since it
captures the fragment of DL-lite that we consider, our
results can be easily transposed into it.
Contrarily to recent works in distributed databases, data
replication can be avoided while guaranteeing global
consistency. Our approach is a good tradeoff between the
NoSQL approaches and the SQL approaches for managing
distributed data stores (see [16] for a survey). While most of
the NoSQL approaches are schema less, our approach
makes possible to handle useful schema constraints. It
provides efficient means to check global consistency, a
stronger property than eventual consistency that is prevalent
in distributed data stores. On the other hand, we are
more flexible than the SQL approaches since global
consistency is checked periodically and not at each update
of the reference DMS.
In the next future, we plan to evaluate our approach, in
particular to compare the size of the modules extracted by
our algorithm to the results provided by Cuenca Grau et al.
[17], [18]. We also plan to apply our algorithms to the real
use case of the MyCorporisFabrica DMS, mentioned in the
introduction, which has been developed manually as a
personalization of the (reference) Foundational Model of
Anatomy DMS. Finally, we plan to extend our approach to
distributed module-based DMSs, where answering queries
combines knowledge of several modules associated with
possibly several reference DMSs.
REFERENCES
[1] The Description Logic Handbook: Theory, Implementation, and
Applications, F. Baader, D. Calvanese, D. McGuinness, D. Nardi,
P.F. Patel-Schneider, eds. Cambridge Univ. Press, 2003.
[2] Modular Ontologies: Concepts, Theories and Techniques for Knowledge
Modularization, H. Stuckenschmidt, C. Parent, S. Spaccapietra, eds.
Springer, 2009.
[3] S. Ghilardi, C. Lutz, and F. Wolter, “Did I Damage My Ontology?
A Case for Conservative Extensions in Description Logics,” Proc.
10th Int’l Conf. Principles of Knowledge Representation and Reasoning
(KR), 2006.
[4] R. Kontchakov, L. Pulina, U. Sattler, T. Schneider, P. Selmer, F.
Wolter, and M. Zakharyaschev, “Minimal Module Extraction from
DL-Lite Ontologies Using QBF Solvers,” Proc. 21st Int’l Joint Conf.
Artificial Intelligence (IJCAI), 2009.
[5] Z. Wang, K. Wang, R.W. Topor, and J.Z. Pan, “Forgetting concepts
in DL-Lite,” Proc. Fifth European Semantic Web Conf. Semantic Web:
Research and Applications (ESWC), 2008.
[6] B. Konev, D. Walther, and F. Wolter, “Forgetting and Uniform
Interpolation in Extensions of the Description Logic EL,” Proc.
22nd Int’l Workshop Description Logics, 2009.
[7] B. Konev, C. Lutz, D. Walther, and F. Wolter, “Semantic
Modularity and Module Extraction in Description Logics,” Proc.
18th European Conf. Artificial Intelligence (ECAI), 2008.
[8] B. Konev, D. Walther, and F. Wolter, “Forgetting and Uniform
Interpolation in Large-Scale Description Logic Terminologies,”
Proc. 21st Int’l Joint Conf. Artifical intelligence (IJCAI), 2009.
[9] B. Cuenca Grau, I. Horrocks, Y. Kazakov, and U. Sattler, “Just the
Right Amount: Extracting Modules from Ontologies,” Proc. 16th
Int’l Conf. World Wide Web (WWW), 2007.
[10] K. Wang, Z. Wang, R.W. Topor, J.Z. Pan, and G. Antoniou,
“Concept and Role Forgetting in ALC Ontologies,” Proc. Eighth
Int’l Semantic Web Conf. (ISWC), 2009.
[11] D. Calvanese, G.D. Giacomo, D. Lembo, M. Lenzerini, and R.
Rosati, “Tractable Reasoning and Efficient Query Answering in
Description Logics: The DL-Lite Family,” J. Automated Reasoning,
vol. 39, no. 3, pp. 385-429, 2007.
[12] O. Palombi, G. Bousquet, D. Jospin, S. Hassan, L. Reve´ret, and F.
Faure, “My Corporis Fabrica: A Unified Ontological, Geometrical
and Mechanical View of Human Anatomy,” Proc. Second Workshop
3D Physiological Human (3DPH), 2009.
[13] S. Abiteboul, R. Hull, and V. Vianu, Foundations of Databases.
Addison-Wesley, 1995.
[14] M.Y. Vardi, “The Complexity of Relational Query Languages,”
Proc. 14th Ann. ACM Symp. Theory of Computing (STOC), 1982.
[15] A. Cali, G. Gottlob, and T. Lukasiewicz, “Datalog+-: A Unified
Approach to Ontologies and Integrity Constraints,” Proc. Int’l
Conf. Database Theory (ICDT), 2009.
[16] R. Cattell, “Scalable Sql and Nosql Data Stores,” SIGMOD Record,
vol. 39, no. 4, pp. 12-27, 2010.
[17] B. Cuenca Grau, I. Horrocks, Y. Kazakov, and U. Sattler,
“Extracting Modules from Ontologies: A Logic-Based Approach,”
Proc. Third Int’l Workshop OWL Experiences and Directions
(OWLED), 2007.
[18] B. Cuenca Grau, I. Horrocks, Y. Kazakov, and U. Sattler, “Modular
Reuse of Ontologies: Theory and Practice,” J. Artificial Intelligence
Research, vol. 31, pp. 273-318, 2008.
660 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 3, MARCH 2013
Franc¸ois Goasdoue´ is an associate professor
of computer science at the Universite´ Paris-
Sud, where he is a member of the Laboratoire
de Recherche en Informatique (LRI). His
research interests include databases, knowledge
representation and reasoning, and the
semantic web.
Marie-Christine Rousset is a professor of
Computer Science at the Universite´ de Grenoble,
where she is a member of the Laboratoire
d’Informatique de Grenoble (LIG ). Her research
interests include knowledge representation and
reasoning, information integration, pattern
mining, and the semantic web.
. For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
GOASDOUE AND ROUSSET: ROBUST MODULE-BASED DATA MANAGEMENT 661

Comments are closed.