Consistency Checking
Attention! GraphDB is not supported anymore, therefore parts related to it in this document are redundant.
Requirements
Knora is designed to prevent inconsistencies in RDF data, as far as is practical, in a triplestore-independent way (see Triplestore Updates). However, it is also useful to enforce consistency constraints in the triplestore itself, for two reasons:
- To prevent inconsistencies resulting from bugs in the DSP-API server.
- To prevent users from inserting inconsistent data directly into the triplestore, bypassing Knora.
The design of the knora-base
ontology supports two ways of specifying
constraints on data (see knora-base: Consistency Checking
for details):
- A property definition should specify the types that are allowed as
subjects and objects of the property, using
knora-base:subjectClassConstraint
and (if it is an object property)knora-base:objectClassConstraint
. Every subproperty ofknora-base:hasValue
or aknora-base:hasLinkTo
(i.e. every property of a resource that points to aknora-base:Value
or to another resource) is required have this constraint, because the DSP-API server relies on it to know what type of object to expect for the property. Use ofknora-base:subjectClassConstraint
is recommended but not required. - A class definition should use OWL cardinalities (see OWL 2 Quick Reference Guide)
to indicate the properties that instances of the class are allowed to
have, and to constrain the number of objects that each property can
have. Subclasses of
knora-base:Resource
are required to have a cardinality for each subproperty ofknora-base:hasValue
or aknora-base:hasLinkTo
that resources of that class can have.
Specifically, consistency checking should prevent the following:
- An object property or datatype property has a subject of the wrong class, or an object property has an object of the wrong class (GraphDB's consistency checke cannot check the types of literals).
- An object property has an object that does not exist (i.e. the
object is an IRI that is not used as the subject of any statements
in the repository). This can be treated as if the object is of the
wrong type (i.e. it can cause a violation of
knora-base:objectClassConstraint
, because there is no compatiblerdf:type
statement for the object). - A class has
owl:cardinality 1
orowl:minCardinality 1
on an object property or datatype property, and an instance of the class does not have that property. - A class has
owl:cardinality 1
orowl:maxCardinality 1
on an object property or datatype property, and an instance of the class has more than one object for that property. - An instance of
knora-base:Resource
has an object property pointing to aknora-base:Value
or to anotherResource
, and its class has no cardinality for that property. - An instance of
knora-base:Value
has a subproperty ofknora-base:valueHas
, and its class has no cardinality for that property. - A datatype property has an empty string as an object.
Cardinalities in base classes are inherited by derived classes. Derived classes can override inherited cardinalities by making them more restrictive, i.e. by specifying a subproperty of the one specified in the original cardinality.
Instances of Resource
and Value
can be marked as deleted, using the
property isDeleted
. This must be taken into account as follows:
- With
owl:cardinality 1
orowl:maxCardinality 1
, if the object of the property can be marked as deleted, the property must not have more than one object that has not been marked as deleted. In other words, it's OK if there is more than one object, as long only one of them hasknora-base:isDeleted false
. - With
owl:cardinality 1
orowl:minCardinality 1
, the property must have an object, but it's OK if the property's only object is marked as deleted. We allow this because the subject and object may have different owners, and it may not be feasible for them to coordinate their work. The owner of the object should always be able to mark it as deleted. (It could be useful to notify the owner of the subject when this happens, but that is beyond the scope of consistency checking.)
Design
Ontotext GraphDB provides a mechanism for checking the consistency of data in a repository each time an update transaction is committed. Knora provides GraphDB-specific consistency rules that take advantage of this feature to provide an extra layer of consistency checks, in addition to the checks that are implemented in Knora.
When a repository is created in GraphDB, a set of consistency rules can
be provided, and GraphDB's consistency checker can be turned on to
ensure that each update transaction respects these rules, as described
in the section
Reasoning
of the GraphDB documentation. Like custom inference rules, consistency
rules are defined in files with the .pie
filename extension, in a
GraphDB-specific syntax.
We have added rules to the standard RDFS inference rules file
builtin_RdfsRules.pie
, to create the file KnoraRules.pie
. The .ttl
configuration file that is used to create the repository must contain
these settings:
owlim:ruleset "/path/to/KnoraRules.pie" ;
owlim:check-for-inconsistencies "true" ;
The path to KnoraRules.pie
must be an absolute path. The scripts
provided with Knora to create test repositories set this path
automatically.
Consistency checking in GraphDB relies on reasoning. GraphDB's reasoning is Forward-chaining, which means that reasoning is applied to the contents of each update, before the update transaction is committed, and the inferred statements are added to the repository.
A GraphDB rules file can contain two types of rules: inference rules and consistency rules. Before committing an update transaction, GraphDB applies inference rules, then consistency rules. If any of the consistency rules are violated, the transaction is rolled back.
An inference rule has this form:
Id: <rule_name>
<premises> <optional_constraints>
-------------------------------
<consequences> <optional_constraints>
The premises are a pattern that tries to match statements found in the data. Optional constraints, which are enclosed in square brackets, make it possible to specify the premises more precisely, or to specify a named graph (see examples below). Consequences are the statements that will be inferred if the premises match. A line of hyphens separates premises from consequences.
A GraphDB consistency rule has a similar form:
Consistency: <rule_name>
<premises> <optional_constraints>
-------------------------------
<consequences> <optional_constraints>
The differences between inference rules and consistency rules are:
- A consistency rule begins with
Consistency
instead ofId
. - In a consistency rule, the consequences are optional. Instead of representing statements to be inferred, they represent statements that must exist if the premises are satisfied. In other words, if the premises are satisfied and the consequences are not found, the rule is violated.
- If a consistency rule doesn't specify any consequences, and the premises are satisfied, the rule is violated.
Rules use variable names for subjects, predicates, and objects, and they can use actual property names.
Empty string as object
If subject i
has a predicate p
whose object is an empty string, the
constraint is violated:
Consistency: empty_string
i p ""
------------------------------------
Subject and object class constraints
If subject i
has a predicate p
that requires a subject of type t
,
and i
is not a t
, the constraint is violated:
Consistency: subject_class_constraint
p <knora-base:subjectClassConstraint> t
i p j
------------------------------------
i <rdf:type> t
If subject i
has a predicate p
that requires an object of type t
,
and the object of p
is not a t
, the constraint is violated:
Consistency: object_class_constraint
p <knora-base:objectClassConstraint> t
i p j
------------------------------------
j <rdf:type> t
Cardinality constraints
A simple implementation of a consistency rule to check
owl:maxCardinality 1
, for objects that can be marked as deleted, could
look like this:
Consistency: max_cardinality_1_with_deletion_flag
i <rdf:type> r
r <owl:maxCardinality> "1"^^xsd:nonNegativeInteger
r <owl:onProperty> p
i p j
i p k [Constraint j != k]
j <knora-base:isDeleted> "false"^^xsd:boolean
k <knora-base:isDeleted> "false"^^xsd:boolean
------------------------------------
This means: if resource i
is a subclass of an owl:Restriction
r
with owl:maxCardinality 1
on property p
, and the resource has two
different objects for that property, neither of which is marked as
deleted, the rule is violated. Note that this takes advantage of the
fact that Resource
and Value
have owl:cardinality 1
on isDeleted
(isDeleted
must be present even if false), so we do not need to check
whether i
is actually something that can be marked as deleted.
However, this implementation would be much too slow. We therefore use two optimisations suggested by Ontotext:
- Add custom inference rules to make tables (i.e. named graphs) of pre-calculated information about the cardinalities on properties of subjects, and use those tables to simplify the consistency rules.
- Use the
[Cut]
constraint to avoid generating certain redundant compiled rules (see Entailment rules).
For example, to construct a table of subjects belonging to classes that
have owl:maxCardinality 1
on some property p
, we use the following
custom inference rule:
Id: maxCardinality_1_table
i <rdf:type> r
r <owl:maxCardinality> "1"^^xsd:nonNegativeInteger
r <owl:onProperty> p
------------------------------------
i p r [Context <onto:_maxCardinality_1_table>]
The constraint [Context <onto:_maxCardinality_1_table>]
means that the
inferred triples are added to the context (i.e. the named graph)
http://www.ontotext.com/_maxCardinality_1_table
. (Note that we have
defined the prefix onto
as http://www.ontotext.com/
in the
Prefices
section of the rules file.) As the GraphDB documentation on
Rules
explains:
If the context is provided, the statements produced as rule consequences are not ‘visible’ during normal query answering. Instead, they can only be used as input to this or other rules and only when the rule premise explicitly uses the given context.
Now, to find out whether a subject belongs to a class with that
cardinality on a given property, we only need to match one triple. The
revised implementation of the rule
max_cardinality_1_with_deletion_flag
is as follows:
Consistency: max_cardinality_1_with_deletion_flag
i p r [Context <onto:_maxCardinality_1_table>]
i p j [Constraint j != k]
i p k [Cut]
j <knora-base:isDeleted> "false"^^xsd:boolean
k <knora-base:isDeleted> "false"^^xsd:boolean
------------------------------------
The constraint [Constraint j != k]
means that the premises will be
satisfied only if the variables j
and k
do not refer to the same
thing.
With these optimisations, the rule is faster by several orders of magnitude.
Since properties whose objects can be marked as deleted must be handled
differently to properties whose objects cannot be marked as deleted, the
knora-base
ontology provides a property called
objectCannotBeMarkedAsDeleted
. All properties in knora-base
whose
objects cannot take the isDeleted
flag (including datatype properties)
should be derived from this property. This is how it is used to check
owl:maxCardinality 1
for objects that cannot be marked as deleted:
Consistency: max_cardinality_1_without_deletion_flag
i p r [Context <onto:_maxCardinality_1_table>]
p <rdfs:subPropertyOf> <knora-base:objectCannotBeMarkedAsDeleted>
i p j [Constraint j != k]
i p k [Cut]
------------------------------------
To check owl:minCardinality 1
, we do not care whether the object can
be marked as deleted, so we can use this simple rule:
Consistency: min_cardinality_1_any_object
i p r [Context <onto:_minCardinality_1_table>]
------------------------------------
i p j
This means: if a subject i
belongs to a class that has
owl:minCardinality 1
on property p
, and i
has no object for p
,
the rule is violated.
To check owl:cardinality 1
, we need two rules: one that checks whether
there are too few objects, and one that checks whether there are too
many. To check whether there are too few objects, we don't care whether
the objects can be marked as deleted, so the rule is the same as
min_cardinality_1_any_object
, except for the cardinality:
Consistency: cardinality_1_not_less_any_object
i p r [Context <onto:_cardinality_1_table>]
------------------------------------
i p j
To check whether there are too many objects, we need to know whether the
objects can be marked as deleted or not. In the case where the objects
can be marked as deleted, the rule is the same as
max_cardinality_1_with_deletion_flag
, except for the cardinality:
Consistency: cardinality_1_not_greater_with_deletion_flag
i p r [Context <onto:_cardinality_1_table>]
i p j [Constraint j != k]
i p k [Cut]
j <knora-base:isDeleted> "false"^^xsd:boolean
k <knora-base:isDeleted> "false"^^xsd:boolean
------------------------------------
In the case where the objects cannot be marked as deleted, the rule is
the same as max_cardinality_1_without_deletion_flag
, except for the
cardinality:
Consistency: cardinality_1_not_less_any_object
i p r [Context <onto:_cardinality_1_table>]
------------------------------------
i p j
Knora allows a subproperty of knora-base:hasValue
or
knora-base:hasLinkTo
to be a predicate of a resource only if the
resource's class has some cardinality for the property. For convenience,
knora-base:hasValue
and knora-base:hasLinkTo
are subproperties of
knora-base:resourceProperty
, which is used to check this constraint in
the following rule:
Consistency: resource_prop_cardinality_any
i <knora-base:resourceProperty> j
------------------------------------
i p j
i <rdf:type> r
r <owl:onProperty> p
If resource i
has a subproperty of knora-base:resourceProperty
, and
i
is not a member of a subclass of an owl:Restriction
r
with a
cardinality on that property (or on one of its base properties), the
rule is violated.
A similar rule, value_prop_cardinality_any
, ensures that if a value
has a subproperty of knora-base:valueHas
, the value's class has some
cardinality for that property.