Triplestore Updates
Requirements
General
The supported update operations are:
- Create a new resource with its initial values.
- Add a new value.
- Change a value.
- Delete a value (i.e. mark it as deleted).
- Delete a resource (i.e. mark it as deleted).
Users must be able to edit the same data concurrently.
Each update must be atomic and leave the database in a consistent, meaningful state, respecting ontology constraints and permissions.
The application must not use any sort of long-lived locks, because they tend to hinder concurrent edits, and it is difficult to ensure that they are released when they are no longer needed. Instead, if a user requests an update based on outdated information (because another user has just changed something, and the first user has not found out yet), the update must be not performed, and the application must notify the user who requested it, suggesting that the user should check the relevant data and try again if necessary. (We may eventually provide functionality to help users merge edits in such a situation. The application can also encourage users to coordinate with one another when they are working on the same data, and may eventually provide functionality to facilitate this coordination.)
We can assume that each SPARQL update operation will run in its own database transaction with an isolation level of 'read committed'. This is what GraphDB does when it receives a SPARQL update over HTTP (see GraphDB SE Transactions). We cannot assume that it is possible to run more than one SPARQL update in a single database transaction. (The SPARQL 1.1 Protocol does not provide a way to do this, and currently it can be done only by embedding the triplestore in the application and using a vendor-specific API, but we cannot require this in Knora.)
Permissions
To create a new value (as opposed to a new version of an existing value), the user must have permission to modify the containing resource.
To create a new version of an existing value, the user needs only to have permission to modify the current version of the value; no permissions on the resource are needed.
Since changing a link requires deleting the old link and creating a new
one (as described in Linking), a user wishing
to change a link must have modify permission on both the containing
resource and the knora-base:LinkValue
for the existing link.
When a new resource or value is created, it can be given default permissions specified the project's admin data, or (only in API v2) custom permissions can be specified.
Ontology Constraints
Knora must not allow an update that would violate an ontology constraint.
When creating a new value (as opposed to adding a new version of an existing value), Knora must not allow the update if the containing resource's OWL class does not contain a cardinality restriction for the submitted property, or if the new value would violate the cardinality restriction.
It must also not allow the update if the type of the submitted value
does not match the knora-base:objectClassConstraint
of the property,
or if the property has no knora-base:objectClassConstraint
. In the
case of a property that points to a resource, Knora must ensure that the
target resource belongs to the OWL class specified in the property's
knora-base:objectClassConstraint
, or to a subclass of that class.
Duplicate and Redundant Values
When creating a new value, or changing an existing value, Knora checks
whether the submitted value would duplicate an existing value for the
same property in the resource. The definition of 'duplicate' depends on
the type of value; it does not necessarily mean that the two values are
strictly equal. For example, if two text values contain the same Unicode
string, they are considered duplicates, even if they have different
Standoff markup. If resource R
has property P
with value V1
, and
V1
is a duplicate of V2
, the API server must not add another
instance of property P
with value V2
. However, if the requesting
user does not have permission to see V2
, the duplicate is allowed,
because forbidding it would reveal the contents of V2
to the user.
When creating a new version of a value, Knora also checks whether the new version is redundant, given the existing value. It is possible for the definition of 'redundant' can depend on the type of value, but in practice, it means that the values are strictly equal: any change, however trivial, is allowed.
Versioning
Each Knora value (i.e. something belonging to an OWL class derived from
knora-base:Value
) is versioned. This means that once created, a value
is never modified. Instead, 'changing' a value means creating a new
version of the value --- actually a new value --- that points to the
previous version using knora-base:previousValue
. The versions of a
value are a singly-linked list, pointing backwards into the past. When a
new version of a value is made, the triple that points from the resource
to the old version (using a subproperty of knora-base:hasValue
) is
removed, and a triple is added to point from the resource to the new
version. Thus the resource always points only to the current version of
the value, and the older versions are available only via the current
version's knora-base:previousValue
predicate.
Unlike values, resources (members of OWL classes derived from
knora-base:Resource
) are not versioned. The data that is attached to a
resource, other than its values, can be modified.
Deleting
Knora does not actually delete resources or values; it only marks them
as deleted. Deleted data is normally hidden. All resources and values
must have the predicate knora- base:isDeleted
, whose object is a
boolean. If a resource or value has been marked as deleted, it has
knora-base:isDeleted true
and has a knora-base:deleteDate
. An
optional knora-base:deleteComment
may be added to explain why the
resource or value has been marked as deleted.
Normally, a value is marked as deleted without creating a new version of
it. However, link values must be treated as a special case. Before a
LinkValue
can be marked as deleted, its reference count must be
decremented to 0. Therefore, a new version of the LinkValue
is made,
with a reference count of 0, and it is this new version that is marked
as deleted.
Since it is necessary to be able to find out when a resource was deleted, it is not possible to undelete a resource. Moreover, to simplify the checking of cardinality constraints, and for consistency with resources, it is not possible to undelete a value, and no new versions of a deleted value can be made. Instead, if desired, a new resource or value can be created by copying data from a deleted resource or value.
Linking
Links must be treated differently to other types of values. Knora needs
to maintain information about the link, including permissions and a
version history. Since the link does not have a unique IRI of its own,
Knora uses RDF
reifications for
this purpose. Each link between two resources has exactly one
(non-deleted) knora-base:LinkValue
. The resource itself has a
predicate that points to the LinkValue
, using a naming convention in
which the word Value
is appended to the name of the link predicate to
produce the link value predicate. For example, if a resource
representing a book has a predicate called hasAuthor
that points to
another resource, it must also have a predicate called hasAuthorValue
that points to the LinkValue
in which information about the link is
stored. To find a particular LinkValue
, one can query it either by
using its IRI (if known), or by using its rdf:subject
,
rdf:predicate
, and rdf:object
(and excluding link values that are
marked as deleted).
Like other values, link values are versioned. The link value predicate
always points from the resource to the current version of the link
value, and previous versions are available only via the current
version's knora-base:previousValue
predicate. Deleting a link means
deleting the triple that links the two resources, and making a new
version of the link value, marked with knora-base:isDeleted
. A triple
then points from the resource to this new, deleted version (using the
link value property).
The API allows a link to be 'changed' so that it points to a different
target resource. This is implemented as follows: the existing triple
connecting the two resources is removed, and a new triple is added using
the same link property and pointing to the new target resource. A new
version of the old link's LinkValue
is made, marked with
knora-base:isDeleted
. A new LinkValue
is made for the new link. The
new LinkValue
has no connection to the old one.
When a resource contains knora-base:TextValue
with Standoff markup
that includes a reference to another resource, this reference is
materialised as a direct link between the two resources, to make it
easier to query. A special link property,
knora-base:hasStandoffLinkTo
, is used for this purpose. The
corresponding link value property, knora-base:hasStandoffLinkToValue
,
points to a LinkValue
. This LinkValue
contains a reference count,
indicated by knora-base:valueHasRefCount
, that represents the number
of text values in the containing resource that include one or more
Standoff references to the specified target resource. Each time this
number changes, a new version of this LinkValue
is made. When the
reference count reaches zero, the triple with
knora-base:hasStandoffLinkTo
is removed, and a new version of the
LinkValue
is made and marked with knora-base:isDeleted
. If the same
resource reference later appears again in a text value, a new triple is
added using knora-base:hasStandoffLinkTo
, and a new LinkValue
is
made, with no connection to the old one.
For consistency, every LinkValue
contains a reference count. If the
link property is not knora-base:hasStandoffLinkTo
, the reference count
will always be either 1 (if the link exists) or 0 (if it has been
deleted, in which case the link value will also be marked with
knora-base:isDeleted
).
When a LinkValue
is created for a standoff resource reference, it is
given the same permissions as the text value containing the reference.
Design
Responsibilities of Responders
The resources responder (ResourcesResponderV1
in API v1, ResourcesResponderV2
in API v2) has sole responsibility for generating SPARQL to
create and updating resources, and the values responder (ValuesResponderV1
or ValuesResponderV2
) has sole responsibility for generating SPARQL to create
and update values. When a new resource is created with its values, the values responder
generates SPARQL statements that can be included in the INSERT
clause of a SPARQL update to create the values, and
the resources responder adds these statements to the SPARQL update that
creates the resource. This ensures that the resource and its values are
created in a single SPARQL update operation, and hence in a single
triplestore transaction.
Application-level Locking
The 'read committed' isolation level cannot prevent a scenario where two
users want to add the same data at the same time. It is possible that
both requests would do pre-update checks and simultaneously find that it
is OK to add the data, and that both updates would then succeed,
inserting redundant data and possibly violating ontology constraints.
Therefore, Knora uses short-lived, application-level write locks on
resources, to ensure that only one request at a time can update a given
resource. Before each update, the application acquires a lock on a resource.
To prevent deadlocks, Knora locks only one resource per API operation.
It then does the pre-update checks and the update, then releases the
lock. The lock implementation (in IriLocker
) requires each API
request message to include a random UUID, which is generated in the
API Routing package. Using
application-level locks allows us to do pre-update checks in their own
transactions, and finally to do the SPARQL update in its own
transaction.
Ensuring Data Consistency
Knora enforces consistency constraints using three redundant mechanisms:
- By doing pre-update checks using SPARQL SELECT queries and cached ontology data.
- By doing checks in the
WHERE
clauses of SPARQL updates. - By using GraphDB's built-in consistency checker (see Consistency Checking).
We take the view that redundant consistency checks are a good thing.
Pre-update checks are SPARQL SELECT
queries that are executed while
holding an application-level lock on the resource to be updated. These
checks should work with any triplestore, and can return helpful,
Knora-specific error messages to the client if the request would violate
a consistency constraint.
However, the SPARQL update itself is our only chance to do pre-update
checks in the same transaction that will perform the update. The design
of the SPARQL 1.1 Update
standard makes it possible to ensure that if certain conditions are not
met, the update will not be performed. In our SPARQL update code, each
update contains a WHERE
clause, possibly a DELETE
clause, and an
INSERT
clause. The WHERE
clause is executed first. It performs
consistency checks and provides values for variables that are used in
the DELETE
and/or INSERT
clauses. In our updates, if the
expectations of the WHERE
clause are not met (e.g. because the data to
be updated does not exist), the WHERE
clause should return no results;
as a result, the update will not be performed.
Regardless of whether the update changes the contents of the
triplestore, it returns nothing. If the update did nothing because the
conditions of the WHERE clause were not met, the only way to find out is
to do a SELECT
afterwards. Moreover, in this case, there is no
straightforward way to find out which conditions was not met. This is
one reason why Knora does pre-update checks using separate SELECT
queries and/or cached ontology data, before performing the update.
This makes it possible to return specific error messages to the user to
indicate why an update cannot be performed.
Moreover, while some checks are easy to do in a SPARQL update, others
are difficult, impractical, or impossible. Easy checks include checking
whether a resource or value exists or is deleted, and checking that the
knora-base:objectClassConstraint
of a predicate matches the rdf:type
of its intended object. Cardinality checks are not very difficult, but
they perform poorly on Jena. Knora does not do permission checks in
SPARQL, because its permission-checking algorithm is too complex to be
implemented in SPARQL. For this reason, Knora's check for duplicate
values cannot be done in SPARQL update code, because it relies on
permission checks.
In a bulk import operation, which can create a large number of resources
in a single SPARQL update, a WHERE
clause can become very expensive
for the triplestore, in terms of memory as well as execution time.
Moreover, RDF4J (and hence GraphDB) uses a recursive algorithm to parse
SPARQL queries with WHERE
clauses, so the size of a WHERE
clause is
limited by the stack space available to the Java Virtual Machine.
Therefore, in bulk import operations, Knora uses INSERT DATA
, which
does not involve a WHERE
clause. Bulk imports thus rely on checks (1)
and (3) above.
SPARQL Update Examples
The following sample SPARQL update code is simpler than what Knora actually does. It is included here to illustrate the way Knora's SPARQL updates are structured and how concurrent updates are handled.
Finding a value IRI in a value's version history
We will need this query below. If a value is present in a resource property's version history, the query returns everything known about the value, or nothing otherwise:
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix knora-base: <http://www.knora.org/ontology/knora-base#>
SELECT ?p ?o
WHERE {
BIND(IRI("http://rdfh.ch/c5058f3a") as ?resource)
BIND(IRI("http://www.knora.org/ontology/0803/incunabula#book_comment") as ?property)
BIND(IRI("http://rdfh.ch/c5058f3a/values/testComment002") as ?searchValue)
?resource ?property ?currentValue .
?currentValue knora-base:previousValue* ?searchValue .
?searchValue ?p ?o .
}
Creating the initial version of a value
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix knora-base: <http://www.knora.org/ontology/knora-base#>
WITH <http://www.knora.org/ontology/0803/incunabula>
INSERT {
?newValue rdf:type ?valueType ;
knora-base:valueHasString """Comment 1""" ;
knora-base:attachedToUser <http://rdfh.ch/users/91e19f1e01> ;
knora-base:attachedToProject <http://rdfh.ch/projects/77275339> ;
knora-base:hasPermissions "V knora-admin:KnownUser,knora-admin:UnknownUser|M knora-admin:ProjectMember" ;
knora-base:valueTimestamp ?currentTime .
?resource ?property ?newValue .
} WHERE {
BIND(IRI("http://rdfh.ch/c5058f3a") as ?resource)
BIND(IRI("http://www.knora.org/ontology/0803/incunabula#book_comment") as ?property)
BIND(IRI("http://rdfh.ch/c5058f3a/values/testComment001") AS ?newValue)
BIND(IRI("http://www.knora.org/ontology/knora-base#TextValue") AS ?valueType)
BIND(NOW() AS ?currentTime)
# Do nothing if the resource doesn't exist.
?resource rdf:type ?resourceClass .
# Do nothing if the submitted value has the wrong type.
?property knora-base:objectClassConstraint ?valueType .
}
To find out whether the insert succeeded, the application can use the query in Finding a value IRI in a value's version history to look for the new IRI in the property's version history.
Adding a new version of a value
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix knora-base: <http://www.knora.org/ontology/knora-base#>
WITH <http://www.knora.org/ontology/0803/incunabula>
DELETE {
?resource ?property ?currentValue .
} INSERT {
?newValue rdf:type ?valueType ;
knora-base:valueHasString """Comment 2""" ;
knora-base:previousValue ?currentValue ;
knora-base:attachedToUser <http://rdfh.ch/users/91e19f1e01> ;
knora-base:attachedToProject <http://rdfh.ch/projects/77275339> ;
knora-base:hasPermissions "V knora-admin:KnownUser,knora-admin:UnknownUser|M knora-admin:ProjectMember" ;
knora-base:valueTimestamp ?currentTime .
?resource ?property ?newValue .
} WHERE {
BIND(IRI("http://rdfh.ch/c5058f3a") as ?resource)
BIND(IRI("http://rdfh.ch/c5058f3a/values/testComment001") AS ?currentValue)
BIND(IRI("http://rdfh.ch/c5058f3a/values/testComment002") AS ?newValue)
BIND(IRI("http://www.knora.org/ontology/knora-base#TextValue") AS ?valueType)
BIND(NOW() AS ?currentTime)
?resource ?property ?currentValue .
?property knora-base:objectClassConstraint ?valueType .
}
The update request must contain the IRI of the most recent version of
the value (http://rdfh.ch/c5058f3a/values/c3295339
). If this is
not in fact the most recent version (because someone else has done an
update), this operation will do nothing (because the WHERE
clause will
return no rows). To find out whether the update succeeded, the
application will then need to do a SELECT query using the query in
Finding a value IRI in a value's version history.
In the case of concurrent updates, there are two possibilities:
- Users A and B are looking at version 1. User A submits an update and it succeeds, creating version 2, which user A verifies using a SELECT. User B then submits an update to version 1 but it fails, because version 1 is no longer the latest version. User B's SELECT will find that user B's new value IRI is absent from the value's version history.
- Users A and B are looking at version 1. User A submits an update and it succeeds, creating version 2. Before User A has time to do a SELECT, user B reads the new value and updates it again. Both users then do a SELECT, and find that both their new value IRIs are present in the value's version history.
Getting all versions of a value
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix knora-base: <http://www.knora.org/ontology/knora-base#>
SELECT ?value ?valueTimestamp ?previousValue
WHERE {
BIND(IRI("http://rdfh.ch/c5058f3a") as ?resource)
BIND(IRI("http://www.knora.org/ontology/0803/incunabula#book_comment") as ?property)
BIND(IRI("http://rdfh.ch/c5058f3a/values/testComment002") AS ?currentValue)
?resource ?property ?currentValue .
?currentValue knora-base:previousValue* ?value .
OPTIONAL {
?value knora-base:valueTimestamp ?valueTimestamp .
}
OPTIONAL {
?value knora-base:previousValue ?previousValue .
}
}
This assumes that we know the current version of the value. If the version we have is not actually the current version, this query will return no rows.