Gravsearch: Virtual Graph Search
Basic Concept
Gravsearch is intended to offer the advantages of SPARQL endpoints (particularly the ability to perform queries using complex search criteria) while avoiding their drawbacks in terms of performance and security (see The Enduring Myth of the SPARQL Endpoint). It also has the benefit of enabling clients to work with a simpler RDF data model than the one the API actually uses to store data in the triplestore, and makes it possible to provide better error-checking.
Rather than being processed directly by the triplestore, a Gravsearch query is interpreted by the API, which enforces certain restrictions on the query, and implements paging and permission checking. The API server generates SPARQL based on the Gravsearch query submitted, queries the triplestore, filters the results according to the user's permissions, and returns each page of query results as an API response. Thus, Gravsearch is a hybrid between a RESTful API and a SPARQL endpoint.
A Gravsearch query conforms to a subset of the syntax of a SPARQL
CONSTRUCT query, with
some additional restrictions and functionality. In particular, the
variable representing the top-level (or 'main') resource that will
appear in each search result must be identified, statements must be
included to specify the types of the entities being queried, OFFSET is
used to control paging, and ORDER BY
is used to sort the results.
It is certainly possible to write Gravsearch queries by hand, but we expect that in general, they will be automatically generated by client software, e.g. by a client user interface.
For a more detailed overview of Gravsearch, see Gravsearch: Transforming SPARQL to query humanities data.
Submitting Gravsearch Queries
The recommended way to submit a Gravsearch query is via HTTP POST:
HTTP POST to http://host/v2/searchextended
This works like query via POST directly in the SPARQL 1.1 Protocol: the query is sent unencoded as the HTTP request message body, in the UTF-8 charset.
It is also possible to submit a Gravsearch query using HTTP GET. The entire query must be URL-encoded and included as the last element of the URL path:
HTTP GET to http://host/v2/searchextended/QUERY
The response to a Gravsearch query is an RDF graph, which can be requested in various formats (see Responses Describing Resources).
To request the number of results rather than the results themselves, you can do a count query:
HTTP POST to http://host/v2/searchextended/count
The response to a count query request is an object with one predicate,
http://schema.org/numberOfItems
, with an integer value.
If a gravsearch query times out, a 504 Gateway Timeout
will be returned.
Gravsearch and API Schemas
A Gravsearch query can be written in either of the two DSP-API v2 schemas. The simple schema is easier to work with, and is sufficient if you don't need to query anything below the level of a DSP-API value. If your query needs to refer to standoff markup, you must use the complex schema. Each query must use a single schema, with one exception (see Date Comparisons).
Gravsearch query results can be requested in the simple or complex schema; see API Schema.
All examples hereafter run with the DSP stack started locally as documented in the section Getting Started with DSP-API. If you access another stack, you can check the IRI of the ontology you are targeting by requesting the ontologies metadata.
Using the Simple Schema
To write a query in the simple schema, use the knora-api
ontology in
the simple schema, and use the simple schema for any other DSP ontologies
the query refers to, e.g.:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
In the simple schema, DSP-API values are represented as literals, which can be used FILTER
expressions
(see Filtering on Values in the Simple Schema).
Using the Complex Schema
To write a query in the complex schema, use the knora-api
ontology in
the complex schema, and use the complex schema for any other DSP ontologies
the query refers to, e.g.:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/v2#>
In the complex schema, DSP-API values are represented as objects belonging
to subclasses of knora-api:Value
, e.g. knora-api:TextValue
, and have
predicates of their own, which can be used in FILTER
expressions
(see Filtering on Values in the Complex Schema).
Main and Dependent Resources
The main resource is the top-level resource in a search result. Other
resources that are in some way connected to the main resource are
referred to as dependent resources. If the client asks for a resource A
relating to a resource B, then all matches for A will be presented as
main resources and those for B as dependent resources. The main resource
must be represented by a variable, marked with knora-api:isMainResource
,
as explained under CONSTRUCT Clause.
Virtual incoming Links
Depending on the ontology design, a resource A points to B or vice versa.
For example, a page A is part of a book B using the property incunabula:partOf
.
If A is marked as the main resource, then B is nested as a dependent resource
in its link value incunabula:partOfValue
. But in case B is marked as the main resource,
B does not have a link value pointing to A because in fact B is pointed to by A.
Instead, B has a virtual property knora-api:hasIncomingLink
containing A's link value:
"knora-api:hasIncomingLinkValue" : {
"@id" : "http://rdfh.ch/A/values/xy",
"@type" : "knora-api:LinkValue",
"knora-api:linkValueHasSource" : {
"@id" : "http://rdfh.ch/A",
"@type" : "incunabula:page",
"incunabula:partOfValue" : {
"@id" : "http://rdfh.ch/A/values/xy",
"@type" : "knora-api:LinkValue",
"knora-api:linkValueHasTargetIri" : {
"@id" : "http://rdfh.ch/B"
}
}
}
},
Note that the virtually inserted link value inverts the relation by using knora-api:linkValueHasSource
.
The source of the link is A and its target B is only represented by an Iri (knora-api:linkValueHasTargetIri
)
since B is the main resource.
Graph Patterns and Result Graphs
The WHERE clause of a Gravsearch query specifies a graph pattern. Each query result will match this graph pattern, and will have the form of a graph whose starting point is a main resource. The query's graph pattern, and hence each query result graph, can span zero more levels of relations between resources. For example, a query could request regions in images on pages of books written by a certain author, articles by authors who were students of a particular professor, or authors of texts that refer to events that took place within a certain date range.
Permission Checking
Each matching resource is returned with the values that the user has permission to see. If the user does not have permission to see a matching main resource, it is hidden in the results. If a user does not have permission to see a matching dependent resource, the link value is hidden.
Paging
Gravsearch results are returned in pages. The maximum number of main
resources per page is determined by the API (and can be configured
in application.conf
via the setting app/v2/resources-sequence/results-per-page
).
If some resources have been filtered out because the user does not have
permission to see them, a page could contain fewer results, or no results.
If it is possible that more results are available in subsequent pages,
the Gravsearch response will contain the predicate knora-api:mayHaveMoreResults
with the boolean value true
, otherwise it will not contain this predicate.
Therefore, to retrieve all available results, the client must request each page
one at a time, until the response does not contain knora-api:mayHaveMoreResults
.
Inference
Gravsearch queries are understood to imply a subset of RDFS reasoning. This is done by the API by expanding the incoming query.
Specifically, if a statement pattern specifies a property, the pattern will
also match subproperties of that property, and if a statement specifies that
a subject has a particular rdf:type
, the statement will also match subjects
belonging to subclasses of that type.
If you know that reasoning will not return any additional results for
your query, you can disable it by adding this line to the WHERE
clause, which may improve query performance:
knora-api:GravsearchOptions knora-api:useInference false .
Gravsearch Syntax
Every Gravsearch query is a valid SPARQL 1.1 CONSTRUCT query. However, Gravsearch only supports a subset of the elements that can be used in a SPARQL Construct query, and a Gravsearch CONSTRUCT Clause has to indicate which variable is to be used for the main resource in each search result.
Supported SPARQL Syntax
The current version of Gravsearch accepts CONSTRUCT
queries whose WHERE
clauses use the following patterns, with the specified restrictions:
OPTIONAL
: cannot be nested in aUNION
.UNION
: cannot be nested in aUNION
.FILTER
: may contain a complex expression using the Boolean operators AND and OR, as well as comparison operators. The left argument of a comparison operator must be a query variable. A Knora ontology entity IRI used in aFILTER
must be a property IRI.FILTER NOT EXISTS
MINUS
OFFSET
: theOFFSET
is needed for paging. It does not actually refer to the number of triples to be returned, but to the requested page of results. The default value is 0, which refers to the first page of results.ORDER BY
: In SPARQL, the result of aCONSTRUCT
query is an unordered set of triples. However, a Gravsearch query returns an ordered list of resources, which can be ordered by the values of specified properties. If the query is written in the complex schema, items below the level of DSP-API values may not be used inORDER BY
.BIND
: The value assigned must be a DSP resource IRI.
Resources, Properties, and Values
Resources can be represented either by an IRI or by a variable, except for the main resource, which must be represented by a variable.
It is possible to do a Gravsearch query in which the IRI of the main resource
is already known, e.g. to request specific information about that resource and
perhaps about linked resources. In this case, the IRI of the main resource must
be assigned to a variable using BIND
. Note that BIND
statements slow the query down,
therefore we recommend that you do not use them unless you have to.
Properties can be represented by an IRI or a query variable. If a
property is represented by a query variable, it can be restricted to
certain property IRIs using a FILTER
.
A Knora value (i.e. a value attached to a knora-api:Resource
)
must be represented as a query variable.
Filtering on Values
Filtering on Values in the Simple Schema
In the simple schema, a variable representing a DSP-API value can be used
directly in a FILTER
expression. For example:
?book incunabula:title ?title .
FILTER(?title = "Zeitglöcklein des Lebens und Leidens Christi")
Here the type of ?title
is xsd:string
.
The following value types can be compared with literals in FILTER
expressions in the simple schema:
- Text values (
xsd:string
) - Uri values (
xsd:anyURI
) - Integer values (
xsd:integer
) - Decimal values (
xsd:decimal
) - Boolean values (
xsd:boolean
) - Date values (
knora-api:Date
) - List values (
knora-api:ListNode
)
List values can only be searched for using the equal operator (=
),
performing an exact match on a list node's label. Labels can be given in different languages for a specific list node.
If one of the given list node labels matches, it is considered a match.
Note that in the simple schema, uniqueness is not guaranteed (as opposed to the complex schema).
A DSP-API value may not be represented as the literal object of a predicate; for example, this is not allowed:
?book incunabula:title "Zeitglöcklein des Lebens und Leidens Christi" .
Filtering on Values in the Complex Schema
In the complex schema, variables representing DSP-API values are not literals. You must add something to the query (generally a statement) to get a literal from a DSP-API value. For example:
?book incunabula:title ?title .
?title knora-api:valueAsString "Zeitglöcklein des Lebens und Leidens Christi" .
Here the type of ?title
is knora-api:TextValue
. Note that no FILTER
is needed
in this example. But if you want to use a different comparison operator,
you need a FILTER
:
?page incunabula:seqnum ?seqnum .
?seqnum knora-api:intValueAsInt ?seqnumInt .
FILTER(?seqnumInt <= 10)
To match a date value in the complex schema, you must use the
knora-api:toSimpleDate
function in a FILTER
(see Date Comparisons). The predicates of
knora-api:DateValue
(knora-api:dateValueHasStartYear
, etc.) are not
available in Gravsearch.
Date Comparisons
In the simple schema, you can compare a date value directly with a knora-api:Date
in a FILTER
:
?book incunabula:pubdate ?pubdate .
FILTER(?pubdate < "JULIAN:1497"^^knora-api:Date)
In the complex schema, you must use the function knora-api:toSimpleDate
,
passing it the variable representing the date value. The date literal used
in the comparison must still be a knora-api:Date
in the simple schema.
This is the only case in which you can use both schemas in a single query:
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX knora-api-simple: <http://api.knora.org/ontology/knora-api/simple/v2#>
CONSTRUCT {
?book knora-api:isMainResource true .
?book incunabula:pubdate ?pubdate .
} WHERE {
?book a incunabula:book .
?book incunabula:pubdate ?pubdate .
FILTER(knora-api:toSimpleDate(?pubdate) < "JULIAN:1497"^^knora-api-simple:Date)
} ORDER BY ?pubdate
You can also use knora-api:toSimpleDate
with to search for date tags in standoff
text markup (see Matching Standoff Dates).
Note that the given date value for comparison must have the following format:
```
(GREGORIAN|JULIAN|ISLAMIC):\d{1,4}(-\d{1,2}(-\d{1,2})?)?( BC| AD| BCE| CE)?(:\d{1,4}(-\d{1,2}(-\d{1,2})?)?( BC| AD| BCE| CE)?)?
```
E.g. an exact date like GREGORIAN:2015-12-03
or a period like GREGORIAN:2015-12-03:2015-12-04
.
Dates may also have month or year precision, e.g. ISLAMIC:1407-02
(the whole month of december) or JULIAN:1330
(the whole year 1330). An optional ERA indicator term (BCE
, CE
, or BC
, AD
) can be added to the date, when no
era is provided the default era AD
will be considered. Era can be given as GREGORIAN:1220 BC
or in range as
GREGORIAN:600 BC:480 BC
.
Searching for Matching Words
The function knora-api:matchText
searches for matching words anywhere in a
text value, and is implemented using a full-text search index if available.
The first argument must represent a text value (a knore-api:TextValue
in
the complex schema, or an xsd:string
in the simple schema). The second
argument is a string literal containing the words to be matched, separated by spaces.
The function supports the
Lucene Query Parser syntax.
Note that Lucene's default operator is a logical OR when submitting several search terms.
This function can only be used as the top-level expression in a FILTER
.
For example, to search for titles that contain the words 'Zeitglöcklein' and 'Lebens':
?book incunabule:title ?title .
FILTER knora-api:matchText(?title, "Zeitglöcklein Lebens")
Filtering Text by Language
To filter a text value by language in the simple schema, use the SPARQL lang
function
on the text value, e.g.:
FILTER(lang(?text) = "fr")
In the complex schema, the lang
function is not supported. Use the text
value's knora-api:textValueHasLanguage
predicate instead:
?text knora-api:textValueHasLanguage "fr" .
Regular Expressions
The SPARQL regex
function
is supported. In the simple schema, you can use it directly on the text value,
e.g.
?book incunabula:title ?title .
FILTER regex(?title, "Zeit", "i")
In the complex schema, use it on the object of the text value's
knora-api:valueAsString
predicate:
?book incunabula:title ?title .
?title knora-api:valueAsString ?titleStr .
FILTER regex(?titleStr, "Zeit", "i")
Searching for Text Markup
To refer to standoff markup in text values, you must write your query in the complex schema.
A knora-api:TextValue
can have the property
knora-api:textValueHasStandoff
, whose objects are the standoff markup
tags in the text. You can match the tags you're interested in using
rdf:type
or other properties of each tag.
Matching Text in a Standoff Tag
The function knora-api:matchTextInStandoff
searches for standoff tags containing certain terms.
The implementation is optimised using the full-text search index if available. The
function takes three arguments:
- A variable representing a text value.
- A variable representing a standoff tag.
- A string literal containing space-separated search terms.
This function can only be used as the top-level expression in a FILTER
.
For example:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX standoff: <http://api.knora.org/ontology/standoff/v2#>
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
CONSTRUCT {
?letter knora-api:isMainResource true .
?letter beol:hasText ?text .
} WHERE {
?letter a beol:letter .
?letter beol:hasText ?text .
?text knora-api:textValueHasStandoff ?standoffParagraphTag .
?standoffParagraphTag a standoff:StandoffParagraphTag .
FILTER knora-api:matchTextInStandoff(?text, ?standoffParagraphTag, "Grund Richtigkeit")
}
Here we are looking for letters containing the words "Grund" and "Richtigkeit" within a single paragraph.
Matching Standoff Links
If you are only interested in specifying that a resource has some text
value containing a standoff link to another resource, the most efficient
way is to use the property knora-api:hasStandoffLinkTo
, whose subjects and objects
are resources. This property is automatically maintained by the API. For example:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
CONSTRUCT {
?letter knora-api:isMainResource true .
?letter beol:hasText ?text .
} WHERE {
?letter a beol:letter .
?letter beol:hasText ?text .
?letter knora-api:hasStandoffLinkTo ?person .
?person a beol:person .
?person beol:hasIAFIdentifier ?iafIdentifier .
?iafIdentifier knora-api:valueAsString "(VIAF)271899510" .
}
Here we are looking for letters containing a link to the historian
Claude Jordan, who is identified by his Integrated Authority File
identifier, (VIAF)271899510
.
However, if you need to specify the context in which the link tag occurs, you must
use the function knora-api:standoffLink
. It takes three arguments:
- A variable or IRI representing the resource that is the source of the link.
- A variable representing the standoff link tag.
- A variable or IRI representing the resource that is the target of the link.
This function can only be used as the top-level expression in a FILTER
.
For example:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX standoff: <http://api.knora.org/ontology/standoff/v2#>
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
CONSTRUCT {
?letter knora-api:isMainResource true .
?letter beol:hasText ?text .
} WHERE {
?letter a beol:letter .
?letter beol:hasText ?text .
?text knora-api:textValueHasStandoff ?standoffLinkTag .
?standoffLinkTag a knora-api:StandoffLinkTag .
FILTER knora-api:standoffLink(?letter, ?standoffLinkTag, ?person)
?person a beol:person .
?person beol:hasIAFIdentifier ?iafIdentifier .
?iafIdentifier knora-api:valueAsString "(VIAF)271899510" .
?standoffLinkTag knora-api:standoffTagHasStartParent ?standoffItalicTag .
?standoffItalicTag a standoff:StandoffItalicTag .
}
This has the same effect as the previous example, except that because we are matching
the link tag itself, we can specify that its immediate parent is a
StandoffItalicTag
.
If you actually want to get the target of the link (in this example, ?person
)
in the search results, you need to add a statement like
?letter knora-api:hasStandoffLinkTo ?person .
to the WHERE
clause and to the
CONSTRUCT
clause:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX standoff: <http://api.knora.org/ontology/standoff/v2#>
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
CONSTRUCT {
?letter knora-api:isMainResource true .
?letter beol:hasText ?text .
?letter knora-api:hasStandoffLinkTo ?person .
} WHERE {
?letter a beol:letter .
?letter beol:hasText ?text .
?text knora-api:textValueHasStandoff ?standoffLinkTag .
?standoffLinkTag a knora-api:StandoffLinkTag .
FILTER knora-api:standoffLink(?letter, ?standoffLinkTag, ?person)
?person a beol:person .
?person beol:hasIAFIdentifier ?iafIdentifier .
?iafIdentifier knora-api:valueAsString "(VIAF)271899510" .
?standoffLinkTag knora-api:standoffTagHasStartParent ?standoffItalicTag .
?standoffItalicTag a standoff:StandoffItalicTag .
?letter knora-api:hasStandoffLinkTo ?person .
}
Matching Standoff Dates
You can use the knora-api:toSimpleDate
function (see @refDate Comparisons)
to match dates in standoff date tags, i.e. instances of knora-api:StandoffDateTag
or
of one of its subclasses. For example, here we are looking for a text containing
an anything:StandoffEventTag
(which is a project-specific subclass of knora-api:StandoffDateTag
)
representing an event that occurred sometime during the month of December 2016:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>
PREFIX knora-api-simple: <http://api.knora.org/ontology/knora-api/simple/v2#>
CONSTRUCT {
?thing knora-api:isMainResource true .
?thing anything:hasText ?text .
} WHERE {
?thing a anything:Thing .
?thing anything:hasText ?text .
?text knora-api:textValueHasStandoff ?standoffEventTag .
?standoffEventTag a anything:StandoffEventTag .
FILTER(knora-api:toSimpleDate(?standoffEventTag) = "GREGORIAN:2016-12 CE"^^knora-api-simple:Date)
}
Matching Ancestor Tags
Suppose we want to search for a standoff date in a paragraph, but we know
that the paragraph tag might not be the immediate parent of the date tag.
For example, the date tag might be in an italics tag, which is in a paragraph
tag. In that case, we can use the inferred property
knora-api:standoffTagHasStartAncestor
. We can modify the previous example to
do this:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX standoff: <http://api.knora.org/ontology/standoff/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>
PREFIX knora-api-simple: <http://api.knora.org/ontology/knora-api/simple/v2#>
CONSTRUCT {
?thing knora-api:isMainResource true .
?thing anything:hasText ?text .
} WHERE {
?thing a anything:Thing .
?thing anything:hasText ?text .
?text knora-api:textValueHasStandoff ?standoffDateTag .
?standoffDateTag a knora-api:StandoffDateTag .
FILTER(knora-api:toSimpleDate(?standoffDateTag) = "GREGORIAN:2016-12-24 CE"^^knora-api-simple:Date)
?standoffDateTag knora-api:standoffTagHasStartAncestor ?standoffParagraphTag .
?standoffParagraphTag a standoff:StandoffParagraphTag .
}
Filtering on rdfs:label
The rdfs:label
of a resource is not a DSP-API value, but you can still search for it.
This can be done in the same ways in the simple or complex schema:
Using a string literal object:
?book rdfs:label "Zeitglöcklein des Lebens und Leidens Christi" .
Using a variable and a FILTER:
?book rdfs:label ?label .
FILTER(?label = "Zeitglöcklein des Lebens und Leidens Christi")
Using the regex
function:
?book rdfs:label ?bookLabel .
FILTER regex(?bookLabel, "Zeit", "i")
To match words in an rdfs:label
using the full-text search index, use the
knora-api:matchLabel
function, which works like knora-api:matchText
,
except that the first argument is a variable representing a resource:
FILTER knora-api:matchLabel(?book, "Zeitglöcklein")
Filtering on Resource IRIs
A FILTER
can compare a variable with another variable or IRI
representing a resource. For example, to find a letter whose
author and recipient are different persons:
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
CONSTRUCT {
?letter knora-api:isMainResource true .
?letter beol:hasAuthor ?person1 .
?letter beol:hasRecipient ?person2 .
} WHERE {
?letter a beol:letter .
?letter beol:hasAuthor ?person1 .
?letter beol:hasRecipient ?person2 .
FILTER(?person1 != ?person2) .
}
OFFSET 0
To find a letter whose author is not a person with a specified IRI:
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
CONSTRUCT {
?letter knora-api:isMainResource true .
?letter beol:hasAuthor ?person1 .
?letter beol:hasRecipient ?person2 .
} WHERE {
?letter a beol:letter .
?letter beol:hasAuthor ?person1 .
?letter beol:hasRecipient ?person2 .
FILTER(?person1 != <http://rdfh.ch/0801/F4n1xKa3TCiR4llJeElAGA>) .
}
OFFSET 0
CONSTRUCT Clause
In the CONSTRUCT
clause of a Gravsearch query, the variable representing the
main resource must be indicated with knora-api:isMainResource true
. Exactly
one variable representing a resource must be marked in this way.
Any other statements in the CONSTRUCT
clause must also be present in the WHERE
clause. If a variable representing a resource or value is used in the WHERE
clause but not in the CONSTRUCT
clause, the matching resources or values
will not be included in the results.
If the query is written in the complex schema, all variables in the CONSTRUCT
clause must refer to DSP-API resources, DSP-API values, or properties. Data below
the level of values may not be mentioned in the CONSTRUCT
clause.
Predicates from the rdf
, rdfs
, and owl
ontologies may not be used
in the CONSTRUCT
clause. The rdfs:label
of each matching resource is always
returned, so there is no need to mention it in the query.
Gravsearch by Example
In this section, we provide some sample queries of different complexity to illustrate the usage of Gravsearch.
Getting All the Components of a Compound Resource
In order to get all the components of a compound resource, the following Gravsearch query can be sent to the API.
In this case, the compound resource is an incunabula:book
identified
by the IRI http://rdfh.ch/0803/c5058f3a
and the components are of
type incunabula:page
(test data for the Incunabula project). Since
inference is assumed, we can use knora-api:StillImageRepresentation
(incunabula:page
is one of its subclasses). This makes the query more
generic and allows for reuse (for instance, a client would like to query
different types of compound resources defined in different ontologies).
ORDER BY
is used to sort the components by their sequence number.
OFFSET
is set to 0 to get the first page of results.
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
CONSTRUCT {
?component knora-api:isMainResource true . # marking of the component searched for as the main resource, required
?component knora-api:seqnum ?seqnum . # return the sequence number in the response
?component knora-api:hasStillImageFileValue ?file . # return the StillImageFile in the response
} WHERE {
?component a knora-api:StillImageRepresentation . # restriction of the type of component
?component knora-api:isPartOf <http://rdfh.ch/0803/c5058f3a> . # component relates to a compound resource via this property
?component knora-api:seqnum ?seqnum . # component must have a sequence number
?component knora-api:hasStillImageFileValue ?file . # component must have a StillImageFile
}
ORDER BY ASC(?seqnum) # order by sequence number, ascending
OFFSET 0 # get first page of results
The incunabula:book
with the IRI http://rdfh.ch/0803/c5058f3a
has
402 pages. (This result can be obtained by doing a count query; see
Submitting Gravsearch Queries.)
However, with OFFSET 0
, only the first page of results is returned.
The same query can be sent again with OFFSET 1
to get the next page of
results, and so forth. When a page of results is not full (see settings
in app/v2
in application.conf
) or is empty, no more results are
available.
By design, it is not possible for the client to get more than one page of results at a time; this is intended to prevent performance problems that would be caused by huge responses. A client that wants to download all the results of a query must request each page sequentially.
Let's assume the client is not interested in all of the book's pages,
but just in first ten of them. In that case, the sequence number can be
restricted using a FILTER
that is added to the query's WHERE
clause:
FILTER (?seqnum <= 10)
The first page starts with sequence number 1, so with this FILTER
only
the first ten pages are returned.
This query would be exactly the same in the complex schema, except for
the expansion of the knora-api
prefix:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
Traversing Multiple Links
Here we are looking for regions of pages that are part of books that have a particular title. In the simple schema:
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
CONSTRUCT {
?region knora-api:isMainResource true ;
knora-api:isRegionOf ?page .
?page incunabula:partOf ?book .
?book incunabula:title ?title .
} WHERE {
?region a knora-api:Region ;
knora-api:isRegionOf ?page .
?page a incunabula:page ;
incunabula:partOf ?book .
?book incunabula:title ?title .
FILTER(?title = "Zeitglöcklein des Lebens und Leidens Christi")
}
In the complex schema:
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
CONSTRUCT {
?region knora-api:isMainResource true ;
knora-api:isRegionOf ?page .
?page incunabula:partOf ?book .
?book incunabula:title ?title .
} WHERE {
?region a knora-api:Region ;
knora-api:isRegionOf ?page .
?page a incunabula:page ;
incunabula:partOf ?book .
?book incunabula:title ?title .
?title knora-api:valueAsString "Zeitglöcklein des Lebens und Leidens Christi" .
}
If we remove the line ?book incunabula:title ?title .
from the CONSTRUCT
clause, so that the CONSTRUCT
clause no longer mentions ?title
, the response
will contain the same matching resources, but the titles of those resources
will not be included in the response.
Requesting a Graph Starting with a Known Resource
Here the IRI of the main resource is already known, and we want specific information
about it, as well as about related resources. In this case, the IRI of the main
resource must be assigned to a variable using BIND
:
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
CONSTRUCT {
?letter knora-api:isMainResource true ;
beol:creationDate ?date ;
?linkingProp1 ?person1 .
?person1 beol:hasFamilyName ?familyName .
} WHERE {
BIND(<http://rdfh.ch/0801/_B3lQa6tSymIq7_7SowBsA> AS ?letter)
?letter a beol:letter ;
beol:creationDate ?date ;
?linkingProp1 ?person1 .
FILTER(?linkingProp1 = beol:hasAuthor || ?linkingProp1 = beol:hasRecipient)
?person1 beol:hasFamilyName ?familyName .
} ORDER BY ?date
This query would be the same in the complex schema, except for the prefix expansions:
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
Searching for a List Value Referring to a Particular List Node
Since list nodes are represented by their Iri in the complex schema, uniqueness is guranteed (as opposed to the simple schema). Also all the subnodes of the given list node are considered a match.
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>
CONSTRUCT {
?thing knora-api:isMainResource true .
?thing anything:hasListItem ?listItem .
} WHERE {
?thing anything:hasListItem ?listItem .
?listItem knora-api:listValueAsListNode <http://rdfh.ch/lists/0001/treeList02> .
}
Type Inference
Gravsearch needs to be able to determine the types of the entities that
query variables and IRIs refer to in the WHERE
clause. In most cases, it can
infer these from context and from the ontologies used. In particular, it needs to
know:
- The type of the subject and object of each statement.
- The type that is expected as the object of each predicate.
Type Annotations
When one or more types cannot be inferred, Gravsearch will return an error message
indicating the entities for which it could not determine types. The missing
information must then be given by adding type annotations to the query. This can always done by
adding statements with the predicate rdf:type
. The subject must be a resource or value,
and the object must either be knora-api:Resource
(if the subject is a resource)
or the subject's specific type (if it is a value).
For example, consider this query that uses a non-DSP property:
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX dcterms: <http://purl.org/dc/terms/>
CONSTRUCT {
?book knora-api:isMainResource true ;
dcterms:title ?title .
} WHERE {
?book dcterms:title ?title .
}
This produces the error message:
The types of one or more entities could not be determined:
?book, <http://purl.org/dc/terms/title>, ?title
To solve this problem, it is enough to specify the types of ?book
and
?title
; the type of the expected object of dcterms:title
can then be inferred
from the type of ?title
.
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX dcterms: <http://purl.org/dc/terms/>
CONSTRUCT {
?book knora-api:isMainResource true ;
dcterms:title ?title .
} WHERE {
?book rdf:type incunabula:book ;
dcterms:title ?title .
?title rdf:type xsd:string .
}
It would also be possible to annotate the property itself, using the predicate knora-api:objectType
;
then the type of ?title
would be inferred:
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX dcterms: <http://purl.org/dc/terms/>
CONSTRUCT {
?book knora-api:isMainResource true ;
dcterms:title ?title .
} WHERE {
?book rdf:type incunabula:book ;
dcterms:title ?title .
dcterms:title knora-api:objectType xsd:string .
}
Note that it only makes sense to use dcterms:title
in the simple schema, because
its object is supposed to be a literal.
Here is another example, using a non-DSP class:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT {
?person knora-api:isMainResource true .
} WHERE {
?person a foaf:Person .
?person foaf:familyName ?familyName .
FILTER(?familyName = "Meier")
}
This produces the error message:
Types could not be determined for one or more entities: ?person
The solution is to specify that ?person
is a knora-api:Resource
:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT {
?person knora-api:isMainResource true .
} WHERE {
?person a foaf:Person .
?person a knora-api:Resource .
?person foaf:familyName ?familyName .
FILTER(?familyName = "Meier")
}
Inconsistent Types
Gravsearch will also reject a query if an entity is used with inconsistent types. For example:
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
CONSTRUCT {
?book knora-api:isMainResource true ;
incunabula:pubdate ?pubdate .
} WHERE {
?book a incunabula:book ;
incunabula:pubdate ?pubdate .
FILTER(?pubdate = "JULIAN:1497-03-01") .
}
This returns the error message:
One or more entities have inconsistent types:
<http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#pubdate>
knora-api:objectType <http://api.knora.org/ontology/knora-api/simple/v2#Date> ;
knora-api:objectType <http://www.w3.org/2001/XMLSchema#string> .
?pubdate rdf:type <http://api.knora.org/ontology/knora-api/simple/v2#Date> ;
rdf:type <http://www.w3.org/2001/XMLSchema#string> .
This is because the incunabula
ontology says that the object of incunabula:pubdate
must be a knora-api:Date
,
but the FILTER
expression compares ?pubdate
with an xsd:string
. The solution is to specify the
type of the literal in the FILTER
:
PREFIX incunabula: <http://0.0.0.0:3333/ontology/0803/incunabula/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
CONSTRUCT {
?book knora-api:isMainResource true ;
incunabula:pubdate ?pubdate .
} WHERE {
?book a incunabula:book ;
incunabula:pubdate ?pubdate .
FILTER(?pubdate = "JULIAN:1497-03-01"^^knora-api:Date) .
}
Scoping Issues
SPARQL is evaluated from the bottom up.
A UNION
block therefore opens a new scope, in which variables bound at
higher levels are not necessarily in scope. This can cause unexpected results if queries
are not carefully designed. Gravsearch tries to prevent this by rejecting queries in the
following cases.
FILTER in UNION
A FILTER
in a UNION
block can only use variables that are bound in the same block, otherwise the query will be rejected. This query is invalid because ?text
is not bound in the UNION
block containing the FILTER
where the variable is used:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX mls: <http://0.0.0.0:3333/ontology/0807/mls/simple/v2#>
CONSTRUCT {
?lemma knora-api:isMainResource true .
?lemma mls:hasLemmaText ?text .
} WHERE {
?lemma a mls:Lemma .
?lemma mls:hasLemmaText ?text .
{
?lemma mls:hasPseudonym ?pseudo .
FILTER regex(?pseudo, "Abel", "i") .
} UNION {
FILTER regex(?text, "Abel", "i") .
}
}
ORDER BY ASC(?text)
OFFSET 0
It can be corrected like this:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX mls: <http://0.0.0.0:3333/ontology/0807/mls/simple/v2#>
CONSTRUCT {
?lemma knora-api:isMainResource true .
?lemma mls:hasLemmaText ?text .
} WHERE {
?lemma a mls:Lemma .
?lemma mls:hasLemmaText ?text .
{
?lemma mls:hasPseudonym ?pseudo .
FILTER regex(?pseudo, "Abel", "i") .
} UNION {
?lemma mls:hasLemmaText ?text .
FILTER regex(?text, "Abel", "i") .
}
}
ORDER BY ASC(?text)
OFFSET 0
ORDER BY
A variable used in ORDER BY
must be bound at the top level of the WHERE
clause. This query is invalid, because ?int
is not bound at the top level of the WHERE
clause:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>
CONSTRUCT {
?thing knora-api:isMainResource true .
?thing anything:hasInteger ?int .
?thing anything:hasRichtext ?richtext .
?thing anything:hasText ?text .
} WHERE {
?thing a knora-api:Resource .
?thing a anything:Thing .
{
?thing anything:hasRichtext ?richtext .
FILTER knora-api:matchText(?richtext, "test")
?thing anything:hasInteger ?int .
}
UNION
{
?thing anything:hasText ?text .
FILTER knora-api:matchText(?text, "test")
?thing anything:hasInteger ?int .
}
}
ORDER BY (?int)
It can be corrected like this:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>
CONSTRUCT {
?thing knora-api:isMainResource true .
?thing anything:hasInteger ?int .
?thing anything:hasRichtext ?richtext .
?thing anything:hasText ?text .
} WHERE {
?thing a knora-api:Resource .
?thing a anything:Thing .
?thing anything:hasInteger ?int .
{
?thing anything:hasRichtext ?richtext .
FILTER knora-api:matchText(?richtext, "test")
}
UNION
{
?thing anything:hasText ?text .
FILTER knora-api:matchText(?text, "test")
}
}
ORDER BY (?int)
Query Optimization by Dependency
The query performance of triplestores, such as Fuseki, is highly dependent on the order of query patterns. To improve performance, Gravsearch automatically reorders the statement patterns in the WHERE clause according to their dependencies on each other, to minimise the number of possible matches for each pattern.
Consider the following Gravsearch query:
PREFIX beol: <http://0.0.0.0:3333/ontology/0801/beol/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
CONSTRUCT {
?letter knora-api:isMainResource true .
?letter ?linkingProp1 ?person1 .
?letter ?linkingProp2 ?person2 .
?letter beol:creationDate ?date .
} WHERE {
?letter beol:creationDate ?date .
?letter ?linkingProp1 ?person1 .
FILTER(?linkingProp1 = beol:hasAuthor || ?linkingProp1 = beol:hasRecipient )
?letter ?linkingProp2 ?person2 .
FILTER(?linkingProp2 = beol:hasAuthor || ?linkingProp2 = beol:hasRecipient )
?person1 beol:hasIAFIdentifier ?gnd1 .
?gnd1 knora-api:valueAsString "(DE-588)118531379" .
?person2 beol:hasIAFIdentifier ?gnd2 .
?gnd2 knora-api:valueAsString "(DE-588)118696149" .
} ORDER BY ?date
Gravsearch optimises the performance of this query by moving these statements to the top of the WHERE clause:
?gnd1 knora-api:valueAsString "(DE-588)118531379" .
?gnd2 knora-api:valueAsString "(DE-588)118696149" .
The rest of the WHERE clause then reads:
?person1 beol:hasIAFIdentifier ?gnd1 .
?person2 beol:hasIAFIdentifier ?gnd2 .
?letter ?linkingProp1 ?person1 .
FILTER(?linkingProp1 = beol:hasAuthor || ?linkingProp1 = beol:hasRecipient )
?letter ?linkingProp2 ?person2 .
FILTER(?linkingProp2 = beol:hasAuthor || ?linkingProp2 = beol:hasRecipient )
?letter beol:creationDate ?date .