How to build queries
The simplest form of a query is a full-text search. For more complex queries than a full-text search, DSP-API offers a special query language called Gravsearch which is based on the query language SPARQL (SPARQL Protocol And RDF Query Language).
Full-text search
DSP-API offers a full-text search that searches all textual representations of values and rdfs:label
-labels of resources. The full-text search supports the Lucene Query Parser syntax.
Search terms can be seperated by a white space - then they will be combined using the Boolean operator OR
since this is Lucene's default operator. Be aware, that the search terms have to be URL encoded!
The recommended way to submit a full-text query is via HTTP GET
in the following form.
HTTP GET request sent to https://api.dasch.swiss/v2/search/searchValue[limitToResourceClass=resourceClassIRI][limitToStandoffClass=standoffClassIri][limitToProject=projectIRI][offset=Integer]
Most of these parameters can be set optionally:
limitToResourceClass=resourceClassIRI
: this restricts the search to resources of the specified resource class and its subclasses.LimitToStandoffClass=standoffClassIRI
: this will force DSP-API to look for search terms that are marked up with the indicated standoff class.limitToProject=projectIRI
: this restricts the search to resources of the specified project.offset=Integer
: this parameter can be used to enable paging and to go through all the results request by request. The default value for the parameteroffset
is 0 which will return the first page of search results. Subsequent pages can be retrieved by increasing the parameteroffset
by one. This means,offset=1
fetches the second page of serach results,offset=2
the third, and so on. The amount of retrieved search results per page is defined inapp/v2
in the fileapplication.conf
.
The first search parameter has to be preceded by a question mark ?
and any following parameter by an ampersand &
. A search value must have a minimal length of three characters (default value) as defined in app/v2
in the file application.conf
.
Wildcards may be used.
- An ampersand
&
represents a single missing character. However, it has to be URL encoded as%3F
because it has a special meaning in the URL syntax. - An asterisk
*
represents zero, one or multiple missing characters. The URL encode of an asterisk is%2A
.
To request the number of results rather than the results themselves, a count query can be made, where the first parameter has to be preceded by a question mark ?
and any following parameter by an ampersand &
:
HTTP GET request sent to https://api.dasch.swiss/v2/search/count/searchValue[limitToResourceClass=resourceClassIRI][limitToStandoffClass=standoffClassIri][limitToProject=projectIRI][offset=Integer]
To combine two search terms with boolean AND
the two search terms have to be combined with AND
:
A AND B
For expressing that a query MUST contain the first search term and MAY contain the second search term, the plus sign (+
; required operator) can be used.
+A B
Sometimes it may be useful to search for one term that doesn't appear in connection with another term. For such a query, the NOT
operator can be used:
A NOT B
Be aware that a NOT
search with only one term is not possible, there have to be two search terms!
The minus sign (-
; prohibit operator) enables to exclude hits that contain the term after the prohibit operator:
A -B
A short note on the following examples
All examples in the following sections are from Bernoulli-Euler-Online (BEOL) which is a research platform for the study of early modern mathematics and science implemented within DSP-API.
Examples of full-text searches
Let's assume we intend to search for the term "Bernoulli" with and without wildcard options. To send HTTP GET requests you can use a program such as Postman or simply use the Firefox request detail panel or a similar tool in your favorite browser. For HTTP POST requests, a program such as Postman has to be used.
The query
HTTP GET request sent to https://api.dasch.swiss/v2/search/Bernoulli
retrieves all resources containing the term "Bernoulli". The beginning of the request result when using Postman and chosing "JSON" as output format looks as follows:
{
"@graph": [
{
"@id": "http://rdfh.ch/0801/--rbZIzLTNC4qrcpAAkjwA",
"@type": "beol:letter",
"beol:hasText": {
"@id": "http://rdfh.ch/0801/--rbZIzLTNC4qrcpAAkjwA/values/wiWf4Ta_QX2jYOiRcUNZlQ",
"@type": "knora-api:TextValue",
"knora-api:arkUrl": {
"@type": "xsd:anyURI",
"@value": "http://ark.dasch.swiss/ark:/72163/1/0801/==rbZIzLTNC4qrcpAAkjwAc/K0Ik_ElfQtOOTkgn4h4MnQQ"
},
We learn that our search term "Bernoulli" appears in a letter which belongs to the BEOL project. If we possess the necessary permissions, the web adress given in @value
copied to a browser guides us to the corresponding entry of the letter and all of its available additional information in the DSP-APP.
The following query with the URL encoded (%3F
) wildcard &
HTTP GET request sent to https://api.dasch.swiss/v2/search/Bernoull%3F
retrieves all resources containing the term "Bernoulli" in the same way as in the preceeding example.
The wildcard *
(URL encoded %2A
) in the following query
HTTP GET request sent to https://api.dasch.swiss/v2/search/Bern%2A
retrieves all resources containing the term "Bernoulli", but would also retrieve hits for e.g. "Bern" or "Bernina".
To search only for the number of hits for the term "Bernoulli" the following count query can be used:
HTTP GET request sent to https://api.dasch.swiss/v2/search/count/Bernoulli
The answer to this query is at the moment "1989":
{
"schema:numberOfItems": 1989,
"@context": {
"schema": "http://schema.org/"
}
}
If we only intend to search within the BEOL project for the term "Bern*", we can specify this in our full-text query, but we have to know the projectIRI of the BEOL project which is http://rdfh.ch/projects/yTerZGyxjZVqFMNNKXCDPF1
:
HTTP GET request sent to https://api.dasch.swiss/v2/search/Bern%2A?limitToProject=http://rdfh.ch/projects/yTerZGyxjZVqFMNNKXCDPF
For the use of wildcards just replace the letter(s) in the search term by one of the wildcards. Be aware that the first parameter after the search term has to be preceded by a question mark ?
. However, any following parameter has to be preceded by an ampersamd &
! Thus, a successful request with two parameters looks as follows:
HTTP GET request sent to https://api.dasch.swiss/v2/search/Bernoulli?limitToProject=http://rdfh.ch/projects/yTerZGyxjZVqFMNNKXCDPF&offset=1
We added here, that we want to see the second page of the results instead of the first one which is the default value (offset=0).
To combine the two search terms "Bernoulli" and "Euler" the logical operator AND
has to be used:
HTTP GET request sent to https://api.dasch.swiss/v2/search/Bernoulli%20AND%20Euler
For expressing that a query must contain the first search term and may contain the second search term, the required operator +
can be used:
HTTP GET request sent to https://api.dasch.swiss/v2/search/+Bernoulli%20Euler
Here it is required that the search results contain the term "Bernoulli", but it is optional that they contain the second term "Euler".
Sometimes we may be interested in searching for one term that doesn't appear in connection with another term. For such a query, the NOT
operator can be used:
https://api.dasch.swiss/v2/search/Bernoulli%20NOT%20Euler
With this query we are searching for everything that contains Bernoulli, but not Euler.
If we want to exclude all search results that contain "Euler", the prohibit operator -
has to be used:
https://api.dasch.swiss/v2/search/Bernoulli%20-Euler
Count queries for the above mentioned combinations of the search terms "Bernoulli" and "Euler" restricted to the BEOL project currently (Dec. 2019) produces the following numbers: |search term|number| |-----|:----:| |Bernoulli|1937| |Euler|1435| |Bernoulli%20AND%20Euler|165| |Bernoulli%20NOT%20Euler|1775| |Bernoulli%20OR%20Euler|3201| |Bernoulli%20Euler|3201| |+Bernoulli%20Euler|1937| |Bernoulli%20-Euler|1775|
SPARQL queries
SPARQL is a set of specifications that provide languages and protocols to query and manipulate RDF graph content on the Web or in an RDF triplestore. In the following the Turtle data format will be used to show each triple.
The following introduction to SPARQL draws heavily on the book of Bob DuCharme, Learning SPARQL. Querying and Updating with SPARQL 1.1, 2nd edition, Sebastopol 2013 which we warmly recommend for reading.
The most basic keywords are SELECT
and WHERE
. Moreover, a very useful keyword is LIMIT
.
SELECT
names which part of the data we want to see. If all data should be shown, useSELECT *
, the asterisk functions as wildcard.WHERE
is used to formulate a clause - it decides which data should be retrieved from the whole dataset.LIMIT
limits the query to the first e.g. 20 (LIMIT 20
) or 100 hits. The limit has to be specified outside the curly braces of theWHERE
-clause. If you've no idea what the test data are about or if you know that the dataset is large, it's very useful to limit your query in order not to ask too much from the query endpoint. Be aware, that if you're sorting your data, theLIMIT
keyword will only apply after all data have been retrieved and sorted! Thus,LIMIT
will give you the first results of the sorted data.
Query forms
In SPARQL four different query forms do exist: SELECT
, CONSTRUCT
, ASK
and DESCRIBE
.
- The most common one is probably
SELECT
which enables requesting data from a collection. - Using
CONSTRUCT
instead will return triples. Triples can be retrieved without modifying them or the retrieved triples can be used to generate new triples.CONSTRUCT
enables to copy, create and convert RDF data, and it makes it easier to identify data that do not conform to specific rules. - Using
ASK
asks the processor whether a certain given graph pattern fits a set of triples in the requested dataset or not. The processor returns a booleantrue
orfalse
. DESCRIBE
asks for triples which describe a particular resource. This query form isn't popular because different processors return different triples as a description of the named resource.
Our dsp-api
only allows CONSTRUCT
requests and a limited set of SPARQL keywords. Thus, the main part of the following examples to illustrate the possibilities of SPARQL use the SPARQL endpoint of Wikipedia called DBpedia accessible at http://dbpedia.org/snorql/. There, the following prefixes are predefined:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
Additional prefixes can be defined in the query window before the SELECT
statement, e.g.
PREFIX knora-api: <https://api.dasch.swiss/ontology/knora-api/v2#>
A query asking for all data
If we have no idea what our data is all about, a nice first query is the following - but don't forget to set a limit because it asks for all the triples in the dataset!
Using your local DSP-API installation
Go to http://0.0.0.0:7200/sparql
in your browser which will guide you to the GraphDB SPARQL Query & Update page if you've DSP-API installed locally. Copy the following code into the window:
PREFIX knora-api: <https://api.dasch.swiss/ontology/knora-api/v2#>
select * where {
?s ?p ?o .
} limit 100
This query retrieves the first 100 triples of the data. The asterisk after SELECT
indicates that all variables should be selected to get bound in this query. Setting a limit with the keyword LIMIT
prevents a server timeout.
Using Postman and HTTP POST
If you haven't installed DSP-API locally or if you want to run the same query on our live-server, you'll see that this query isn't allowed because it's supposed to be too broad to be meaningful. Additionally, the query has to be adapted because only CONSTRUCT
queries are allowed, the main resource has to be named and the keyword LIMIT
is not supported. Remember, that all variables in the CONSTRUCT
must appear also in the WHERE
-clause! Open Postman and chose
HTTP POST sent to https://api.dasch.swiss/v2/searchextended
Then click "Body" and chose "raw". Theoretically, the adapted query asking for all resources in DSP-API would look like the following:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
CONSTRUCT {
?s knora-api:isMainResource true .
}
WHERE {
?s a knora-api:Resource .
}
This is a syntactically correct query, but it returns no results due to access limitations.
However, we can ask for all resources of a certain kind, e.g. for all who are a foaf:Person
:
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT {
?s knora-api:isMainResource true .
}
WHERE {
?s a knora-api:Resource .
?s a foaf:Person .
}
If you do a count search, i.e. sending a POST request to https://api.dasch.swiss/v2/searchextended/search
, you retrieve 2643 results at the moment.
A query asking for one specific variable
If we are interested in works of the artist Marc Chagall, we can send a first query to the DBpedia endpoint, asking for information about Chagall with results stored in the variable ?artist
:
SELECT ?artist
WHERE
{
<http://dbpedia.org/resource/Chagall> ?p ?artist .
}
The angle brackets surrounding "http://dbpedia.org/resource/Chagall" tell the processor that it is an URI. The query retrieves the following results in tabular form (if the simple output format "Browse" is chosen): |artist| |-----| |"Chagall"@en| |158693| |600433204| |:Marc_Chagall link| |http://en.wikipedia.org/wiki/Chagall link| |http://en.wikipedia.org/wiki/Chagall?oldid=600433204 link|
After having verified in this way that http://dbpedia.org/resource/Chagall is a valid URI, we can ask for properties and values with the following query:
SELECT ?property ?hasValue
WHERE {
<http://dbpedia.org/resource/Marc_Chagall> ?property ?hasValue .
}
One of the results obtained is a property dbpedia:ontology/wikiPageExternalLink
with the value "<http://www.thejewishmuseum.org/collection/search?artist=Marc%20Chagall%2C%20French%2C%20b.%20Belorussia%2C%201887-1985>
. Following this link leads us to the homepage of the Jewish Museum in New York and the works of Marc Chagall the museum possesses.
If you intend to query data that isn't stored locally, the keyword FROM
could be used to specify the data set. It is also possible to read data from a file when using the FROM
keyword. If the data are stored in a file named example.ttl
stored in the same folder as the query itself, then the SELECT
-part of the request looks as follows:
SELECT * <example.ttl>
WHERE {
...
}
A query with matching on multiple triples
The following query uses a ?films
variable to tie together two triple patterns in the WHERE
-clause.
SELECT ?films
WHERE {
?films <http://dbpedia.org/ontology/starring> :Iggy_Pop .
?films <http://dbpedia.org/ontology/starring> :Johnny_Depp .
}
Such a set of triple patterns within curly braces is called a graph pattern. The ?films
variable is used in two different triples to find connected triples in the queried data. The first triple looks for films starring Iggy Pop, the second triple then searches if in one of these films also Johnny Depp was involved. The result shows all the films these actors played in together: Cry Baby and Dead Man.
Searching for strings and values
The keyword FILTER
tells the query processor to return only those triples which meet the given condition in the filter. This time we're searching for films starring David Bowie, but only those which have a runtime longer than 89 minutes.
SELECT ?BowieFilm ?runtime
WHERE {
?BowieFilm <http://dbpedia.org/ontology/starring> :David_Bowie .
?BowieFilm <http://dbpedia.org/ontology/Work/runtime> ?runtime .
FILTER (xsd:double(?runtime) > 89.0 )
}
ORDER BY xsd:double(?runtime)
The keyword ORDER BY
allows to sort the results with runtime in increasing order. Be aware that it is necessary to specify the datatype xsd:double
of the ?runtime
variable, otherwise the filter and also the sorting doesn't work properly!
Searching for data that may exist or not
The keyword OPTIONAL
can be used to express "show this value if it exists".
SELECT *
WHERE
{
<http://dbpedia.org/resource/Pink_Floyd> <http://dbpedia.org/ontology/formerBandMember> ?members .
OPTIONAL {
?members <http://dbpedia.org/ontology/deathDate> ?deathDate .
}
}
The above query asks for the former members of the band Pink Floyd and for their death date if they passed away already. If they still enjoy their life, their names are listed anyway since the additional information about the death date is set in an OPTIONAL
graph pattern. If there is more than one OPTIONAL
triple pattern, they
are processed in the order the processor sees them. Hence, the order of OPTIONAL
patterns matters - this fact can be taken advantage of. However, excessive use of the keyword OPTIONAL
can slow down the queries when dealing with large datasets!
Searching for data that doesn't meet a condition
While cleaning up data it is very useful if one can search for missing entries. FILTER NOT EXISTS
is a filter condition that returns a boolean value true
if the specified graph pattern doesn't exist.
SELECT *
WHERE
{
?artist rdf:type <http://dbpedia.org/ontology/Artist> .
FILTER NOT EXISTS {?artist dbpedia2:works ?works }
}
LIMIT 1000
When searching dbpedia data I realised that the information provided for different artists isn't very consistent. If we would intend to homogenise the basic information given for each artist we could use the above query to identify those for which the basic information about at least some of their works is still missing.
Another possibility is to use the keyword MINUS
. As its name suggests, it subtracts from a group A all those which satisfy the condition. The last query using the keyword MINUS
looks as follows:
SELECT *
WHERE
{
?artist rdf:type <http://dbpedia.org/ontology/Artist> .
MINUS {?artist dbpedia2:works ?works }
}
LIMIT 1000
Be aware that FILTER NOT EXISTS
and MINUS
may return different results!
A joined search
To link up different sets of data the same variable can be in the object position in one triple and in the subject position of another triple as in the following example:
SELECT DISTINCT ?actor ?value
WHERE
{
?film <http://dbpedia.org/ontology/starring> ?actor .
?actor <http://dbpedia.org/ontology/birthYear> ?value .
}
LIMIT 1000
In this query with the first triple we ask for all actors, in the second triple for the birth dates of the actors. The DISTINCT
keyword tells the processor that duplicate answers shouldn't be displayed, thus, redundant results are eliminated. Without the DISTINCT
keyword we would receive lots of identical responses since most actors played in more than one film.
Whether the queried data stem from the same repository or from different ones doesn't matter as long there are resource URIs in one dataset that can be paralleled with resource URIs in another.
Asking about patterns
Property paths are a way to express more extensive patterns to look for. They provide a possibility to search for the requested data and in addition to that keep looking for more data. An example where this may be useful is the citation pattern of a certain paper. By simply adding a +
sign we can tell the query processor to look for papers that cite paper A, and papers that cite those, etc. until the tree of papers is finished. The plus sign means "one or more". One could use an asterisk instead, which means "zero or more". It is also possible to be much more specific by using a property path which asks for papers that are exactly three links away, i.e. papers that cited papers that cited papers that cited paper A. This is laid out by a a series of steps separated by slashes. The WHERE
-clauses of such queries could look like as follows:
WHERE -clause |
meaning |
---|---|
WHERE {?s :cites :paperA .} |
papers that cited paper A. |
WHERE {?s :cites+ :paperA .} |
one or more links away: papers that cited paper A, and papers that cited those, and papers that cited those, etc. |
WHERE {?s :cites* :paperA .} |
zero or more links away: papers that cited paper A, and papers that cited those, and papers that cited those, etc. |
WHERE {?s :cites/:cites/:cites :paperA .} |
three links away: papers that cited papers that cited paper A. |
Property paths are powerful to ask about patterns in all kinds of datasets. They are also very useful to cope with data which has been prepared or published with differing naming conventions. One such specific example is if we're interested in the total population of the British Islands:
CONSTRUCT {
?country <http://dbpedia.org/ontology/populationTotal> ?pop
} WHERE {
VALUES ?country {
:England
:Scotland
:Wales
:Northern_Ireland
:Ireland
}
?country <http://dbpedia.org/property/populationCensus> | <http://dbpedia.org/ontology/populationTotal> ?pop .
}
For all countries except Ireland the corresponding number is returned when asking for <http://dbpedia.org/property/populationCensus>
. To find Ireland's population we need to use another predicate, namely <http://dbpedia.org/ontology/populationTotal>
. Use of the so-called alternative property path |
as in the above example allows searching for both options at the same time. Furthermore, using CONSTRUCT
instead of SELECT
enables to connect each country to its population with <http://dbpedia.org/ontology/populationTotal>
, thus returning consistent triples!
Combining search conditions
The UNION
keyword allows to specify multiple different graph pattern and to ask for a combination of all the data which fit any of these patterns. No connection between the sets of data need to be specified. However, probably it is more useful to use UNION
for retrieving two overlapping sets of data.
SELECT *
WHERE
{
{?artist rdf:type <http://dbpedia.org/ontology/Artist> .
?artist <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/Russia> .
?artist <http://dbpedia.org/ontology/deathPlace> <http://dbpedia.org/resource/Paris> . }
UNION
{?artist rdf:type <http://dbpedia.org/ontology/Artist> .
?artist <http://dbpedia.org/ontology/deathPlace> <http://dbpedia.org/resource/Saint-Paul-de-Vence> . }
}
The above query asks for all artists that were born in Russia and died in Paris and for all those artists who died in St. Paul de Vence.
Filtering data
The keyword FILTER
takes a single argument and it is used to retrieve those triples that match the filter argument. The filter argument can contain strings, a function or a simple comparison. Here are a few examples of filter arguments and the corresponding queries:
Filter argument | Meaning |
---|---|
FILTER (regex(?o, "Leiden","i")) |
searches in the ?o value for the substring "Leiden", the "i" means case insensitivity. |
FILTER (?o < 10) |
searches for all items with ?o values of less than 10. |
FILTER (!(isURI(?city))) |
returns a boolean true if the value for ?city isn't a proper URI, the ! means "not". |
SELECT *
WHERE
{
?s rdfs:label ?o .
FILTER (regex(?o, "Leiden","i"))
}
LIMIT 1000
SELECT DISTINCT ?actor ?year
WHERE
{
?film <http://dbpedia.org/ontology/starring> ?actor .
?actor <http://dbpedia.org/ontology/birthDate> ?year .
FILTER (xsd:date(?year) > "1980-01-01"^^xsd:date)
}
LIMIT 1000
Be aware that it is necessary to specify the datatype xsd:date
of the ?year
variable, otherwise the filter doesn't work properly.
SELECT *
WHERE
{
<http://dbpedia.org/resource/Leiden> a ?o .
FILTER (!(isURI(?o)))
}
This query retrieves no results since all values for ?o
for the city of Leiden in dbpedia are proper URIs.
Skip a number of results
The OFFSET
keyword can be used to tell the processor to skip a certain number of search results before picking those to return. This is useful for paging. Just type OFFSET
and an integer number (e.g. OFFSET 5
) after the last closing curly brace of the query.
Subqueries
Subqueries are queries inside queries. With the help of subqueries a complex query can be broken down into easier manageable parts. Each subquery must be enclosed in its own set of curly braces.
SELECT ?capital ?latitude
WHERE {
{
SELECT ?capital
WHERE { ?capital ?p <http://dbpedia.org/resource/Category:Capitals_in_Europe> . }
}
{
SELECT *
WHERE { ?capital <http://dbpedia.org/property/latd> ?latitude .
FILTER (?latitude < 52.0)}
}
}
ORDER BY DESC(?latitude)
This query first asks for all capital cities of Europe. The subquery then retrieves the corresponding geographical latitudes and filters out all of them with latitudes larger than 52.0°. Finally, the results are sorted in decreasing order of geographical latitude.
Storing results in variables
The AS
keyword can be used to store the query result in a variable.
SELECT (SUM(?height) AS ?total)
WHERE {
?skyscrapers <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Skyscrapers_in_Manhattan> .
?skyscrapers <http://dbpedia.org/ontology/height> ?height .
}
The above example asks for the heights of all skyscrapers in Manhattan, the sum of all their heights is calculated and then stored in the variable ?total
. The SUM()
function adds up all values.
If we're not interested in the sum of the heights of the skyscrapers in Manhattan, but rather in the tallest, the shortest or the average value, the functions MAX()
, MIN()
or AVG()
can be used - they enable to find the maximum, minmum or the average value. Just replace SUM
in the query by MAX
, MIN
or AVG
.
A more common way of assigning a value to a variable is with the keyword BIND
. Arithmetic expressions can be used when a new value is created with the help of the keyword BIND
: +
for addition, -
for subtraction, *
for multiplication and /
for division.
SELECT ?elmax ?elmin ?eldiff
WHERE {
<http://dbpedia.org/resource/Rouffignac> <http://dbpedia.org/ontology/maximumElevation> ?elmax .
<http://dbpedia.org/resource/Rouffignac> <http://dbpedia.org/ontology/minimumElevation> ?elmin .
BIND (?elmax - ?elmin AS ?eldiff)
}
We asked for the maximum and minimum elevation of the cave of Rouffignac, calculated the elevation difference and stored it in a variable ?eldiff
with the help of the keyword BIND
.
Sorting
The keyword ORDER BY
enables the sorting of data. Values can e.g. be sorted in ascending or descending order - the ascending order is the default, for a descending order the sort key has to be wrapped in the DESC()
function.
SELECT ?BowieFilm ?runtime
WHERE {
?BowieFilm <http://dbpedia.org/ontology/starring> :David_Bowie .
?BowieFilm <http://dbpedia.org/ontology/Work/runtime> ?runtime .
}
ORDER BY DESC(xsd:double(?runtime))
This example is almost identical to the one in the section "Searching for strings and values", but we eliminated the filter and now we list all films in decreasing order of runtime. Be aware that it is necessary to specify the datatype xsd:double
of the ?runtime variable, otherwise the filter and also the sorting doesn't work properly!
Sorting on multiple keys is possible if the key value names are separated by spaces:
SELECT ?height ?floors
WHERE {
?skyscrapers <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Skyscrapers_in_Manhattan> .
?skyscrapers <http://dbpedia.org/ontology/height> ?height .
?skyscrapers <http://dbpedia.org/ontology/floorCount> ?floors .
}
ORDER BY DESC(?floors) DESC(?height)
This query returns all skyscrapers of Manhattan sorted by their number of floors and in the second place by their height.
Querying a remote endpoint
The SERVICE
keyword provides a possibility to query remote data from a distant SPARQL endpoint. The usual way to formulate such a query is the following with an outer SELECT
or CONSTRUCT
indicating which values we're interested in:
PREFIX xy: <http://your.external.querypoint>
SELECT ?p ?o
WHERE {
SERVICE <http://your.external.querypoint/sparql>
{xy:Person_XY ?p ?o}
}
However, the SPARQL Explorer for DBpedia doesn't allow such queries to external query points. Such requests have to be sent with e.g. python scripts - see section "Python example scripts".
Queries using CONSTRUCT
In contrast to a SELECT
query, a CONSTRUCT
query specifies a triple to create with each set of values that got bound to the three variables while it has the same graph pattern following the WHERE
keyword. Thus, the following SELECT
version of a query asking for all information - subjects, predicates and objects of all triples - in DBpedia where the same subject has a foaf:givenName
value of "Peter" and a foaf:surname
value of "Morris"
SELECT ?person ?p ?o
WHERE {
?person <http://xmlns.com/foaf/0.1/givenName> "Peter"@en ;
<http://xmlns.com/foaf/0.1/surname> "Morris"@en ;
?p ?o .
}
corresponds to the following CONSTRUCT
version:
CONSTRUCT
{ ?person ?p ?o . }
WHERE {
?person <http://xmlns.com/foaf/0.1/givenName> "Peter"@en ;
<http://xmlns.com/foaf/0.1/surname> "Morris"@en ;
?p ?o .
}
Query efficiency
Sometimes there are different possibilities for a query to ask for the same set of information which differ in efficiency. The heart of any query is the WHERE
-clause and the order of its components and the functions it calls can speed things up or slow them down. Although the order of a graph pattern's triple in a WHERE
-clause does not effect the query results, the ordering may have a huge effect on the speed of the query's execution. Here are a few rules of thumb one should keep in mind:
* You can speed up your searches if you reduce the search space as much as possible as soon as possible.
* The keywort OPTIONAL
will very likely slow your query down considerably. The best optimization is to avoid the use of OPTIONAL
whenever possible.
* Also the order of triple pattern influences the execution time. The fewer triples a triple pattern matches against the more it narrows down the search space and hence the faster the query processor will finish its job. Usually, a triple pattern with more unbound variables will match against more triples than a triple pattern with less unbound variables. Furthermore, the variable's position in the triple matters: a given dataset is more likely to have the same property in the predicate position of a large number of triples than in the subject position.
* Moving a FILTER
statement earlier can help to reduce the serach space as long as all variables referenced in the FILTER
statement have already been bound.
* The use of property paths can be expensive in terms of execution time.
Outside the WHERE
-clause the following points should be kept in mind:
* Naming fewer variables in the SELECT
statement can make the query run faster.
* Sorting of the returned values with ORDER BY
costs some time.
SPARQL Query Result Formats
The retrieved results can usually be returned in different formats, suitable for the different needs of different target groups. Standardized formats are: Extensible Markup Language (XML), JavaScript Object Notation (JSON), Comma-Separated Value (CSV) and Tabular-Separated Value (TSV). These formats do not only differ in the syntax used to represent the query results, but also in the amount of metadata provided with the results. For our purposes especially the XML and the JSON formats are useful.
SPARQL Query Result XML Format
The SPARQL query result XML format describes a standard XML format for returning the results of a SPARQL query. The structure of a returned XML document is as follows:
* The document element is called sparql
and it has two child elements - the head
element lists the selected variable names and the results
element contains the actual results.
* The returned results are stored in result
children of the results
element with a binding
child for each bound variable.
SPARQL Query Result JSON Format
A JSON object is defined as “an unordered collection of zero or more name-value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.” The syntax is as follows: * Objects are in curly braces. * A name-value pair is separated by a colon. The name has to be unique within an object. * A list of name-value pairs is delimited by commas. * Arrays are in square braces.
The results of a SELECT
query in JSON format are a table. The uppermost JSON object has two name-value pairs head
and results
- both have an object as their values. The variables the SELECT
query asked for a stored in the vars
value of the head
object. The most important part of the results
element is its bindings
object. Each object in the bindings
array has a name-value pair for each requested variable.
Python example scripts
If you have Python 3 installed on your computer the following SPARQL query can be stored in a file testxml.py
and run from the directory where the file is stored with the command python testxml.py
.
We query the dbpedia SPARQL endpoint for the ID of Jakob Bernoulli and return the result in XML format.
# Send SPARQL query to dbpedia SPARQL endpoint, store and output result in XML format.
import urllib.request
endpointURL = "http://dbpedia.org/sparql"
query = """
SELECT ?bernoulliID WHERE {
<http://dbpedia.org/resource/Jakob_Bernoulli>
<http://dbpedia.org/ontology/wikiPageID> ?bernoulliID . }
"""
escapedQuery = urllib.parse.quote(query)
requestURL = endpointURL + "?query=" + escapedQuery
request = urllib.request.Request(requestURL)
result = urllib.request.urlopen(request)
print (result.read())
The output in XML format looks as follows:
<sparql xmlns="http://www.w3.org/2005/sparql-results#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd">
<head>
<variable name="bernoulliID"/>
</head>
<results distinct="false" ordered="true">
<result>
<binding name="bernoulliID">
<literal datatype="http://www.w3.org/2001/XMLSchema#integer">7500054</literal>
</binding>
</result>
</results>
</sparql>
We are told that the bound variable is named "bernoulliID" and we receive the information that the questioned ID is a literal value, namely the integer 7500054.
The following SPARQL query can be stored in a file testjson.py
and run from the directory where the file is stored with the command python testjson.py
.
We query the dbpedia SPARQL endpoint for the labels of the resource "Switzerland" and return the result in JSON format.
# Query dbpedia SPARQL endpoint about labels of resource Switzerland
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
WHERE { <http://dbpedia.org/resource/Switzerland> rdfs:label ?label }
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
if (len(results["results"]["bindings"]) == 0):
print ("No results found.")
else:
for result in results["results"]["bindings"]:
print(result["label"]["value"])
The output in JSON format looks as follows:
Switzerland
سويسرا
Schweiz
Suiza
Suisse
Svizzera
スイス
Zwitserland
Szwajcaria
Suíça
Швейцария
瑞士
Thus, the result is presented in list form.
Different ways to send queries
As we've seen in the last paragraph, SPARQL queries can be stored in a file and run from the command line with e.g. Python. However, there are other ways to send your query and the resulting output will differ slightly, depending on which way you chose. Here are a few examples if we query the dbpedia SPARQL endpoint about labels of the resource Switzerland as in the python example in the last paragraph.
If the query part of the script
SELECT ?label
WHERE { <http://dbpedia.org/resource/Switzerland> rdfs:label ?label . }
is directly copied into the query text box of the Virtuoso SPARQL Query Editor at http://dbpedia.org/sparql and the results format is chosen as "JSON", the output looks as follows:
head:
link: []
vars:
0: "label"
results:
distinct: false
ordered: true
bindings:
0:
label:
type: "literal"
xml:lang: "en"
value: "Switzerland"
1:
label:
type: "literal"
xml:lang: "ar"
value: "سويسرا"
2:
label:
type: "literal"
xml:lang: "de"
value: "Schweiz"
3:
label:
type: "literal"
xml:lang: "es"
value: "Suiza"
4:
label:
type: "literal"
xml:lang: "fr"
value: "Suisse"
5:
label:
type: "literal"
xml:lang: "it"
value: "Svizzera"
6:
label:
type: "literal"
xml:lang: "ja"
value: "スイス"
7:
label:
type: "literal"
xml:lang: "nl"
value: "Zwitserland"
8:
label:
type: "literal"
xml:lang: "pl"
value: "Szwajcaria"
9:
label:
type: "literal"
xml:lang: "pt"
value: "Suíça"
10:
label:
type: "literal"
xml:lang: "ru"
value: "Швейцария"
11:
label:
type: "literal"
xml:lang: "zh"
value: "瑞士"
Another possibility to retrieve the result of the same query is to send the whole query URL encoded via your browser. To URL encode your query, you can use a tool such as URL Decode and Encode:
http://dbpedia.org/sparql?query=SELECT%20%3Flabel%20WHERE%20%7B%20%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FSwitzerland%3E%20rdfs%3Alabel%20%3Flabel%20.%0A%7D
The result will be a table with included language tabs in HTML format:
label
"Switzerland"@en
"سويسرا"@ar
"Schweiz"@de
"Suiza"@es
"Suisse"@fr
"Svizzera"@it
"スイス"@ja
"Zwitserland"@nl
"Szwajcaria"@pl
"Suíça"@pt
"Швейцария"@ru
"瑞士"@zh
Or you can send the whole query URL encoded as HTTP GET request using e.g. Postman:
HTTP GET sent to http://dbpedia.org/sparql?query=SELECT%20%3Flabel%20WHERE%20%7B%20%0A%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FSwitzerland%3E%20rdfs%3Alabel%20%3Flabel%20.%0A%7D
This retrieves the following result in JSON format:
<sparql xmlns="http://www.w3.org/2005/sparql-results#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd">
<head>
<variable name="label"/>
</head>
<results distinct="false" ordered="true">
<result>
<binding name="label"><literal xml:lang="en">Switzerland</literal></binding>
</result>
<result>
<binding name="label"><literal xml:lang="ar">سويسرا</literal></binding>
</result>
<result>
<binding name="label"><literal xml:lang="de">Schweiz</literal></binding>
</result>
<result>
<binding name="label"><literal xml:lang="es">Suiza</literal></binding>
</result>
<result>
<binding name="label"><literal xml:lang="fr">Suisse</literal></binding>
</result>
<result>
<binding name="label"><literal xml:lang="it">Svizzera</literal></binding>
</result>
<result>
<binding name="label"><literal xml:lang="ja">スイス</literal></binding>
</result>
<result>
<binding name="label"><literal xml:lang="nl">Zwitserland</literal></binding>
</result>
<result>
<binding name="label"><literal xml:lang="pl">Szwajcaria</literal></binding>
</result>
<result>
<binding name="label"><literal xml:lang="pt">Suíça</literal></binding>
</result>
<result>
<binding name="label"><literal xml:lang="ru">Швейцария</literal></binding>
</result>
<result>
<binding name="label"><literal xml:lang="zh">瑞士</literal></binding>
</result>
</results>
</sparql>
Gravsearch queries
Gravsearch is a query language which is able to perform queries with complex search criteria that work well in terms of perforance and security. Furthermore, it enables clients to work with a simpler RDF data model than the one DSP-API actually uses to store data in the triplestore, and it permits to provide better error-checking than SPARQL.
A Gravsearch query isn't being processed directly by the triplestore. Instead, the query is interpreted by DSP-API, which enforces certain restrictions on the query, implements paging and permission checking. The API server then generates a SPARQL query based on the submitted Gravsearch query, queries the triplestore, filters the results according to the user’s permissions, and returns each page of query results as a DSP API response. Thus, Gravsearch is a hybrid between a RESTful API and a SPARQL endpoint.
A Gravsearch query conforms to a subset of the syntax of a SPARQL CONSTRUCT query, with some additional restrictions and functionality. In particular, the variable representing the top-level (or ‘main’) resource that will appear in each search result must be identified, statements must be included to specify the types of the entities being queried, OFFSET
is used to control paging, and ORDER BY
is used to sort the results.
The current version of Gravsearch accepts CONSTRUCT
queries whose WHERE
-clauses use the following patterns, with the specified restrictions:
OPTIONAL
: cannot be nested in aUNION
.UNION
: cannot be nested in aUNION
.FILTER
: may contain a complex expression using the Boolean operatorsAND
(&&) andOR
(||), as well as comparison operators. The left argument of a comparison operator must be a query variable. A DSP ontology entity IRI used in aFILTER
must be a property IRI.FILTER NOT EXISTS
MINUS
OFFSET
: theOFFSET
is needed for paging. It does not actually refer to the number of triples to be returned, but to the requested page of results. The default value is 0, which refers to the first page of results. The number of results per page is defined inapp/v2
inapplication.conf
.ORDER BY
: In SPARQL, the result of aCONSTRUCT
query is an unordered set of triples. However, a Gravsearch query returns an ordered list of resources, which can be ordered by the values of specified properties. If the query is written in the complex schema, items below the level of DSP values may not be used inORDER BY
.BIND
: The value assigned must be a DSP resource IRI.
Gravsearch query examples
To send queries to our live-server it is easiest to use the program Postman. You have to chose the method POST
. Open Postman and type
HTTP POST sent to https://api.dasch.swiss/v2/searchextended
Then click "Body" and chose "raw". You can then write your queries in the window before sending your request. Be aware, that only CONSTRUCT
requests are supported by dsp-api
.
Here is one query examples, more will follow.
PREFIX beol: <http://api.dasch.swiss/ontology/0801/beol/simple/v2#>
PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
CONSTRUCT {
?letter knora-api:isMainResource true .
?letter beol:creationDate ?date .
?letter ?linkingProp1 ?person1 .
?letter ?linkingProp2 ?person2 .
} WHERE {
?letter a knora-api:Resource .
?letter a beol:letter .
?letter beol:letterHasTranslation ?translation .
beol:letterHasTranslation knora-api:objectType knora-api:Resource .
?translation a knora-api:Resource .
?letter beol:creationDate ?date .
beol:creationDate knora-api:objectType knora-api:Date .
?date a knora-api:Date .
?letter ?linkingProp1 ?person1 .
?linkingProp1 knora-api:objectType knora-api:Resource .
FILTER(?linkingProp1 = beol:hasAuthor || ?linkingProp1 = beol:hasRecipient )
?person1 a beol:person .
?person1 a knora-api:Resource .
?person1 beol:hasIAFIdentifier ?gnd1 .
FILTER(?gnd1 = "(DE-588)116610832")
?gnd1 a xsd:string .
?letter ?linkingProp2 ?person2 .
?linkingProp2 knora-api:objectType knora-api:Resource .
FILTER(?linkingProp2 = beol:hasAuthor || ?linkingProp2 = beol:hasRecipient )
?person2 a beol:person .
?person2 a knora-api:Resource .
?person2 beol:hasIAFIdentifier ?gnd2 .
FILTER(?gnd2 = "(DE-588)118696149")
?gnd2 a xsd:string .
beol:hasIAFIdentifier knora-api:objectType xsd:string .
} ORDER BY ?date
OFFSET 0