Gravsearch Trace Runbook
A one-page guide to diagnosing a slow Gravsearch query from its trace. The goal is that you can open a trace cold and name the dominant stage and read the time decomposition without prior briefing.
1. Find a slow trace
Open Grafana → Explore → grafanacloud-dasch-traces (Tempo), TraceQL tab, and run the
slow-query recipe:
{ span:name = "gravsearch" && span:duration > 2s }
See TraceQL Recipes for the full set (threshold relative to the baseline
p95, drill-down by stage, filter by query shape, find errors and interruptions). Click a result to
open the trace, then find the gravsearch span — it is an INTERNAL span nested under the HTTP
SERVER span for the request.
2. The span tree
A full-path query (the prequery returned at least one main resource) produces this tree:
graph TD
H["HTTP SERVER span<br/>(endpoint path template)"] --> G["gravsearch (INTERNAL, root)"]
G --> P["gravsearch.parse"]
G --> TI["gravsearch.type_inspection"]
G --> PG["gravsearch.prequery.generate"]
G --> PE["gravsearch.prequery.execute"]
PE --> TSP["triplestore query<br/>(CLIENT span)"]
G --> MG["gravsearch.mainquery.generate"]
G --> ME["gravsearch.mainquery.execute"]
ME --> TSM["triplestore query<br/>(CLIENT span)"]
G --> RT["gravsearch.result_transform"]
All stage spans are direct children of the gravsearch root. The triplestore round-trips appear as
CLIENT spans nested under the two *.execute stages — that nesting is how you separate time
spent in the triplestore from time spent generating SPARQL or transforming results.
3. What each stage means
| Stage span | Measures | Typical cost driver |
|---|---|---|
gravsearch.parse |
Parsing the Gravsearch string into a ConstructQuery AST |
Negligible; only notable on parse failure |
gravsearch.type_inspection |
Inferring entity/value types for the query | Large or deeply-typed queries |
gravsearch.prequery.generate |
Building the prequery SPARQL (resource IRIs + ordering) | Complex WHERE clauses, many patterns/joins |
gravsearch.prequery.execute |
Running the prequery against Fuseki (CLIENT span nested here) | Most common hotspot — triplestore time |
gravsearch.mainquery.generate |
Building the main query for the page of resource IRIs | Many properties / large page |
gravsearch.mainquery.execute |
Running the main query against Fuseki (CLIENT span nested here) | Triplestore time for the value graph |
gravsearch.result_transform |
Permission filtering + assembling the API response | Large result pages, heavy markup |
A count query (/v2/searchextended/count) runs only the prequery side: gravsearch.parse,
gravsearch.type_inspection, gravsearch.prequery.generate, gravsearch.prequery.execute under
the root — four prequery-side stages, no main-query or result-transform spans. This is expected
(see §6), not a truncated trace.
4. Root-span attributes
The gravsearch root span carries a query shape — a bounded fingerprint of what kind of query
this was, with no user data in it. Use it to group "queries like this one" without leaking FILTER
literals or instance IRIs.
| Attribute | Example | Use |
|---|---|---|
gravsearch.query.shape |
resource-list\|has_filter\|has_order_by\|patterns:4-7\|joins:1 |
Bounded label; safe to group/aggregate by. Format: result-type, then each true flag, then patterns:<bucket> and joins:<bucket> (buckets: 0, 1, 2-3, 4-7, 8+) |
gravsearch.shape.has_filter |
true |
Per-flag booleans for TraceQL filtering — also has_optional, has_union, has_order_by, has_offset, has_link_traversal, is_fulltext |
gravsearch.schema_predicates |
hasTitle,isPartOf |
Sorted, de-duplicated ontology predicate names only (never instance IRIs). Drill-down detail, not a metric label |
On a failed or interrupted stage span you may also see:
| Attribute / field | Meaning |
|---|---|
span status ERROR, description "<stage>: <ClassName>" |
A typed stage failure, e.g. gravsearch.prequery.execute: TriplestoreException. The message is sanitized — never the raw SPARQL or FILTER literal |
error.type |
The exception class simple name |
gravsearch.exit_reason = interrupted |
The fiber was interrupted (client disconnect / timeout / cancellation) — see §6 |
5. Reading the time decomposition
- Note the root
gravsearchduration — that is the responder's total. - Walk the stage spans in order; the one with the largest duration is the dominant stage.
- For an
*.executestage, compare the stage duration with its nested triplestoreCLIENTspan: if they are close, the time is in Fuseki; if the stage is much longer than the client span, the time is in DSP-API around the query. - Stage durations do not perfectly sum to the root (there is glue between stages), but one stage
normally dominates.
gravsearch.prequery.executeis the most common hotspot.
6. Absent spans: four normal topologies
The instrumentation deliberately omits spans for work that did not happen rather than emitting zero-duration placeholders. So a trace with fewer than eight spans is usually correct. Four distinct shapes look like "missing spans" but each means something specific — do not read any of them as broken instrumentation, and do not mistake one for another.
| Topology | What you see | What it means | Tell-tale |
|---|---|---|---|
| Empty result | parse → type_inspection → prequery.generate → prequery.execute present; no mainquery.*, no result_transform |
The prequery returned zero main resources, so there was nothing to fetch — "no rows", not an error | All present spans are OK; root has its shape attributes |
| Parse failure | root + gravsearch.parse only, parse span is ERROR |
The Gravsearch string was malformed; the pipeline never started | Only the parse span exists and it is ERROR (gravsearch.parse: <Class>) |
| Interruption / timeout | early stages present, later stages absent, last open span + root are ERROR |
The request fiber was interrupted (client disconnect, timeout, cancellation) mid-query | gravsearch.exit_reason = interrupted on the open span and the root |
| Shape-less early interrupt | root present but without gravsearch.query.shape / gravsearch.shape.*, little or nothing below it |
Interrupted (or failed) before parse completed, so the shape was never derived | Missing shape attributes and exit_reason = interrupted / ERROR on the root — not a broken shape derivation |
How to tell them apart quickly:
- Later stages missing + everything
OK+ shape present → empty result. Benign. - Only the parse span + it is
ERROR→ parse failure. Look at the client's query, not the instrumentation. - Later stages missing + an
ERRORspan carryingexit_reason = interrupted→ interruption. The query was probably slow and got cancelled — this is exactly the trace you are hunting; read the stages that did run to see where the time went before the cut. - Root has no shape attributes at all → shape-less early interrupt. The interruption happened so early that parse/shape never ran; the absence of shape is expected, not a bug.
Why interruption is called out separately
OTel span status has no cancelled value (only Unset/Ok/Error). Without the
gravsearch.exit_reason = interrupted attribute, an interrupted slow query — early stages
present, later stages absent — would be indistinguishable from a benign empty result, and from a
typed stage failure. The attribute is what disambiguates them.