Four ways to run SPARQL queries on your RDF data
SPARQL looks scary at first. All caps, weird keywords, curly braces in odd places. But trust me, it’s not that bad once you get the hang of it.
If SQL is for databases, SPARQL is for graphs. Specifically RDF graphs (Resource Description Framework). Think of it as querying relationships instead of just tables.
If SPARQL is the query language, RDF is the thing it queries.
The RDF idea
Everything in RDF is a triple: subject, predicate, object. Like this:
<George> <likes> <Keyboards> .That’s it. Just who, what, and something about them. SPARQL is the way we query those triples.
Serializations
You can serialize RDF in different formats.
XML:
<rdf:Description rdf:about="http://gcgbarbosa.com/George"> <ex:likes rdf:resource="http://gcgbarbosa.com/Keyboards"/></rdf:Description>JSON-LD:
{ "@context": { "ex": "http://gcgbarbosa.com/" }, "@id": "ex:George", "ex:likes": "ex:Keyboards"}Turtle:
@prefix ex: <http://gcgbarbosa.com/> .ex:George ex:likes ex:KeyboardsWhat’s with the prefix?
RDF uses URIs (web-style identifiers) for everything. Why URIs?
If I just write:
<George> <likes> <Keyboards> .You have no idea which George I mean. Your friend George? My uncle George? Is it curious George?
But if I say:
<http://gcgbarbosa.com/George> <http://gcgbarbosa.com/likes> <http://gcgbarbosa.com/Keyboards> .Now you know exactly which George I mean.
“George” in our dataset is globally unique.
No mix-ups. No collisions.
Then using a prefix makes sense:
@prefix ex: <http://gcgbarbosa.com/> .ex:George ex:likes ex:Keyboards.First SPARQL query
Let’s say we have a dataset of people and what they like. Here’s the most boring SPARQL query:
SELECT ?s ?p ?oWHERE { ?s ?p ?o .}In plain english, this means “give me everything.” The ?s, ?p, and ?o are variables. You’ll get back a table of all triples.
Tools to setup SPARQL endpoints
To begin with, our SPARQL endpoint needs to read RDF data.
Let’s assume we have a turtle file named dump.ttl.
HDT
HDT = Header, Dictionary, Triples. It’s a binary format for RDF that’s:
- Compressed → much smaller on disk.
- Queryable → you can still run lookups directly.
- Streamable → load big graphs without eating all your RAM.
It’s like a .zip for RDF, but smarter.
You don’t have to decompress everything just to read it.
Setup
To run SPARQL queries on HDT files, we first need to convert our RDF to HDT. To do that, we’re going to use rdf2hdt. Rust & cargo are required for this step. Once you’ve got cargo installed, run:
cargo install rdf2hdtrdf2hdt convert -i dump.ttl -o dump.hdtRunning queries
To run the queries, we are going to use rdflib-hdt.
Assuming you have uv installed, create a new project and add rdflib-hdt:
uv init sparql-hdtcd sparql-hdtuv add rdflib-hdtWe can use the following code to run a query:
from rdflib import Graphfrom rdflib_hdt import HDTStore, optimize_sparql
# Calling this function optimizes the RDFlib SPARQL engine for HDT documentsoptimize_sparql()
graph = Graph(store=HDTStore("dump.hdt"))
q = ( "SELECT ?s ?p ?o " "WHERE { " " ?s ?p ?o . " "}")
# You can execute SPARQL queries using the regular RDFlib APIqres = graph.query(q)for row in qres: print(row)That’s it.
Blazegraph
Blazegraph is an RDF graph database (aka a triplestore). It’s open source and written in Java.
Setup
We can download blazegraph from github https://github.com/blazegraph/database/releases.
Then run the server with:
java -server -Xmx4g -jar blazegraph.jarThen upload the data using the UI at http://localhost:9999/blazegraph/#update`:
Running queries
To run the queries we can use SPARQLWrapper:
uv add sparqlwrapperThen we can then run the queries:
from SPARQLWrapper import SPARQLWrapper, JSONimport time
endpoint = "http://localhost:9999/bigdata/sparql"
sparql = SPARQLWrapper(endpoint)
q = ( "SELECT ?s ?p ?o " "WHERE { " " ?s ?p ?o . " "}")
sparql.setQuery(q)sparql.setReturnFormat(JSON)results = sparql.query().convert()
for result in results["results"]["bindings"]: print(result)Qlever
Most triplestores (Blazegraph, Virtuoso, Fuseki) follow the same database approach. Qlever takes a search-engine approach instead:
- Indexes everything → subjects, predicates, objects, and even text.
- Hybrid search → mix structured SPARQL with full-text search.
- Optimized for joins → the expensive part of SPARQL queries.
Setup
We can setup Qlever using its CLI. Let’s install it by creating a new python project and adding qlever.
uv init qlever-testcd qlever-testuv add qleverOnce we have it installed, we can index our data. Qlever uses a configuration file to define the dataset and indexing options.
We are going to create a file named Qleverfile and add the following:
[data]NAME = toydataGET_DATA_CMD = echo "We don't need to download any data"DESCRIPTION = This is our toy data
[index]INPUT_FILES = dump.ttlCAT_INPUT_FILES = cat ${INPUT_FILES}SETTINGS_JSON = { "ascii-prefixes-only": false, "num-triples-per-batch": 1000000 }TEXT_INDEX = from_literals
[server]PORT = 7041ACCESS_TOKEN = ${data:NAME}MEMORY_FOR_QUERIES = 10G
[runtime]SYSTEM = dockerIMAGE = docker.io/adfreiburg/qlever:latest
[ui]UI_CONFIG = toydataWe can then index the data and start Qlever:
qlever indexqlever startThe sparql endpoint will be available at http://localhost:7041/.
Running queries
To run the queries we can use the same steps from blazegraph.
We just need to change the endpoint to:
endpoint = "http://localhost:7041"That’s it!
Oxigraph
Oxigraph is an RDF database + SPARQL engine, written in Rust. It is embeddable, and you use it as a library inside your own app.
Setup
First, we need to install oxigraph:
uv init oxigraph-testcd oxigraph-testuv add pyoxigraphIt can be embedded in our app.
Running queries
To run the queries, we can use the code below:
from pyoxigraph import RdfFormat, Store
store = Store()f = open("dump.ttl", "rb")store.bulk_load(f, format=RdfFormat.TURTLE)
q = ( "SELECT ?s ?p ?o " "WHERE { " " ?s ?p ?o . " "}")
result = store.query(q)for binding in result: print(binding)That’s it. Good luck!