A Compendium of Regular Expression Shapes in SPARQL Queries
Description
Regular path queries (RPQs) are at the heart of navigational queries in graph databases. Motivated by new features of regular path queries in the languages Cypher, GQL, and SQL/PGQ, which require new approaches for indexing and compactly storing intermediate query results, we investigate a large corpus of real-world RPQs. Our corpus consists of 148.7 million RPQs occurring in 937.2 million SPARQL queries, used on 29 different data sets.
We investigate three main questions on these logs. First, what is the syntactic structure of RPQs in practice? Second, how much non-determinism do they have? Third, can they be evaluated tractably under simple path and trail semantics?
Concerning the first question, we show that all the RPQs can be classified in only 572 different syntactic shapes, which we provide in a downloadable data set in Zenodo. Furthermore, we classify the the relative use of various RPQ operators, and popular predicates that are used for transitive navigation. Concerning the second question, we show that although non-determinism occurs in the RPQs, less than one in ten million requires a deterministic finite automaton with more states than the size of the regular expression. This is remarkable because this blow-up is known to be exponential in the worst case.
When using this data set, please cite the following paper:
@inproceedings{HM25,
author = {Janik Hammerer and Wim Martens},
title = {A Compendium of Regular Expression Shapes in SPARQL Queries},
booktitle = {Joint International Workshop on Graph Data Management Experiences {\&} Systems {(GRADES)}
and Network Data Analytics (NDA)},
publisher = {{ACM}},
year = {2025},
url = {https://doi.org/10.1145/3735546.3735853},
doi = {10.1145/3735546.3735853}
}
Files
affymetrix.csv
Files
(144.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:cf44a966ac88eb77d921611002b60826
|
157 Bytes | Preview Download |
|
md5:cf44a966ac88eb77d921611002b60826
|
157 Bytes | Preview Download |
|
md5:d9adc915601ddc821ff22d8193dbd719
|
509 Bytes | Preview Download |
|
md5:76b65caa7a6d4a527d9f299e353fc3ca
|
196 Bytes | Preview Download |
|
md5:6440912491482ef3b4b4dae58ea4c6d8
|
5.4 kB | Preview Download |
|
md5:cf44a966ac88eb77d921611002b60826
|
157 Bytes | Preview Download |
|
md5:ea1fd6ae972d63d89eed14e5d125bfaa
|
314 Bytes | Preview Download |
|
md5:3e7b75ceefa0633daef0c071812be8ba
|
99.5 kB | Preview Download |
|
md5:cf44a966ac88eb77d921611002b60826
|
157 Bytes | Preview Download |
|
md5:87da6e5d36978479ceb22c365be61a0c
|
196 Bytes | Preview Download |
|
md5:b1d641450ea73141306c6dc9659e0c22
|
197 Bytes | Preview Download |
|
md5:85ef38d9ccf3d9e3c517af0333fc9fad
|
196 Bytes | Preview Download |
|
md5:cf44a966ac88eb77d921611002b60826
|
157 Bytes | Preview Download |
|
md5:cf44a966ac88eb77d921611002b60826
|
157 Bytes | Preview Download |
|
md5:cf44a966ac88eb77d921611002b60826
|
157 Bytes | Preview Download |
|
md5:fe9c2d863e189e3f8762d739c57aaaca
|
157 Bytes | Preview Download |
|
md5:ad720c7947832a9d9186235209281219
|
445 Bytes | Preview Download |
|
md5:cf44a966ac88eb77d921611002b60826
|
157 Bytes | Preview Download |
|
md5:05a833740ec55a14c1561074d9263dcd
|
195 Bytes | Preview Download |
|
md5:2ec7c5b507225d64c47f83f5962eee9b
|
196 Bytes | Preview Download |
|
md5:72d1fea37f991ef16962f174fd113507
|
310 Bytes | Preview Download |
|
md5:85fdff2c4893342b13b1225f1e1f9b91
|
156 Bytes | Preview Download |
|
md5:cf44a966ac88eb77d921611002b60826
|
157 Bytes | Preview Download |
|
md5:cf44a966ac88eb77d921611002b60826
|
157 Bytes | Preview Download |
|
md5:50cd452e53b4754421c619e95175def7
|
195 Bytes | Preview Download |
|
md5:5926b9d4337a1e47db45e7fc9e62bacd
|
634 Bytes | Preview Download |
|
md5:764a2a1307f236664c61433618d9073c
|
196 Bytes | Preview Download |
|
md5:13be5bc14461f248fab3ca1d5ecbaaab
|
26.3 kB | Preview Download |
|
md5:afb3d405997e166097aed7e49cfcc868
|
7.2 kB | Preview Download |
|
md5:cf44a966ac88eb77d921611002b60826
|
157 Bytes | Preview Download |
Additional details
Related works
- Is supplement to
- Dataset: 10.1145/3735546.3735853 (DOI)
Software
- Programming language
- CSV