Published April 19, 2022 | Version 1
Journal article Open

A geospatial source selector for federated GeoSPARQL querying

  • 1. Institute of Informatics and Telecommunications, National Center for Scientific Research (NCSR) Demokritos, Ag. Paraskevi, 15341, Greece
  • 2. Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens, 16122, Greece

Description

Background: Geospatial linked data brings into the scope of the Semantic Web and its technologies, a wealth of datasets that combine semantically-rich descriptions of resources with their geo-location. There are, however, various Semantic Web technologies where technical work is needed in order to achieve the full integration of geospatial data, and federated query processing is one of these technologies.

Methods: In this paper, we explore the idea of annotating data sources with a bounding polygon that summarizes the spatial extent of the resources in each data source, and of using such a summary as an (additional) source selection criterion in order to reduce the set of sources that will be tested as potentially holding relevant data. We present our source selection method, and we discuss its correctness and implementation.

Results: We evaluate the proposed source selection using three different types of summaries with different degrees of accuracy, against not using geospatial summaries. We use datasets and queries from a practical use case that combines crop-type data with water availability data for food security. The experimental results suggest that more complex summaries lead to slower source selection times, but also to more precise exclusion of unneeded sources. Moreover, we observe the source selection runtime is (partially or fully) recovered by shorter planning and execution runtimes. As a result, the federated sources are not burdened by pointless querying from the federation engine.

Conclusions: The evaluation draws on data and queries from the agroenvironmental domain and shows that our source selection method substantially improves the effectiveness of federated GeoSPARQL query processing.

Files

openreseurope-2-15771.pdf

Files (1.0 MB)

Name Size Download all
md5:12dfb0efd5c22f6c2a59edf644ebb66f
1.0 MB Preview Download

Additional details

References

  • (null). OGC GeoSPARQL: A geographic query language for RDF data, version 1.0.
  • Bereta K, Caumont H, Daniels U (2019). The Copernicus App Lab project: Easy access to Copernicus data. Advances in Database Technology - 22nd International Conference on Extending Database Technology. doi:10.5441/002/edbt.2019.46
  • Alexander K, Cyganiak R, Hausenblas M (2011). Describing linked datasets with the VoID vocabulary. W3C Interest Group Note.
  • Quilitz B, Leser U (2008). Querying distributed RDF data sources with SPARQL. doi:10.1007/978-3-540-68234-9_39
  • Schwarte A, Haase P, Hose K (2011). FedX: A federation layer for distributed query processing on linked open data. doi:10.1007/978-3-642-21064-8_39
  • Acosta M, Vidal ME, Lampo T (2011). ANAPSID: an adaptive query processing engine for SPARQL endpoints. The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference. doi:10.1007/978-3-642-25073-6_2
  • Charalambidis A, Troumpoukis A, Konstantopoulos S (2015). SemaGrow: optimizing federated SPARQL queries. Proceedings of the 11th International Conference on Semantic Systems. doi:10.1145/2814864.2814886
  • Görlitz O, Staab S (2011). SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. Proceedings of the Second International Workshop on Consuming Linked Data (COLD2011).
  • Wang X, Tiropanis T, Davis HC (2013). LHD: optimising linked data query processing using parallelisation.
  • Saleem M, Ngomo AN (2014). HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation. doi:10.1007/978-3-319-07443-6_13
  • Montoya G, Skaf-Molli H, Hose K (2017). The approach for optimizing federated SPARQL queries. doi:10.1007/978-3-319-68288-4_28
  • Konstantopoulos S, Charalambidis A, Troumpoukis A (2017). The Sevod vocabulary for dataset descriptions for federated querying.
  • Caldwell DR (2005). Unlocking the mysteries of the bounding box. Coordinates: Online Journal of the Map and Geography Round Table Series.
  • Samet H (1984). The quadtree and related hierarchical data structures. ACM Comput Surv. doi:10.1145/356924.356930
  • Kyzirakos K, Karpathiotakis M, Koubarakis M (2012). Strabon: A semantic geospatial DBMS. The Semantic Web - ISWC 2012 - 11th International Semantic Web Conference. doi:10.1007/978-3-642-35176-1_19
  • Kostopoulos C, Mouchakis G, Troumpoukis A (2021). KOBE: Cloud-native Open Benchmarking Engine for federated query processors. The Semantic Web - 18th International Conference. doi:10.1007/978-3-030-77385-4_40
  • Masmoudi M, Lamine SBAB, Zghal HB (2021). Knowledge hypergraph-based approach for data integration and querying: Application to Earth Observation. Future Generation Computer Systems. doi:10.1016/j.future.2020.09.029
  • Malik T, Szalay AS, Budavari T (2003). Skyquery: A webservice approach to federate databases. First Biennial Conference on Innovative Data Systems Research. doi:10.48550/arXiv.cs/0211023
  • Zimmermann R, Ku WS, Chu W (2004). Efficient query routing in distributed spatial databases. doi:10.1145/1032222.1032249
  • Tang G, Chen L, Liu YL (2005). Integrated -nn query processing based on geospatial data services. doi:10.1007/11590354_71