Published May 31, 2024 | Version v1
Presentation Open

High-performance Spatial Data Management and Analysis with DuckDB

  • 1. DuckDB Labs

Description

DuckDB is a novel in-process SQL database system designed for analytical workloads that has been making waves in the data science and engineering community. Not only for its impressive performance, but also for its focus on ease of use and integrations with the wider data ecosystem. A key part in making this possible is DuckDB's flexible extension system that enable DuckDB to be used across different domains while the core system itself remains small and focused. One such extension is the DuckDB Spatial Extension which brings geospatial data processing capabilities to DuckDB, allowing users to perform complex spatial queries and transformations. By incorporating the trifecta of foundational open source GIS libraries: GDAL, GEOS and PROJ as well as natively implemented geospatial algorithms all neatly packaged into a single binary with no runtime dependencies, the spatial extension provides hundreds of familiar spatial SQL functions and import and export capabilities to and from dozens of different vector file formats.

Just like DuckDB tries to default to the behavior or PostgreSQL, the spatial extension is heavily inspired by PostGIS and similarly follows the Simple Features SQL standard. However, while the Simple Features geometry model undoubtedly provides a great deal of flexibility with its hierarchy of subtypes (points, linestrings, multipolygons) and optional Z and M dimensions, it is not always the most efficient representation for modern high performance processing. While the spatial extension implements a bunch of geospatial algorithms natively to try to make the most of DuckDBs vectorized execution engine and memory model, it also complements the GEOMETRY type that we all know and love with a new set of strongly typed spatial types backed by a columnar storage model, similarly to what is being proposed in the GeoArrow project. This makes DuckDBs spatial extension an exciting project as it stands with one foot firmly in the traditional open source GIS world and the other in the modern data science and engineering movement.

In this talk, we introduce DuckDB and the DuckDB Spatial Extension, walk through some of the internals that make DuckDB special as well as some of the challenges and design decisions encountered when adapting it for geospatial processing. We also showcase some of the main features the spatial extension brings to the table today and share some insights into the future of the project.

Files

20240531-Big-Geodata-Talk-DuckDB-Spatial.pdf

Files (2.7 MB)

Name Size Download all
md5:20eb83a112883c03e868e076e56c07c6
2.7 MB Preview Download

Additional details