Published March 2, 2022 | Version v1
Journal article Open

Array databases: concepts, standards, implementations

  • 1. Jacobs University

Description

Multi-dimensional arrays (also known as raster data or gridded data) play a key role in
many, if not all science and engineering domains where they typically represent spatio-
temporal sensor, image, simulation output, or statistics “datacubes”. As classic database
technology does not support arrays adequately, such data today are maintained
mostly in silo solutions, with architectures that tend to erode and not keep up with the
increasing requirements on performance and service quality. Array Database systems
attempt to close this gap by providing declarative query support for flexible ad-hoc
analytics on large n-D arrays, similar to what SQL offers on set-oriented data, XQuery
on hierarchical data, and SPARQL and CIPHER on graph data. Today, Petascale Array
Database installations exist, employing massive parallelism and distributed processing.
Hence, questions arise about technology and standards available, usability, and overall
maturity. Several papers have compared models and formalisms, and benchmarks have
been undertaken as well, typically comparing two systems against each other. While
each of these represent valuable research to the best of our knowledge there is no
comprehensive survey combining model, query language, architecture, and practical
usability, and performance aspects. The size of this comparison differentiates our study
as well with 19 systems compared, four benchmarked to an extent and depth clearly
exceeding previous papers in the field; for example, subsetting tests were designed
in a way that systems cannot be tuned to specifically these queries. It is hoped that
this gives a representative overview to all who want to immerse into the field as well
as a clear guidance to those who need to choose the best suited datacube tool for
their application. This article presents results of the Research Data Alliance (RDA) Array
Database Assessment Working Group (ADA:WG), a subgroup of the Big Data Interest
Group. It has elicited the state of the art in Array Databases, technically supported by
IEEE GRSS and CODATA Germany, to answer the question: how can data scientists and
engineers benefit from Array Database technology? As it turns out, Array Databases
can offer significant advantages in terms of flexibility, functionality, extensibility, as well
as performance and scalability—in total, the database approach of offering “datacubes”
analysis-ready heralds a new level of service quality. Investigation shows that there is
a lively ecosystem of technology with increasing uptake, and proven array analytics
standards are in place. Consequently, such approaches have to be considered a serious
option for datacube services in science, engineering and beyond. Tools, though, vary
greatly in functionality and performance as it turns out.

 

Files

Springer-BigData_Array-Databases-Survey.pdf

Files (4.5 MB)

Name Size Download all
md5:09b1a8f8a080aeeec7aced533e96183e
4.5 MB Preview Download

Additional details

Funding

EarthServer-2 – Agile Analytics on Big Data Cubes 654367
European Commission