Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published May 18, 2022 | Version v1
Journal article Open

User-defined operators: efficiently integrating custom algorithms into modern databases

  • 1. TUM

Description

In recent years, complex data mining and machine learning algorithms have become more common in data analytics. Several specialized systems exist to evaluate these algorithms on ever-growing data sets, which are built to efficiently execute different types of complex analytics queries.

However, using these various systems comes at a price. Moving data out of traditional database systems is often slow as it requires exporting and importing data, which is typically performed using the relatively inefficient CSV format. Additionally, database systems usually offer strong ACID guarantees, which are lost when adding new, external systems. This disadvantage can be detrimental to the consistency of the results.

Most data scientists still prefer not to use classical database systems for data analytics. The main reason why RDBMS are not used is that SQL is difficult to work with due to its declarative and set-oriented nature, and is not easily extensible.

We present User-Defined Operators (UDOs) as a concept to include custom algorithms into modern query engines. Users can write idiomatic code in the programming language of their choice, which is then directly integrated into existing database systems. We show that our implementation can compete with specialized tools and existing query engines while retaining all beneficial properties of the database system.

Files

UserDefinedOperators.pdf

Files (1.1 MB)

Name Size Download all
md5:93bc27dd1117776fc7026736ccad3f52
1.1 MB Preview Download

Additional details

Funding

CompDB – The Computational Database for Real World Awareness 725286
European Commission