A curation interface for temporal databases
Creators
- 1. University of Edinburgh
- 2. University of Glasgow
Description
Curated databases play a crucial role in the scientific endeavour, and software tools are needed to support the work of curators. This talk describes a curation interface that we are developing using the cross-tier programming language Links [1]. Links uses language-integrated query, making it possible to generate well-formed and type-safe queries instead of manually constructing SQL. Links combines database and web programming capabilities that together ease implementation of typical scientific databases together with their Web front-ends.
Recent re-implementation in Links of the IUPHAR/BPS Guide to PHARMACOLOGY [2] has demonstrated that Links is well-suited for developing tools for the display, querying and curation of real-world scientific data [3].
We consider the scenario where there is periodic update data that is to be added to the database which include corrections to previous data. This update data could come from one source such as release of government figures, or from multiple users proposing changes.
These corrections are significant, and part of the curation effort is to understand why they have happened to ensure data quality and to understand data limitations. This scenario of updates is seen in the regular statistics that have been released about the Covid-19 pandemic. When this type of scientific data is provided or collected intermittently, it is often stored in CSVs on GitHub. This makes data available but does not provide tools to understand how the datasets have changed. Similarly, curation tools are lacking for online permanent data repositories such as Figshare.
A standard database can be seen as a snapshot of reality at a particular time point. However, for certain curation tasks, more information is required. For example, a history of the changes to the data may be important for reasons of provenance and to support data change analysis. Furthermore, the database may need to contain information about when a fact was true in the case of information update. Databases that support this functionality are called temporal databases [4].
Links has recently been extended with temporal database features, and we are developing a web-based curation interface that will use these features to permit a curator to add a new batch of data to the database and to highlight changes in the data as the addition occurs. The curator will then be able to select how to handle the changed data with a choice of options: to add the data with temporal updates to indicate that this new data replaces an existing piece of data (which is likely to be the most common choice), mark the data as rejected (but still recorded) or to indicate that in fact this is a new piece of data and not an update.
The interface will also allow for data change analysis where it will be possible to do queries to assess the state of the data at individual timepoints, and to compare the differences between the data at these snapshots, covering aspects such as number of records, which fields are most commonly changed, trends in quantities and other metrics that may be appropriate to the specific database. Our prototype will illustrate how a curation interface can support data quality by promoting the analysis of newly collected data, and allowing the assessment of data change over the lifetime of a database.
1. E. Cooper, S. Lindley, P. Wadler, J. Yallop, Links: Web Programming Without Tiers. 5th International Symposium on Formal Methods for Components and Objects (FMCO 2006): 266-296, https://doi.org/10.1007/978-3-540-74792-5_12
2. J.F. Armstrong, E. Faccenda, S.D. Harding, A.J. Pawson, C. Southan, J.L. Sharman, B. Campo, D.R. Cavanagh, S.P.H. Alexander, A.P. Davenport, M. Spedding, J.A. Davies. The IUPHAR/BPS Guide to PHARMACOLOGY in 2020: extending immunopharmacology content and introducing the IUPHAR/MMV Guide to MALARIA PHARMACOLOGY. Nucleic Acids Research 48:D1006–D1021, 2020, https://doi.org/10.1093/nar/gkz951https://doi.org/10.1093/nar/gkz951
3. S. Fowler, S. Harding, J. Sharman, J. Cheney, Cross-tier Web Programming for Curated Databases: A Case Study, to appear, International Journal of Digital Curation 16(1), 2020, http://www.ijdc.net/article/view/717
4. C.S. Jensen, R.T. Snodgrass, Temporal Database. Encyclopedia of Database Systems 2009: 2957-2960, https://doi.org/10.1007/978-1-4614-8265-9_1419
Files
vgalpin-idcc21.pdf
Files
(1.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:f4465f9266dd025ac7209334364e581d
|
1.1 MB | Preview Download |