Dataset Open Access

Ariel Big Challenge (ABC) Database

Quentin Changeat; Kai Hou Yip

Ingo Waldmann

This is the database for the publication: ESA-Ariel Data Challenge NeurIPS 2022: Introduction to exo-atmospheric
studies and presentation of the Ariel Big Challenge (ABC) Database. 

The database contains 105,877 realistic Ariel observations in Tier-2 resolutions. All examples are generated with a pre-determined atmospheric assumptions. 26% (26,109) of these observations are complemented with retrieval results produced from Nested Sampling algorithms. 

Inside you will find Level 1 and Level 2 data. Level 1 data is for general propose and Level 2 data is specifically designed for NeurIPS 2022 data challenge. 

We have also included a simple tutorial on how to manipulate the files and a introduction tutorial on TauREx3, the software we used to generate these spectra. 


This is an exciting era for exo-planetary exploration. In the past two decades, astronomers have harvested data from all the observatories at their disposal. Those collective efforts allowed us to have a glimpse at the convoluted process of planetary formation and evolution and its connections to atmospheric compositions, but nevertheless remained limited by the low quality and scarcity of exo-atmospheric data. Now, the recently launched JWST, and other upcoming space missions such as Ariel, Twinkle and ELTs are set to anew the landscape, bringing fresh insights to these remote worlds. However, with new opportunities come new challenges. The field of exoplanet atmospheres is already struggling with the incoming volume and quality of data, and machine learning (ML) techniques lands itself as a promising alternative. Developing techniques of this kind is an inter-disciplinary task, one that requires domain knowledge of the field, access to relevant tools and expert insights on the capability and limitations of current ML models. These stringent requirements have so far limited the developments of ML in the field to a few isolated initiatives. As part of the data product of the NeurIPS 2022 Data challenge, we would like to present the Ariel Big Challenge Database (ABC Database), a carefully designed, organised and publicly available database. With 105,887 forward models, 26,109 complementary posterior distributions and an easy-to-understand documentation, this represents an unprecedented effort to invite cross-disciplinary experts to the study of the inverse problem in the context of exoplanetary studies. 

For more information on the data and how they were generated please refer to our publication ( 

This work utilised resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (, provided by Dell EMC and Intel using Tier-2 funding from the Engineering and Physical Sciences Research Council (capital grant EP/P020259/1), and DiRAC funding from the Science and Technology Facilities Council (

Accepted in NeurIPS 2022 Competition Track; Submitted to RASTI Funded by UK Space Agency, ST/W00254X/1 Funded by Turing Post-Doctoral Enrichment Award (PDEA) We would like to acknowledge the generous HPC allocations from DiRAC HPC facility, without which the dataset will not be possible
Files (7.0 GB)
Name Size
3.8 GB Download
3.2 GB Download
NeurIPS taurex
67.6 MB Download
Tutorial - How To Use.ipynb
7.0 kB Download
All versions This version
Views 795795
Downloads 370370
Data volume 744.4 GB744.4 GB
Unique views 708708
Unique downloads 198198


Cite as