Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published June 28, 2022 | Version 1
Dataset Open

Ariel Big Challenge (ABC) Database

  • 1. UCL


  • 1. UCL


This is the database for the publication: ESA-Ariel Data Challenge NeurIPS 2022: Introduction to exo-atmospheric
studies and presentation of the Ariel Big Challenge (ABC) Database. 

The database contains 105,877 realistic Ariel observations in Tier-2 resolutions. All examples are generated with a pre-determined atmospheric assumptions. 26% (26,109) of these observations are complemented with retrieval results produced from Nested Sampling algorithms. 

Inside you will find Level 1 and Level 2 data. Level 1 data is for general propose and Level 2 data is specifically designed for NeurIPS 2022 data challenge. 

We have also included a simple tutorial on how to manipulate the files and a introduction tutorial on TauREx3, the software we used to generate these spectra. 


This is an exciting era for exo-planetary exploration. In the past two decades, astronomers have harvested data from all the observatories at their disposal. Those collective efforts allowed us to have a glimpse at the convoluted process of planetary formation and evolution and its connections to atmospheric compositions, but nevertheless remained limited by the low quality and scarcity of exo-atmospheric data. Now, the recently launched JWST, and other upcoming space missions such as Ariel, Twinkle and ELTs are set to anew the landscape, bringing fresh insights to these remote worlds. However, with new opportunities come new challenges. The field of exoplanet atmospheres is already struggling with the incoming volume and quality of data, and machine learning (ML) techniques lands itself as a promising alternative. Developing techniques of this kind is an inter-disciplinary task, one that requires domain knowledge of the field, access to relevant tools and expert insights on the capability and limitations of current ML models. These stringent requirements have so far limited the developments of ML in the field to a few isolated initiatives. As part of the data product of the NeurIPS 2022 Data challenge, we would like to present the Ariel Big Challenge Database (ABC Database), a carefully designed, organised and publicly available database. With 105,887 forward models, 26,109 complementary posterior distributions and an easy-to-understand documentation, this represents an unprecedented effort to invite cross-disciplinary experts to the study of the inverse problem in the context of exoplanetary studies. 

For more information on the data and how they were generated please refer to our publication ( 

This work utilised resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (, provided by Dell EMC and Intel using Tier-2 funding from the Engineering and Physical Sciences Research Council (capital grant EP/P020259/1), and DiRAC funding from the Science and Technology Facilities Council (


Accepted in NeurIPS 2022 Competition Track; Submitted to RASTI Funded by UK Space Agency, ST/W00254X/1 Funded by Turing Post-Doctoral Enrichment Award (PDEA) We would like to acknowledge the generous HPC allocations from DiRAC HPC facility, without which the dataset will not be possible


Files (7.0 GB)

Name Size Download all
3.8 GB Preview Download
3.2 GB Preview Download
67.6 MB Preview Download
7.0 kB Preview Download

Additional details


ExoAI – Deciphering super-Earths using Artificial Intelligence 758892
European Commission