Published March 19, 2023 | Version v2
Dataset Open

Expert-annotated dataset for band gap prediction

  • 1. Lomonosov Moscow State University

Description

Here we present the dataset for "Toward Accurate Interpretable Predictions of Materials Properties within Transformer Language Models" (arXiv:2303.12188).

The dataset_annotated.json file is organized as follows:

{
    "JARVIS-DFT id": {
        "text": "text description of material generated within the Robocrystallographer library",
        "tokens": "sequence of tokens generated by the MatBERT tokenizer",
        "rationales": "rationales proposed by a domain expert",
        "label": "label predicted by the MatBERT model",
    },
    ...: ...,
}

 

Files

dataset_annotated.json

Files (449.8 kB)

Name Size Download all
md5:0c57ea5a20af40752276824a527bbe9d
449.8 kB Preview Download