A Dataset of Inorganic Crystal Structures with Hybrid-DFT Derived Band Gaps: Integration of Crystallography Open Database and HSE-Band Gaps Database
Authors/Creators
- 1. Teaching Assistant ,Faculty of computer Science, Ain shams University
- 2. School of Artificial Intelligence, Egyptian Russian University, Cairo, Egypt
- 3. Solid-State Electronics Laboratory, Solid-State Physics Department, Physics Research Institute, National Research Centre, 33 El-Bohouth St., Dokki, Giza, 12622, Egypt
- 4. Science and Engineering Department, Faculty of Postgraduate Studies for Advanced Science, Renewable Energy, Beni-Suef University, Beni-Suef 62511, Egypt
- 5. X-ray Crystallography Laboratory, Solid-State Physics Department, Physics Research Institute, National Research Centre, 33 El-Bohouth St., Dokki, Giza, 12622, Egypt
Description
This dataset comprises 4,542 inorganic crystalline materials, provided into a unified repository of Crystallographic Information Files (CIF) paired with high-accuracy electronic band gap values. The structural data were curated from the Crystallography Open Database (COD), a comprehensive open-access collection of crystal structures [1, 2].
To ensure predictive reliability, the corresponding electronic band gaps were sourced from the validated HSE database developed by Kim et al. (2020) [3]. In this underlying work, electronic structures were characterized using hybrid density functional theory (DFT) with the Heyd–Scuseria–Ernzerhof (HSE06) screened hybrid functional. This approach significantly mitigates the well-known "band-gap problem" inherent in standard semilocal exchange-correlation approximations, such as the Local Density Approximation (LDA) or Generalized Gradient Approximation (GGA), thereby providing a more physically accurate representation of the semiconducting properties within the dataset.
Methods
Data Acquisition and Workflow
The dataset was constructed through a multi-stage integration of the HSE band-gap database and the COD Database. Chemical formulas were systematically extracted from the HSE repository and utilized as primary keys for programmatic queries within the COD, facilitated by the aiida-cod database importer.
To ensure high data fidelity, a strict string-matching protocol was implemented: CIF entries were retrieved only when an exact correspondence was established between the query formula and the chemical formula_sum field within the COD metadata. Following verification, the corresponding crystallographic files were archived locally using a standardized naming convention based on stoichiometric identifiers.
The final curation stage involved filtering for completeness, retaining only those entries where structural coordinates and hybrid-DFT electronic data were concurrently present. This pipeline yielded a validated ensemble of 4,542 inorganic compounds, providing a robust basis for structure-property relationship analysis.
Data Sources
- Crystal Structures: Crystallography Open Database (COD) [1,2]
- Band Gap Values: (Hybrid DFT calculations using the Heyd–Scuseria–Ernzerhof (HSE06) functional) [3]
Files
dataset[1].zip
Files
(10.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d484d0a68b4a3c83ed7c9616282fbab9
|
10.2 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Dataset: Crystallography Open Database (COD) (Other)
References
- [1] Gražulis, S., et al. (2009). Crystallography Open Database – an open-access collection of crystal structures. Journal of Applied Crystallography, 42(4), 726-729. https://doi.org/10.1107/S0021889809016690
- [2] Gražulis, S., et al. (2012). Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Research, 40(D1), D420-D427. https://doi.org/10.1093/nar/gkr900
- [3] Kim, C., et al. (2020). A band-gap database for semiconducting inorganic materials calculated with hybrid functional. Scientific Data, 7, 387. https://doi.org/10.1038/s41597-020-00723-8