Published April 22, 2026 | Version v5
Dataset Open

OneDZ: A Global Detrital Zircon Database and Implications for Constructing Giant Geoscience Database

  • 1. State Key Laboratory of Mineral Deposit Research
  • 2. School of Earth Sciences and Engineering, Nanjing University
  • 3. ROR icon Chinese Academy of Geological Sciences
  • 4. School of Earth Sciences, China University of Geosciences (Wuhan)
  • 5. ROR icon State Key Laboratory of Isotope Geochemistry
  • 6. College of Computer Science and Cyber Security, Chengdu University of Technology
  • 7. Hangzhou Research Institute, Huawei Technologies
  • 8. ROR icon State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation
  • 9. Institute of Sedimentary Geology, Chengdu University of Technology
  • 10. Key Laboratory of Deep-time Geography and Environment Reconstruction and Applications of Ministry of Natural Resources, Chengdu University of Technology
  • 11. College of Earth and Planetary Sciences, University of Chinese Academy of Sciences
  • 12. School of Resources and Environment, Henan Polytechnic University
  • 13. College of Marine Geosciences, Ocean University of China
  • 14. School of Earth Sciences and Engineering, Hohai University
  • 15. College of Oceanography, Hohai University
  • 16. College of Tourism, Henan Normal University, Xinxiang
  • 17. College of Earth and Planetary Sciences, Chengdu University of Technology
  • 18. ROR icon State Key Laboratory of Continental Dynamics
  • 19. Department of Geology, Northwest University
  • 20. ROR icon State Key Laboratory of Palaeobiology and Stratigraphy
  • 21. ROR icon Nanjing Institute of Geology and Paleontology
  • 22. Center for Excellence in Life and Palaeoenvironment, Chinese Academy of Sciences

Description

# onedz_datasets_csv
 
This directory contains the **split CSV datasets** of the ZirconRegular_LLM project. All files are partitioned into manageable parts (~100,000–130,000 rows each) for batch processing, LLM ingestion, or memory-constrained workflows.
 
## Directory Structure
```
onedz_datasets_csv/
├── Total_UPb_split_parts/              # Main U-Pb geochronology database
│   ├── zircon_upb_part_01.csv
│   ├── zircon_upb_part_02.csv
│   └── ... (22 parts total)
├── Total_LuHf_split_parts/             # Lu-Hf isotope database, note that all files have been checked by experts
│   ├── zircon_luhf_part_01.csv
│   ├── zircon_luhf_part_02.csv
│   └── zircon_luhf_part_03.csv
└── Experts_checked_UPb_split_parts/    # Expert-reviewed U-Pb subsets
    ├── expert_upb_part_01.csv
    ├── expert_upb_part_02.csv
    └── ... (14 parts total)
```
## Dataset Summary
 
| Dataset | Parts | Est. Total Rows | Columns | Content |
|---------|-------|-----------------|---------|---------|
| `Total_UPb_split_parts` | 22 | ~2,550,000 | 64 | Full detrital zircon U-Pb age database |
| `Total_LuHf_split_parts` | 3 | ~297,000 | 33 | Lu-Hf isotope data linked to U-Pb records (expert-checked) |
| `Experts_checked_UPb_split_parts` | 14 | ~1,497,000 | 64 | Peer-reviewed regional compilations (quality-controlled) |
---
 
## File Format
 
All CSV files follow the project standard:
| Property | Specification |
|----------|---------------|
| **Encoding** | UTF-8 with BOM (`utf-8-sig`) |
| **Delimiter** | Comma (`,`) |
| **Line endings** | LF (`\n`) |
| **Header** | Single header row with standardized column names |
| **Quoting** | Double-quoted fields when containing commas or newlines |
 
### U-Pb Standard Columns (64 total)
 
- **Bibliographic**: `Lead_Author`, `Year`, `Journal`, `Vol`, `Pages`, `Title`, `Web_Link`
- **Sample**: `Published_Sample_ID`, `Country_State`, `Region`, `Continent`, `Major_Geographic_Geologic_Unit`, `Minor_Geologic_Geographic_Unit`, `Group`, `Formation`, `Member`, `Locality`, `Profile`, `Latitude`, `Longitude`
- **Depositional Age**: `Depos_Age_Period`, `Depos_Age_Epoch`, `Depos_Age_Stage`, `Max_Depos_Age_Ma`, `Est_Depos_Age_Ma`, `Min_Depos_Age_Ma`
- **Analytical**: `Spectrometer`, `Spectrometer_Location`, `Institution`, `Spectrometer_Mode`, `Rock_Type_one`, `Rock_Type_two`, `Rock_Type_three`, `Grain`, `Spot_Location`, `Spot_diam`
- **Isotope Ratios**: `Pb206U238_iso`, `Pb207U235_iso`, `Pb207Pb206_iso`, `Pb208Th232_iso` (with one-sigma uncertainties)
- **Calculated Ages**: `Pb206U238_age`, `Pb207U235_age`, `Pb207Pb206_age`, `Best_age` (with one- and two-sigma uncertainties), `Discord`
- **Elemental**: `U_ppm`, `Th_ppm`, `Pb_ppm`, `Pb206Pb204`, `Pb204Pb206`, `UTh_ratio`, `ThU_ratio`
 
### Lu-Hf Columns (33 total)
 
Includes all bibliographic and sample metadata columns above, plus:
- `Upb_Age`, `Upb_Age_two_sigma`
- `176Hf177Hf_iso`, `176Lu177Hf_iso`, `176Yb177Hf_iso` (with 2-sigma uncertainties)
- `epsilon_Hf_0`, `epsilon_Hf_t` (with 1-sigma and 2-sigma uncertainties)
- `TDM1_Ma`, `TDM2_Ma` (with 2-sigma uncertainties)
---
 
## Usage Notes
 
1. **Load order**: When reassembling the full dataset, load parts in numerical order (`01` → `22`).
2. **Row overlap**: Parts are split sequentially; no duplicate rows exist across parts of the same dataset.
3. **Cross-dataset linkage**: Use `Lead_Author` + `Year` + `Published_Sample_ID` + `Grain` to link U-Pb records with Lu-Hf records.
4. **Expert vs. Total**: `Experts_checked_UPb_split_parts` is a **subset** of the total database, curated from peer-reviewed regional compilations. It does not contain all rows from `Total_UPb_split_parts`.

Files

onedz_datasets_csv.zip

Files (161.4 MB)

Name Size Download all
md5:ed2edefc62053b62fbc71a140fec32b0
161.4 MB Preview Download

Additional details

Identifiers

Other
onedz

Dates

Created
2025-10-22