There is a newer version of the record available.

Published March 16, 2026 | Version 2.0
Dataset Open

USDA Phytochemical Database — Enriched v2.0 (400-Row Sample)

Authors/Creators

Description

A 400-record sample of the USDA Dr. Duke's Phytochemical and Ethnobotanical Database, denormalized into a flat 8-column schema and enriched with quantitative signals from four sources:

- pubmed_mentions_2026: PubMed publication count per compound (NCBI E-utilities)
- clinical_trials_count_2026: ClinicalTrials.gov v2 study count per compound
- chembl_bioactivity_count: ChEMBL v35 bioassay data points (CC BY-SA 3.0)
- patent_count_since_2020: USPTO patents since 2020-01-01 (PatentsView REST API)

Schema: chemical, plant_species, application, dosage, pubmed_mentions_2026, clinical_trials_count_2026, chembl_bioactivity_count, patent_count_since_2020

Records: 400 (top compounds by PubMed mentions)
Total dataset: 76,907 records across 24,746 compounds and 2,313 species.
Full dataset: https://ethno-api.com

Formats: JSON (16 MB) + Parquet (800 KB, Snappy compression).
Methodology: https://github.com/wirthal1990-tech/USDA-Phytochemical-Database-JSON/blob/main/METHODOLOGY.md

Files

ethno_sample_400.json

Files (130.7 kB)

Name Size Download all
md5:02c2ebe990b70dce8f03662585e72189
110.4 kB Preview Download
md5:6bbfa2ca5e9aef4c7fb86c3f78bb8e60
20.3 kB Download

Additional details

References