A2H_Clinical_Data
Authors/Creators
Description
General
The two files in this ressource are used in the analysis of animal to human translation for this project: Preclinical_DrugDisease_Translation_Pipeline.
Raw AACT Snapshot
raw_aact/mv_interventional_drug_studies_20260302.csv
Tabular snapshot of interventional drug-related studies derived from the AACT / ClinicalTrials.gov relational database. The file was generated from a materialized view built on a database snapshot dated 1 December 2025. It includes one row per nct_id for studies with study_type = 'INTERVENTIONAL' and at least one intervention of type DRUG, DIETARY_SUPPLEMENT, BIOLOGICAL, COMBINATION_PRODUCT, GENETIC, or OTHER. The table combines study-level metadata from ctgov.studies, brief summaries from ctgov.brief_summaries, aggregated intervention names and types from ctgov.interventions, and aggregated condition names from ctgov.conditions.
Included columns:
nct_idbrief_titlestudy_official_titlestart_datecompletion_datestudy_first_submitted_datephaseoverall_statusbrief_summaryintervention_namesintervention_typescondition_names
Notes:
intervention_names,intervention_types, andcondition_namesare aggregated as pipe-separated strings (" | ").- Only studies matching the SQL selection criteria are included.
- This file is intended as the structured trial metadata input for downstream entity-linking and integration steps.
Linked NER Drug and Disease Entities
linked_to_ontologies/entities_drug_disease_clin.csv
Normalized drug and disease ontology annotations applied to the NER results. Disease concepts are mapped to MONDO, while drug concepts are mapped to UMLS CUIs. Multiple entities are represented as pipe-separated values (|).
Disease / condition mapping (MONDO)
merged_condition_names: Original condition names aggregated from the trial recorddisease_mondo_termid: Assigned MONDO identifierdisease_mondo_term_norm: Normalized MONDO labeldisease_term_mondo_clean: Cleaned disease string used for matchingdisease_termid_mondo_clean: MONDO ID after cleaning stepnearest_dataset_parent_mondo: Closest parent MONDO concept in the reference dataset (-1if none)nearest_dataset_parent_label: Label of the nearest parent conceptmerged_mondo_termid: Final merged MONDO identifier(s)merged_mondo_label: Final merged MONDO label(s)
Drug / intervention mapping (UMLS)
ner_predicted_drugs: Drug names extracted via NERlinkbert_umls_drugs: Drug names after normalization / linking modeldrug_umls_termid: UMLS concept identifiers (CUIs)drug_umls_term_norm: Normalized UMLS labelsnearest_dataset_parent_umls: Closest parent UMLS concept (-1if none)nearest_dataset_parent_umls_label: Label of the parent conceptmerged_umls_termid: Final merged UMLS identifier(s)merged_umls_label: Final merged UMLS label(s)
Files
clinical.zip
Files
(122.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:334ffd84e97a6b0cc394e6d35f1fcada
|
122.6 MB | Preview Download |