Published March 31, 2026 | Version v1
Dataset Open

A2H_Clinical_Data

Authors/Creators

Description

General

The two files in this ressource are used in the analysis of animal to human translation for this project: Preclinical_DrugDisease_Translation_Pipeline.

Raw AACT Snapshot

raw_aact/mv_interventional_drug_studies_20260302.csv
Tabular snapshot of interventional drug-related studies derived from the AACT / ClinicalTrials.gov relational database. The file was generated from a materialized view built on a database snapshot dated 1 December 2025. It includes one row per nct_id for studies with study_type = 'INTERVENTIONAL' and at least one intervention of type DRUG, DIETARY_SUPPLEMENT, BIOLOGICAL, COMBINATION_PRODUCT, GENETIC, or OTHER. The table combines study-level metadata from ctgov.studies, brief summaries from ctgov.brief_summaries, aggregated intervention names and types from ctgov.interventions, and aggregated condition names from ctgov.conditions.

Included columns:

  • nct_id
  • brief_title
  • study_official_title
  • start_date
  • completion_date
  • study_first_submitted_date
  • phase
  • overall_status
  • brief_summary
  • intervention_names
  • intervention_types
  • condition_names

Notes:

  • intervention_names, intervention_types, and condition_names are aggregated as pipe-separated strings (" | ").
  • Only studies matching the SQL selection criteria are included.
  • This file is intended as the structured trial metadata input for downstream entity-linking and integration steps.

Linked NER Drug and Disease Entities

linked_to_ontologies/entities_drug_disease_clin.csv
Normalized drug and disease ontology annotations applied to the NER results. Disease concepts are mapped to MONDO, while drug concepts are mapped to UMLS CUIs. Multiple entities are represented as pipe-separated values (|).

Disease / condition mapping (MONDO)

  • merged_condition_names: Original condition names aggregated from the trial record
  • disease_mondo_termid: Assigned MONDO identifier
  • disease_mondo_term_norm: Normalized MONDO label
  • disease_term_mondo_clean: Cleaned disease string used for matching
  • disease_termid_mondo_clean: MONDO ID after cleaning step
  • nearest_dataset_parent_mondo: Closest parent MONDO concept in the reference dataset (-1 if none)
  • nearest_dataset_parent_label: Label of the nearest parent concept
  • merged_mondo_termid: Final merged MONDO identifier(s)
  • merged_mondo_label: Final merged MONDO label(s)

Drug / intervention mapping (UMLS)

  • ner_predicted_drugs: Drug names extracted via NER
  • linkbert_umls_drugs: Drug names after normalization / linking model
  • drug_umls_termid: UMLS concept identifiers (CUIs)
  • drug_umls_term_norm: Normalized UMLS labels
  • nearest_dataset_parent_umls: Closest parent UMLS concept (-1 if none)
  • nearest_dataset_parent_umls_label: Label of the parent concept
  • merged_umls_termid: Final merged UMLS identifier(s)
  • merged_umls_label: Final merged UMLS label(s)

Files

clinical.zip

Files (122.6 MB)

Name Size Download all
md5:334ffd84e97a6b0cc394e6d35f1fcada
122.6 MB Preview Download

Additional details