MUSK: A Vision-Language Foundation Model for Precision Oncology

Xiang, Jinxi

doi:10.5281/zenodo.13315626

Published August 14, 2024 | Version v1

Computational notebook Restricted

MUSK: A Vision-Language Foundation Model for Precision Oncology

Xiang, Jinxi

Contributors

Contact persons:

Supervisor:

Li, Ruijiang²

1. Sichuan University
2. Stanford University

Abstract: Clinical decision-making is a process driven by multimodal data, including clinical notes and pathologic characteristics. Artificial intelligence approaches that can effectively integrate multimodal data hold significant promise to advance clinical care. However, the scarcity of well-annotated multimodal datasets in the clinical setting hinders the development of useful models. Here, we develop Multimodal transformer with Unified maSK modeling (MUSK), a vision-language foundation model designed to leverage large-scale, unlabeled, unpaired image-text data. MUSK is pre-trained on 50 million pathology images and 1 billion pathology-related text tokens using unified masked modeling. MUSK is further pre-trained on 1 million pathology image-text pairs to align vision and language features efficiently. After pretaining, MUSK is applied to a wide range of downstream tasks involving pathology images and/or text with minimal or no further fine-tuning. MUSK achieves superior performance across 21 patch-level and slide-level benchmarks, including image-to-text and text-to-image retrieval, visual question answering, and image classification. Importantly, MUSK shows promising performance in outcome prediction, including melanoma relapse prediction, pan-cancer prognosis prediction, and immunotherapy response prediction in lung and gastro-esophageal cancers. MUSK effectively combines complementary information from pathology images and clinical reports and can potentially improve diagnosis and precision cancer therapy.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Submitted: 2024-08-13

	All versions	This version
Views	711	711
Downloads	20	20
Data volume	7.1 GB	7.1 GB

MUSK: A Vision-Language Foundation Model for Precision Oncology

Creators

Contributors

Contact persons:

Supervisor:

Description

Files

Restricted

Additional details

Dates