What Do Genomic Transformers Attend To? Interpreting Attention Heads Across DNABERT, Nucleotide Transformer, and scGPT

Consens, Micaela E.; Diaz-Navarro, Ander; Chu, Vivian; Stein, Lincoln; Moses, Alan; He, Housheng; Wang, Bo

doi:10.5281/zenodo.15484539

Published May 21, 2025 | Version v1

Dataset Open

What Do Genomic Transformers Attend To? Interpreting Attention Heads Across DNABERT, Nucleotide Transformer, and scGPT

1. University of Toronto
2. Vector Institute
3. University Health Network
4. Ontario Institute for Cancer Research

Transformer models have shown strong performance on biological sequence prediction tasks, but the interpretability of their internal mechanisms remains underexplored. As the use-case of these models is in the context of biomedical research, mechanistic understanding of the predictions is key to their widespread adoption. We introduce a method to interpret attention heads in genomic transformers by correlating per-token attention scores with curated biological annotations and summarize each head’s focus using GPT-4. Applying this to DNABERT, Nucleotide Transformer, and scGPT, we find that attention heads learn biologically meaningful associations during unsupervised pre-training and that these associations shift with fine-tuning. We show that interpretability varies with tokenization scheme, and that context-dependence plays a key role in head behaviour. Through ablation, we demonstrate that heads strongly associated with biological features are more important for task performance than uninformative heads in the same layers. In DNABERT trained for TATA promoter prediction, we observe heads with positive and negative associations reflecting positive and negative learning dynamics. Our results offer a framework to trace how biological features are learned from random initialization to pretraining to finetuning, enabling insight into how these models represent nucleotides, genes, and cells.

Files

Files (8.1 GB)

Name	Size	Download all
genome_information.tar.gz md5:76ac92e3d8552b2a04dc89e0196b077a	162.9 kB	Download
intermediate.tar.gz md5:eca672cb1dc85706e0f4b08813180fc6	372.1 MB	Download
processed.tar.gz md5:74b633bd06ff8ec4f94a8b6e49f51ea2	3.7 GB	Download
raw.tar.gz md5:2988f26c4ac9edb917198d0143f80222	4.1 GB	Download
z_score_matrices.tar.gz md5:1b6379b6f676ee59813dbe487b276492	10.4 MB	Download

Additional details

Available: 2025-05-21

Repository URL: https://github.com/meconsens/genome-head-interpreter
Programming language: Python, R

	All versions	This version
Views	154	154
Downloads	26	26
Data volume	36.2 GB	36.2 GB

What Do Genomic Transformers Attend To? Interpreting Attention Heads Across DNABERT, Nucleotide Transformer, and scGPT

Files

Files (8.1 GB)

Additional details

Dates

Software

What Do Genomic Transformers Attend To? Interpreting Attention Heads Across DNABERT, Nucleotide Transformer, and scGPT

Creators

Description

Files

Files (8.1 GB)

Additional details

Dates

Software