Published April 28, 2026 | Version v1
Dataset Open

AlphaFold 3 hexameric resistosome models for 637 NRC-clade NLR proteins from Solanaceae

  • 1. ROR icon Sainsbury Laboratory

Description

This dataset contains all AlphaFold 3 (AF3) structural predictions generated for the study above. The set comprises 1,911 AF3 inference runs covering 637 non-redundant NRC-clade NLR proteins drawn from a reannotation of 346 Solanaceae genomes spanning 85 species. Each NRC sequence was modeled as a homohexamer with 25 oleic-acid molecules as a stand-in for the plasma membrane, in three independent seeds (1, 2 and 3); each AF3 inference returned five sampled diffusion outputs per seed. These models are the input to the Structural Novelty Index (SNI) framework introduced in the paper.

The full dataset is a single tar archive (output.tar, ~164 GB) split into 17 ten-gigabyte parts. Because the total size exceeds Zenodo’s 50 GB per-record limit, the parts are distributed across four sibling Zenodo records (this record holds only the README, reassembly script, and checksums).

 

The 17 split parts are distributed across four Zenodo records:

Record

DOI

Parts included

Size

Data part 1/4

https://doi.org/10.5281/zenodo.19859996

output.tar.part-00, -01, -02, -03

~40 GB

Data part 2/4

https://doi.org/10.5281/zenodo.19862508

output.tar.part-04, -05, -06, -07

~40 GB

Data part 3a/4

https://doi.org/10.5281/zenodo.19880570

output.tar.part-08, -09, -10

~30 GB

Data part 3b/4

https://doi.org/10.5281/zenodo.19924056

output.tar.part-11

~10 GB

Data part 4/4

https://doi.org/10.5281/zenodo.19880646

output.tar.part-12, -13, -14, -15, -16

~43.3 GB

 

This record (the index) contains:

File

Description

README.md

Dataset description, methods, file structure

SHA256SUMS.txt

SHA-256 of every part and of the original output.tar

reassemble.sh

One-command reassembly + verification (macOS / Linux / WSL / Git Bash)

benchmark.zip

AF3 models of the 18-protein SNI benchmark set (9 hexameric NRCs + 9 NRC-S)

NRC7_structures.zip

AF3 models of StNRC7 and SlNRC7 as 5-mer to 11-mer assemblies

 

How to access the data

1.        Download README.md, SHA256SUMS.txt and reassemble.sh from this record.

2.        Download all 17 output.tar.part-* files from the four data records into the same folder as the script and checksums. (Downloading from each record into a single working folder is fine — the script does not care which record any given part came from.)

3.        Reassemble and extract:

macOS / Linux / WSL / Git Bash

bash reassemble.sh
tar -xf output.tar

reassemble.sh verifies each part against SHA256SUMS.txt, concatenates the parts in order, then verifies that the reassembled output.tar matches the original SHA-256.

Plain shell (no script)

# Verify parts
sha256sum    -c SHA256SUMS.txt   # Linux
shasum -a 256 -c SHA256SUMS.txt  # macOS

# Reassemble
cat output.tar.part-* > output.tar

# Extract
tar -xf output.tar

Windows (PowerShell, no WSL)

cmd /c "copy /b output.tar.part-* output.tar"
Get-FileHash output.tar -Algorithm SHA256
# Compare with the '# original-sha256:' line in SHA256SUMS.txt

Disk requirements

•          During reassembly: ~328 GB (parts + reassembled output.tar) before extraction.

•          After full extraction of the per-job archives: an additional ~250 GB.

•          For partial use, the manifest TSV inside output.tar and attached here lets you target individual jobs without extracting everything.

 

 

AlphaFold 3 modeling

All structural models were generated with a local installation of AlphaFold 3 (https://github.com/google-deepmind/alphafold3). For each of the 637 input sequences, the CC–NB-ARC region (from the N-terminus to the end of the NB-ARC domain) was used as the modeling target. Each sequence was modeled as a homohexamer (six identical protomers) in complex with 25 oleic-acid (OLA) molecules as a stand-in for the plasma-membrane lipid environment.

To process the dataset efficiently, the AF3 data pipeline (template search and multiple sequence alignment generation) was decoupled from the inference step (see https://github.com/google-deepmind/alphafold3/blob/main/docs/performance.md#data-pipeline). Templates and MSAs were generated once per sequence, then inference was run three times with random seeds 1, 2 and 3.

Inference for the 1,911 hexamer jobs was run on NVIDIA A100 80 GB GPUs, within the ~5,000-token limit imposed by that hardware. The larger NRC7 5-mer to 11-mer assemblies modeled in the paper exceed that token budget and were generated separately on H200 141 GB GPUs.

 

SNI benchmark set (benchmark.zip)

benchmark.zip contains the AlphaFold 3 models of the 18-protein benchmark set used in the paper to demonstrate that SNINRC-Hexa separates canonical hexameric NRCs from their phylogenetically related NRC-S (sensor) NLRs — the analysis underlying Figure 1A.

The benchmark comprises 9 NRC helpers (the three NRCs with experimentally resolved hexameric structures — NbNRC2a, SlNRC3 and NbNRC4c — plus two orthologs from each of their respective phylogenetic clades) and 9 NRC-S sensor NLRs drawn from across the NRC superclade as outgroups. Each protein was modeled as a homohexamer with 25 oleic-acid molecules as a stand-in for the plasma membrane, in three independent seeds — yielding 18 × 3 = 54 inference jobs.

 

NRC7 5-mer to 11-mer assemblies (NRC7_structures.zip)

NRC7_structures.zip contains AlphaFold 3 models of the CC–NB-ARC domains of StNRC7 (Solanum tuberosum) and SlNRC7 (Solanum lycopersicum) modeled across a range of stoichiometries — 5-mer, 6-mer, 7-mer, 8-mer, 9-mer, 10-mer and 11-mer assemblies, in three independent seeds per stoichiometry per protein.

 

•    Genome annotations and NLR sequence data — Toghani et al. https://doi.org/10.5281/zenodo.19855163

•    Analysis code for SNI computation: https://github.com/amiralito/SolNRCH_foldome.

 

 

Files

benchmark.zip

Files (13.6 GB)

Name Size Download all
md5:da2ec8cd9f9acc56e8995d2f7724655f
7.8 GB Preview Download
md5:1c3211bf43c95663007d7f549fb8eaaa
5.7 GB Preview Download
md5:c66bcab4a7b0de5bfd91d7bd40519c80
1.6 kB Download
md5:60fcd711de010fa6494b25931e82d37b
1.6 kB Preview Download
md5:5413f6688ec2217d2eaa96ad1bc74673
379.3 kB Download