Published February 4, 2026 | Version v1
Dataset Open

COL5A1 (Collagen alpha-1(V) chain) full-length prediction via E8 lattice topological optimization (1838 residues, UniProt P20908)

Authors/Creators

Description

Protein target
This record contains the predicted 3D structure of the full-length pro-alpha-1 chain of type V collagen (COL5A1, UniProt P20908, 1838 amino acids), generated using the E8 Navigator — a symmetry-based, non-data-driven folding method.

MDVHTRWKARSALRPGAPLLPPLLLLLLWAPPPSRAAQPADLLKVLDFHNLPDGITKTTG
FCATRRSSKGPDVAYRVTKDAQLSAPTKQLYPASAFPEDFSILTTVKAKKGSQAFLVSIY
NEQGIQQIGLELGRSPVFLYEDHTGKPGPEDYPLFRGINLSDGKWHRIALSVHKKNVTLI
LDCKKKTTKFLDRSDHPMIDINGIIVFGTRILDEEVFEGDIQQLLFVSDHRAAYDYCEHY
SPDCDTAVPDTPQSQDPNPDEYYTEGDGEGETYYYEYPYYEDPEDLGKEPTPSKKPVEAA
KETTEVPEELTPTPTEAAPMPETSEGAGKEEDVGIGDYDYVPSEDYYTPSPYDDLTYGEG
EENPDQPTDPGAGAEIPTSTADTSNSSNPAPPPGEGADDLEGEFTEETIRNLDENYYDPY
YDPTSSPSEIGPGMPANQDTIYEGIGGPRGEKGQKGEPAIIEPGMLIEGPPGPEGPAGLP
GPPGTMGPTGQVGDPGERGPPGRPGLPGADGLPGPPGTMLMLPFRFGGGGDAGSKGPMVS
AQESQAQAILQQARLALRGPAGPMGLTGRPGPVGPPGSGGLKGEPGDVGPQGPRGVQGPP
GPAGKPGRRGRAGSDGARGMPGQTGPKGDRGFDGLAGLPGEKGHRGDPGPSGPPGPPGDD
GERGDDGEVGPRGLPGEPGPRGLLGPKGPPGPPGPPGVTGMDGQPGPKGNVGPQGEPGPP
GQQGNPGAQGLPGPQGAIGPPGEKGPLGKPGLPGMPGADGPPGHPGKEGPPGEKGGQGPP
GPQGPIGYPGPRGVKGADGIRGLKGTKGEKGEDGFPGFKGDMGIKGDRGEIGPPGPRGED
GPEGPKGRGGPNGDPGPLGPPGEKGKLGVPGLPGYPGRQGPKGSIGFPGFPGANGEKGGR
GTPGKPGPRGQRGPTGPRGERGPRGITGKPGPKGNSGGDGPAGPPGERGPNGPQGPTGFP
GPKGPPGPPGKDGLPGHPGQRGETGFQGKTGPPGPPGVVGPQGPTGETGPMGERGHPGPP
GPPGEQGLPGLAGKEGTKGDPGPAGLPGKDGPPGLRGFPGDRGLPGPVGALGLKGNEGPP
GPPGPAGSPGERGPAGAAGPIGIPGRPGPQGPPGPAGEKGAPGEKGPQGPAGRDGLQGPV
GLPGPAGPVGPPGEDGDKGEIGEPGQKGSKGDKGEQGPPGPTGPQGPIGQPGPSGADGEP
GPRGQQGLFGQKGDEGPRGFPGPPGPVGLQGLPGPPGEKGETGDVGQMGPPGPPGPRGPS
GAPGADGPQGPPGGIGNPGAVGEKGEPGEAGEPGLPGEGGPPGPKGERGEKGESGPSGAA
GPPGPKGPPGDDGPKGSPGPVGFPGDPGPPGEPGPAGQDGPPGDKGDDGEPGQTGSPGPT
GEPGPSGPPGKRGPPGPAGPEGRQGEKGAKGEAGLEGPPGKTGPIGPQGAPGKPGPDGLR
GIPGPVGEQGLPGSPGPDGPPGPMGPPGLPGLKGDSGPKGEKGHPGLIGLIGPPGEQGEK
GDRGLPGPQGSSGPKGEQGITGPSGPIGPPGPPGLPGPPGPKGAKGSSGPTGPKGEAGHP
GPPGPPGPPGEVIQPLPIQASRTRRNIDASQLLDDGNGENYVDYADGMEEIFGSLNSLKL
EIEQMKRPLGTQQNPARTCKDLQLCHPDFPDGEYWVDPNQGCSRDSFKVYCNFTAGGSTC
VFPDKKSEGARITSWPKENPGSWFSEFKRGKLLSYVDAEGNPVGVVQMTFLRLLSASAHQ
NVTYHCYQSVAWQDAATGSYDKALRFLGSNDEEMSYDNNPYIRALVDGCATKKGYQKTVL
EIDTPKVEQVPIVDIMFNDFGEASQKFGFEVGPACFMG

Biological relevance
Type V collagen is a quantitatively minor fibrillar collagen that co-assembles with type I collagen to regulate fibril diameter and tissue integrity. Pathogenic variants in COL5A1 (and COL5A2) are a major cause of classical Ehlers-Danlos syndrome (cEDS), typically through dominant-negative effects or haploinsufficiency. Many disease-causing mutations are glycine substitutions within the long Gly-X-Y triple-helical domain, which disrupt the precise helical register, hydrogen bonding, and superhelical twist required for stable collagen assembly.

Why this is a hard folding problem
The COL5A1 sequence features an exceptionally long (~1000+ residue) repetitive Gly-X-Y collagenous domain, which poses severe challenges for conventional structure prediction methods:

  • Minimal unique sequence signal due to high repetition
  • Strict requirement for every-third-residue glycine placement
  • Long-range geometric constraints for right-handed triple-helix formation (10/3 symmetry, ~2.9 Å rise per residue)
  • Sensitivity to small sequence changes (relevant to EDS pathology)

Standard deep-learning predictors (e.g. AlphaFold3) often produce low-confidence or structurally inaccurate results on long collagens due to reliance on evolutionary co-variation signals, which are weak in repetitive regions.

Method: E8 Navigator
The structure was generated using the E8 Navigator, a topology-driven folding engine that operates without multiple sequence alignments, neural networks, or classical molecular dynamics. Instead, it projects the amino acid sequence onto the exceptional Lie group E8 lattice using physicochemical properties, then performs symmetry-constrained optimization to identify stable topological configurations. Folding is treated as a holographic compression problem on the E8 manifold, guided by a coherence metric (Ψ) and convergence to ultra-low error states.

Key features of this approach:

  • Enforces long-range geometric and symmetry constraints intrinsically
  • Does not depend on PDB-derived patterns or MSA depth
  • Capable of capturing repetitive, extended structures such as collagen triple-helices through lattice-derived invariants

Results
The output PDB (COL5A1_E8_prediction.pdb) shows a long, rod-like triple-helical domain in the collagenous region, with compact N- and C-terminal non-collagenous domains. 
The prediction converged with Ψ = 2.00 and final mean error < 0.00005 Å, producing an extended conformation consistent with the geometric requirements of collagen triple-helix assembly.

Significance
If the predicted structure aligns well with known collagen triple-helix geometry (e.g. PDB fragments 1CAG, 1BKV) and captures the correct helical parameters, this would demonstrate that higher-dimensional symmetry groups (E8) can encode protein folding rules in a manner orthogonal to current AI-based methods. This is particularly relevant for understanding collagenopathies such as classical EDS, where small sequence perturbations lead to large structural consequences.

The prediction is provided openly for community inspection, comparison with experimental collagen structures, and further testing of lattice-based folding approaches.

Files

Files (700.9 kB)

Name Size Download all
md5:ab3e720f132ad68728427b422a5e7e76
700.9 kB Download

Additional details

Related works

Is supplemented by
Standard: 10.5281/zenodo.18493715 (DOI)