Published May 26, 2022 | Version v1
Dataset Open

Sequence Similarity Network (SSN) and Genome Neighbourhood Network (GNN) for Mycobacterium Cytochrome P450 enzymes

  • 1. Swansea University


This dataset was generated in the context of the Horizon 2020 MSCA IF action deCrYPtion (Grant 839116). The aim of this project is to use comparative genomics in order to propose and then test the function of uncharacterised Cytochrome P450 enzymes that are present among Mycobacterium species.

More information about this project can be found at:

This dataset contains:

- The FASTA sequences files obtained from the UniProt database, for members of the PF00067 protein family (CYP).

- A set of reference FASTA sequences, matching the supplementary material from the following publication: Parvez, M. et al. (2016) ‘Molecular evolutionary dynamics of cytochrome P450 monooxygenases across kingdoms: Special focus on mycobacterial P450s’, Scientific Reports, 6(1), p. 33099. doi:10.1038/srep33099.

- A combined FASTA files of both previously described, that was used for the generation of SSNs

- A PNG image produced from the analysis of the Sequence Similarity Networks generated at AST78 (corresponding to 40% identity, defining CYP families)

- A PNG image produced from the analysis of the Sequence Similarity Networks generated at AST141 (corresponding to 55% identity, defining CYP subfamilies)

- A Cytoscape session for the Sequence Similarity Networks from the combined FASTA file generated using the Enzyme Function Initiative web tools (, at AST78

- A Cytoscape session containing Sequence Similarity Networks and Genome Neighborhood Network from the combined FASTA file  generated using the Enzyme Function Initiative web tools (, at AST141


Publication_plus_CYPs_from_genus_Mycobacterium_1_Oct_19_fasta_AST131_ID55 Full Network colorized with Title.png

Additional details


European Commission
deCrYPtion – Decrypting Mycobacterium cytochrome P450 (CYP) physiological functions by testing hypotheses emitted form large-scale comparative genomics analysis 839116