Fungal Divergent Actin: Trait mapping and function hypothesis
Project member:
This projects investigates the potential new function of a divergent actin form mainly found in fungal species (Fungal Divergent Actin (FDA)) using trait mapping and association analysis as a novel strategy to generate hypotheses about protein function. We use the ProteinCartography pipeline (, combined with trait mapping and trait association analysis to generate hypotheses about the function of FDA.
The overall idea is to identify a working set of fungal species for which we are able to confidently determine the presence or absence of fungal divergent actin, gather fungal trait information for these species (ecological traits, structural traits, genetic traits etc...) and determine whether there is any correlation between the presence/absence of FDA and one of the fungal trait. Any correlation between FDA presence/absence and fungal trait can then be used to infer hypothesis about the function of FDA.
Our approach is divided into four main steps:
- Step 1: Expanding the initial set of fungal species that possess FDA
- Step 2: Defining the 'working set of species' (set of fungal species for which we can determine their FDA status (presence or absence))
- Step 3: Curating fungal trait information
- Step 4: Statistical modeling of the association of FDA and chosen fungal traits
In this upload, we provide files that are parts of Steps 1 & 2:
- ProteinCartography folder of the ProteinCartography run for the FDA proteins:
- .csv table of all fungal proteins and associated species in Uniprot that have available structure in AlphaFold: Fungi_prot_uniprot.csv
ProteinCartography run:
(Step 1 - Expanding the initial set of fungal species that possess FDA)
The aim of the first step, is to detect as many fungal species as possible that possess FDA. For this, we used ProteinCartography (, to screen for protein that have similar structures than the previously identified Fungal Divergent Actin (REF actin pub) and expand the original cluster of FDA.
After identifying 6 representative sequences of Fungal Divergent Actins, we used each of the six proteins as input proteins for "Search Mode'' of the pipeline ProteinCartography. Full details on the ProteinCartography pipeline can be found in the GitHub repository and accompanying pub (
The contains all the inputs and outputs of the ProteinCartography run:
- fasta and pdb files for all 6 input proteins
- configuration file for the ProteinCatography run
- all output files (including main files presented in the Pub: similarity matrix, semantic analysis and aggregated_features_pca_umap in the subfolder output/clusteringresults
Uniprot list of fungal protein with available structure: Fungi_prot_uniprot.csv
(Step 2 - Defining the 'working set of species')
The aim of Step 2 is to identify the list of fungal species for which we can confidently determine the FDA presence/absence status. While ProteinCartography allows us to identify species that possess FDA, we also need to be able to confidently tell when a species doesn't possess FDA. Thus, the working set is defined as the set of species for which the presence or absence of FDA was putatively established.
Because ProteinCartography relies on protein structures available in UniProt and AlphaFold, we decided to define our working set as any fungal species that has a minimum of 6000 protein structures in AlphaFold. Then we consider that any species of this set that is not part of the extended cluster does not possess FDA.
Fungi_prot_uniprot.csv is the list of fungal proteins and associated species that have structure avaialable in AlphaFold from UniProt. This serves as the initial file to eventually count the number of proteins with available structures per species and identify the working set of fungal species - To obtain this list we conducted an 'Advanced search' in UniProt using the following query:
- 'Fungi' in Taxonomy field
- ' * ' for the field AlphaFoldDB cross-reference (found within the Cross reference /3D structure field)
(1.6 GB)
Name | Size | Download all |
858.8 MB | Preview Download |
773.4 MB | Preview Download |