Innate antiviral systems are major defensome components that influence prophage distribution in Acinetobacter baumannii
Creators
Description
In this project, we have analysed the defensome of Acinetobacter baumannii with the aim of profiling different defense systems associated with particular prophage profiles, as well as to predict which systems are more effective and against which specific phages, associating both positively and negatively prophages to defense systems using machine learning techniques.
DOI (Biorxiv): https://doi.org/10.1101/2024.10.26.620419
Python scripts
Package versions: numpy 1.26.4 pandas 2.2.2
binary_matrix.py
Generate a binary matrix of defense systems using genomes without prophages, as input of the Upset plot.
coocurr_matrix.py
Generate a matrix of defense systems coappearance, as input in the fig 2A.
defsys_pres_ann.py
Create a presence-absence matrix of defense systems.
freq_phages_bymlst.py
Get the most frequent prophages (10% of genomes) per MLST (provided in a list).
matrix_mlst_phages_freq.py
Generate two matrix of absolute and relative frequency, respectively, of prophages by frequent MLST group.
pres_aus_matrix_cl.py
Create a presence-absence matrix of prophages.
matrix_preaus_ml.py
Add to the presence-absence matrix of prophages two columns: one with the defense systems of each genome and another with the MLST group to which they belong.
cdhit_heamtap.py
Read CD-HIT output files and builds a variant matrix with the most prevalent clusters.
triangle_to_square.py
Read an upper triangular matrix (emboss format) format and converts it into a square matrix.
merge_dist.py
Merge both distance values (phylogenetic distance from the tree built in IQ-TREE and Kimura distance from the sequence alignment) from the same genome.
and determine the MLST relationship between those genomes.
Phylogeny
Use assembly_seq.pl and uniq_sl.pl to build the initial multifasta with only the core genes, as input of MAFFT software. The generated MSA is processed using Clipkit, to eliminate gaps and keep the most informative regions. The processed MSA is used as input to iqTREE to generate the tree.
Circos
Circos were plotted using files generated by prepareForCircos2.pl. This script uses "defsys_presaus_ann.tsv", "logical_viruses.tsv" and a list of genomes of each MLST group to create the input file for the figure. These files are also provided.
README.txt
A more detailed version of the protocol used to generate the results and figures used in the paper.
Files
aba_defensome.zip
Files
(7.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:84cb66080bab86e7ec5b4dfe47ad6aca
|
7.8 MB | Preview Download |