There is a newer version of the record available.

Published July 11, 2025 | Version v3
Dataset Open

Innate antiviral systems are major defensome components that influence prophage distribution in Acinetobacter baumannii

  • 1. ROR icon Universidad Pablo de Olavide
  • 2. ROR icon Centro Andaluz de Biología del Desarrollo

Description

In this project, we have analysed the defensome of Acinetobacter baumannii with the aim of profiling different defense systems associated with particular prophage profiles, as well as to predict which systems are more effective and against which specific phages, associating both positively and negatively prophages to defense systems using machine learning techniques.

DOI (Biorxiv): https://doi.org/10.1101/2024.10.26.620419

Python scripts

Package versions: numpy 1.26.4 pandas 2.2.2

binary_matrix.py 

Generate a binary matrix of defense systems using genomes without prophages, as input of the Upset plot.

coocurr_matrix.py

Generate a matrix of defense systems coappearance, as input in the fig 2A.

defsys_pres_ann.py

Create a presence-absence matrix of defense systems.

freq_phages_bymlst.py

Get the most frequent prophages (10% of genomes) per MLST (provided in a list).

matrix_mlst_phages_freq.py

Generate two matrix of absolute and relative frequency, respectively, of prophages by frequent MLST group.

pres_aus_matrix_cl.py

Create a presence-absence matrix of prophages.

matrix_preaus_ml.py

Add to the presence-absence matrix of prophages two columns: one with the defense systems of each genome and another with the MLST group to which they belong.

cdhit_heamtap.py

Read CD-HIT output files and builds a variant matrix with the most prevalent clusters.

triangle_to_square.py

Read an upper triangular matrix (emboss format) format and converts it into a square matrix.

merge_dist.py

Merge both distance values (phylogenetic distance from the tree built in IQ-TREE and Kimura distance from the sequence alignment) from the same genome.
and determine the MLST relationship between those genomes.

 

Phylogeny

Use assembly_seq.pl and uniq_sl.pl to build the initial multifasta with only the core genes, as input of MAFFT software. The generated MSA is processed using Clipkit, to eliminate gaps and keep the most informative regions. The processed MSA is used as input to iqTREE to generate the tree.

Circos

Circos were plotted using files generated by prepareForCircos2.pl. This script uses "defsys_presaus_ann.tsv", "logical_viruses.tsv" and a list of genomes of each MLST group to create the input file for the figure. These files are also provided.

README.txt

A more detailed version of the protocol used to generate the results and figures used in the paper.

Files

aba_defensome.zip

Files (7.8 MB)

Name Size Download all
md5:84cb66080bab86e7ec5b4dfe47ad6aca
7.8 MB Preview Download

Additional details

Software

Programming language
Python, R, Perl