Published March 25, 2025
| Version v10
Software
Open
Analysis of CRISPR-Cas systems in Pseudomonas aeruginosa and PAM sequences
Authors/Creators
Description
Here you will find the scripts and data associated with the research article "Bacteriophages in Pseudomonas aeruginosa evade the CRISPR-Cas I-F system by depletion of PAM sequences" published in Microbial Genomics:
Ortega-Sanz, I., Rubio, A., & Pérez-Pulido, A. J. (2025). Bacteriophages in Pseudomonas aeruginosa evade the CRISPR-Cas I-F system by depletion of PAM sequences. Microbial genomics, 11(6), 001423. https://doi.org/10.1099/mgen.0.001423
Technical info (English)
A brief comment on the purpose of each script is provided here:
- script1_spacers_df.R Construction of the Pseudomonas aeruginosa spacers dataframe for the df2fasta() function of the Spacer2PAM library. The spacers were collected from the output of CRISPRCasFinder and filtered based on known CRISPR-Cas array orientation and evidence level equal to 4, as well as known subtype determined by CRISPRCas-Typer.
- script2_PAM_prediction.R After the information regarding each spacer has been collected, the PAM for each CRISPR-Cas subtype will be predicted using Spacer2PAM.
- script3_PLSDB_IMGVR_sequences_filtering.sh From the PLSDB database v2020_06_23_v2 and the IMG/VR v3 high-quality genomes database, the P. aeruginosa sequences will be filtered.
- script4_plasmids_viruses_BLAST.sh The spacers representing each P. aeruginosa CRISPR-Cas subtype will be blasted against the P. aeruginosa plasmids and viruses from the PLSDB and IMG/VR databases, respectively (default configuration will evaluate for PAM recognized by the I-C subtype and IMG/VR database).
- script5_DNA_logos_plasmids_viruses.R For the P. aeruginosa plasmids and viruses from the PLSDB and IMG/VR databases, respectively, that are recognized by each P. aeruginosa CRISPR-Cas system subtype, the DNA logo will be constructed (an example is provided for P. aeruginosa CRISPR-Cas subtype I-C and IMG/VR database).
- script6_PAM_freq_GC.sh Determination of the occurrence of the PAM and GC content in the foreign sequences (plasmids and viruses from the PLSDB and IMG/VR databases, respectively) recognized by each P. aeruginosa CRISPR-Cas system (default configuration will evaluate for PAM recognized by the I-C subtype and IMG/VR database).
The content of the Supplementary Material is described below:
- Supplementary Table S1. Metadata collection of P. aeruginosa spacers.
- Supplementary Table S2. Comprehensive PAM predictions for each P. aeruginosa CRISPR-Cas subtype using Spacer2PAM.
- Supplementary Table S3. Abundance (%) of the different PAM sequences of length matching those predicted by Spacer2PAM found in the plasmids and viruses targeted by the P. aeruginosa CRISPR-Cas systems. Targeted plasmids and viruses were defined as those with a sequence identity ≥ 95% and a query coverage = 100%, against spacers of a P. aeruginosa CRISPR-Cas system, respectively.
- Supplementary Table S4. Expected and observed PAM frequency found in the plasmids and viruses targeted by the P. aeruginosa CRISPR-Cas systems. Targeted plasmids and viruses were defined as those with a sequence identity ≥ 95%. Expected frequencies were calculated based on the frequencies of each nucleotide found in the sequences.
- Supplementary Figure S1. Distribution of plasmids (A) and viruses (B) being targeted by the P. aeruginosa CRISPR-Cas subtypes I-C, I-E and I-F.
- Supplementary Figure S2. Abundance (%) of the different PAM sequences of length matching that predicted by Spacer2PAM that were found in the plasmids (A) and viruses (B) targeted by the P. aeruginosa CRISPR-Cas subtypes I-C, I-E and I-F. The most frequent PAM sequence for each CRISPR-Cas system is shown in gray (5’-TTC for subtype I-C, 5’-AAG for subtype I-E, and 5’-CC for subtype I-F), while other PAMs are shown in white. The top-4 more frequent mutated PAMs are labelled. Frequency for all PAM sequences identified can be found in Supplementary Table S3.
- Supplementary Figure S3. Abundance (%) of the different PAM sequences of length matching those predicted by Spacer2PAM found in the (A) plasmids and (B) viruses targeted by the P. aeruginosa CRISPR-Cas systems. Targeted plasmids and viruses were defined as those with a sequence identity ≥ 95% (or 100%) and a query coverage = 100% against spacers of a P. aeruginosa CRISPR-Cas system, respectively. Frequency for all PAM sequences identified can be found in Supplementary Table S3.
- Supplementary Figure S4. Distribution of PAM expected and observed frequencies in the groups of targeted and non-targeted sequences by the P. aeruginosa CRISPR-Cas subtypes I-C, I-E and I-F. (A) Targeted plasmids, (B) Non-targeted plasmids, (C) Targeted viruses, and (D) Non-targeted viruses. Dashes lines show the PAM frequency in the P. aeruginosa reference strain PAO1 (GCF_000006765.1). The Holm method was used to adjust the p-value for multiple comparisons. The numerical values can be found in Supplementary Table S4.
Files
Supplementary Figures.pdf
Files
(11.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:49d15dc6c0465c92acf3305e055816bf
|
15.3 kB | Download |
|
md5:0b3d6fa770f8d5db62e35b2a9ba887c4
|
3.5 kB | Download |
|
md5:85b84c064902cf418885ce85bb283272
|
4.5 kB | Download |
|
md5:2f284f565ec6e2029318f53b202dfa09
|
1.5 kB | Download |
|
md5:2098fe11d1203b7770fe7c8699ec45a3
|
7.7 kB | Download |
|
md5:0acf8e9311732fa1979629d791d74a8e
|
4.3 kB | Download |
|
md5:865ef6cd942c6b712fa0a95791ae4d9c
|
613.3 kB | Preview Download |
|
md5:daf688127de29e83fb35a7677ebd8ec2
|
10.9 MB | Download |
Additional details
Related works
- Is published in
- Journal article: 10.1099/mgen.0.001423 (DOI)