Published October 24, 2023 | Version v1
Dataset Open

Machine learning suggests small size is a key determinant of plasmid host range

  • 1. The Ohio State University

Description

Plasmids mediate gene exchange across taxonomic barriers through conjugation, shaping bacterial evolution for billions of years. While plasmid mobility can be harnessed for genetic engineering and drug-delivery applications, rapid plasmid-mediated spread of resistance genes has rendered most clinical antibiotics useless, posing an existential threat to human society. To solve this urgent problem, we must understand how plasmids spread across bacterial communities. Here, we applied machine-learning models to identify features that determine plasmid host range. We assembled an up-to-date dataset of more than thirty thousand bacterial plasmids, separated them into 1125 clusters, and assigned a distribution possibility score, which takes host distribution of each taxonomic rank and the sampling bias of the existing sequencing data into account, for each cluster. Using this score and an optimized plasmid feature pool, we built a model stack consisting of DecisionTreeRegressor, EvoTreeRegressor, and LGBMRegressor as base models and LinearRegressor as a meta-learner. Our analysis reveals that a short sequence length is most important for successful plasmid spread, followed by P-loop NTPases, mobility factors, and β-lactamases. Ours and other recent results suggest that small plasmids broaden their range by evading host defenses and using alternative modes of transfer instead of autonomous conjugation.

Other

Funding provided by: National Institutes of Health
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000002
Award Number: GM067153

Files

data_matrix.csv

Files (376.4 MB)

Name Size Download all
md5:c0ad32520b0d8eb1c88709fc17d7cb88
541.5 kB Preview Download
md5:07d029924923479b707a4f254b472ac9
122.8 MB Download
md5:5e3a067d3f284b17b4748d0770856677
142.3 MB Download
md5:183357a1623a51d98553524d366ea506
110.7 MB Download
md5:e342f3f4fa63c09d55ba399d0731cc4a
6.0 kB Preview Download