Published February 12, 2025 | Version v0.1.0
Software Open

TEgenomeSimulator: A tool to simulate TE mutation and insertion into a random-synthesised or user-provided genome.

Description

TEgenomeSimulator

A tool to simulate transposable element (TE) mutation and insertion into a random-synthesised or user-provided genome.

This tool is released under the GPL license to promote advancements in TE research. You are welcome to raise issues at TEgenomeSimulator's Github repository for inquiries regarding TEgenomeSimulator or contribute to its development and improvement. For more details, please refer to Contributing Guideline.

Introduction

TEgenomeSimulator was created based on Matias Rodriguez & Wojciech Makałowski. Software evaluation for de novo detection of transposons. 2022. Mobile DNA. TEgenomeSimulator adopted and modified scripts from denovoTE-eval, and further developed with several new functionalities. The following table shows the major features that were kept, modified, or created in TEgenomeSimulator.

Features denovoTE-eval TEgenomeSimulator v0.1.0
Random synthesized genome O O
Custom genome X O
Simulation with multiple chromosomes X O
Automatic generation of TE mutation parameter tablea X O
Automatic generation of configuration yml filea X O
TE mutation: copy numberb O O (simplified)
TE mutation: substitution rate O O
TE mutation: INDEL rate O O
TE mutation: standard deviation of SNP O O
TE mutation: 5' fragmentation O O
TE mutation: target site duplication (TSD)c O O (enhanced)
TE mutation: strandedness O O
TE gff file: TE coordinates O O
TE gff file: TE family O O
TE gff file: TE superfamily if provided in TE lib fasta filed X O
TE gff file: TE subclass if provided in TE lib fasta filed X O
TE gff file: nucleotide identity info O O
TE gff file: fragmentation info O O
TE gff file: coordinates of disruption if TE cut by nested TE O O
TE gff file: tag to show if TE nested in other TE O O
TE gff file: ID of associated nested/disrupted TEs X O

a denovoTE-eval requires users to provide a configuration file and a TE mutation parameter table with information such as copy number, nucleotide identity, fragmentation, TSD etc., whereas TE genomeSimulator simplified the process and automatically helps users to create these files.

b denovoTE-eval requires users to specify the copy number of each TE family in the TE mutation parameter table, whereas TEgenomeSimulator allows users to specify a range of integer values where the copy number of each TE family would be randomly sampled.

c denovoTE-eval allows users to specify whether to simulate TSD using y|n with the length of TSD randomly picked between the range of 5 and 20, whereas TEgenomeSimulator tries to recognise the TE family information from the sequence headers in the TE library fasta file and then decides on the TSD length based on literature. For example, TEgenomeSimulator would set "5" as the TSD length of a Copia LTR-retrotransposon but "8–9" for a Mutator DNA transposon. You can check the code of prep_sim_TE_lib.py for more details about TSD length setting.

d To allow TEgenomeSimulator to include TE superfamily and class/subclass information in the final TE gff file, please follow the header format >{TE_family}#{TE_subclass}/{TE_superfamily}, such as >ATCOPIA10#LTR/Copia.

TEgenomeSimulator Github repository

For more details regarding installation, testing etc., please refer to https://github.com/Plant-Food-Research-Open/TEgenomeSimulator/ 

Files

TEgenomeSimulator-v0.1.0.main.zip

Files (7.3 MB)

Name Size Download all
md5:10885acd9aa19cb955ed607f9a38bf82
7.3 MB Preview Download

Additional details

Software

Repository URL
https://github.com/Plant-Food-Research-Open/TEgenomeSimulator/
Programming language
Python
Development Status
Active