Published February 3, 2023 | Version 1
Dataset Open

Datasets for "Targeted insertion and reporter transgene activity at a gene safe harbor of the human blood fluke, Schistosoma mansoni"

Description

To identify sites that could serve as potential genomic safe harbours (GSHs) for transgene integration, we conducted a genome-wide bioinformatic search based on established, widely accepted criteria, along with newly introduced criteria (below), that would satisfy benign and stable gene expression. 

At the outset, we identified euchromatic regions in all developmental stages of S. mansoni to avoid silencing genes to be integrated upon CRISPR/Cas manipulation. With these criteria, we enriched for regions that were, (i) close to peaks of H3K4me3, a histone modification that is associated with euchromatin and transcription start sites, (ii) regions that did not include H3K27me3, a histone modification that is associated with heterochromatin, (iii) regions of open euchromatin accessible to Tn5 integration, in an Assay of Transposase Accessible Chromatin sequencing (ATAC-seq) providing a positive display of integration events, and (iv) given that HIV-1 integrates preferentially into euchromatin in human cell lines, we used sites of HIV proviral integration known from S. mansoni to likewise support predictions of euchromatic regions.

Examination of the draft genome of S. mansoni in Worm Base Parasite, version 7 (WormBase Parasite) identified 6,884 regions with enrichment of H3K4me3 in the absence of H3K27me3 in available developmental stages (H3K4me3 not K3K27me3). In mature, adult schistosomes, we found consistently 10,533 ATAC positive regions. There were 4,027 ATAC regions that overlapped with H3K4me3 but not K3K27me3, and 2,915 genes overlapped with (ATAC and H3K4me3 not H3K27me3). Forty-two unambiguous HIV integration sites were identified, and eight genes were ≤ 11 kb upstream or downstream from these integration sites. Repeats were masked with RepeatMasker V4.1.0 using a specific repeat library produced with RepeatModeler2 V2.0.1 and stored as a GFF file.

To identify intergenic GSH, we located 10,149 intergenic regions.  There were 9,985 regions beyond 2 kb upstream and 8,837 regions outside long non-coding-RNA (lncRNA), which were intersected to 95,587 unique intergenic regions outside 2 kb and lncRNA of ≥100 bp.  Two hundred regions were identified intersecting with merged ATAC H3K4me3 signal. Four of these were situated ≤ 11 kb distance from HIV integration sites. 

Made at George Washington University, Justus Liebig University Giessen, Khon Kaen University, Naresuan University, Aberystwyth University, Schistosomiasis Resource Center, IHPE. 

Files

Files (107.6 MB)

Name Size Download all
md5:33153b78271b42048eb5a3cdecfb9b6e
432.9 kB Download
md5:23e299abd7061ae20dc06298ea8bcfb6
96 Bytes Download
md5:0fa3977ec7f744eeb02de28f56381bd6
60.4 kB Download
md5:945392aeae88881233072d8bff06151f
457.4 kB Download
md5:0cd88916ca614e9959f08abfe340adcb
259.7 kB Download
md5:2cd69144c77877b559cd5e663ab0ec0e
707.7 kB Download
md5:7615aa71214f51c0a183b6cf82685502
707.7 kB Download
md5:41c6edfe2e1e4ed3c6659d570cbb4570
76.7 kB Download
md5:22965ba508fb184c2bf1d630e1ea8bc8
5.3 kB Download
md5:5c6dd8ad24a848580c08092457ce1817
255.6 kB Download
md5:8477228cc2bc1bbe631577c4873d4024
226.3 kB Download
md5:08cb033ae9b48735089ff776d144ee2f
1.1 kB Download
md5:6e8cd48bee30c4828f76d4c39fbd6768
87.1 MB Download
md5:55e03e9c110e8f6cfdeb901efc806e57
452.3 kB Download
md5:acc6fb0d5b9306894d0637744e769446
8.4 kB Download
md5:2cd94a46e3f62ba26df3d57385b26b07
4.1 MB Download
md5:3f0e3a703b5a0ef40910339ad797d209
12.8 MB Download