Datasets for "Targeted insertion and reporter transgene activity at a gene safe harbor of the human blood fluke, Schistosoma mansoni"
Creators
- 1. George Washington University
- 2. Justus Liebig University Giessen
- 3. IHPE
- 4. Khon Kaen University
- 5. Schistosomiasis Resource Center
Description
To identify sites that could serve as potential genomic safe harbours (GSHs) for transgene integration, we conducted a genome-wide bioinformatic search based on established, widely accepted criteria, along with newly introduced criteria (below), that would satisfy benign and stable gene expression.
At the outset, we identified euchromatic regions in all developmental stages of S. mansoni to avoid silencing genes to be integrated upon CRISPR/Cas manipulation. With these criteria, we enriched for regions that were, (i) close to peaks of H3K4me3, a histone modification that is associated with euchromatin and transcription start sites, (ii) regions that did not include H3K27me3, a histone modification that is associated with heterochromatin, (iii) regions of open euchromatin accessible to Tn5 integration, in an Assay of Transposase Accessible Chromatin sequencing (ATAC-seq) providing a positive display of integration events, and (iv) given that HIV-1 integrates preferentially into euchromatin in human cell lines, we used sites of HIV proviral integration known from S. mansoni to likewise support predictions of euchromatic regions.
Examination of the draft genome of S. mansoni in Worm Base Parasite, version 7 (WormBase Parasite) identified 6,884 regions with enrichment of H3K4me3 in the absence of H3K27me3 in available developmental stages (H3K4me3 not K3K27me3). In mature, adult schistosomes, we found consistently 10,533 ATAC positive regions. There were 4,027 ATAC regions that overlapped with H3K4me3 but not K3K27me3, and 2,915 genes overlapped with (ATAC and H3K4me3 not H3K27me3). Forty-two unambiguous HIV integration sites were identified, and eight genes were ≤ 11 kb upstream or downstream from these integration sites. Repeats were masked with RepeatMasker V4.1.0 using a specific repeat library produced with RepeatModeler2 V2.0.1 and stored as a GFF file.
To identify intergenic GSH, we located 10,149 intergenic regions. There were 9,985 regions beyond 2 kb upstream and 8,837 regions outside long non-coding-RNA (lncRNA), which were intersected to 95,587 unique intergenic regions outside 2 kb and lncRNA of ≥100 bp. Two hundred regions were identified intersecting with merged ATAC H3K4me3 signal. Four of these were situated ≤ 11 kb distance from HIV integration sites.
Made at George Washington University, Justus Liebig University Giessen, Khon Kaen University, Naresuan University, Aberystwyth University, Schistosomiasis Resource Center, IHPE.
Files
Files
(107.6 MB)
Name | Size | Download all |
---|---|---|
md5:33153b78271b42048eb5a3cdecfb9b6e
|
432.9 kB | Download |
md5:23e299abd7061ae20dc06298ea8bcfb6
|
96 Bytes | Download |
md5:0fa3977ec7f744eeb02de28f56381bd6
|
60.4 kB | Download |
md5:945392aeae88881233072d8bff06151f
|
457.4 kB | Download |
md5:0cd88916ca614e9959f08abfe340adcb
|
259.7 kB | Download |
md5:2cd69144c77877b559cd5e663ab0ec0e
|
707.7 kB | Download |
md5:7615aa71214f51c0a183b6cf82685502
|
707.7 kB | Download |
md5:41c6edfe2e1e4ed3c6659d570cbb4570
|
76.7 kB | Download |
md5:22965ba508fb184c2bf1d630e1ea8bc8
|
5.3 kB | Download |
md5:5c6dd8ad24a848580c08092457ce1817
|
255.6 kB | Download |
md5:8477228cc2bc1bbe631577c4873d4024
|
226.3 kB | Download |
md5:08cb033ae9b48735089ff776d144ee2f
|
1.1 kB | Download |
md5:6e8cd48bee30c4828f76d4c39fbd6768
|
87.1 MB | Download |
md5:55e03e9c110e8f6cfdeb901efc806e57
|
452.3 kB | Download |
md5:acc6fb0d5b9306894d0637744e769446
|
8.4 kB | Download |
md5:2cd94a46e3f62ba26df3d57385b26b07
|
4.1 MB | Download |
md5:3f0e3a703b5a0ef40910339ad797d209
|
12.8 MB | Download |