Published April 13, 2022
| Version v2
Dataset
Restricted
Data from: High-resolution methylome analysis in the clonal Populus nigra cv. 'Italica' reveals environmentally sensitive hotspots and drought-responsive TE superfamilies
Creators
- 1. Department of Terrestrial Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Droevendaalsesteeg 10, 6708 PB Wageningen, the Netherlands
- 2. Department of Biology, Philipps-University Marburg, Karl-von-Frisch Strasse 8 | D-35043 Marburg, Germany
- 3. IGA Technology Services Srl. Via Jacopo Linussio 51, 33100 Udine UD, Italy
- 4. Genetics, Faculty of Biology, Ludwig Maximilians University Munich, 82152 Martinsried, Germany
- 5. Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- 6. Department of Agri-Food, Environmental and Animal Sciences, University of Udine. via delle Scienze 206, 33100 Udine, Italy
Description
The following dataset contains the processed data presented in the article "High-resolution methylome analysis in the clonal Populus nigra cv. ‘Italica’ reveals environmentally sensitive hotspots and drought-responsive TE superfamilies"
- BedGraph files (CpG.bed, CHG.bed, CHH.bed): contain methylation levels (%) for each cytosine in the Lombardy poplar genome, in the respective sequence context. The first three columns represent the genomic coordinates of the cytosine, the 56 following columns indicate the methylation levels for each of the samples. Missing values are represented with NA (when particular cytosines were not captured by the sequencing method).
- DMR_annotation_Populus_nigra_Italica_after_biotic_and_abiotic_treatments.txt: contains all the identified regions that showed significant stress-induced differential methylation (DMR). The file include all annotation for each single DMR: genomic location, genomic feature, gene, TE, sequence context and stress treatment, besides other specific relevant information.
- sample_IDs_basic_metadata.txt: contains the sample ID and the associated metadata (stress treatment and ortet location and ID) for all samples used in the analysis.
- italica_denovo_TE_280920.gff: contains the predicted TEs using the following methodology. First, TEs were de-novo annotated using the Extensive de-novo TE Annotator (EDTA) (version 1.8.3) (https://github.com/oushujun/EDTA) with default parameters, except for option --sensitive: 1, which uses RepeatModeler (version 2.0.1) to identify remaining TEs. All the steps in EDTA pipeline were selected (filter, final and anno) in order to perform whole-genome annotation/analysis after the TE library was constructed. Then, in the annotated library from EDTA, we merged overlapping fragments and fragments located at a close distance (<10bp) in a strand wise manner. The merged fragment was annotated as the family of longer merged fragment. Structural variants derived from nanopore data were used to redefine the boundaries of overlapping TE fragments to be more precise with actual predictions. LINE elements were identified independently by RepeatModeler in order to construct a more comprehensive de-novo TE library.
- SaliS.fasta: contains the consensus sequences of Salicaceae SINE families (SaliS), the file was built by extracting information from the supplementary table 2 of the publication: "Divergence of 3′ ends as a driver of short interspersed nuclear element (SINE) evolution in the Salicaceae" (https://doi.org/10.1111/tpj.14721)
- Pnigra_Italica_SaliS.bed: the file contains the annotated SaliS found by blastn over the P. nigra Italica reference genome (-qcov_hsp_perc 90 -perc_identity 70 -word_size 7). Column headers: chr, start, end, length, strand, perc_identity, SaliS family.
- Pnigra_Italica_all_TEs_for_anno.bed: contains the merged information from italica_denovo_TE_280920.gff and Pnigra_Italica_SaliS.bed. Column headers: chr, start, end, length, strand, perc_identity (only for SaliS), TE superfamily.
- CXX_ortet_DMRs_merged.bed: contains DMRs merged from all pairwise DMR callings between two ortets. One file per context. Column headers: chr, start, end, number of comparisons where the DMR occur, avg number of cytosines (when called in multiple DMR callings), avg differential methylation vs. control (when called in multiple DMR callings), avg adjusted p value (when called in multiple DMR callings), avg DMR length (when called in multiple DMR callings).
SCRIPTS
- cov_filtering.sh: to filter individual positions according to a custom threshold.
- unionbedg_with_NAs.sh: to merge information from different samples in a single file taking into account the percentage of missing values per position across the given samples.
- anovas_and_contrasts_boxplots_barplots_cld.r: to perform statistical tests for the effect of treatments and ortets on the average global methylation. Each sequence context was analyzed separately.
- CHH_noise_filter.sh: to remove cytosines with invariable methylation values across 90% of the samples.
- GlobalMethAvg_calculation.r: to calculate global average methylation given a methylation file (CpG.bed, CHG/bed or CHH.bed) and sample file.
- Hclustering_and_PCAs_analysis.r: to perform hierarchical clustering, principal component analysis and plot the respective figures.
- ICC_matrices_analysis.r: to calculate intraclass correlation coefficients among all pairwise combinations and plot colored grids
Annotations are based on the de novo reference genome of the Populus nigra cv. Italica clone uploaded in the ENA project: PRJEB44889. Bisulfite sequencing data can be found under the ENA project: PRJEB51831