Published March 27, 2022 | Version v1
Dataset Open

Reconstruction of full-length LINE-1 progenitors from ancestral genomes (Supplementary Data)

  • 1. University of Toronto
  • 2. University of British Columbia
  • 3. McGill University

Description

Web Supplementary Files

  • Web Supplementary File 1 - FASTA files containing full-length reconstruction input sequences: full_length_reconstruction_input_sequence_fastas.zip
  • Web Supplementary File 2 - FASTA files containing Muscle alignments of the full-length reconstruction input sequences. full_length_reconstruction_input_sequence_alns.zip
  • Web Supplementary File 3 - FASTA file of full-length reconstructed sequences: full_length_reconstructions.fa
  • Web Supplementary File 4 - Table of full-length reconstruction statistics: full_length_reconstruction_stats.csv
  • Web Supplementary File 5 - FASTA files containing ORF reconstruction input sequences: orf_fastas.zip
  • Web Supplementary File 6 - FASTA files containing Macse alignments of the ORF reconstruction input sequences: ORF_reconstruction_input_sequence_alns.zip
  • Web Supplementary File 7 - Table of ORF reconstruction statistics: ORF_reconstructions.fa
  • Web Supplementary File 8 - Table of ORF reconstruction statistics: ORF_reconstruction_stats.csv
  • Web Supplementary File 9 - Table of Composite Sequences: bestfl_selection_fixed_CS_seqs.csv
  • Web Supplementary File 10 - Database of gold standards: L1_goldstandards.csv

Data Underlying Figures

  • RepeatMasker scans of hg38 and ancestral genomes: anc_gen_RM_out_files.zip
  • Figure 4
    • 4A
      • Source alignment of 54 composite sequences: 220121_dropped12+L1ME3A_muscle.nt.afa
      • Tree produced using the alignment and FastTree: 220121_dropped12+L1ME3A.tree
    • 4B
      • Source alignment of 67 Dfam L1 subfamily 3’ end models: 200123_dfam_3ends.fa.muscle.aln
      • Tree produced using the alignment: 200123_dfam_3ends.fa.muscle.aln.tree
  • Figure 5
    • KZFP-TE enrichment p-values (from Barazandeh et al 2018): TE_KZFP_enrichment_pvals.xlsx
    • KZFP-TE top 500 peak overlap (from Barazandeh et al 2018): top500_peak_overlap.xlsx
  • Figure 6
    • RepeatMasker .out file for the Composite Sequence custom library queried against hg38: CS_RM_hg38.fa.out.gz
  • Figure S2
    • RepeatMasker scan .out file of hg38 (CG corrected Kimura Divergence values are in last column): hg38+KimDiv_RM.out
    • RepeatMasker scan .out file of the Progressive Cactus eutherian ancestral genome (CG corrected Kimura Divergence values are in last column): Progressive_Cactus_Euth+KimDiv_RM.out
    • RepeatMasker scan .out file of the Ancestors 1.1 eutherian ancestral genome (CG corrected Kimura Divergence values are in last column): Ancestors_Euth+KimDiv_RM.out
  • Figure S5
    • RepeatMasker scan .out files for Progressive Cactus simian and primate reconstructed ancestral genomes: progCactus_RM_outfiles.zip
    • S5A
      • FASTA files containing Cactus genome-derived reconstructed sequences equivalent to the L1MA2, L1MA4, and L1MD1-3 best full-length sequences: progCactus_reconstruction_bestFL_equivalents.zip
    • S5B
      • FASTA files containing Muscle alignments of Cactus genome-derived full-length reconstruction input sequences: progCactus_reconstruction_input_sequence_alns.zip
  • Figure S6
    • S6A
      • Results of Conserved Domain scans of Cactus genome-derived full-length reconstructed sequences: CD_search_results_short_nms.txt
    • S6B-D
      • Character posterior probabilities of “best” full-length reconstructed sequences: best_fl_post_probs.zip
  • Figure S7
    • S7B-C
      • Results of Conserved Domain scans of translated initial full-length reconstructed sequences: initial_recons_all_3frametrans_CD-search.txt
      • Results of Conserved Domain scans of translated reconstructed ORFs: recons_ORF1-2_all_3frametrans_CD-search.csv
  • Figure S15
    • S15A
      • Source alignment of 67 composite sequences: bestfl_selection_fixed_CS_seqs_muscle.nt.afa
      • Tree produced using the alignment: bestfl_selection_fixed_CS_seqs_muscle.nt.afa.tree
    • S15B-E
      • Source Muscle alignments for phylogenetic trees of reconstructed sequence components:
        • ORF2: ORF2_keep54_muscle.nt.afa
        • 5’ UTR: 5utr_keep54_muscle.nt.afa
        • ORF1: ORF1_keep54_muscle.nt.afa
        • 3’ UTR: 3utr_keep54_muscle.nt.afa
      • Trees produced using above alignments:
        • ORF2: ORF2_keep54_muscle.nt.afa.tree
        • 5’ UTR: 5utr_keep54_muscle.nt.afa.tree
        • ORF1: ORF1_keep54_muscle.nt.afa.tree
        • 3’ UTR: 3utr_keep54_muscle.nt.afa.tree
  • Figure S17
    • Unfiltered BLAST results of Composite Sequences queried against hg38: CS_hg38_blastn.csv.zip
    • BED file of L1 instances annotated using BLAST pipeline: BLAST_L1_hits.bed

Files

anc_gen_RM_out_files.zip

Files (4.2 GB)

Name Size Download all
md5:b7eafc3b0bf562960f6cd487e1eadcb2
235.1 kB Download
md5:54566671a3688f25df92d9acecc34d68
1.9 kB Download
md5:fb439c8deaca93cab852018a672fa31d
6.9 kB Download
md5:f913dad06dbc30b87e797035f8e32808
959.2 kB Download
md5:9ca11321617b513902c6ee0c79a4385c
111.9 kB Download
md5:9e0d7ee7c0e860550fe6682ad2b6516f
5.9 kB Download
md5:8b4365244150de4682b311db5d258fbc
212.4 kB Download
md5:eb3de0fa1cb41a7438d45254e39e3db0
1.3 kB Download
md5:a6cb5eaee56eedf818221c385183353d
1.9 GB Preview Download
md5:3906ceede6d3ef965fa4ae1cb3eafdce
286.6 MB Download
md5:ddc026915ade726f2437a95bf7697060
5.3 MB Preview Download
md5:1ff3419b4247360aaa02fd6afaa02501
455.8 kB Preview Download
md5:657c821fbf7315b2f9255020d58154a8
1.3 MB Download
md5:366ed5aaf3ba54ad756db18e6b29c8a2
1.9 kB Download
md5:bd736d9fd3b35aed189af653b7caec15
33.4 MB Download
md5:93e5eb2d00a52ca8b96efd30dc3a15b4
2.0 MB Preview Download
md5:251342c950c637a4115979dd104f7c87
421.9 MB Preview Download
md5:9b8df606d3a73cf44fe6d26678247c65
138.5 MB Download
md5:540d2bd346cf828ec6b209fec06933bc
60.6 MB Preview Download
md5:9b30cc9937f0973c519ea8279c74c56a
41.9 MB Preview Download
md5:540ad802342f851dcc90c5090da78898
511.0 kB Preview Download
md5:44d60e1591affde0abafe6d8eb9b86b0
10.1 MB Download
md5:881c90838c31c8b8752ba19a130bf1fa
735.4 MB Download
md5:cb2b155e12130e3735efac4e815f7acd
8.4 MB Preview Download
md5:e7d0e5f1b642bccf3739dafd217becd6
647.3 kB Preview Download
md5:370866528740d33b210943c170bdb450
72.0 kB Download
md5:ef80f80db13bb5c284bede75fc5f4e51
1.5 kB Download
md5:c3b46ac531b7e12a585e4b11e598c3d9
426.6 kB Download
md5:86166de0c44df29be7aaba7c40ea7fed
1.5 kB Download
md5:dba75683771828e2fe1d30be1e3bd37d
25.5 MB Preview Download
md5:3aeec7fab606b026cd0027907a06425a
37.7 MB Preview Download
md5:785694e92c30f7023e748409c36d7a6b
597.2 kB Preview Download
md5:3eb5ea80eae450d2f2337f6e84fdf281
6.6 MB Download
md5:a94b2d007079e5192b56e451e64c1ad3
11.1 kB Preview Download
md5:717db0e76e09faaab91d446acff55751
546.2 kB Preview Download
md5:30320abc3b157d115fc0d20e7c4a0a5c
262.4 MB Preview Download
md5:01c8f6de60618c249939c74cf91fcad2
247.0 MB Download
md5:cac4abd5764d62875e01bde10d7bc7de
41.7 MB Preview Download
md5:2afcf156f035a7e5c832f40c25293cb1
1.1 MB Download
md5:4e998a967267cd57312b38244afe8dfe
841.9 kB Download

Additional details

Related works

Is supplement to
Journal article: 10.1093/genetics/iyac074 (DOI)