dNTPs and adjuvant reagent solutions in 3' RACE improve the characterization of noncanonical RNA SARS-CoV-2 genomes
Creators
- 1. Laboratorio de Genómica Celular Aplicada (LGCA), Grupo de Investigación en Computo Avanzado y a Gran Escala - CAGE, Universidad Industrial de Santander, Bucaramanga, Santander, 680006, Colombia.
- 2. Grupo de Investigación en Demografía, Salud Pública y Sistemas de Salud - GUINDESS, Universidad Industrial de Santander, Bucaramanga, Santander, 680006, Colombia.
- 3. Grupo de Investigación en Computo Avanzado y a Gran Escala - CAGE, Centro de Supercomputación y Cálculo Científico de la Universidad Industrial de Santander - SC3UIS, Universidad Industrial de Santander, Bucaramanga, Santander, 680006, Colombia.
- 4. Laboratorio de Genómica Celular Aplicada (LGCA), Grupo de Investigación en Computo Avanzado y a Gran Escala - CAGE, Centro de Supercomputación y Cálculo Científico de la Universidad Industrial de Santander - SC3UIS, Universidad Industrial de Santander, Bucaramanga, Santander, 680006, Colombia.
Contributors
Contact person:
Project managers:
- 1. Laboratorio de Genómica Celular Aplicada (LGCA), Grupo de Investigación en Computo Avanzado y a Gran Escala - CAGE, Universidad Industrial de Santander, Bucaramanga, Santander, 680006, Colombia.
- 2. Grupo de Investigación en Demografía, Salud Pública y Sistemas de Salud - GUINDESS, Universidad Industrial de Santander, Bucaramanga, Santander, 680006, Colombia.
- 3. Grupo de Investigación en Computo Avanzado y a Gran Escala - CAGE, Centro de Supercomputación y Cálculo Científico de la Universidad Industrial de Santander - SC3UIS, Universidad Industrial de Santander, Bucaramanga, Santander, 680006, Colombia.
- 4. Laboratorio de Genómica Celular Aplicada (LGCA), Grupo de Investigación en Computo Avanzado y a Gran Escala - CAGE, Centro de Supercomputación y Cálculo Científico de la Universidad Industrial de Santander - SC3UIS, Universidad Industrial de Santander, Bucaramanga, Santander, 680006, Colombia.
Description
The data correspond to the article entitled: "dNTPs and adjuvant reagent solutions in 3’ RACE improve the characterization of noncanonical RNA SARS-CoV-2 genomes"
R1. RACE 3’ Primer Blast Alignment. Contains BLAST alignments against the GenBank database using the consensus nucleotide sequence from the 3’ end of the SARS-CoV-2 genome and the polylinker. This file illustrates the restriction enzyme patterns and a graphical representation of the cDNA synthesis methodology and samples for genomic sequencing.
R2. Reads and assembles SARS-CoV-2 genomes.
The folder "1) Reads - Ion torrent" contains the reads obtained from sequencing via Ion Torrent technology and the reagents used in this study.
The folder named "2) FastQC" contains the results of Ion Torrent sequencing. In the file name, the number indicates the sample, and the letters "RNA" indicate the sequencing according to the IonTorrent protocol. The cDNA synthesis procedures for this study correspond to the following nomenclature: dNTPs-R = dNTPs SARS-CoV-2 solution, DES-R = denaturation reagent, and COM PRO = commercial procedure.
The folders named "3) IRMA" and "4) Bowtie2" contain the assemblies of the genomes.
R3. BLAST alignment of assembled SARS-CoV-2 genomes. Contains two folders named "BLAST - IRMA" and "BLAST - Bowtie2," which contain plain text documents with the results of the BLAST alignment for the genomes obtained with each of the assemblies.
R4. Pangolin v1.16 and Nextclade v2.9.1 lineages for SARS-CoV-2 genomes. Contains the folders "Pangolin and Nextclade (Bowtie2)" and "Pangolin and Nextclade (IRMA)." Each folder shows the data obtained with the Pangolin v1.16 and Nextclade v2.9.1 software for the classification of the genomes reported in this study, which were assembled with the IRMA and Bowtie2 software.
R5. Reference genome alignment and assembled genomes. Contains the folders "1) IRMA genomes," "2) Bowtie2 genomes," and "3) Genomes 07dN120320 and 27St122620." The files show the sequences and alignments of the examined genomes (the file name indicates the analyzed genome) relative to the SARS-CoV-2 reference genome both in FASTA and Clustal W formats.
R6. Programmed −1 Ribosomal Frameshifting Structure. The folder "1) Gibbs free energy 2D" contains a plain text document indicating the secondary structures of the open reading frame stimulation element in dot-bracket format. The folder "2) modeling Data Modeling 3D" contains the information for generating the structure of folder 1 in 3D.
R7. SARS-CoV-2 Database.
1) GISAID_sequences.zip contains a Zip file that contains a folder named GISAID, which in turn contains plain text documents with the genomes of each variant indicated in the filename of each document.
2) The depuration of sequences_GISAID contains two subfolders. The first subfolder, named "1) SARS-CoV-2 complete genome" contains plain text documents with the genomes downloaded from GISAID without undetermined nucleotides. The file name of each document corresponds to the analyzed variant. The subfolder "2) SARS-CoV-2 eliminate genome" contains the sequences eliminated from subfolder 1 because they differed from the majority of the analyzed sequences.
3) SARS-CoV-2 consensus variants. Contains plain text documents with consensus sequences for each variant, with frequency thresholds of 20 and 100 indicated in the file name of each document.
4) SARS-CoV-2 alignment consensus variants. Contains two subfolders, with the number indicating the alignment frequency threshold. The "Alignment 20_" subfolder contains four documents named "with Ns," which correspond to fasta and Clustal formats with undetermined nucleotides, whereas the files named "without" do not have undetermined nucleotides. The "100_" folder has the same file pattern as the previous folder.
5) SARS-CoV-2 codons alignment consensus variants and nc-sgRNA. Contains a document with the alignment of the genomes characterized in this study with the reference genome of SARS-CoV-2. A subfolder named “SARS-CoV-2 codons nc-sgRNA” shows each of the nc-sgRNA obtained in this study with the reference genome, and the file name corresponds to the nc-sgRNAs. The subfolder “SARS-CoV-2 Geneious Prime” contains 4 documents. Each document includes the graphical representation of the alignment of the nc-sgRNA obtained with each treatment for the synthesis of SARS-CoV-2 cDNA with respect to the reference genome. The following three documents indicated with the numbers 25, 50, and 100 correspond to the percentage of identity with respect to the number of annotations relative to the reference genome, which is indicated in the title of each document.
6) Variant Alignment – Ns. Contains eight documents corresponding to the fasta and clustal formats with SARS-CoV-2 genomes obtained in this study from the reference genome and from genomes containing undetermined nucleotides of the Gamma, Lambda, Mu and Omicron variants.
R8. Phylogeny SARS-CoV-2. Contains two subfolders with the results of the phylogenetic analyses conducted via the maximum likelihood method of the genomes characterized in this study compared to the variants. The subfolder named "Phylogeny with Ns" indicates the analysis of genomes containing undetermined nucleotides, whereas "Phylogeny without Ns" corresponds to the analysis of complete genomes.
Notes
Files
Files
(9.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:b08a8d9e732498186b2549924e4454be
|
9.6 GB | Download |