Published February 13, 2021 | Version v4
Dataset Open

Characterization of an infectious clone of a novel Coronavirus closely related to Bat coronavirus HKU4 from PRJNA602160

Description

This is the full supplementary information, sequence and 3d models associated with the publication "Characterization of an infectious clone of a novel Coronavirus closely related to Bat coronavirus HKU4 from PRJNA602160"

All contig sequences with homology to KJ473822.1 found within PRJNA602160 have been deposited as 10915167.fa , 10915172.fa , 10915173.fa and 10915174.fa

Reads supporting the vector-virus junctions within SRR10915173 have been included in 10915173.fa

The raw MEGAHIT contig sequences from PRJNA602160 have been deposited as SRR10915167_final.contigs.fa.gz , SRR10915172_final.contigs.fa.gz , SRR10915173_final.contigs.fa.gz and SRR10915174_final.contigs.fa.gz

Sequences that were found to be near-identical to MERS-CoV have been deposited as MERS_CoV from SRR10915173.fa and MERS_CoV from SRR10915174.fa

The addgene analysis results supporting fig.1 have been deposited as 10915173_addgene_analysis_result.gb

The annotated genome of the HKU4-related Coronavirus clone found in SRR10915173 have been deposited as 10915173_annotation.gb

Additional analysis of SRR10915173 were performed using CoronaSPADES, and have been deposited as SRR10915173_coronaspades_default.tar.gz

The 3-dimensional model of the RBD of the HKU4-related Coronavirus Clone have been deposited as HKU4_RBD.pdb

Sequences found with homology with the bat Tylonycteris pachypus, their similarity(identity/length of match) to the bat sequence, the most similar sequence on nt and their similarity(identity/length of match) to such sequences have been deposited as Bat_candidate.fa

Methods

Sequencing data and assembly

Using the NCBI STAT phylogenetic analysis tool from the SRA run browser, We identified four sequencing datasets that were positive for Coronaviruses, SRR10915167, SRR10915172, SRR10915173 and SRR10915174, from the HuaZhong Agricultural University Oryza Sativa BioProject PRJNA605983.

These four sequencing datasets are then downloaded and assembled using MEGAHIT[4]. The resulting contig sequences are then searched against the sequence identified from the NCBI STAT phylogenetic analysis tool, BtTp-BetaCoV/GX2012, KJ473822.1. This revealed a complete sequence 32725nt in length which is then identified as being 98.38% similar to the closest related sequence on NCBI, KJ473822.1, from the dataset SRR10915173

An attempt of searching for the natural host of HKU4-related Coronaviruses, the Tylonycteris pachypus bat, were performed on this dataset, however no sequences could be found that identifies as from this species.

Identification of the sequence as an infectious clone

As the Contig sequence was found to be longer than the genome size of Merbecoviruses, 30247nt for HKU4, we performed a BLAST analysis of the sequences flanking the HKU4 genome on this contig, which revealed homology to many expression and cloning vector sequences that were directly fused to the 5’- end and 3’-end of the Coronavirus genome. A BLAST search of the 5’-end and 3’-end of the Coronavirus genome was performed, which verifies the presence of reads covering the Vector-Virus junctions on both the 5’-end and 3’-end of the genome.

Sequence analysis were then performed using the Addgene sequence analyzer[5], which revealed a CMV promoter before the 5’-end of the Coronvirus genome and a bgH polyA signal after the 3’-end of the Coronavirus genome, confirming sequence origin as an infectious clone.

The complete genome of the HKU4-related Coronavirus is manually annotated to indicate all open reading frames (ORFs) and deposited as 10915173_annotation.gb.

Files

Files (74.1 MB)

Name Size Download all
md5:ad9cadc2df8b6c997b60a48f6c2df6cf
1.5 kB Download
md5:c06daa6bc12bb2122913cd9cde585263
11.8 kB Download
md5:fe5cec37e3457be2cc8772586ac39e38
733 Bytes Download
md5:7701fa71e75136caabe62253154fe690
46.0 kB Download
md5:53fb32574c04e55fe2690a8dcf8bb3a4
57.5 kB Download
md5:823360b670f7449ad8657be079e1451f
62.9 kB Download
md5:dbf150ee6c47cb5ead5ce1cbec3457ac
30.6 kB Download
md5:f8767ae8e38b4ab4c0b90a5539b05bbc
26.6 kB Download
md5:9329ccfa035840031d5349588fe258ad
134.8 kB Download
md5:86cec7c99fdfab1032f9a19bf0b26633
450 Bytes Download
md5:951d01034334697b60526801543a3164
1.4 kB Download
md5:8ff8f5bc1d6cea8a689dba4a6f70ae96
2.1 kB Download
md5:f9ada0ee28e84755390813c0aa9fed16
4.5 MB Download
md5:5a164f1bcfb38b4b13bb581d5128fc51
7.5 MB Download
md5:d39caf7df50db7ca9a34cb0207c2422e
8.5 MB Download
md5:14fd333524326cca88de4b641b3d2098
38.0 MB Download
md5:3ee9e022f90858c1052b4074ea24c7a3
7.1 MB Download
md5:f310c4265c4086255790a8a25b263bf0
8.1 MB Download