4486195
doi
10.5281/zenodo.4486195
oai:zenodo.org:4486195
Vector sequences in early WIV SRA sequencing data of SARS-CoV-2 inform on a potential large-scale security breach at the beginning of the COVID-19 pandemic
Daoyu Zhang
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
SARS-CoV-2
Infectious Clone
Metagenomic analysis
Security breach
Contamination.
<p>DESCRIPTION</p>
<p>Sequences identified as Influenza A virus, Spodoptera frugiperda rhabdovirus and Nipah henipavirus have been previously identified within the early HiSeq 1000 and HiSeq 3000 sequencing data of SARS-CoV-2, SRR11092059,SRR11092060,SRR11092061 and SRR11092062, and were being used to support the hypothesis that a "simultaneous outbreak of multiple zoonotic viruses" have happened in the Huanan Seafood market. https://doi.org/10.31219/osf.io/s4td6</p>
<p>However, a closer examination of these sequences revealed that they were not sequences of actual wild viruses, but were in stead fragments left behind from PCR products and cloning vectors harboring both cDNA clones and infectious clones of such viruses, with evidence of viral sequences being joined directly to DNA sequences of vector and non-human origin within the same short reads.</p>
<p>Here are the vector sequences and PCR product-like sequences recovered from the earliest WIV SRA sequencing data of Human SARS-CoV-2 from dataset SRR11092059,SRR11092060,SRR11092061,SRR11092062.</p>
<p>Sequences associated with Vectors and PCR products from 3 distinct viral species have been obtained: The 3'-end of a Nipah Henipahvirus with fusion to a Hepatitis D virus Ribozyme, a T7 terminator and a Tetracycline resistance gene, The 5'-end of the same Nipah Henipahvirus with fusion to sequences found in diverse vectors, A complete vector genome encoding the HA gene of Influenza A virus subtype H7N9 under a CMV promoter and a bgH polyA terminator, and 221 Contiguous sequences corresponding to the Spodoptera frugiperda rhabdovirus reference genome fused to sequences that were homologous to multiple Plastid sequences and Notably Mitochondrial sequences of Rodents.</p>
<p>As sequences corresponding to a rescued infectious clone of a BSL-4 organism (Nipah Henipahvirus) were found in sample sequences that supposedy represents patient samples that were obtained from Hospital ICU and sequenced in a pathogen diagnosis laboratory (which is separate from the Virology Research laboratory which is implied by the context of an Infectious Clone of such an organism, evident by the 3'-HDV ribozyme and T7 terminator fused directly to the 3'-terminus of the Nipah Henipahvirus reads), The discovery of artifact-containing sequences of at least 3 different pathogen species that are phylogenetically and methodologically distinct from each other in samples that were supposedly submitted by a laboratory that is Separate from the virological research laboratories that could have hosted such clone sequences imply extensive crosstalk and cross-contamination between the various laboratories within the Wuhan Institute of Virology, which includes at least one BSL-4 laboratory with evidence of containment breach of a BSL-4 organism and it's subsequent introduction into RNA-seq samples that were processed by a laboratory of distinct and separate purposes than the basic virological research evidenced by the Infectious Clone of the Hipah Henipahvirus.</p>
<p>Such a discovery therefore likely imply a major security breach happening within the Wuhan institute of Virology at the time when the first sequences of SARS-CoV-2 was sampled and sequenced, which have important implications on the origins of the SARS-CoV-2 virus itself.</p>
<p>METHODS</p>
<p>The metagenomic sequencing datasets, SRR11092059,SRR11092060,SRR11092061 and SRR11092062 were first analyzed using the NCBI phylogenetic analysis tool, which identified viral sequences that is not related to SARS-CoV-2 itself. These include Influenza A virus (IAV, subtype H7N9), Spodoptera frugiperda rhabdovirus and Nipah Henipahvirus.</p>
<p>The datasets were then subjected to BLAST search using MEGABLAST against the reference sequences of such viruses to verify the existence of the viral sequences and determine the exact sybtype of such viruses and the closest sequences on GenBank that corresponds to the reads. There seuqences are MH926031.1 for the Spodoptera frugiperda rhabdovirus, KY199425.1 for the Influenza A virus and AY988601.1 for the Nipah Henipahvirus.</p>
<p>A second round BLAST analysis with these identified sequences were then performed, which unexpectedly revealed numerous reads corresponding to Cloning vectors and non-human Mitochondrial and Plastid sequences being fused directly to the sequences of the identified viral species. Reads were then downloaded and subjected to assembly using the CAP3 sequence assembly program and the EGASSEMBLER tool. Contig sequences were then queried against the NCBI nr/nt database which unanimously identified the original sample sequences as viral sequences inserted into cloning vectors.</p>
<p>The complete sequence of the Influenza A virus Haemagluttinin (HA) gene clone was obtained from SRR11092061,SRR11092062 using multiple rounds of BLAST search and sequence assembly expansion on the existing vector-virus junction contigs, and a partial sequence corresponding the 3'-end of Nipah Henipahvirus AY988601.1 fused to a 3'-HDV ribozyme, T7 terminator and a Tet resistance gene was obtained from SRR11092059. In addition, 221 Contig sequences corresponding to the Rhabdovirus MH926031.1 fused to Chloroplast sequence MN524635.1 and Rodent Mitochondrial sequence MT241668.1 have been recovered from SRR11092061.</p>
<p>We then performed a BLAST search using the identified vector sequences on SRR11092059,SRR11092060,SRR11092061 and SRR11092062, which confirms the existence of these two vetor sequences in all 4 datasets.</p>
Zenodo
2021-02-01
info:eu-repo/semantics/other
4486194
1616284920.107558
67033
md5:6dbaecaba7decc0c6288846d5de73931
https://zenodo.org/records/4486195/files/Contig sequences containing Spodoptera frugiperda rhabdovirus and Plastid sequences from SRR11092061.fasta
6442
md5:f9c892be79e3fb3db9e84ae074c38330
https://zenodo.org/records/4486195/files/Vector sequences recovered from human (WIV BALF) data of SARS-CoV-2.fasta
322406
md5:33dfb95de0c4e1d66f9b021797f771e7
https://zenodo.org/records/4486195/files/Reads corresponding to Spodoptera frugiperda rhabdovirus in SRR11092061.fasta
1665519
md5:4fb61366d78412740f06bad5adc337f2
https://zenodo.org/records/4486195/files/Sequences corresponding to Influenza HA gene with NeoKan,CMV and bgH pA in SRR11092059,SRR11092060,SRR11092061,SRR11092062.fasta
47447
md5:996f9465a2fa697dca629ebd8697558f
https://zenodo.org/records/4486195/files/Sequences corresponding to Nipah Virus with 3'-HDV ribozyme and Tet resistance in SRR11092059,SRR11092060,SRR11092061,SRR11092062.fasta
public
10.5281/zenodo.4486194
isVersionOf
doi