Uncovering disease determinants of Covid-19 through analysis of its molecular evolution

Covid-19 was first reported in Wuhan China but has now spread globally with overwhelming impacts on human health and health systems. The disease is caused by the SARs-Cov-2 which is related to the SARs-Cov-1 that causes SARs. There is evidence suggesting that the virus originated from the Rhinolophilid bats and has subsequently undergone recombination to allow for a natural selection for a human host and it is thought that the recombination might have occurred either prior or upon infection of the human host. These events of natural selection are presumed to have hastened human to human transmission. However, analyses of sequences from Covid-19 isolates from China and across the globe point to a unique case scenario that may suggest selection and evolution of the virus upon infection of the human host. In this paper we have examined the role of the human ACE-2 receptor in determining the predisposition to Covid-19 and analyzed Covid-19 sequences deposited in the virus database from around the globe to provide evidence that the disease burden is different across regions and among individuals as determined by the genetics of the virus and that the virus is rapidly evolving across regions and populations. Our phylogenetics data confirms that all strains circulating around the globe are related to the strains from China, the origin of the virus and further depicts varying degrees of similarity and disparity which suggests that the virus is mutating. While the USA has already been shown to have sequences closely related to the Wuhan sequences, here we report that the sequences from Spain and Africa are distantly related to the Wuhan sequences. Owing to the unique presentation of the disease in regions across the globe, the summary presented here informs of the need to tailor the management of the disease as dictated by viral and host genetic factors and the observed regional burden of the disease.


Introduction
Corona viruses have been shown to exist in both humans and wild animals and the possibility of animal to human transmission has been demonstrated (Andersen et al., 2020). Among the Corona viruses that can be transmitted from animals to humans are the severe acute respiratory syndrome Coronavirus-1 (SARs-Cov-1) that causes SARs and the Middle East Respiratory syndrome Coronavirus (MERs-Cov) that causes MERs (Johnson et al., 2018;and Schoeman and Fielding, 2019). The SARs-Cov-1 has been traced back to bats while the MERs-Cov has been associated with dromedary camels and bats (Johnson et al., 2018;and Schoeman and Fielding, 2019). Despite that the two viruses are transmitted from animals to humans and are both associated with respiratory ailments (Schoeman and Fielding, 2019), entry into the human host has been shown to be uniquely driven by different receptors where the SARs-Cov-1 attaches onto the human angiotensin converting enzyme-2 (ACE-2)  while the MERs-Cov attaches to the dipeptidyl peptidase-4 (DPP4) (Johnson et al., 2018) receptor that are coded by different genes. The novel SARs-Cov-2 that causes Covid -19 has been described as a descendant of the SARs-Cov-1 that also utilizes the human ACE-2 receptor to enter the host cell and similarly causes respiratory ailments (Hoffmann et al., 2020). Attachment on to the host receptor is via a receptor binding domain located in the spike (S) and subsequent entry into the host cell via endocytosis . However unlike, SARs, Covid-19 has a high transmission rate which has been associated with the high affinity with which the SAR-Cov-2 unlike the SARs-Cov-1 binds to the ACE-2 receptor (Yuen et al., 2020). Sequence analysis of the SARs-Cov-2 suggests that it originated from the Rhinolophilid bats and subsequently underwent recombination in its spike protein to confer a host selection advantage (Fehr and Perlman, 2015;Johnson et al., 2018;and Ou et al., 2020). This is likely to be true since the RaTG13 Coronavirus isolated in Rhinolophilid bats has a 96% sequence similarity to the SARs-Cov-2 but differs in the binding affinity to the human ACE-2 receptor where it lacks key amino acid sequences in its receptor binding domain for effective binding on to this human receptor (Hussain et al., 2020). The pangolin that is thought to be an intermediate host between the bats and humans (Yuen et al., 2020) has a Corona virus whose receptor binding domain is much more related to that of the SARs-Cov-2 but lacks a polybasic cleavage site in the S protein which is vital for binding to the human ACE-2 receptor (Andersen et al., 2020). However, because sequences analyzed from Covid-19 isolates across the globe depict disparities from the first virus isolated in Wuhan, it is likely that evolution of the virus receptor binding domain is occurring in the human host in order to evade immune responses and foster a higher human to human transmission (Yuen et al., 2020).

Case presentation
The presentation of Covid-19 has been shown to vary among patients but the most common symptoms of the disease include coughing, fever, headache, fatigue among other non-specific symptoms (Bernheim et al., 2020). Few studies have reported diarrhea and vomiting among Covid-19 patients suggesting the oral-fecal route as an alternative transmission pathway for the virus (Yuen et al., 2020). Unlike both SARs and MERs, fever has been found to be less common among Covid-19 patients and thus described as unspecific for symptomatic diagnosis of Covid-19 (Yuen et al., 2020). The radiographic manifestation of the disease includes unilateral and bilateral ground glass opacities as well as pulmonary infiltration as reported in chest CT scans . The virulence and prognosis of any disease depend on a wide range of factors that may influence immune responses; hence the clinical outcome of COVID-19 has been reported to range from mild, asymptomatic (mild without symptoms) to severe with the risk of death (Heneghan et al., 2020). However, Heneghan et al. (2020) further suggests that the outcome of the disease is also highly influenced by the viral load, and especially the dose of the virus on exposure. Almost 80% of the reported cases are mild while 20% are severe (Yuen et al., 2020). Severe cases are characterized by alveolar damage resulting from the virus, dyspnea, blood oxygen saturation (SpO2) 93%, PaO2/FiO2 ratio, lung filtrates >50% within 48 h and respiratory frequency >30/min (Cascella et al., 2020;and Heneghan et al., 2020). 5% of the severe cases develop into critical cases characterized by multiple organ failure, respiratory failure, septic shock, development of severe pneumonia, and most of these cases result in death . However, many of the severe cases are reported in middle-aged patients, elderly patients (Verity et al., 2020) as well as patients with preexisting medical conditions especially heart diseases, hypertension, Parkinson's disease, and diabetes (Li et al., 2020a).
Majority of the patients with mild disease present symptoms of respiratory infections such as mild fever, dry cough, headache, muscle pains, and lack of radiological indicators of severe disease (Cascella et al., 2020). Progression to severe disease depends on the host immunological response and is aborted by early diagnosis and supportive management and WHO, 2020). According to Chen et al. (2020a), the disease progression can result from over ebullient host immune responses suggesting that the host immune response plays an important role in the pathogenesis and clinical outcome of COVID-19. Despite that any viral infection triggers immune responses, immunopathogenesis has been associated with an aberrant or out of control immune response that results in pulmonary tissue damage, reduced lung capacity, and cytokine storm in Covid-19 (Huang et al., 2019;Chen et al., 2020a;. A few studies on the host-parasite interactome have been carried out to elucidate the virulence factors associated with the SARS-Cov-2 pathogenesis, one of the study demonstrates that the SARS-Cov-2 has evolved multiple mechanisms to modulate the host's key cellular processes leading to evasion of the immune system and disease pathogenesis (Li et al., 2020c;and Yuen et al., 2020). Analysis of proteome profiling among mild and severe cases of Covid-19 indicate that the nsp9 and nsp10 proteins are the key virulence factors of the virus which interacts with the NKRF to induce the IL-8 and IL-6 that ultimately lead to uncontrolled infiltration, activation of neutrophils as well as inflammatory response observed in patients (Wan et al., 2020). Increased levels of IL-6 have been reported in SARS-Cov-2 patients in ICU (Chen et al., 2020a;and Wan et al., 2020). These observations thus elucidate important pathways modulated by the virus proteins which determine the outcome of the disease.

ACE-2 expression, polymorphisms and the risk for Covid-19
The gene that codes for the human ACE-2 receptor has been mapped on to the X chromosome suggesting different expression levels in males and females (Benjafield et al., 2004) and the expression of the receptor has also been shown to vary with age . The ACE-2 receptor has a role in the control of hypertension (Hussain et al., 2020) where it catalyzes vasodilation via conversion of angiotensin ii to angiotensin 1-7 . Deficiencies in the expression of ACE-2 have been associated with an increased risk of cardiovascular diseases and hypertension  and it is now evident that certain polymorphisms of the ACE-2 have a role in predisposing individuals to hypertension (Luo et al., 2019). In addition, He et al. (2018) suggests that certain ACE-2 polymorphisms inherited from mothers who are carriers of such polymorphisms increase the likelihood of the development of cardiovascular conditions in old age. It is not clear if any of the polymorphisms associated with an increased risk for cardiovascular disease have a biphasic effect on the susceptibility to Covid-19 but there is evidence suggesting that individuals with cardiovascular conditions and hypertension have a higher risk and case fatality for Covid-19 and Fang et al., 2020). Despite that studies recently conducted in Asia indicate that Asian population have similar expression of the ACE-2 to population of other races , Lu et al. (2012) provides evidence suggesting that ACE-2 polymorphisms predispose Asian population to hypertension while Guo et al. (2020) and Fang et al. (2020) suggest the severity of Covid-19 in patients with cardiovascular conditions and hypertension in Asia. China was severely affected by the pandemic and it is not clear if the effects of the disease were in any away related to the nature of the ACE-2 receptor across population in China or the high risk of hypertension and there is thus need to assess the postulated confounding factors using data from hypertensive Covid-19 patients from other races.
Because of the available evidence suggesting a relationship between the expression of ACE-2 and SARs-Cov-1 (Hussain et al., 2020), it has been expected that SARs-Cov-2 which similarly binds to ACE-2 would exhibit a similar relationship and Fang et al., 2020). The later has prompted fears around the use of conventional ACE-2 inhibitors and antagonists for treatment of cardiovascular conditions and hypertension in the wake of Covid-19 (Sommerstein et al., 2020). Since the use of these inhibitors increase the availability of ACE-2 (Fang et al., 2020), it would be expected that these would create a positive correlation with the risk and severity of Covid-19 (Li et al., 2020a;and Fang et al., 2020). However, despite an observed clinical association between cardiovascular conditions and the risk of Covid-19, there is debatable clinical evidence suggesting an observed risk of the use of ACE-2 inhibitors and antagonists in the wake of Covid-19 and thus these drugs are still recommended for cardiovascular conditions (Sommerstein et al., 2020). In the most recent evaluation, it emerges that the high risk of Covid-19 in patients with cardiovascular conditions and individuals expressing low levels of ACE-2 emanates from the concomitant accumulation of angiotensin ii upon the association of the ACE-2 receptor with SARs-Cov-2 which not only causes vasoconstriction but also triggers an increase in the release of macrophages and interleukins that yield inflammatory reactions which subsequently result into organ destruction and collapse or organ failure (Kai and Kai, 2020). Similarly Gracia-Ramos (2020) indicates that the risk of severe disease characterized by organ failure in MERs is similarly increased in elderly patients and those with cardiovascular conditions despite that MERs-Cov binds to the DPP4 receptor and not ACE-2 suggesting that the observed risk of organ failure is a result of inflammatory reactions that might ensue in the event of the virus infection.
Since Covid-19 presents differently among individuals with a large proportion of asymptomatic patients (Biscayart et al., 2020), it would be expected that certain genetic factors would drive the observed disparities. Host genetic factors that drive the expression of the ACE-2 have been shown to vary across sex ( (Benjafield et al., 2004) and age (Guo et al., 2020) but it is not clear if these variations are sufficient to determine the susceptibility to Covid-19 although elderly people, those with cardiovascular conditions and men unlike women have been shown to exhibit a high Covid-19 risk and mortality (Caramelo et al., 2020). Besides host genetic factors that may be related to the expression of the ACE-2 and the risk for Covid-19, lessons drawn the HIV virus in Africa indicated that mutations in the virus receptor (Marmor et al., 2006;and Hussain et al., 2020) and immune responses are protective against HIV/AIDS (Aids Anal Africa, 1995;and Kaul et al., 2001) and Kaiser (2020) suggests that differences in the susceptibility to Covid-19 could be driven by disparities in the host receptor gene. Similarly, Hussain et al. (2020) indicates that ACE-2 variants such the rs73635825 (S19P) and rs143936283 (E329G) may be protective against Covid-19 owing to a low binding affinity to the S protein.
In addition, Delanghe et al. (2020), reports that the varying presentation and case fatality in Europe and Asia is associated with an insertion/deletion mutation in the ACE-2 gene that codes for a D allele which has been associated with reduced expression of ACE-2 and a lower expression of this D allele has been associated with Covid-19 suggesting that a high expression of ACE-2 is likely to positively correlated with Covid-19. There is thus an urgent need to examine the genetics of the ACE-2 receptor in detail to further inform on the varying disease presentation besides exploring other mechanisms in the host that may hinder or facilitate the replication of the virus upon entry.

Phylogenetic analysis
To further understand the genetic relatedness between different Covid-19 strains from across the world and the earliest strains reported from China, the origin, an alignment of Covid-19 gene sequences (regional comparison) was done using the virus pathogen database and analysis resource (VIPR) workbench (https:/ /www.viprbrc.org/brc/workbench) where we compared sequences from selected regions including Africa to the Wuhan sequences using the quick tree in the VIPR workbench and trees visualized using the archaeopteryx viewer (https://www.viprbrc.org/brc/tree.spg?) . Figure 1 depicts the alignment of sequences from USA and sequences from China. The phylogenetic analysis of the COVID-19 strains from the two regions suggest that they are genetically related as they cluster together. Figure 2 depicts the alignment of sequences from Spain and sequences from China and the phylogenetic analysis suggests that they are genetically distant as both strains from the two countries cluster separately. Figure 3 depicts the alignment of sequences drawn from Nigeria and Tunisia and the sequences from Wuhan China. The analysis revealed that the African strains were genetically distant from the Wuhan strains. Figure 4 is an analysis that illustrates the relationship between Covid-19 sequences across the globe. Strains clustered together in accordance to their respective regions but to some extent there was a similarity since some sequences could cluster along clades formed by other regions. For instance, sequences from Europe (highlighted in green) clustered along other sequences from other regions such as North America and Asia.

Discussion
It remains unclear why different regions and populations would experience differences in Covid-19 case presentation and fatality, but the phylogenetics data suggests that all the SARs-Cov-2 sequences are related to the sequences drawn from Wuhan where the virus was first reported but with varying disparities and similarities. The human migratory patterns would then have greatly contributed to the transportation of the virus from China and subsequent transmission across the globe. Whether the different case presentation is driven by unique host-related genetics coupled with the genetics of the virus is an aspect that is intensely being investigated but is more likely that the host immune responses have a vital role in the evolution of the virus. Because of the unique involvement of the ACE-2 receptor in Covid-19, it would be ideal to further investigate the host-related factors that modulate the infectivity and replication of the virus and establish in a clinical presentation whether certain genes or mutations in the ACE-2 are protective or predisposing to . Understanding the relationship between the expression and polymorphisms of this receptor and susceptibility to Covid-19 would help identify population at a higher risk. The close relatedness between the sequences in the USA to the Wuhan sequences suggests that the disease was introduced to the USA by an individual who travelled from Wuhan and that the viral genetics are very closely related. The later would also explain why the USA has had a severe case presentation and fatality that would match that reported in China had strict measures not been adopted. This suggests that countries outside China that have reported sequences that are more closely related to the Wuhan sequences are bound to have a severe disease burden, if ideal containment measures are not adopted in time. Spain has equally had high number of confirmed cases and fatalities but it is surprising that sequences from Spain have some degree of disparity from the Wuhan sequence which is a possible indicator of natural selection. This observed disparity could likely have emerged from natural selection events of the virus in infected human hosts as a mechanism of evading immune responses. Understanding the host immune system mechanisms under which the virus is mutating would equally inform further action especially in drug and vaccine development. Among all regions where Covid-19 has been reported, Africa has relatively few confirmed cases and few fatalities. It is not clear how this would have come about since prevention approaches such as social distancing are very difficult to foster given the African social interaction patterns besides the close involvement with China for trade and expertise which would have fostered disease importation but critics argue that Africa has a greater population of young people than most Western countries in addition to having lower cases of preexisting conditions such as hypertension, cardiovascular diseases and diabetes and thus may not suffer serious case presentation and fatalities. Old age and preexisting conditions have been associated with a high Covid-19 risk of death but it would be ideal for future studies to explore the mechanisms that drive the unique disease presentation in this region. However, the few sequences reported here from Africa indicate disparity from the Wuhan virus suggesting that the virus reported in Africa is equally a mutated version of the Wuhan virus and may thus be less potent.

Conclusion
In summary, because the presentation of the disease is evidently different from one region to the other, it would be ideal for each region to consider adopting prevention and management practices as dictated by the regional burden of the disease, the population dynamics, social interaction patterns and host and viral genetic factors. This would also allow for triaging of patients as dictated by disease severity and reduce the strain on available health care resources and also foster management approaches that are people friendly. While containment measures across the globe have been as borrowed from the Wuhan experience and past flu pandemics, such data suggests that not every region would require very strict measures and that countries whose viral sequence data closely matches the Wuhan Covid-19 sequence data as informed from sequence analysis should prepare in advance for ideal containment measures to avoid severe fatalities. Our data suggests that sequence analysis is an ideal surveillance tool for Covid-19 that informs of the viral genetic relatedness and such data also serves as tool for evaluating viral clusters that might define community transmission if used at a regional level.

Conflicts of Interest
The authors declare that there is no conflict of interest.