Genome Mutations of Novel Human Betacoronavirus 2c
It is useful to trace the mutations arise in the pathogen genome during the disease emergence.
Here it is the analysis of mutations occurred in the four novel human betacoronavirus 2c(HuCoV2c) genome sequences (strain Jordan-N3, EMC, England1 & England2 from patient 1, 3, 4 & 10 respectively; read Background Information for more details). Briefly, a time-scaled phylogeny of the four genome sequences was estimated by Bayesian MCMC analysis (in BEAST) with the assumption of substitution rate = 0.00177 subs/site/year (estimated previously by Andrew; details here). Based on the phylogeny, ancestral genome sequences of tree nodes were estimated using marginal reconstruction method (in PAML). Predicted substitutions occurred along the branches were obtained by comparing the ancestral or taxon sequences at both ends of the branches. The result is shown in the following figure.
Nucleotide substitutions (and their corresponding amino acid changes) are displayed in the blocks. Genomic positions of the substitutions are indicated by numbers (which start from the 34th nt of the EMC sequence; amino acid positions start from the first residue of each protein), and the corresponding genes are highlighted with colours. Posterior clade supports are shown next to the nodes. PDF of this figure is available here.
Because a proper outgroup (HKU4 and HKU5 bat-CoVs are too distant for this analysis) is not available, substitutions inferred for Jordan-N3 branch (in the leftmost block) includes those along the branch from the root to the most recent common ancestor(MRCA) of EMC, England1&2, and the direction could not be confirmed (indicated by <>). More sequences from other HuCoV2c samples will be helpful to increase the resolution of this analysis.
While most polymorphic codons have single substitutions occurred in either Jordan, EMC, England1 or England2 branch, codon#1000 of ORF1ab and codon#1020 of Spike gene have two substitutions (at different codon positions) in two branches. For example, residue#1020 of Spike gene has changed from glutamine(Q) to histidine(H) and arginine(R) in England1 and England2 branch, respectively. Spike is an important host-receptor binding protein for the virus, and hence the repeated non-synonymous substitutions on the same Spike gene codon may hint at the adaptation for human transmissions after the MRCA of England1&2; however, such conjecture needs further studies for confirmation. It is also noteworthy that recent studies suggest the HuCoV2c (EMC) does not require angiotensin-converting enzyme 2 for infection and has broad infectivity across some different mammalian cells including human airway epithelial cells (Müller MA, et al. 2012; Kindler E, et al. 2013).
The dN/dS (an indicator of selection pressure; shown on top of the mutation boxes) of each branch was estimated from the whole genome (a concatenation of 10 genes) alignment using codon model (in PAML). While the Jordan-N3 branch has a substantially higher dN/dS (0.58) than those of other branches (0.11-0.17), a decreasing diversifying selection over time is also observed in general. This appears to be a typical evolutionary process of cross-species transmission and adaptation.
Tommy Lam School of Public Health, The University of Hong Kong