Preliminary genetic analyses

Genetic analysis

Currently have 2 complete genome sequences, hCoV-EMC & England1_CoV Polyprotein 1a coding sequence: 61 differences in 21,237 nucleotides, divergence 0.00287 subst/site S protein: 10 differences in 4,062 nucleotides, divergence 0.00246 subst/site Genome: 99 differences in 30,031 aligned nucleotides, divergence 0.00330 subst/site

When did these two strains share a common ancestor?

This depends on the rate of evolution and with 2 strains this is not possible to determine directly from the sequences. We can use estimates for SARS which in humans has been estimated to be evolving at 4.0 x 10-4 (2.0 × 10−4 to 6 × 10−4) substitutions per site per year for the 1a polyprotein (Salemi et al, 2004, JVirol 78:1602).

This range would correspond to a time of the common ancestor being between 2.25 years and 7.04 years prior to June 2012. This is calculated by taking the divergence between the strains (2.87x10-3 for pp1a) and dividing it by the rate (say 4.0x10-4) to give the divergence in years (7.18 years). The common ancestor is then estimated to have existed at the midpoint. The 101 day interval between the sampling of the two strains can be allowed by removing 0.3 years prior to dividing by 2. This would give a TMRCA of 3.45 years prior to June 2012 (i.e., January 2008). Given the uncertainty in the SARS rate, above, the range would be between June 2005 and March 2010.

This is not accommodating the uncertainty in the stochastic process giving the observed number of mutations in the time given the rate.

This is assuming the virus is evolving at the same rate as other human coronaviruses. As the likely scenario is that most of the divergence occurred in another, reservoir, species this may not hold. This figure shows an estimate of the time of the MRCA as a function of evolutionary rate:

Analysis of the likely date of the most recent common ancestor of the 2 human strains computed using BEAST. For each point the rate of molecular evolution was fixed and the time of the MRCA estimated. Both axis are on a log scale. The bars represent 95% HPD intervals. HKY+G model of molecular evolution for complete genomes. Coalescent prior with a 1/x prior on population size.

How fast do CoV evolve?

Vijgen et al (2005) J Virol 79 1595-1604 estimate rates of evolution for a group 2 CoV in cattle. This is probably one of our best estimates of the rate of zoonotic CoV. Other, later papers estimate rates for a number of bat CoV [Pfefferle et al (2009) Emerg Infect Dis 15:1377-84, Huynh et al (2012) J Virol] but these are dependent for calibration on the dating given by Vijgen et al and are therefore not independent estimates. Vijgen et al use a number of methods to estimate rates but all give a similar estimate in the order of 4x10-4 substitutions per site per year.

What is the nearest non-human host relative?

The closest non-human sequence to both the human cases is a short fragment from a CoV isolated from a pipistrelle bat in the Netherlands collected in 2008. The fragment consists 332 nucleotides of pp1b located at nucleotide 15033 in the human CoV genomes. There are 41 differences between the human cases and the bat sequence giving a divergence of 0.123 subst/site which, at the same rate as above (4x10-4 subst/site/year), corresponds to an MRCA existing about 150 years ago. So this fragment can tell us little about the possible location and species of the reservoir host for the human cases.