Novel Human Coronavirus Molecular Epidemiology and Genetic Analysis
This is an update of an older analysis based on 3 sequences.
We now have 5 complete genome sequences:
| Name | Accession | Source | Date of collection |
|---|---|---|---|
| CoV_EMC | JX869059.2 | Patient 3 | 2012-06-13 |
| England_Qatar | KC667074.1 | Patient 4 | 2012-09-12 |
| England_2 | HPA Website | Patient 10 | 2013-2-10 |
| Jordan-N3 | KC776174 | Patient 1 | between 2012-04-09 and 2012-04-19 |
| Munich/Abu_Dhabi | Institute of Virology Website | Patient 17 | 2013-03-22 |
When did these strains share a common ancestor?
With 4 sequences, sampled from different times, we can attempt to estimate the rate of evolution. To do this we estimated a maximum likelihood tree under the GTR + gamma model of substitution using PhyML. This is the unrooted maximum likelihood topology with estimated the branch lengths:

A maximum likelihood tree estimated using PhyML and the GTR + G model. Branch lengths are in substitutions per site. The tree is arbitrarily rooted midway between the most distant sequences.
A rate of evolution for these sequences can be estimated using root-to-tip regression. This plots genetic distance from the root of the tree against the time of isolation of each virus:

The root-to-tip regression of genetic distances against time of isolation using the maximum likelihood tree above. The position of the root of the tree was found to minimize the residuals on this plot and thus the oldest sequence lies directly on the line.
The estimate of the rate of evolution is given by the slope of the line and the time of the most recent common ancestor by the x-intercept:
| Rate: | 1.52x10-3 subst/site/year |
| tMRCA: | 2011.5 |
We can use this rate to give a timescale for the tree:

The same tree as above but with a timescale added given by the estimated rate of evolution.
This would result in the common ancestor of all four viruses being in the in the middle of 2011. This rate of evolution is about 5 times higher than that estimated only from the 2 UK sequences in a previous analysis (that slower rate gave the common ancestor as the first half of 2009). There are a number of possible reasons for this discrepancy - 1) No estimate of the uncertainty is estimated for these analyses and with 4 sequences this is likely to be large. 2) The rate of evolution might not be constant over the tree - e.g., the rate in humans could be different from that in an animal host. 3) Most importantly this is a point estimate based on very few data points (essentially < 3 due to the phylogenetic non-independence). The only way to resolve this would be to have additional sequences sampled from the other cases thus far recorded.
This rate (and timescale for the tree) is likely to jump around a bit as additional sequences are added to the analysis as with this few, the stochastic effects of anyone can be large.
What is the nearest non-human host relative?
The closest non-human sequence to both the human cases is a short fragment from a CoV isolated from a pipistrelle bat in the Netherlands collected in 2008. The fragment consists 332 nucleotides of pp1b located at nucleotide 15033 in the human CoV genomes. There are 41 differences between the human cases and the bat sequence giving a divergence of 0.123 subst/site which, at the same rate as above (1.52x10-3 subst/site/year), corresponds to an MRCA existing about 40 years ago. With the previous rate of 4.4x10-3 this would be over 150 years. So this fragment can tell us little about the possible location and species of the reservoir host for the human cases.
Interpretation
Based on the above results and the restricted geographical range of the known cases, it seems unlikely that this virus has been circulating entirely in humans since these sequences shared a common ancestor. Although it is certain that the virus can spread from human to human (two familial clusters are noted), a single introduction into humans and subsequent epidemic would be unlikely to have remained restricted to the Ariabian Peninsula (the UK case from January was a transitory visitor to Saudi Arabia).
A more likely interpretation of the data would be multiple zoonotic transmissions from an animal reservoir. If the reservoir has a high contact rate with humans (e.g., a domesticated or farmed animal) then multiple small chains of human transmission could be hypothesized allowing for contact with the cases that have been described so far.
Whilst the (relatively) close phylogenetic relationship of the human virus to bat coronavirus may indicate bats as an ultimate source of this virus, it seems unlikely that bats are the immediate contact for the human cases as human bat contacts are relatively low frequency. Speculatively, it would be plausible that the virus crossed from bats to a domesticated or agricultural animal which then spread widely within the last few years in the Arabian Peninsula. Further surveillance of both bats and other potential reservoirs will undoubtedly be ongoing and the epidemiology of this virus will become more clear.
Further sequencing of the currently reported human cases, where samples exist, would certainly help resolve the timescale of this virus. It may also help estimate the number of zoonotic events that the human cases represent.
Update 2013-02-25:
This paper reports anecdotally that one patient had indirect contact with ill goats:
While our patient denied contact to bats, he remembered ill goats among the animals on his farm. Albarrak et al. reported that the first Saudi case was exposed to farm animals, but the first Qatari patient and the second Saudi patient were not [15]. Although our patient reported no direct contact with his animals, one animal caretaker working for him was ill with cough and might have been an intermediate link in the chain of infection.
- Log in to post comments